0% found this document useful (0 votes)
176 views

Classic Papers in Programming Languages and Logic

These papers provide a breadth of information about Programming Languages and Logic that is generally useful in the study of computation and interesting from a computer science perspective.

Uploaded by

Manjunath.R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views

Classic Papers in Programming Languages and Logic

These papers provide a breadth of information about Programming Languages and Logic that is generally useful in the study of computation and interesting from a computer science perspective.

Uploaded by

Manjunath.R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 863

Classic Papers in

Programming Languages and Logic

These papers provide a breadth of information about Programming Languages and Logic that is generally useful in

the study of computation and interesting from a computer science perspective.


Contents

 Investigations into Logical Deduction


 Towards a Mathematical Semantics for Computer Languages
 Linear Logic
 The Essence of Algol
 Can Programming Be Liberated from the von Neumann Style? A Functional
Style and Its Algebra of Programs
 The Formulae-as-Types Notion of Construction
 A Type-Theoretical Alternative to ISWIM, CUCH, OWHY
 Abstract Types have Existential Types
 An Axiomatic Basis for Computer Programming
 An Evaluation Semantics for Classical Proofs
 Using Dependent Types to Express Modular Structure
 Higher-Order Modules and the Phase Distinction
 The Mechanical Evaluation of Expressions
 The Next 700 Programming Languages
 Communicating Sequential Processes
 Some Properties of Conversion
 Types, Abstraction, and Parametric Polymorphism
 Intuitionistic Type Theory
 A Structural Approach to Operational Semantics
 Towards a Theory of Type Structure
 Proof of a Program: FIND
 Guarded Commands, Nondeterminacy and Formal Derivation of Programs
 On the Meanings of the Logical Constants and the Justifications of the
Logical Laws
 Notions of computation and monads
 Principal Type-Schemes for Functional Programs
 Recursive Functions of Symbolic Expressions and Their Computation by
Machine, Part I
 Definitional interpreters for higher-order programming languages
 Fundamental Concepts in Programming Languages
 Computational lambda-calculus and monads
197 7 ACM Turing Award Lecture

The 1977 A C M Turing Award was presented to John Backus putations called Fortran. This same group designed the first
at the A C M Annual Conference in Seattle, October 17. In intro- system to translate Fortran programs into machine language.
ducing the recipient, Jean E. Sammet, Chairman of the Awards They employed novel optimizing techniques to generate fast
Committee, made the following comments and read a portion of machine-language programs. Many other compilers for the lan-
the final citation. The full announcement is in the September guage were developed, first on IBM machines, and later on virtu-
1977 issue of Communications, page 681. ally every make of computer. Fortran was adopted as a U.S.
"Probably there is nobody in the room who has not heard of national standard in 1966.
Fortran and most of you have probably used it at least once, or at During the latter part of the 1950s, Backus served on the
least looked over the shoulder of someone who was writing a For. international committees which developed Algol 58 and a later
tran program. There are probably almost as many people who version, Algol 60. The language Algol, and its derivative com-
have heard the letters BNF but don't necessarily know what they pilers, received broad acceptance in Europe as a means for de-
stand for. Well, the B is for Backus, and the other letters are veloping programs and as a formal means of publishing the
explained in the formal citation. These two contributions, in my algorithms on which the programs are based.
opinion, are among the half dozen most important technical In 1959, Backus presented a paper at the UNESCO confer-
contributions to the computer field and both were made by John ence in Paris on the syntax and semantics of a proposed inter-
Backus (which in the Fortran case also involved some col- national algebraic language. In this paper, he was the first to
leagues). It is for these contributions that he is receiving this employ a formal technique for specifying the syntax of program-
year's Turing award. ming languages. The formal notation became known as B N F -
The short form of his citation is for 'profound, influential, standing for "Backus N o r m a l Form," or "Backus Naur F o r m " to
and lasting contributions to the design of practical high-level recognize the further contributions by Peter Naur of Denmark.
programming systems, notably through his work on Fortran, and Thus, Backus has contributed strongly both to the pragmatic
for seminal publication of formal procedures for the specifica- world of problem-solving on computers and to the theoretical
tions of programming languages.' world existing at the interface between artificial languages and
The most significant part of the full citation is as follows: computational linguistics. Fortran remains one of the most
' . . . Backus headed a small IBM group in New York City widely used programming languages in the world. Almost all
during the early 1950s. The earliest product of this group's programming languages are now described with some type of
efforts was a high-level language for scientific and technical corn- formal syntactic definition.' "

Can Programming Be Liberated from the von


Neumann Style? A Functional Style and Its
Algebra of Programs
John Backus
IBM Research Laboratory, San Jose

Conventional programming languages are growing


ever more enormous, but not stronger. Inherent defects
at the most basic level cause them to be both fat and
weak: their primitive word-at-a-time style of program-
ming inherited from their common ancestor--the von
Neumann computer, their close coupling of semantics to
state transitions, their division of programming into a
world of expressions and a world of statements, their
inability to effectively use powerful combining forms for
building new programs from existing ones, and their lack
of useful mathematical properties for reasoning about
programs.
An alternative functional style of programming is
General permission to make fair use in teaching or research of all founded on the use of combining forms for creating
or part of this material is granted to individual readers and to nonprofit
libraries acting for them provided that ACM's copyright notice is given programs. Functional programs deal with structured
and that reference is made to the publication, to its date of issue, and data, are often nonrepetitive and nonrecursive, are hier-
to the fact that reprinting privileges were granted by permission of the archically constructed, do not name their arguments, and
Association for Computing Machinery. To otherwise reprint a figure,
table, other substantial excerpt, or the entire work requires specific do not require the complex machinery of procedure
permission as does republication, or systematic or multiple reproduc- declarations to become generally applicable. Combining
tion. forms can use high level programs to build still higher
Author's address: 91 Saint Germain Ave., San Francisco, CA
94114. level ones in a style not possible in conventional lan-
© 1978 ACM 0001-0782/78/0800-0613 $00.75 guages.

613 Communications August 1978


of Volume 2 i
the ACM Number 8
Associated with the functional style of programming grams, and no conventional language even begins to
is an algebra of programs whose variables range over meet that need. In fact, conventional languages create
programs and whose operations are combining forms. unnecessary confusion in the way we think about pro-
This algebra can be used to transform programs and to grams.
solve equations whose "unknowns" are programs in much For twenty years programming languages have been
the same way one transforms equations in high school steadily progressing toward their present condition of
algebra. These transformations are given by algebraic obesity; as a result, the study and invention of program-
laws and are carried out in the same language in which ming languages has lost much of its excitement. Instead,
programs are written. Combining forms are chosen not it is now the province of those who prefer to work with
only for their programming power but also for the power thick compendia of details rather than wrestle with new
of their associated algebraic laws. General theorems of ideas. Discussions about programming languages often
the algebra give the detailed behavior and termination resemble medieval debates about the number o f angels
conditions for large classes of programs. that can dance on the head of a pin instead of exciting
A new class of computing systems uses the functional contests between fundamentally differing concepts.
programming style both in its programming language and Many creative computer scientists have retreated
in its state transition rules. Unlike von Neumann lan- from inventing languages to inventing tools for describ-
guages, these systems have semantics loosely coupled to ing them. Unfortunately, they have been largely content
states--only one state transition occurs per major com- to apply their elegant new tools to studying the warts
putation. and moles of existing languages. After examining the
Key Words and Phrases: functional programming, appalling type structure of conventional languages, using
algebra of programs, combining forms, functional forms, the elegant tools developed by Dana Scott, it is surprising
programming languages, von Neumann computers, yon that so many of us remain passively content with that
Neumann languages, models of computing systems, ap- structure instead of energetically searching for new ones.
plicative computing systems, applicative state transition The purpose of this article is twofold; first, to suggest
systems, program transformation, program correctness, that basic defects in the framework of conventional
program termination, metacomposition languages make their expressive weakness and their
CR Categories: 4.20, 4.29, 5.20, 5.24, 5.26 cancerous growth inevitable, and second, to suggest some
alternate avenues of exploration toward the design of
new kinds of languages.
Introduction

I deeply appreciate the honor of the ACM invitation


2. Models of Computing Systems
to give the 1977 Turing Lecture and to publish this
account of it with the details promised in the lecture.
Underlying every programming language is a model
Readers wishing to see a summary of this paper should
of a computing system that its programs control. Some
turn to Section 16, the last section.
models are pure abstractions, some are represented by
hardware, and others by compiling or interpretive pro-
1. Conventional Programming Languages: Fat and grams. Before we examine conventional languages more
Flabby closely, it is useful to make a brief survey of existing
models as an introduction to the current universe of
Programming languages appear to be in trouble. alternatives. Existing models may be crudely classified
Each successive language incorporates, with a little by the criteria outlined below.
cleaning up, all the features of its predecessors plus a few
more. Some languages have manuals exceeding 500 2.1 Criteria for Models
pages; others cram a complex description into shorter 2.1.1 Foundations. Is there an elegant and concise
manuals by using dense formalisms. The Department of mathematical description of the model? Is it useful in
Defense has current plans for a committee-designed proving helpful facts about the behavior of the model?
language standard that could require a manual as long Or is the model so complex that its description is bulky
as 1,000 pages. Each new language claims new and and of little mathematical use?
fashionable features, such as strong typing or structured 2.1.2 History sensitivity. Does the model include a
control statements, but the plain fact is that few lan- notion of storage, so that one program can save infor-
guages make programming sufficiently cheaper or more mation that can affect the behavior of a later program?
reliable to justify the cost of producing and learning to That is, is the model history sensitive?
use them. 2.1.3 Type of semantics. Does a program successively
Since large increases in size bring only small increases transform states (which are not programs) until a termi-
in power, smaller, more elegant languages such as Pascal nal state is reached (state-transition semantics)? Are
continue to be popular. But there is a desperate need for states simple or complex? Or can a "program" be suc-
a powerful methodology to help us think about pro- cessively reduced to simpler "programs" to yield a final
614 Communications August 1978
of Volume 21
the ACM Number 8
"normal form program," which is the result (reduction three parts: a central processing unit (or CPU), a store,
semantics)? and a connecting tube that can transmit a single word
2.1.4 Clarity and conceptual usefulness of programs. between the CPU and the store (and send an address to
•Are programs of the model clear expressions of a process the store). I propose to call this tube the yon Neumann
or computation? Do they embody concepts that help us bottleneck. The task of a program is to change the
to formulate and reason about processes? contents of the store in some major way; when one
considers that this task must be accomplished entirely by
2.2 Classification of Models pumping single words back and forth through the von
Using the above criteria we can crudely characterize Neumann bottleneck, the reason for its name becomes
three classes of models for computing systems--simple clear.
operational models, applicative models, and von Neu- Ironically, a large part of the traffic in the bottleneck
mann models. is not useful data but merely names of data, as well as
2.2.1 Simple operational models. Examples: Turing operations and data used only to compute such names.
machines, various automata. Foundations: concise and Before a word can be sent through the tube its address
useful. History sensitivity: have storage, are history sen- must be in the CPU; hence it must either be sent through
sitive. Semantics: state transition with very simple states. the tube from the store or be generated by some CPU
Program clarity: programs unclear and conceptually not operation. If the address is sent from the store, then its
helpful. address must either have been sent from the store or
2.2.2 Applicative models. Examples: Church's generated in the CPU, and so on. If, on the other hand,
lambda calculus [5], Curry's system of combinators [6], the address is generated in the CPU, it must be generated
pure Lisp [17], functional programming systems de- either by a fixed rule (e.g., "add 1 to the program
scribed in this paper. Foundations: concise and useful. counter") or by an instruction that was sent through the
History sensitivity: no storage, not history sensitive. Se- tube, in which case its address must have been sent . . .
mantics: reduction semantics, no states. Program clarity: and so on.
programs can be clear and conceptually useful. Surely there must be a less primitive way of making
2.2.3 Von Neumann models. Examples: von Neu- big changes in the store than by pushing vast numbers
mann computers, conventional programming languages. of words back and forth through the von Neumann
Foundations: complex, bulky, not useful. History sensitiv- bottleneck. Not only is this tube a literal bottleneck for
ity: have storage, are history sensitive. Semantics: state the data traffic of a problem, but, more importantly, it is
transition with complex states. Program clarity: programs an intellectual bottleneck that has kept us tied to word-
can be moderately clear, are not very useful conceptually. at-a-time thinking instead of encouraging us to think in
The above classification is admittedly crude and terms of the larger conceptual units of the task at hand.
debatable. Some recent models may not fit easily into Thus programming is basically planning and detailing
any of these categories. For example, the data-flow the enormous traffic of words through the von Neumann
languages developed by Arvind and Gostelow [1], Den- bottleneck, and much of that traffic concerns not signif-
nis [7], Kosinski [13], and others partly fit the class of icant data itself but where to find it.
simple operational models, but their programs are clearer
than those of earlier models in the class and it is perhaps
possible to argue that some have reduction semantics. In 4. Von Neumann Languages
any event, this classification will serve as a crude map of
the territory to be discussed. We shall be concerned only Conventional programming languages are basically
with applicative and von Neumann models.
high level, complex versions of the von Neumann com-
puter. Our thirty year old belief that there is only one
kind of computer is the basis of our belief that there is
3. Von Neumann Computers only one kind of programming language, the conven-
tional--von Neumann--language. The differences be-
In order to understand the problems of conventional tween Fortran and Algol 68, although considerable, are
programming languages, we must first examine their less significant than the fact that both are based on the
intellectual parent, the von Neumann computer. What is programming style of the von Neumann computer. Al-
a v o n Neumann computer? When von Neumann and though I refer to conventional languages as "von Neu-
others conceived it over thirty years ago, it was an mann languages" to take note of their origin and style,
elegant, practical, and unifying idea that simplified a I do not, of course, blame the great mathematician for
number of engineering and programming problems that their complexity. In fact, some might say that I bear
existed then. Although the conditions that produced its some responsibility for that problem.
architecture have changed radically, we nevertheless still Von Neumann programming languages use variables
identify the notion of "computer" with this thirty year to imitate the computer's storage cells; control statements
old concept. elaborate its jump and test instructions; and assignment
In its simplest form a v o n Neumann computer has statements imitate its fetching, storing, and arithmetic.
615 Communications August 1978
of Volume 21
the A C M Number 8
The assignment statement is the von Neumann bottle- 5. Comparison of von Neumann and Functional
neck of programming languages and keeps us thinking Programs
in word-at-a-time terms in much the same way the
computer's bottleneck does. To get a more detailed picture of some of the defects
Consider a typical program; at its center are a number of von Neumann languages, let us compare a conven-
of assignment statements containing some subscripted tional program for inner product with a functional one
variables. Each assignment statement produces a one- written in a simple language to be detailed further on.
word result. The program must cause these statements to
be executed many times, while altering subscript values, 5.1 A von Neumann Program for Inner Product
in order to make the desired overall change in the store,
c.-~-0
since it must be done one word at a time. The program- for i .~ I step 1 until n do
mer is thus concerned with the flow of words through c .---c + ali]xbIi]
the assignment bottleneck as he designs the nest of
Several properties of this program are worth noting:
control statements to cause the necessary repetitions.
a) Its statements operate on an invisible "state" ac-
Moreover, the assignment statement splits program-
cording to complex rules.
ming into two worlds. The first world comprises the right
b) It is not hierarchical. Except for the right side of
sides of assignment statements. This is an orderly world
the assignment statement, it does not construct complex
of expressions, a world that has useful algebraic proper-
entities from simpler ones. (Larger programs, however,
ties (except that those properties are often destroyed by
often do.)
side effects). It is the world in which most useful com-
c) It is dynamic and repetitive. One must mentally
putation takes place.
execute it to understand it.
The second world of conventional programming lan-
d) It computes word-at-a-time by repetition (of the
guages is the world of statements. The primary statement
assignment) and by modification (of variable i).
in that world is the assignment statement itself. All the
e) Part of the data, n, is in the program; thus it lacks
other statements of the language exist in order to make
generality and works only for vectors of length n.
it possible to perform a computation that must be based
f) It names its arguments; it can only be used for
on this primitive construct: the assignment statement.
vectors a and b. To become general, it requires a proce-
This world of statements is a disorderly one, with few
dure declaration. These involve complex issues (e.g., call-
useful mathematical properties. Structured programming
by-name versus call-by-value).
can be seen as a modest effort to introduce some order
g) Its "housekeeping" operations are represented by
into this chaotic world, but it accomplishes little in
symbols in scattered places (in the for statement and the
attacking the fundamental problems created by the
subscripts in the assignment). This makes it impossible
word-at-a-time von Neumann style of programming,
to consolidate housekeeping operations, the most com-
with its primitive use of loops, subscripts, and branching
mon of all, into single, powerful, widely useful operators.
flow of control.
Thus in programming those operations one must always
Our fixation on yon Neumann languages has contin-
start again at square one, writing "for i .--- . . . " and
ued the primacy of the von Neumann computer, and our
"for j := . . . " followed by assignment statements sprin-
dependency on it has made non-von Neumann languages
kled with i's and j's.
uneconomical and has limited their development. The
absence of full scale, effective programming styles
founded on non-von Neumann principles has deprived 5.2 A Functional Program for Inner Product
designers of an intellectual foundation for new computer
Def Innerproduct
architectures. (For a brief discussion of that topic, see
- (Insert +)o(ApplyToAll x)oTranspose
Section 15.)
Applicative computing systems' lack of storage and Or, in abbreviated form:
history sensitivity is the basic reason they have not
Def IP - (/+)o(ax)oTrans.
provided a foundation for computer design. Moreover,
most applicative systems employ the substitution opera- Composition (o), Insert (/), and ApplyToAll (a) are
tion of the lambda calculus as their basic operation. This functional forms that combine existing functions to form
operation is one of virtually unlimited power, but its new ones. Thus f o g is the function obtained by applying
complete and efficient realization presents great difficul- first g and then fi and c~f is the function obtained by
ties to the machine designer. Furthermore, in an effort a p p l y i n g f to every member of the argument. If we write
to introduce storage and to improve their efficiency on f : x for the result of applying f to the object x, then we
von Neumann computers, applicative systems have can explain each step in evaluating Innerproduct applied
tended to become engulfed in a large von Neumann to the pair of vectors <<1, 2, 3>, <6, 5, 4 > > as follows:
system. For example, pure Lisp is often buried in large
I P : < < i,2,3>, <6,5,4>> =
extensions with many von Neumann features. The re- Definition of IP ~ (/+)o(ax)oTrans: < < 1,2,3>, <6,5,4>>
suiting complex systems offer little guidance to the ma- Effect of composition, o ~ (/+):((ax):(Trans:
chine designer. <<1,2,3>, <6,5,4>>))

616 Communications August 1978


of Volume 21
the ACM Number 8
Applying Transpose (/+):((ax): <<1,6>, <2,5>, <3,4>>) provides a general environment for its changeable fea-
Effect of ApplyToAll, a (/+): < x : <1,6>, x: <2,5>, x: <3,4>> tures.
Applying × (/+): <6,10,12>
Effect of Insert, / +: <6, +: <lO,12>>
Now suppose a language had a small framework
Applying + +: <6,22> which could accommodate a great variety of powerful
Applying + again 28 features entirely as changeable parts. Then such a frame-
work could support many different features and styles
Let us compare the properties of this program with without being changed itself. In contrast to this pleasant
those of the von Neumann program. possibility, von Neumann languages always seem to have
a) It operates only on its arguments. There are no an immense framework and very limited changeable
hidden states or complex transition rules. There are only parts. What causes this to happen? The answer concerns
two kinds of rules, one for applying a function to its two problems of von Neumann languages.
argument, the other for obtaining the function denoted The first problem results from the von Neumann
by a functional form such as composition, fog, or style of word-at-a-time programming, which requires
ApplyToAll, af, when one knows the functions f and g, that words flow back and forth to the state, just like the
the parameters of the forms. flow through the von Neumann bottleneck. Thus a v o n
b) It is hierarchical, being built from three simpler Neumann language must have a semantics closely cou-
functions (+, x, Trans) and three functional forms fog, pled to the state, in which every detail of a computation
af, a n d / f . changes the state. The consequence of this semantics
c) It is static and nonrepetitive, in the sense that its closely coupled to states is that every detail of every
structure is helpful in understanding it without mentally feature must be built into the state and its transition
executing it. For example, if one understands the action rules.
of the forms fog and af, and of the functions x and Thus every feature of a v o n Neumann language must
Trans, then one understands the action of a x and of be spelled out in stupefying detail in its framework.
(c~x)oTrans, and so on. Furthermore, many complex features are needed to prop
d) It operates on whole conceptual units, not words; up the basically weak word-at-a-time style. The result is
it has three steps; no step is repee, ted. the inevitable rigid and enormous framework of a v o n
e) It incorporates no data; it is completely general; it Neumann language.
works for any pair of conformable vectors.
f) It does not name its arguments; it can be applied to
any pair of vectors without any procedure declaration or 7. Changeable Parts and Combining Forms
complex substitution rules.
g) It employs housekeeping forms and functions that The second problem of von Neumann languages is
are generally useful in many other programs; in fact, that their changeable parts have so little expressive
only + and x are not concerned with housekeeping. power. Their gargantuan size is eloquent proof of this;
These forms and functions can combine with others to after all, if the designer knew that all those complicated
create higher level housekeeping operators. features, which he now builds into the framework, could
Section 14 sketches a kind of system designed to be added later on as changeable parts, he would not be
make the above functional style of programming avail- so eager to build them into the framework.
able in a history-sensitive system with a simple frame- Perhaps the most important element in providing
work, but much work remains to be done before the powerful changeable parts in a language is the availabil-
above applicative style can become the basis for elegant ity of combining forms that can be generally used to
and practical programming languages. For the present, build new procedures from old ones. Von Neumarm
the above comparison exhibits a number of serious flaws languages provide only primitive combining forms, and
in yon Neumann programming languages and can serve the von Neumann framework presents obstacles to their
as a starting point in an effort to account for their present full use.
fat and flabby condition. One obstacle to the use of combining forms is the
split between the expression world and the statement
world in von Neumann languages. Functional forms
6. Language Frameworks versus Changeable Parts naturally belong to the world of expressions; but no
matter how powerful they are they can only build expres-
Let us distinguish two parts of a programming lan- sions that produce a one-word result. And it is in the
guage. First, its framework which gives the overall rules statement world that these one-word results must be
of the system, and second, its changeable parts, whose combined into the overall result. Combining single words
existence is anticipated by the framework but whose is not what we really should be thinking about, but it is
particular behavior is not specified by it. For example, a large part of programming any task in von Neumann
the for statement, and almost all other statements, are languages. To help assemble the overall result from
part of Algol's framework but library functions and user- single words these languages provide some primitive
defined procedures are changeable parts. Thus the combining forms in the statement world--the for, while,
framework of a language describes its fixed features and and if-then-else statements--but the split between the
617 Communications August 1978
of Volume 21
the ACM Number 8
two worlds prevents the combining forms in either world foundations provide powerful tools for describing the
from attaining the full power they can achieve in an language and for proving properties of programs. When
undivided world. applied to a v o n Neumann language, on the other hand,
A second obstacle to the use of combining forms in it provides a precise semantic description and is helpful
von Neumann languages is their use of elaborate naming in identifying trouble spots in the language. But the
conventions, which are further complicated by the sub- complexity of the language is mirrored in the complexity
stitution rules required in calling procedures. Each of of the description, which is a bewildering collection of
these requires a complex mechanism to be built into the productions, domains, functions, and equations that is
framework so that variables, subscripted variables, only slightly more helpful in proving facts about pro-
pointers, file names, procedure names, call-by-value for- grams than the reference manual of the language, since
mal parameters, call-by-name formal parameters, and so it is less ambiguous.
on, can all be properly interpreted. All these names, Axiomatic semantics [11] precisely restates the in-
conventions, and rules interfere with the use of simple elegant properties ofvon Neumann programs (i.e., trans-
combining forms. formations on states) as transformations on predicates.
The word-at-a-time, repetitive game is not thereby
changed, merely the playing field. The complexity of this
8. APL versus Word-at-a-Time Programming axiomatic game of proving facts about von Neumann
programs makes the successes of its practitioners all the
Since I have said so much about word-at-a-time more admirable. Their success rests on two factors in
programming, I must now say something about APL addition to their ingenuity: First, the game is restricted
[12]. We owe a great debt to Kenneth Iverson for showing to small, weak subsets of full von Neumann languages
us that there are programs that are neither word-at-a- that have states vastly simpler than real ones. Second,
time nor dependent on lambda expressions, and for the new playing field (predicates and their transforma-
introducing us to the use of new functional forms. And tions) is richer, more orderly and effective than the old
since APL assignment statements can store arrays, the (states and their transformations). But restricting the
effect of its functional forms is extended beyond a single game and transferring it to a more effective domain does
assignment. not enable it to handle real programs (with the necessary
Unfortunately, however, APL still splits program- complexities of procedure calls and aliasing), nor does it
ming into a world of expressions and a world of state- eliminate the clumsy properties of the basic von Neu-
ments. Thus the effort to write one-line programs is mann style. As axiomatic semantics is extended to cover
partly motivated by the desire to stay in the more orderly more of a typical von Neumann language, it begins to
world of expressions. APL has exactly three functional lose its effectiveness with the increasing complexity that
forms, called inner product, outer product, and reduc- is required.
tion. These are sometimes difficult to use, there are not Thus denotational and axiomatic semantics are de-
enough of them, and their use is confined to the world scriptive formalisms whose foundations embody elegant
of expressions. and powerful concepts; but using them to describe a v o n
Finally, APL semantics is still too closely coupled to Neumann language can not produce an elegant and
states. Consequently, despite the greater simplicity and powerful language any more than the use of elegant and
power of the language, its framework has the complexity modern machines to build an Edsel can produce an
and rigidity characteristic of von Neumann languages. elegant and modem car.
In any case, proofs about programs use the language
of logic, not the language of programming. Proofs talk
9. Von Neumann Languages Lack Useful about programs but cannot involve them directly since
Mathematical Properties the axioms of von Neumann languages are so unusable.
In contrast, many ordinary proofs are derived by alge-
So far we have discussed the gross size and inflexi- braic methods. These methods require a language that
bility of von Neumann languages; another important has certain algebraic properties. Algebraic laws can then
defect is their lack of useful mathematical properties and be used in a rather mechanical way to transform a
the obstacles they present to reasoning about programs. problem into its solution. For example, to solve the
Although a great amount of excellent work has been equation
published on proving facts about programs, von Neu-
mann languages have almost no properties that are ax+bx=a+b
helpful in this direction and have many properties that
for x (given that a+b ~ 0), we mechanically apply the
are obstacles (e.g., side effects, aliasing).
distributive, identity, and cancellation laws, in succes-
Denotational semantics [23] and its foundations [20,
sion, to obtain
21] provide an extremely helpful mathematical under-
standing of the domain and function spaces implicit in (a + b)x = a + b
programs. When applied to an applicative language (a + b)x = (a + b) l
(such as that of the "recursive programs" of [16]), its X~ 1.

618 Communications August 1978


of Volume 21
the ACM Number 8
Thus we have proved that x = 1 without leaving the past three or four years and have not yet found a
"language" of algebra. Von Neumann languages, with satisfying solution to the many conflicting requirements
their grotesque syntax, offer few such possibilities for that a good language must resolve. But I believe this
transforming programs. search has indicated a useful approach to designing non-
As we shall see later, programs can be expressed in von Neumann languages.
a language that has an associated algebra. This algebra This approach involves four elements, which can be
can be used to transform programs and to solve some summarized as follows.
equations whose "unknowns" are programs, in much the a) A functional style of programming without varia-
same way one solves equations in high school algebra. bles. A simple, informal functional programming (FP)
Algebraic transformations and proofs use the language system is described. It is based on the use of combining
of the programs themselves, rather than the language of forms for building programs. Several programs are given
logic, which talks about programs. to illustrate functional programming.
b) An algebra of functional programs. An algebra is
described whose variables denote FP functional pro-
10. What Are the Alternatives to von Neumann
grams and whose "operations" are FP functional forms,
Languages?
the combining forms of FP programs. Some laws of the
algebra are given. Theorems and examples are given that
Before discussing alternatives to von Neumann lan-
show how certain function expressions may be trans-
guages, let me remark that I regret the need for the above
formed into equivalent infinite expansions that explain
negative and not very precise discussion of these lan-
the behavior of the function. The FP algebra is compared
guages. But the complacent acceptance most of us give
with algebras associated with the classical applicative
to these enormous, weak languages has puzzled and
systems of Church and Curry.
disturbed me for a long time. I am disturbed because
c) A formal functional programming system. A formal
that acceptance has consumed a vast effort toward mak-
(FFP) system is described that extends the capabilities
ing von Neumann languages fatter that might have been
of the above informal FP systems. An FFP system is
better spent in looking for new structures. For this reason
thus a precisely defined system that provides the ability
I have tried to analyze some of the basic defects of
to use the functional programming style of FP systems
conventional languages and show that those defects can-
and their algebra of programs. FFP systems can be used
not be resolved unless we discover a new kind of lan-
as the basis for applicative state transition systems.
guage framework.
d) Applicative state transition systems. As discussed
In seeking an alternative to conventional languages
above. The rest of the paper describes these four ele-
we must first recognize that a system cannot be history
ments, gives some brief remarks on computer design,
sensitive (permit execution of one program to affect the
and ends with a summary of the paper.
behavior of a subsequent one) unless the system has
some kind of state (which the first program can change
and the second can access). Thus a history-sensitive I I. Functional Programming Systems (FP Systems)
model of a computing system must have a state-transition
semantics, at least in this weak sense. But this does not 11.1 Introduction
mean that every computation must depend heavily on a In this section we give an informal description of a
complex state, with many state changes required for each class of simple applicative programming systems called
small part of the computation (as in von Neumann functional programming (FP) systems, in which "pro-
languages). grams" are simply functions without variables. The de-
To illustrate some alternatives to von Neumann lan- scription is followed by some examples and by a discus-
guages, I propose to sketch a class of history-sensitive sion of various properties of FP systems.
computing systems, where each system: a) has a loosely An FP system is founded on the use of a fixed set of
coupled state-transition semantics in which a state tran- combining forms called functional forms. These, plus
sition occurs only once in a major computation; b) has simple definitions, are the only means of building new
a simply structured state and simple transition rules; c) functions from existing ones; they use no variables or
depends heavily on an underlying applicative system substitution rules, and they become the operations of an
both to provide the basic programming language of the associated algebra of programs. All the functions of an
system and to describe its state transitions. FP system are of one type: they map objects into objects
These systems, which I call applicative state transition and always take a single argument.
(or AST) systems, are described in Section 14. These In contrast, a lambda-calculus based system is
simple systems avoid many of the complexities and founded on the use of the lambda expression, with an
weaknesses of von Neumann languages and provide for associated set of substitution rules for variables, for
a powerful and extensive set of changeable parts. How- building new functions. The lambda expression (with its
ever, they are sketched only as crude examples of a vast substitution rules) is capable of defining all possible
area of non-von Neumann systems with various attrac- computable functions of all possible types and of any
tive properties. I have been studying this area for the number of arguments. This freedom and power has its
619 Communications August 1978
of Volume 21
the ACM Number 8
disadvantages as well as its obvious advantages. It is There is one important constraint in the construction
analogous to the power of unrestricted control statements of objects: if x is a sequence with J_ as an element, then
in conventional languages: with unrestricted freedom x = ±. That is, the "sequence constructor" is "±-pre-
comes chaos. If one constantly invents new combining serving." Thus no proper sequence has i as an element.
forms to suit the occasion, as one can in the lambda
calculus, one will not become familiar with the style or
Examples of objects
useful properties of the few combining forms that are ± 1.5 ¢p AB3 <AB, 1, 2.3>
adequate for all purposes. Just as structured program- <.4, < < B > , C>, D > <,4, ± > = ±
ming eschews many control statements to obtain pro-
grams with simpler structure, better properties, and uni- 11.2.2 Application. An FP system has a single oper-
form methods for understanding their behavior, so func- ation, application. I f f is a function and x is an object,
tional programming eschews the lambda expression, sub- t h e n f : x is an application and denotes the object which
stitution, and multiple function types. It thereby achieves is the result of applying f to x. f is the operator of the
programs built with familiar functional forms with application and x is the operand.
known useful properties. These programs are so struc- Examples of applications
tured that their behavior can often be understood and
proven by mechanical use of algebraic techniques similar +:<•,2> = 3 tI:<.A,B,C> = <B,C>
to those used in solving high school algebra problems. I:<A,B,C> = A 2:<A,B,C> = B
Functional forms, unlike most programming con- 11.2.3 Functions, F. All functions f i n F map objects
structs, need not be chosen on an ad hoc basis. Since into objects and are bottom-preserving:f:± = ±, for a l l f
they are the operations of an associated algebra, one in F. Every function in F is either primitive, that is,
chooses only those functional forms that not only provide supplied with the system, or it is defined (see below), or
powerful programming constructs, but that also have it is a functional form (see below).
attractive algebraic properties: one chooses them to max- It is sometimes useful to distinguish between two
imize the strength and utility of the algebraic laws that cases in w h i c h f : x = ± . If the computation f o r f : x termi-
relate them to other functional forms of the system. nates and yields the object _1_,we s a y f i s undefined at x,
In the following description we shall be imprecise in that is, f terminates but has no meaningful value at x.
not distinguishing between (a) a function symbol or Otherwise we s a y f i s nonterminating at x.
expression and (b) the function it denotes. We shall
indicate the symbols and expressions used to denote Examples of primitive functions
functions by example and usage. Section 13 describes a Our intention is to provide FP systems with widely
formal extension o f FP systems (FFP systems); they can useful and powerful primitive functions rather than weak
serve to clarify any ambiguities about FP systems. ones that could then be used to define useful ones. The
following examples define some typical primitive func-
11.2 Description tions, many of which are used in later examples o f
An FP system comprises the following: programs. In the following definitions we use a variant
l) a set O of objects; of McCarthy's conditional expressions [ 17]; thus we write
2) a set F of functions f that map objects into objects; pl -+ el; ... ;pn ~ en; e,+l
3) an operation, application;
instead of McCarthy's expression
4) a set F of functional forms; these are used to combine
existing functions, or objects, to form new functions in (401---> el . . . . . p n ---~ en, T---~ en+l).
F;
The following definitions are to hold for all objects x, xi,
5) a set D of definitions that define some functions in F
y, yi, Z, Zi:
and assign a name to each.
What follows is an informal description of each of Selector functions
the above entities with examples. 1 :X ~- X = < X 1 , ... , X n > ""* X1; I

and for any positive integer s


11.2.1 Objects, O. An object x is either an atom, a
sequence <x~ .... , Xn> whose elements xi are objects, or S:X----X=<Xb...,Xn>&n~s--~ xs;-L
± ("bottom" or "undefined"). Thus the choice of a set A Thus, for example, 3 : < A , B , C > = C and 2 : < A > = ±.
of atoms determines the set of objects. We shall take A
Note that the function symbols 1, 2, etc. are distinct from
to be the set of nonnull strings of capital letters, digits, the atoms 1, 2, etc.
and special symbols not used by the notation of the FP
system. Some o f these strings belong to the class of atoms Tail
called "numbers." The atom ~ is used to denote the t h x -- x = < x ~ > ~ if;
x=<xl, . . . , Xn> & n 2__2~ <x2, .... xn>; i
empty sequence and is the only object which is both an
atom and a sequence. The atoms T and F are used to Identity
denote "true" and "false." id:x - x

620 Communications August 1978


of Volume 21
the ACM Number 8
Atom and g, f and g are its parameters, and it denotes the
a t o m : x - x is an a t o m ~ T; x # 3 - ~ F; ,1, function such that, for any object x,
Equals (fog) :x = f : ( g : x ) .
e q : x -- x = < y , z > & y=z----> T; x = < y , z > & y--~z---> F; ,1,
Some functional forms m a y have objects as parameters.
Null
For example, for any object x, ~c is a functional form, the
null:x -= x = ~ ~ 12, x~_l_ ~ F; _1_
constant function o f x, so that for a n y object y
Reverse
reverse:x = x=4~ ~ dp; Yc:y = y = l ~ 3-; x.
X=<Xl, ... , Xn> ~ < X n , ... ; X I > ; -J-
In particular, _T_is the everywhere-_l_ function.
Distribute from left; distribute from right Below we give some functional forms, m a n y o f which
distl:x - x = < y @ > ---> ep; are used later in this paper. W e u s e p , f, and g with and
X=<.V,<21 . . . . . a n > > ----->< < f , & > ..... <y,,%>>; ± without subscripts to denote arbitrary functions; and x,
distr:x -- x = < f f , y > ---) q~; Xl . . . . . x., y as arbitrary objects. Square brackets [...] are
X=<<yl, ... , f n > , 2 > ---> < < . V 1 , Z > , ... , < f i n , a > > ; ± used to indicate the functional form for construction,
which denotes a function, whereas pointed brackets
Length
< . . . > denote sequences, which are objects. Parentheses
l e n g t h : x - x = < x , . . . . . Xn> --+ n; x=qa ---> 0; ,1,
are used both in particular functional forms (e.g., in
Add, subtract, multiply, and divide condition) and generally to indicate grouping.
+ :x = x = < y , z > & y , z are numbers--+ y + z ; ,1,
-:x - x = < y , z > & y,z are n u m b e r s ~ y - z ; ,1, Composition
x :x -- x = < y , z > & y,z are n u m b e r s ~ yXz; ,1, ( f o g ) : x =-f : ( g : x )
+:x - x = < y , z > & y,z are numbers---> y+z; ,1,
(where y + 0 = ,1,) Construction
[fi . . . . . fn]:x = <fi :x . . . . . f n : X > (Recall that since
Transpose < .... 3_.... > = _1_ and all functions are _L-preserving, so
t r a n s : x -- x = < 4 , . . . . . 4'> "-->,/,;
is [ f i . . . . . f n ] - )
X=<Xl, ... , Xn> --+ <yl, ... , f r o > ; _1_
where Condition
Xi~"~-<Xil, ... , Xim> and ( p - + f, g ) : x -- (p:x)=T---~ f : x ; ( p : x ) = F - - + g:x; ±
yj=<xtj . . . . . Xnj>, l_i__n, l _ j _ m .
Conditional expressions (used outside o f FP systems to
And, or, not describe their functions) and the functional f o r m condi-
a n d : x ~ x = < T , T > --> T; tion are both identified by "---~". T h e y are quite different
x = < T , F > V x = < F , T > V x = < F , F > ---> F; 3- although closely related, as shown in the above defini-
etc. tions. But no confusion should arise, since the elements
Append left; append right o f a conditional expression all denote values, whereas
a p n d l : x = x=<y,ep> ~ < y > ; the elements o f the functional form condition all denote
X~-<.V,<Z1 . . . . , Z n > > ~ < y , Zl . . . . , Zn>; 3_
functions, never values. W h e n no ambiguity arises we
a p n d r : x -= x = < q , , z > - - + < z > ; omit right-associated parentheses; we write, for example,
X = < < y l . . . . . y n > , Z > "-'> < y l . . . . . yn,,Z>; "1" pl ---)f,;p2--*f2; g for (pl---> fi; (/02-'-~f2; g)).

Right selectors; Right tail Constant (Here x is an object parameter.)


l r : x - x = < x , , ..., Xn> ---) Xn; -J- ~c:y = y = / ~ ±; x
2 r : x -- x=<x~, ..., x , > & n _ 2 -+ x , - , ; 3- Insert
etc. / f : x =- x = < x l > ~ Xl; x = < x l , ..., xn > & n_>2
tlr:x-- x = < x ~ > --+ 6;
-->f:<x,,/f:<x2 ..... Xn>>; ±
x = < x , .... , x , > & n_>2 --+ < x , . . . . . Xn-,>; "1"
I f f has a unique right unit ur # ±, where
Rotate left; rotate right
f : < x , u r > E {x, 3_} for all objects x, then the above
rotl:x = x = ~ ~ 4~; x = < x l > "--> < X l > ;
definition is extended:/f:q~ = ur. T h u s
x = < x , . . . . . x . > & n _ 2 ---> <x2 .... , Xn,X,>; ±
etc. /+:<4,5,6> = +:<4, +:<5,/+:<6>>>
= + : < 4 , + : < 5 , 6 > > = 15
11.2.4 Functional forms, F. A functional form is an
/+:~=0
expression denoting a function; that function depends on
the functions or objects which are the parameters o f the Apply to all
expression. Thus, for example, i f f and g are a n y func- a f : x - x=ep ~ 4';
tions, then f o g is a functional form, the composition o f f X=<XI ..... Xn> ~ < f : x , ..... f:Xn>; ±

621 Communications August 1978


of Volume 21
the ACM Number 8
Binary to unary (x is an object parameter) and knows how to apply it. I f f i s a functional form, then
(bu f x) :y - f: <x,y> the description of the form tells how to compute f : x in
terms of the parameters of the form, which can be done
Thus by further use of these rules. I f f is defmed, D e f f - r, as
(bu + l):x = l + x in (3), then to f m d f : x one computes r:x, which can be
done by further use of these rules. If none of these, then
While f : x - .1_. O f course, the use of these rules may not
(while p f ) : x ~ p: x = T --* (while p f ) : (f: x); terminate for s o m e f a n d some x, in which case we assign
p:x=F---~ x; ±
the value f : x --- .1_.
The above functional forms provide an effective
method for computing the values of the functions they 11.3 Examples of Functional Programs
denote (if they terminate) provided one can effectively The following examples illustrate the functional pro-
apply their function parameters. gramming style. Since this style is unfamiliar to most
readers, it may cause confusion at first; the important
11.2.5 Definitions. A definition in an FP system is an
point to remember is that no part of a function definition
expression of the form
is a result itself. Instead, each part is a function that must
Def l -- r be applied to an argument to obtain a result.
11.3.1 Factorial.
where the left side 1 is an unused function symbol and
the right side r is a functional form (which may depend Def ! - eq0 ~ ]; xo[id, !osubl]
on/). It expresses the fact that the symbol I is to denote
where
the function given by r. Thus the definition Def last 1 -
loreverse defines the function lastl that produces the Def eq0 -- eqo[id, 0]
last element of a sequence (or ±). Similarly, Def subl - - o [ i d , ]]
Def last -- nullotl --> l; lastotl Here are some of the intermediate expressions an FP
system would obtain in evaluating !: 2:
defines the function last, which is the same as last 1. Here
in detail is how the definition would be used to compute !:2 ~ (eqO--~ 1; ×o[id, !osubl]):2
last: <1,2>: xo[id, !osubl]:2
last:<l,2> = × : < i d : 2 , !osubl:2> ~ × : < 2 , !:1>
definition of last (nullotl ---, 1; l a s t o t l ) : < l , 2 > x:<2, x:<l, !:0>>
action o f the form (p--~fi g) lastotl:<l,2> x:<2, X:<I,I:0>> ~ x:<2, x:<l,l>>
since n u l l o t l : < l , 2 > = n u l l : < 2 > x : < 2 , 1 > ~ 2.
=F
action of the form f o g last:(tl:<l,2>)
In Section 12 we shall see how theorems of the algebra
definition of primitive tail last:<2>
definition of last (nullotl --~ l; l a s t o t l ) : < 2 >
o f FP programs can be used to prove that ! is the
action o f the form (p-*~ g) 1:<2> factorial function.
since n u t l o t l : < 2 > = null:oh = T 11.3.2 Inner product. We have seen earlier how this
definition o f selector 1 ~2 definition works.
The above illustrates the simple rule: to apply a Def IP ~- (/+)o(ax)otrans
defined symbol, replace it by the right side of its defini-
tion. Of course, some definitions may define nontermi- 11.3.3 Matrix multiply. This matrix multiplication
nating functions. A set D of definitions is well formed if program yields the product of any pair <re,n> of con-
no two left sides are the same. formable matrices, where each matrix m is represented
11.2.6 Semantics. It can be seen from the above that as the sequence of its rows:
an FP system is determined by choice of the following m = <ml, ... , m r >
sets: (a) The set of atoms A (which determines the set of where mi = <mil ..... mis> for i = 1. . . . . r.
objects). (b) The set of primitive functions P. (c) The set Def MM = (aalp)o(adistl)odistro[ 1, transo2]
of functional forms F. (d) A well formed set of definitions
D. To understand the semantics of such a system one The program MM has four steps, reading from right to
needs to know how to compute f : x for any function f left; each is applied in turn, beginning with [1, transo2],
and any object x of the system. There are exactly four to the result of its predecessor. If the argument is <m,n>,
possibilities for f : then the first step yields <rn,n'> where n' = trans:n. The
( l ) f i s a primitive function; second step yields <<ml,n'>. . . . , < m r , n ' > > , where the
( 2 ) f i s a functional form; mi are the rows of m. The third step, adistl, yields
(3) there is one definition in D, D e f f - r; and
<distl:<ml,n'>, ..., distl:<mr,n'>> = <pl, ... ,pr>
(4) none o f the above.
I f f is a primitive function, then one has its description where
622 Communications August 1978
of Volume 21
the A C M Number 8
pi = d i s t l : < m i , n ' > -- < < m i , n l ' > , ..., <mi,ns'>> itive functions and functional forms, the FP framework
f o r i = 1..... r provides for a large class of languages with various styles
and capabilities. The algebra of programs associated
and nj' is the jth column of n (the jth row of n'). Thus pi,
with each of these depends on its particular set of func-
a sequence of row and column pairs, corresponds to the
tional forms. The primitive functions, functional forms,
i-th product row. The operator aedP, or a(etlP), causes
and programs given in this paper comprise an effort to
alP to be applied to each pi, which in turn causes IP to
develop just one of these possible styles.
be applied to each row and column pair in each pi. The
11.4.2 Limitations of FP systems. FP systems have
result of the last step is therefore the sequence of rows
a number of limitations. For example, a given FP system
comprising the product matrix. If either matrix is not
is a fixed language; it is not history sensitive: no program
rectangular, or if the length of a row of m differs from
can alter the library of programs. It can treat input and
that of a column of n, or if any element of m or n is not
output only in the sense that x is an input a n d f : x is the
a number, the result is Z.
output. If the set of primitive functions and functional
This program MM does not name its arguments or
forms is weak, it may not be able to express every
any intermediate results; contains no variables, no loops,
computable function.
no control statements nor procedure declarations; has no
An FP system cannot compute a program since func-
initialization instructions; is not word-at-a-time in na-
tion expressions are not objects. Nor can one define new
ture; is hierarchically constructed from simpler compo-
functional forms within an FP system. (Both of these
nents; uses generally applicable housekeeping forms and
limitations are removed in formal functional program-
operators (e.g., af, distl, distr, trans); is perfectly general;
ming (FFP) systems in which objects "represent" func-
yields ± whenever its argument is inappropriate in any
tions.) Thus no FP system can have a function, apply,
way; does not constrain the order of evaluation unnec-
such that
essarily (all applications of IP to row and column pairs
can be done in parallel or in any order); and, using apply: < x , y > = x :y
algebraic laws (see below), can be transformed into more
because, on the left, x is an object, and, on the right, x
"efficient" or into more "explanatory" programs (e.g.,
is a function. (Note that we have been careful to keep
one that is recursively defined). None of these properties
the set of function symbols and the set of objects distinct:
hold for the typical von Neumann matrix multiplication
thus 1 is a function symbol, and 1 is an object.)
program.
The primary limitation of FP systems is that they are
Although it has an unfamiliar and hence puzzling
not history sensitive. Therefore they must be extended
form, the program MM describes the essential operations
somehow before they can become practically useful. For
of matrix multiplication without overdetermining the
discussion of such extensions, see the sections on FFP
process or obscuring parts of it, as most programs do;
and AST systems (Sections 13 and 14).
hence many straightforward programs for the operation
can be obtained from it by formal transformations. It is
an inherently inefficient program for von Neumann 11.4.3 Expressive power of FP systems. Suppose two
computers (with regard to the use of space), but efficient FP systems, FP~ and FP2, both have the same set of
ones can be derived from it and realizations of FP objects and the same set of primitive functions, but the
systems can be imagined that could execute MM without set of functional forms of FP~ properly includes that of
the prodigal use of space it implies. Efficiency questions FP2. Suppose also that both systems car~ express all
are beyond the scope of this paper; let me suggest only computable functions on objects. Nevertheless, we can
that since the language is so simple and does not dictate say that FPi is more expressive than FP2, since every
any binding of lambda-type variables to data, there may function expression in FP2 can be duplicated in FP1, but
be better opportunities for the system to do some kind of by using a functional form not belonging to FP2, FP~ can
"lazy" evaluation [9, 10] and to control data management express some functions more directly and easily than
more efficiently than is possible in lambda-calculus FP2.
based systems. I believe the above observation could be developed
into a theory of the expressive power of languages in
11.4 Remarks About FP Systems which a language A would be m o r e expressive than
11.4.1 FP systems as programming languages. FP language B under the following roughly stated condi-
systems are so minimal that some readers may find it tions. First, form all possible functions of all types in A
difficult to view them as programming languages. by applying all existing functions to objects and to each
Viewed as such, a f u n c t i o n f i s a program, an object x is other in all possible ways until no new function of any
the contents of the store, and f : x is the contents of the type can be formed. (The set of objects is a type; the set
store after p r o g r a m f i s activated with x in the store. The of continuous functions [T->U] from type T to type U is
set of definitions is the program library. The primitive a type. IffE[T----~U] and tET, t h e n f t in U can be formed
functions and the functional forms provided by the by applying f to t.) Do the same in language B. Next,
system are the basic statements of a particular program- compare each type in A to the corresponding type in B.
ming language. Thus, depending on the choice of prim- If, for every type, A's type includes B's corresponding
67.3 Communications August 1978
of Volume 21
the ACM Number 8
type, then A is more expressive than B (or equally correct, he will need much simpler techniques than those
expressive). If some type of A's functions is incomparable the professionals have so far put forward. The algebra of
to B's, then A and B are not comparable in expressive programs below may be one starting point for such a
power. proof discipline and, coupled with current work on al-
11.4.4 Advantages of FP systems. The main reason gebraic manipulation, it may also help provide a basis
FP systems are considerably simpler than either conven- for automating some of that discipline.
tional languages or lambda-calculus-based languages is One advantage of this algebra over other proof tech-
that they use only the most elementary fixed naming niques is that the programmer can use his programming
system (naming a function in a det'mition) with a simple language as the language for deriving proofs, rather than
fixed rule of substituting a function for its name. Thus having to state proofs in a separate logical system that
they avoid the complexities both of the naming systems merely talks about his programs.
of conventional languages and of the substitution rules At the heart of the algebra of programs are laws and
of the lambda calculus. FP systems permit the definition theorems that state that one function expression is the
of different naming systems (see Sections 13.3.4 and same as another. Thus the law [fig]oh =_ [foh, goh] says
14.7) for various purposes. These need not be complex, that the construction o f f and g (composed with h) is the
since many programs can do without them completely. same function as the construction of ( f composed with
Most importantly, they treat names as functions that can h) and (g composed with h) no matter what the functions
be combined with other functions without special treat- f, g, and h are. Such laws are easy to understand, easy to
ment. justify, and easy and powerful to use. However, we also
FP systems offer an escape from conventional word- wish to use such laws to solve equations in which an
at-a-time programming to a degree greater even than "unknown" function appears on both sides of the equa-
APL [12] (the most successful attack on the problem to tion. The problem is that iffsatisfies some such equation,
date within the von Neumann framework) because they it will often happen that some extensionf' of f will also
provide a more powerful set of functional forms within satisfy the same equation. Thus, to give a unique mean-
a unified world of expressions. They offer the opportu- ing to solutions of such equations, we shall require a
nity to develop higher level techniques for thinking foundation for the algebra of programs (which uses
about, manipulating, and writing programs. Scott's notion of least fixed points of continuous func-
tionals) to assure us that solutions obtained by algebraic
manipulation are indeed least, and hence unique, solu-
12. The Algebra of Programs for FP Systems tions.
Our goal is to develop a foundation for the algebra
12.1 Introduction of programs that disposes of the theoretical issues, so
The algebra of the programs described below is the that a programmer can use simple algebraic laws and
work of an amateur in algebra, and I want to show that one or two theorems from the foundations to solve
it is a game amateurs can profitably play and enjoy, a problems and create proofs in the same mechanical style
game that does not require a deep understanding of logic we use to solve high-school algebra problems, and so
and mathematics. In spite of its simplicity, it can help that he can do so without knowing anything about least
one to understand and prove things about programs in fixed points or predicate transformers.
a systematic, rather mechanical way. One particular foundational problem arises: given
So far, proving a program correct requires knowledge equations of the form
of some moderately heavy topics in mathematics and f-po---} q0; ... ;pi---} qi; Ei(f), (1)
logic: properties of complete partially ordered sets, con-
tinuous functions, least fLxed points of functionals, the where the pi's and qi's are functions not involvingf and
first-order predicate calculus, predicate transformers, Ei(f) is a function expression involvingfi the laws of the
weakest preconditions, to mention a few topics in a few algebra will often permit the formal "extension" of this
approaches to proving programs correct. These topics equation by one more "clause" by deriving
have been very useful for professionals who make it their Ei(f) -- p i + l ~ qi+l; Ei+l(f) (2)
business to devise proof techniques; they have published
a lot of beautiful work on this subject, starting with the which, by replacing El(f) in (1) by the right side of (2),
work of McCarthy and Floyd, and, more recently, that yields
of Burstall, Dijkstra, Manna and his associates, Milner, f---po--* q0; ... ;pi+l ~ q~+l; E i + l ( f ) . (3)
Morris, Reynolds, and many others. Much of this work This formal extension may go on without limit. One
is based on the foundations laid down by Dana Scott question the foundations must then answer is: when can
(denotational semantics) and C. A. R. Hoare (axiomatic the least f satisfying (1) be represented by the infinite
semantics). But its theoretical level places it beyond the expansion
scope of most amateurs who work outside of this spe-
cialized field. f - po --~ qo; ... ; pn --~ qn; ... (4)
If the average programmer is to prove his programs in which the final clause involving f has been dropped,
67,4 Communications August 1978
of Volume 2 !
the ACM Number 8
so that we now have a solution whose right side is free definedog ; -~ lo [fig] - - f
off's? Such solutions are helpful in two ways: first, they
to indicate that the law (or theorem) on the right holds
give proofs of "termination" in the sense that (4) means
within the domain of objects x for which definedog:x
t h a t f : x is defined if and only if there is an n such that,
= T. Where
for every i less than n, pi: x = F and pn : X = T and qn : X
is defined. Second, (4) gives a case-by-case description Def defined - ~r
o f f that can often clarify its behavior.
i.e. defined:x - x = ± ---> ±; T. In general we shall write
The foundations for the algebra given in a subsequent
a qualified functional equation:
section are a modest start toward the goal stated above.
For a limited class of equations its "linear expansion p --->--->f- g
theorem" gives a useful answer as to when one can go
to mean that, for any object x, w h e n e v e r p : x = T, then
from indefinitely extendable equations like (l) to infinite
f:x=g:x.
expansions like (4). For a larger class of equations, a
more general "expansion theorem" gives a less helpful Ordinary algebra concerns itself with two operations,
answer to similar questions. Hopefully, more powerful addition and multiplication; it needs few laws. The al-
theorems covering additional classes of equations can be gebra of programs is concerned with more operations
found. But for the present, one need only know the (functional forms) and therefore needs more laws.
conclusions of these two simple foundational theorems Each of the following laws requires a corresponding
in order to follow the theorems and examples appearing proposition to validate it. The interested reader will find
in this section. most proofs of such propositions easy (two are given
The results of the foundations subsection are sum- below). We first define the usual ordering on functions
marized in a separate, earlier subsection titled "expan- and equivalence in terms of this ordering:
sion theorems," without reference to f'Lxed point con- DEFINITIONf__<g iff for all objects x, e i t h e r f : x = ±, or
cepts. The foundations subsection itself is placed later f : x = g:x.
where it can be skipped by readers who do not want to DEHNITIONf = g iff f<--g and g~_f.
go into that subject.
It is easy to verify that _ is a partial ordering, t h a t f _ g
means g is an extension o f f , and t h a t f - g i f f f : x = g:x
12.2 Some Laws of the Algebra of Programs
for all objects x. We now give a list of algebraic laws
In the algebra of programs for an FP system variables
organized by the two principal functional forms in-
range over the set of functions o f the system. The "op-
volved.
erations" of the algebra are the functional forms of the
system. Thus, for example, [f,g]oh is an expression of I Composition and construction
the algebra for the FP system described above, in which 1.1 [fl . . . . . fnlog = [flog, .... fnOg]
fi g, and h are variables denoting arbitrary functions of 1.2 afo[g, . . . . . g.] -- [fog, . . . . . fogn]
that system. And 1.3 /f°[gl . . . . . gn]
[fig]oh =_ [foh, goh] - f ° [ g l , / f ° [ g 2 . . . . . gn]] when n_~_2
=-fo[gl, fo[g2 . . . . . fo [gn-1, gn]---]]
is a law of the algebra which says that, whatever func- /fo[g] - g
tions one chooses for f, g, and h, the function on the left 1.4 fo[Sc,g] - ( b u f x ) o g
is the same as that on the right. Thus this algebraic law 1.5 1o [.fi . . . . . fn] --fl
is merely a restatement of the following proposition s°[fl . . . . . f i . . . . . f~] _ f i for any selector s, s _ n
about any FP system that includes the functional forms defmedofi (for all i~s, l _ i _ n ) -->-->
[f,g] and fog: SO[fi . . . . . f n ] - - f i
1.5.1 [ f i ° l . . . . . fnonlo[ga . . . . . gn] - [flog1 . . . . . fnogn]
PROPOSITION: For all functions f, g, and h and all objects
1.6 tl°[fi] -- ~; and
x, ([f,g]oh):x =- [foh, goh]:x.
tlo[J] . . . . . f~] _--<If2 . . . . . fn] for n_>2
PROOF:
d e f m e d o f --~--~ tlo[fi] -=
([fig]oh): x = [fig] :(h :x)
and t l o [ f . . . . . fn] -- L6 . . . . . f~] for n_>2
by definition of composition
1.7 distlo[fi [g~. . . . . gn]] -= [[f, gx]. . . . . [fign]]
= <f:(h:x), g:(h:x)>
defmedof-->--> distlo[f,~;] - ~
by definition of construction
The analogous law holds for distr.
= <(foh):x, (goh):x>
1.8 apndlo[fi [gl . . . . . gn]] ~- [figl . . . . . gn]
by definition of composition
nullog-o--, apndlo[fig] - [ f ]
= [foh, goh]:x
And so on for apndr, reverse, rotl, etc.
by definition of construction []
1.9 [ .... J_.... ] - J_
Some laws have a domain smaller than the domain 1.10 apndlo [fog, afoh] =- afoapndlo[g,h]
of all objects. Thus 1o[f,g] - f d o e s not hold for objects I.ll pair & notonullol >
x such that g : x = _1_.We write apndlo[[ 1o 1,2], distro[tlo 1,2]] - distr
625 Communications August 1978
of Volume 21
the ACM Number 8
W h e r e f & g - ando[f,g]; PROPOSITION 2
pair -= a t o m --~ F; eqo[length,2]
Pair & notonullo I --*--*
II Composition and condition (right associated paren- apndlo[[l 2, 2], distro[tlo 1, 2]] -= distr
theses omitted) (Law II.2 is noted in M a n n a et al. [16],
where f ± g is the function: a n d o [ f , g], a n d f 2 - f o f .
p. 493.)
PROOF. W e show that both sides produce the same result
II.l (p---~f',g)oh = poh --~ foh; goh
II.2 h o(p---~ g) -- p ~ h oj~ hog w h e n applied to any pair <x,y>, where x # ~, as per the
II.3 oro[q,notoq] ---~---~ando[p,q] ---~f, stated qualification.
ando[p,notoq] ~ g; h - p---> ( q ~ f , g); h CASE 1. X is an a t o m or i . T h e n distr: < x , y > = .k, since
x # $. T h e left side also yields ± w h e n applied to < x , y > ,
II.3.1 p --~ (p--.~f; g); h =- p---~ f ; h
since tlo 1: < x , y > = & and all functions are i - p r e s e r v i n g .
I I I C o m p o s i t i o n and miscellaneous
CASE 2. x = <Xl . . . . . Xn>. T h e n
III.1 ~of_<
de freed of_---~--->_~ o f = apndlo[[l 2, 2], distro[tlo 1, 2]]:<x, y >
I I I . l . l _[_of m f o ± ==_± = apndl: < < l : x , y > , distr: <tl:x, y > >
IiI.2 f o i d = i d o f - = f = apndl: <<xl,y>, $> = <<x~,y>> if tl:x = q~
III.3 pair > > l o d i s t r - [ l o l , 2 ] also: = apndl: <<xl,y>, <<xz,y> ..... <Xn,y>>>
pair > ~ l o t l - 2 etc. if tl:x #
I l i A a(fog) =- a f o ag = < < X l , y > , ... , < X n , y > >
III.5 n u l l o g - - , - - , afog = = distr: < x , y > []

IV Condition and construction 12.3 Example: Equivalence of Two Matrix


IV.l [fi . . . . . ( p - - ~ g ; h ) . . . . . fn] Multiplication Programs
- p---~ [fi . . . . . g . . . . . fn]; [fl . . . . . h . . . . . fi~] W e have seen earlier the matrix multiplication pro-
IV.l.1 [ f l . . . . . (pl --~ gl; ... ; p n ---> gn; h) . . . . . f m ] gram:
= p l ~ [J~ . . . . . gl . . . . . f m ] ; Def M M - a a I P o adistl o distr o [1, transo2].
•. - ; p n "-'> [ f i . . . . . g . . . . . . f m ] ; I l l . . . . . h ..... fm]
W e shall n o w show that its initial segment, M M ' , where
This concludes the present list o f algebraic laws; it is by D e f M M ' - a a I P o adistl o distr,
no m e a n s exhaustive, there are m a n y others.
can be defined recursively. ( M M ' "multiplies" a pair o f
matrices after the second matrix has been transposed.
Proof of two laws Note that M M ' , unlike M M , gives A_ for all a r g u m e n t s
W e give the proofs o f validating propositions for laws that are not pairs.) T h a t is, we shall show that M M '
I. 10 and I. 11, which are slightly m o r e involved than most satisfies the following equation which recursively defines
o f the others. the same function (on pairs):
PROPOSITION 1 f - = null o 1 ~ q~; apndlo [alpodistlo [1 o 1, 2], fo [tlo 1, 2]].

a p n d l o [fog, afoh] ~ a f o a p n d l o [g,h] O u r p r o o f will take the f o r m o f showing that the follow-
PROOF. W e show that, for every object x, both o f the ing function, R,
a b o v e functions yield the same result. Def R m null o 1 --~ 6;
CASE 1. h:x is neither a sequence nor q,. apndlo[aIpodistlo[l o 1, 2], MM'o[tlo 1, 2]]
T h e n both sides yield ± w h e n applied to x.
CASE 2. h:x = ~. T h e n is, for all pairs <x,y>, the same function as M M ' . R
apndlo[fog, afoh]: x "multiplies" two matrices, w h e n the first has m o r e t h a n
zero rows, by c o m p u t i n g the first row o f the " p r o d u c t "
= apndl: <fog:x, ~ > = <f.'(g:x)>
afoapndlo[g,h ]: x (with aIpodistlo[lo 1, 2]) and adjoining it to the "prod-
uct" o f the tail o f the first matrix and the second matrix.
= afoapndl: <g:x, if> = af.'<g:x>
T h u s the t h e o r e m we want is
= <f.'(g:x)>

CASE 3. h:x = < y l . . . . . y n > . T h e n pair > ~MM'=R,


from which the following is immediate:
apndlo[fog, afoh]: x
-- apndl: <fog:x, af." < y l . . . . . y n > > M M - M M ' o [1, transo2] = R o [1, transo2];
----- <f." ( g : x ) , f.'yl . . . . . f-'yn > where
ofoapndlo[g,h ]: x
= afoapndl: <g:.x, < y l ..... yn>> D e f pair = a t o m --~ F; eqo]length, 2].
= af'.<g:x, y~ . . . . . y n > THEOREM: pair--*--~ M M ' = R
= <f." ( g : x ) , f.'yl . . . . . J ~ y n > [] where
626. Communicat; ~ns August 1978
of Volume 21
the ACM Number 8
D e f MM' - aalP o adistl o distr where the pi's and q i ' s a r e particular functions, so that E
D e f R -- nullo 1 ~ ~; has the property:
apndlo[alpodistlo[12, 2], MM'o[tlo 1, 2]]
E(fi) - f i + l for i = 0, 1, 2 .... (E3)
PROOF.
Then we say that E is expansive and has the jS's as
CASE 1. p a i r & n u l l o l • ~MM'-=R.
approximating functions.
p a i r & n u l l o l > >R--=6 b y d e f o f R If E is expansive and has approximating functions as
pair & nullo 1 ---~---~MM' - in (E2), and i f f is the solution of (El), t h e n f can be
since distr: < $ , x > = $ by def of distr written as the infinite expansion
and a j ~ = $ by def of Apply to all.
f-po--* qo; ... ;pn ---> qn; ... (E4)
And so: aaIP o adistl o distr: < ~ , x > = q~.
Thus pair & nullo 1 ---~---~MM' - R. meaning that, for any x, fix # ± iff there is an n _> 0
CASE 2. pair & notonullo I ~ MM' -- R. such that (a)pi:x = F for all i < n, and (b)pn:x = T, and
(c) qn:X # _l_. W h e n f : x # ±, thenf.'x = qn:X for this n.
pair & notonullo 1 ---~---~R - R', (l) (The foregoing is a consequence of the "expansion theo-
rem".)
by def of R and R', where
12.4.2 Linear expansion. A more helpful tool for
D e f R' - apndlo[alPodistlo[12, 2], MM'o[tlo 1, 2]]. solving some equations applies when, for any function h,

We note that E(h) - p0 ---, q0; El(h) (LEI)

R' -- apndlo[fog, afoh] and there exist pi and qi such that

where E l ( p i ---> qi; h) = p i + l ~ qi+l; E l ( h )


for i = 0, 1, 2 .... (LE2)
f - = aIpodistl
g - [12, 21 and
h =- distro[tlo 1, 2]
E , ( i ) - _[_. (LE3)
a f - a(alpodistl) = aalpoadistl (by 111.4). (2)
Under the above conditions E is said to be linearly
Thus, by I. 10,
expansive. If so, a n d f is the solution of
R ' = afoapndlo[g,h]. (3)
f ~- E ( f ) (LE4)
Now apndlo[g,h] -= apndlo[[l 2, 2], distro[tlo 1, 2]],
then E is expansive and f can again be written as the
thus, by I. 11,
infinite expansion
pair & notonullo I ---~--->apndlo[g,h] = distr. (4)
f = - p o - - , q0;'... ;pn "-> qn; ... (LE5)
And so we have, by (1), (2), (3) and (4),
using the pi's and qi's generated by (LE 1) and (LE2).
pair & notonullo 1 ---~---~R - R' Although the pi's and qi's of (E4) or (LE5) are not
- afodistr - aaIPoadistlodistr - MM'. unique for a given function, it may be possible to find
Case l and Case 2 together prove the theorem. [] additional constraints which would make them so, in
which case the expansion (LE5) would comprise a can-
onical form for a function. Even without uniqueness
12.4 Expansion Theorems
these expansions often permit one to prove the equiva-
In the following subsections we shall be "solving"
lence of two different function expressions, and they
some simple equations (where by a "solution" we shall
often clarify a function's behavior.
mean the "least" function which satisfies an equation).
To do so we shall need the following notions and results
drawn from the later subsection on foundations of the 12.5 A Recursion T h e o r e m
algebra, where their proofs appear. Using three of the above laws and linear expansion,
12.4.1 Expansion. Suppose we have an equation of one can prove the following theorem of moderate gen-
the form erality that gives a clarifying expansion for many recur-
sively defined functions.
f- E(f) (El)
RECURSION THEOREM:L e t f be a solution of
where E ( f ) is an expression involvingf. Suppose further
that there is an infinite sequence of functionsfi for i = 0, f- p --~ g;, Q ( f ) (1)
1, 2 . . . . . each having the following form: where
Q(k) - ho[i, koj] for any function k (2)
fo-£
J~+l m p o "-"> qo; .-. ; p i - - ~ qi; -1- (E2) and p, g, h, i, j are any given functions, then
627 Communications August 1978
of Volume 2 l
the ACM Number 8
f-p---> ~,,poj-. Q(g); ... ;pojn'-> Q~(g); ... (3) Using these results for ios ~, eq0os ~, and Q n ( ~ ) in the
previous expansion for f , we obtain
(where Q~(g) is ho[i, Q n - a ( g ) ° J ] , and j~ is j o j n-1 for
n >__2) and f i x - x=O--~ 1; ... ; x = n
Qn(g) ___/h o[i, ioj, .... ioj n-a, gojn]. (4) --~n×(n- l) x . . . x Ix l;...
Thus we have proved t h a t f terminates on precisely the
PROOF. W e verify t h a t p --> g;, Q ( f ) is linearly expansive.
set o f nonnegative integers and that it is the factorial
Let p~, qn and k be any functions. T h e n
function thereon.
Q(pn ~ qn, k)
- h o [ i , (pn ---~ q~; k)°j] by (2) 12.6 An Iteration Theorem
- ho[i, (pn°j--~ qn°j; koj)] by II.l This is really a corollary o f the recursion theorem. It
- ho(p~oj---~ [i, qnOj]; [i, koj]) by IV.l gives a simple expansion for m a n y iterative programs.
- p~oj---~ ho[i, q~oj]; ho[i, koj] by II.2
ITERATION THEOREM: L e t f be the solution (i.e., the least
- p , o j - ~ Q(q~); Q(k) by (2) (5)
solution) o f
Thus i f p o - p and qo - g, then (5) gives px - p o j and
f - p---~ g;, hofok
ql = Q(g) and in general gives the following functions
satisfying (LE2) then
pn - p ° j n and qn -~ Q n ( g ) . (6) f =_p .-o g; p o k ~ hogok; ... ; p o k n --o hnogok~; ...

Finally, PROOF. Let h' - ho2, i' =- id, f =- k, then


Q ( i ) -- ho[i, i o j ] f -- p ---> g; h' o[i', foj']
- ho[i, &] by III.l.l since ho2o[id, f o k ] - hofok by 1.5 (id is defined except
= ho~ by 1.9 for A_, and the equation holds for _1_).Thus the recursion
~- i by III.1.1. (7) theorem gives
Thus (5) and (6) verify (LE2) and (7) verifies (LE3), with f _ p _ _ _ > g; ... ;pokn _.> Qnfg); ...
E1 -- Q. If we let E(--f) -= p ~ g; Q ( f ) , then we have
(LE1); thus E is linearly expansive. Since f is a solution where
o f f - - E ( f ) , conclusion (3) follows from (6) and (LE5). Qn(g) _ ho2o[id, Qn-l(g)ok]
Now =_ h o O n - t ( g ) o k =__hnogok n
Q~(g) = ho[i, Qn-~(g)°J] b y I . 5 []
- ho[i, ho[ioj, .... ho[ioj n-~, goj n] ... ]] 12.6.1 Example: Correctness proof for an iterative
factorial function. L e t f be the solution o f
by I. 1, repeatedly
f- eq0ol --> 2;fo[so 1, ×]
-/ho[i, ioj, .... iojn-l,'g°j ~] by 1.3 (8)
where Def s - - o [ i d , i] (substract 1). We want to prove
Result (8) is the second conclusion (4). [] t h a t f . ' < x , l > = x! iff x is a nonnegative integer. Let p -=
12.5.1 Example: correctness proof of a reeursive eq0o 1, g - 2, h - id, k - [so 1, ×]. T h e n
factorial function. L e t f b e a solution o f
f-p --> g; h o f o k
f-eq0--~ ]; ×o[id, fos]
and so
where f-p--> g; ... ;pokn ~ g°kn; ... (1)
Def s - - o [ i d , i] (subtract 1).
by the iteration theorem, since h n - id. W e want to show
T h e n f satisfies the hypothesis o f the recursion theorem that
with p - eq0, g - L h - x , i - id, and j - s. Therefore
pair--->---> k n --- Jan, bn] (2)
f- eq0 --~ ]; ... ; eq0os n --~ Q~(h; ...
holds for every n _> 1, where
and
an - s% 1 (3)
Q~()) _ / × o [id, idos . . . . . idos n-l, ]osn]. bn - / × ° [s ~-1° 1. . . . . so l, 1, 2] (4)
NOW idos k -~ s k by III.2 and eq0os n --.--* los n - ] by N o w (2) holds for n = 1 by definition o f k. W e assume
III.1, since eq0osn:x implies defmedosn:x; and also it holds for some n _ 1 and prove it then holds for
eq0osn:x --- eq0: (x - n) - x = n . Thus if eq0osn: x = T, n + 1. N o w
then x = n and
pair ~ ~ k n+l -= kok ~ =- [so 1, x ] o [ a , , b,] (5)
QR(~): n = n × (n - 1) × ... × (n - (n - 1))
x (l: (n - n)) = n!. since (2) holds for n. And so

628 Communications August 1978


of Volume 21
the ACM Number 8
pair-->--* k ~+~ =- [soan, Xo[an, bn]] by 1.1 and 1.5 (6) since pn and p " provide the qualification needed for q~
-- q" - h%2.
T o pass from (5) to (6) we must check that whenever an
N o w suppose there is an x such t h a t f x # g:x. T h e n
or bn yield £ in (5), so will the right side o f (6). N o w
there is an n such that pi:x = p¢:x = F for i < n, and p , : x
soan =- s n+l° 1 -- an+l (7) # p~:x. F r o m (12) and (13) this can only h a p p e n when
×o[an, b.] - / × ° [s ~° 1, s n-lo 1. . . . . so 1, 1, 2] h%2:x = ±. But since h is ±-preserving, hmo2:x = I for
- b.+l by 1.3. (8) all m _> n. H e n c e f : x = g:x = i by (14) and (15). This
contradicts the assumption that there is an x for which
C o m b i n i n g (6), (7), and (8) gives
f i x # g:x. Hence f - g.
pair--->---> k ~+~ - [an+l, bn+l]. (9) This example (by J. H. Morris, Jr.) is treated more
elegantly in [16] on p. 498. However, some m a y find that
Thus (2) holds for n = 1 and holds for n + 1 whenever the above treatment is m o r e constructive, leads one more
it holds for n, therefore, by induction, it holds for every mechanically to the key questions, and provides more
n _> 1. N o w (2) gives, for pairs: insight into the b e h a v i o r o f the two functions.
definedok n --,--. p o k n = eq0o l o[an, bn]
- eq0oan = eq0os"o 1 (10) 12.7 Nonlinear Equations
d e f m e d o k n ---~--~gok ~ T h e preceding examples have concerned "linear"
---- 2°[an, bn] - - / × o [s~-X°l . . . . . sol, 1, 2] (11) equations (in which the " u n k n o w n " function does not
have an a r g u m e n t involving itself). T h e question o f the
(both use 1.5). N o w (1) tells us t h a t f . ' < x , l > is defined iff existence o f simple expansions that "solve .... quadratic"
there is an n such t h a t p o k i : < x , l > = F for all i < n, and and higher order equations remains open.
p o k " : < x , l > = T, that .is, by (10), eq0os~:x = T, i.e., T h e earlier examples concerned solutions o f f - - E ( f ) ,
x = n ; and g o I d : < x , l > is defined, in which case, by (1 l), where E is linearly expansive. T h e following e x a m p l e
involves an E ( f ) that is quadratic and expansive (but
f<x,l> =/X:<I, 2 . . . . . x - I , x, 1> = n!, not linearly expansive).
which is what we set out to prove. 12.7.1 Example: proof of idemlmtency ([16] p. 497).
12.6.2 Example: proof of equivalence of two iterative L e t f b e the solution o f
programs. In this example we want to prove that two
f-= E(f) -p--~ id;f%h. (1)
iteratively defined p r o g r a m s , f and g, are the same func-
tion. L e t f b e the solution o f W e wish to prove that f - - f 2 . We verify that E is
expansive (Section 12.4.1) with the following approxi-
f = _ p o l ~ 2; hofo[kol, 2]. (1) mating functions:
Let g be the solution of j~-= i (2a)
g - po 1 ~ 2; go[ko 1, ho2]. (2) fn -- p ~ id; ... ; poh n-1 ---->hn-1; J. for n > 0 (2b)
First we note that p ~ fn - id and so
Then, by the iteration theorem:
(3) p°hi > >fn°hi --- hi. (3)
f - p0 ---, q0; ... ;pn ---) qn; ...
g --p6---> q6; ... ,"p ,' ---) q .....
'" (4) N o w E(J~) - p --~ id; J_2oh ~-Jq, (4)
where (letting r ° =- id for any r), for n = 0, 1.... and
pn - p o l o [ k o l , 2] n - - p o l o [ k % l , 2] by 1.5.1 (5) E(fn)
q, = h%2o[ko 1, 2] n -- h%2o[/Co 1, 2] by 1.5.1 (6) - p ---> id;f~o(p ---> id; ... ; p o h n-1 ~ hn-12 j_)oh
p'n-p°lo[k°l,h°2]n-p°lo[k%l,h%2] by 1.5.1 (7) -= p ~ i d ; f n ° ( p ° h ~ h; ... ; p ° h n --) h"; ± °h)
q~ -= 2o[ko 1, ho2]" - 2o[/,3ol, hno2] by 1.5.1. (8) - p ..-.) id; poh --., f~oh; ... ; poh'~ ---~f~ oh~; fn Oi
- p---> id; p°h---~ h; ... ; p ° h " - - ) hn; & b y ( 3 )
Now, from the above, using 1.5, -fn+~. (5)
defmedo2 ~ p~ - po/Co 1
(9) T h u s E is expansive by (4) and (5); so by (2) and Section
defmedoh%2 ~ - p" - p o k % 1 (10) 12.4.1 (E4)
defmedo~o I ) • q~ = q~ - hno2 (11)
f = p --* id; ... ; poh ~ --* h"; .... (6)
Thus
But (6), by the iteration theorem, gives
defmedohno2 ) ) defmedo2 - f (12)
defmedoh%2, • • p~ - p" (13) f- p --) id;foh. (7)
and Now, f f p : x = T, thenf.'x = x = f 2 : x , by (1). I f p : x = F,
then
f - - p o - - > qo; ... ;pn---~ h%2; ... (14)
g ---p~ ~ q~; ... ;p~ ~ h%2; ... (15) fix = f2oh:x by (1)

629 Communications August 1978


of Volume 21
the ACM Number 8
= f'.(foh:x) =f.'(f.'x) by (7) DEFINITION. Let E ( f ) be a function expression satisfying
= f2:x. the following:
I f p : x iS neither T nor F, thenf.'x -- ± =f2:x. Thus E(h) -po---~ qo; El(h) for all h E F (EEl)

f_f2. where pi E F and qi E F exist such that


El(pi ~ qi; h) - pi+l ~ qi+l; El(h)
12.8 Foundations for the Algebra of Programs for all h E F and i = 0, 1.... (LE2)
Our purpose in this section is to establish the validity
of the results stated in Section 12.4. Subsequent sections and
do not depend on this one, hence it can be skipped by
EI(_T_) -= &. (LE3)
readers who wish to do so. We use the standard concepts
and results from [16], but the notation used for objects Then E is said to be linearly expansive with respect to
and functions, etc., will be that of this paper. these pi's and qi's.
We take as the domain (and range) for all functions
LINEAR EXPANSIONTHEOREM:Let E be linearly expansive
the set O of objects (which includes ±) of a given FP
with respect to pi and qi, i = 0, 1..... Then E is expansive
system. We take F to be the set of functions, and F to be
with approximating functions
the set of functional forms of that FP system. We write
E ( f ) for any function expression involving functional fo- i (1)
forms, primitive and defined functions, and the function f,+l - p o ~ q0; ... ;pi ~ qi; i . (2)
symbol f, and we regard E as a functional that maps a
PROOF. We want to show that E(fi) =-fi+~ for any i _ 0.
function f into the corresponding function E ( f ) . We
Now
assume that all f ~ F are &-preserving and that all
functional forms in F correspond to continuous function- E(fo) = p0---~ qo; E~ ( i ) ----p0---~ q0; & - - f i (3)
als in every variable (e.g., [f, g] is continuous in b o t h f by (LE1) (LE3) (1).
and g). (All primitive functions of the FP system given
Let i > 0 be fLxed and let
earlier are _L-preserving, and all its functional forms are
continuous.) fi = po ~ qo; W1 (4a)
W1 ~ px ~ ql; W2 (4b)
DEFINITIONS. Let E ( f ) be a function expression. Let
etc.
fo-=£ Wi--1 ~ pi-1 ~ qi--1; ~ . (4-)
fi+x ----po ~ qo; ... ; pi ~ qi; J- for i = 0, 1.... Then, for this i > 0
where pi, qi E F. Let E have the property that
E(fi) - p0 ~ q0; El(fi) by (LE1)
E(fi)----fi+~ fori=0,1 ..... E~(fi) - pl ~ ql; El(Wa) by (LE2) and (4a)
El(w~) - p2 --~ q2; E~(w2) by (LE2) and (4b)
Then E is said to be expansive with the approximating
functionsfi. We write etc.
E~(wi-~) -= pi ~ qi; E~ ( i ) by (LE2) and (4-)
f=po---~ q0; ... ;pn---~ qn;-..
- pi --~ qi; A- by (LE3)
to mean that f = limi{fi}, where the fi have the form
above. We call the right side an infinite expansion o f f . Combining the above gives
We takef.'x to be defined iff there is an n _> 0 such that E(fi) - f + l for arbitrary i > 0, by (2). (5)
(a) pi:x = F for all i < n, and (b) p,:x = T, and (c) qn:x
By (3), (5) also holds for i -- 0; thus it holds for all i >__0.
is defined, in which casef.'x = qn:X.
Therefore E is expansive and has the required approxi-
EXPANSION THEOREM: Let E ( f ) be expansive with ap- mating functions. []
proximating functions as above. Let f be the least func-
COROLLARY. If E is linearly expansive with respect to pi
tion satisfying
and qi, i = 0, 1. . . . . a n d f i s the least function satisfying
f~ E(f).
f---- E ( f ) (LE4)
Then
then
f - - p0 ~ q0; ... ; p , ~ qn; ...
f - po ~ qo; ... ; pn ---~ qn; .... (LE5)
PROOF. Since E is the composition of continuous func-
tionals (from F) involving only monotonic functions
(_l_-preserving functions from F) as constant terms, E is 12.9 The Algebra of Programs for the Lambda Calculus
continuous ([16] p. 493). Therefore its least fixed p o i n t f and for Combinators
is limi{Ei(j-)} -= limi(fi} ([16] p. 494), which by defmition Because Church's lambda calculus [5] and the system
is the above inf'mite expansion forf. [] of combinators developed by Sch6nfinkel and Curry [6]

630 Communications August 1978


of Volume 21
the A C M Number 8
are the primary mathematical systems for representing of these complexities. (The Church-Rosser theorem, or
the notion of application of functions, and because they Scott's proof of the existence of a model [22], is required
are more powerful than FP systems, it is natural to to show that the lambda calculus has a consistent seman-
enquire what an algebra of programs based on those tics.) The defmition of pure Lisp contained a related
systems would look like. error for a considerable period (the "funarg" problem).
The lambda calculus and combinator equivalents of Analogous problems attach to Curry's system as well.
FP composition, fog, are In contrast, the formal (FFP) version of FP systems
(described in the next section) has no variables and only
hfgx.(f(gx)) --- B an elementary substitution rule (a function for its name),
where B is a simple combinator defined by Curry. There and it can be shown to have a consistent semantics by a
is no direct equivalent for the FP object <x,y> in the relatively simple fLxed-point argument along the lines
Church or Curry systems proper; however, following developed by Dana Scott and by Manna et al [16]. For
Landin [14] and Burge [4], one can use the primitive such a proof see McJones [18].
functions prefix, head, tail, null, and atomic to introduce
the notion of list structures that correspond to FP se- 12.10 Remarks
quences. Then, using FP notation for lists, the lambda The algebra of programs outlined above needs much
calculus equivalent for construction is ~fgx.<fx,gx>. A work to provide expansions for larger classes of equations
combinatory equivalent is an expression involving pret'Lx, and to extend its laws and theorems beyond the elemen-
the null list, and two or more basic combinators. It is so tary ones given here. It would be interesting to explore
complex that I shall not attempt to give it. the algebra for an FP-like system whose sequence con-
If one uses the lambda calculus or combinatory structor is not _L-preserving (law 1.5 is strengthened, but
expressions for the functional forms fog and [fig] to IV. 1 is lost). Other interesting problems are: (a) Find
express the law 1.1 in the FP algebra, [f,g]oh = rules that make expansions unique, giving canonical
[foh, goh], the result is an expression so complex that the forms for functions; (b) find algorithms for expanding
sense of the law is obscured. The only way to make that and analyzing the behavior of functions for various
sense clear in either system is to name the two function- classes of arguments; and (c) explore ways of using the
als: composition - B, and construction --- A, so that Bfg laws and theorems of the algebra as the basic rules either
=fog, and Afg --- [f,g]. Then 1.1 becomes of a formal, preexecution "lazy evaluation" scheme [9,
10], or of one which operates during execution. Such
B(Afg)h -- A(Bfh)(Bgh), schemes would, for example, make use of the law
1o[f,g] _<fto avoid evaluating g:x.
which is still not as perspicuous as the FP law.
The point of the above is that if one wishes to state
clear laws like those of the FP algebra in either Church's 13. Formal Systems for Functional Programming
or Curry's system, one finds it necessary to select certain (FFP Systems)
functionals (e.g., composition and construction) as the
basic operations of the algebra and to either give them 13.1 Introduction
short names or, preferably, represent them by some As we have seen, an FP system has a set of functions
special notation as in FP. If one does this and provides that depends on its set of primitive functions, its set of
primitives, objects, lists, etc., the result is an FP-like functional forms, and its set of definitions. In particular,
system in which the usual lambda expressions or com- its set of functional forms is fixed once and for all, and
binators do not appear. Even then these Church or Curry this set determines the power of the system in a major
versions of FP systems, being less restricted, have some way. For example, if its set of functional forms is empty,
problems that FP systems do not have: then its entire set of functions is just the set of primitive
a) The Church and Curry versions accommodate functions. In FFP systems one can create new functional
functions of many types and can define functions that forms. Functional forms are represented by object se-
do not exist in FP systems. Thus, Bf is a function that quences; the first element of a sequence determines
has no counterpart in FP systems. This added power which form it represents, while the remaining elements
carries with it problems of type compatibility. For ex- are the parameters of the form.
ample, in fog, is the range of g included in the domain The ability to define new functional forms in FFP
o f f ? In FP systems all functions have the same domain systems is one consequence of the principal difference
and range. between them and FP systems: in FFP systems objects
b) The semantics of Church's lambda calculus de- are used to "represent" functions in a systematic way.
pends on substitution rules that are simply stated but Otherwise FFP systems mirror FP systems closely. They
whose implications are very difficult to fully compre- are similar to, but simpler than, the Reduction (Red)
hend. The true complexity of these rules is not widely languages of an earlier paper [2].
recognized but is evidenced by the succession of able We shall first give the simple syntax of FFP systems,
logicians who have published "proofs" of the Church- then discuss their semantics informally, giving examples,
Rosser theorem that failed to account for one or another and finally give their formal semantics.

631 Communications August 1978


of Volume 21
the ACM Number 8
13.2 Syntax sible a function, apply, which is meaningless in FP
We describe the set O of objects and the set E of systems:
expre.,;sions of an FFP system. These depend on the
apply:<x,y> = (x:y).
choice of some set A of atoms, which we take as given.
We assume that T (true), F (false), ff (the empty se- The result o f apply:<x,y>, namely (x:y), is meaningless
quence), and # (default) belong to A, as well as "num- in FP systems on two levels. First, (x:y) is not itself an
bers" of various kinds, etc. object; it illustrates another difference between FP and
1) Bottom, ±, is an object but not an atom. FFP systems: some FFP functions, like apply, map ob-
2) Every atom is an object. jects into expressions, not directly into objects as FP
3) Every object is an expression. functions do. However, the meaning of apply:<x,y> is
4) If x~..... xn are objects [expressions], then an object (see below). Second, (x:y) could not be even an
< x i ..... x~> is an object [resp., expression] called a intermediate result in an FP system; it is meaningless in
sequence (of length n) for n _> 1. The object [expression] FP systems since x is an object, not a function and FP
xi for 1 ___ i __%n, is the ith element of the sequence systems do not associate functions with objects. Now if
< x l . . . . . xl . . . . . xn>. (ff is both a sequence and an atom; A P P L Y represents apply, then the meaning o f
its length is 0.) (APPL Y:<NULL,A>) is
5) If x and y are expressions, then (x:y) is an expression
#(APPL Y:<NULL,A>)
called an application, x is its operator a n d y is its operand.
= #((pAPPL Y):<NULL, A>)
Both are elements of the expression.
= #(apply:<NULL,A>)
6) If x = < x l . . . . . Xn> and if one of the elements of x is
= It(NULL:A) = #((pNULL):A)
_1_,then x = .1_. That is, <..., ± .... > = ±.
= #(null:A) = # F = F.
7) All objects and expressions are formed by finite use
of the above rules. The last step follows from the fact that every object is its
A subexpression of an expression x is either x itself or own meaning. Since the meaning function/t eventually
a subexpression o f an element of x. An FFP object is an evaluates all applications, one can think of
expression that has no application as a subexpression. apply<NULL,A> as yielding F even though the actual
Given the same set of atoms, FFP and FP objects are result is (NULL:A).
the same. 13.3.2 How objects represent functions; the repre-
13.3 Informal Remarks About FFP Semantics sentation function #. As we have seen, some atoms
13.3.1 The meaning of expressions; the semantic (primitive atoms) will represent the primitive functions o f
function p. Every F F P expression e has a meaning, #e, the system. Other atoms can represent defined functions
which is always an object; #e is found by repeatedly just as symbols can in FP systems. I f an atom is neither
replacing each innermost application in e by its meaning. primitive nor defined, it represents 1, the function which
If this process is nonterminating, the meaning of e is ±. is .1_ everywhere.
Sequences also represent functions and are analogous
The meaning of an innermost application (x:y) (since it
is innermost, x and y must be objects) is the result of to the functional forms of FP. The function represented
applying the function represented by x to y, just as in FP by a sequence is given (recursively) by the following rule.
systems, except that in FFP systems functions are rep- Metacomposition rule
resented by objects, rather than by function expressions,
(p<X1 . . . . . Xn>):y = ( p X l ) : < < X l . . . . . Xn>, y > ,
with atoms (instead of function symbols) representing
primitive and defined functions, and with sequences where the xi's and y are objects. Here pxl determines
representing the FP functions denoted by functional what functional form <Xl, ..., xn> represents,
forms. and x2. . . . . Xn are the parameters of the form (in FFP, xl
The association between objects and the functions itself can also serve as a parameter). Thus, for example,
they represent is given by the representation function, P, let Def o C O N S T - 2ol; then <CONST, x> in F F P
o f the F F P system. (Both p and # belong to the descrip- represents the FP functional form ~, since, by the meta-
tion o f the system, not the system itself.) Thus if the composition rule, i f y ~ .1_,
atom N U L L represents the FP function null, then
(o<CONST, x>):y = (pCONST):<<CONST, x>,y>
pNULL = null and the meaning of (NULL:A) is
= 20 I:<<CONST, x>,y> = x.
#(NULL:A) = (pNULL):A = null:A = F.
From here on, as above, we use the colon in two senses. Here we can see that the first, controlling, operator of a
When it is between two objects, as in (NULL:A), it sequence or form, CONST in this case, always has as its
identifies an F F P application that denotes only itself; operand, after metacomposition, a pair whose first ele-
when it comes between a function and an object, as in ment is the sequence itseff and whose second element is
(oNULL):A or null:A, it identifies an FP-like application the original operand o f the sequence, y in this case. The
that denotes the result of applying the function to the controlling operator can then rearrange and reapply the
object. elements o f the sequence and original operand in a great
The fact that F F P operators are objects makes pos- variety o f ways. The significant point about metacom-
632 Communications August 1978
of Volume 21
the ACM Number 8
position is that it permits the definition of new functional -- I x ( p M L A S T : < < M L A S T > , < A , B > > )
forms, in effect, merely by defining new functions. It also by metacomposition
permits one to write recursive functions without a deft- = #(applyo[1, t l o 2 ] : < < M L A S T > , < A , B > > )
nition. = #(apply:<<MLAST>,<B>>)
We give one more example of a controlling function = #(<MLAST>:<B>)
for a functional form: Def p C O N S = aapplyotlodistr. = p(pMLAST:<<MLAST>,<B>>)
This defmition results in < C O N S , f x . . . . . f n > - - w h e r e the = ~(lo2:<<MLAST>,<B>>)
fi are objects--representing the same function as mn.
[pfl . . . . . #fn]. The following shows this.
13.3.3 Summary of the properties of p and p. So far
( p < C O N S , f l . . . . . fn>):X we have shown how p maps atoms and sequences into
= ( p C O N S ) : < < C O N S , f ~ . . . . . fn > , x > functions and how those functions map objects into
by metacomposition expressions. Actually, p and all F F P functions can be
extended so that they are defined for all expressions.
= aapplyotlodistr:<<CONS, f i . . . . . fn>,X> With such extensions the properties of p and/~ can be
by def of p C O N S summarized as follows:
= aapply:<<fi,x> . . . . . <fn,X>> 1) # E [expressions ~ objects].
by def of tl and distr and o 2) If x is an object,/~x = x.
= <apply:<fi,x> . . . . . apply:<fn,x>> 3) If e is an expression and e = <el . . . . . en>, then
by def of a /~e -- <#el, ...,/ten>.
- < ( f i : x ) . . . . . (fn:X)> by def of apply. 4) p E [expressions ---> ]expressions ---> expressions]].
5) For any expression e, pe --- p ~ e ) .
In evaluating the last expression, the meaning function
6) If x is an object and e an expression, then
# will produce the meaning of each application, giving
px:e = px:~e).
pfi:x as the ith element.
7) If x and y are objects, then #(x:y) = #(px:y). In
Usually, in describing the function represented by a
words: the meaning o f an FFP application (x:y) is found
sequence, we shall give its overall effect rather than show
by applying px, the function represented by x, to y and
how its controlling operator achieves that effect. Thus
then finding the meaning of the resulting expression
we would simply write
(which is usually an object and is then its own meaning).
( p < C O N S , f l . . . . . fn>):x = < ( f l : x ) . . . . . (fn:X)> 13.3.4 Cells, fetching, and storing. For a number of
reasons it is convenient to create functions which serve
instead of the more detailed account above.
as names. In particular, we shall need this facility in
We need a controlling operator, C O M P , to give us
describing the semantics of definitions in F F P systems.
sequences representing the functional form composition. To introduce naming functions, that is, the ability to
We take p C O M P to be a primitive function such that,
fetch the contents of a cell with a given name from a
for all objects x,
store (a sequence of cells) and to store a cell with given
( p < C O M P , f i . . . . . fn>):X name and contents in such a sequence, we introduce
= (jq:(f2:(.-- :(fn:X)...))) for n _> 1. objects called cells and two new functional forms, fetch
and store.
(I am indebted to Paul McJones for his observation that
Cells
ordinary composition could be achieved by this primitive
A cell is a triple < C E L L , name, contents>. We use this
function rather than by using two composition rules in
the basic semantics, as was done in an earlier paper form instead of the pair <name, contents> so that ceils
can be distinguished from ordinary pairs.
[21.) Fetch
Although F F P systems permit the definition and
The functional form fetch takes an object n as its
investigation of new functional forms, it is to be expected
parameter (n is customarily an atom serving as a name);
that most programming would use a fixed set of forms
it is written ~'n (read "fetch n"). Its definition for objects
(whose controlling operators are primitives), as in FP, so
that the algebraic laws for those forms could be em- n and x is
ployed, and so that a structured programming style could "rn:x --- x = ~ ---> #; atom:x ~ 3-;
be used based on those forms. (l:x) = <CELL,n,c>---> c; l'notl:x,
In addition to its use in defining functional forms,
metacomposition can be used to create recursive func- where # is the atom "default." Thus ~'n (fetch n) applied
tions directly without the use of recursive definitions of to a sequence gives the contents o f the first cell in the
the form D e f f =- E ( f ) . For example, if p M L A S T - sequence whose name is n; If there is no cell named n,
nullotl*2 --+ lo2; applyo[1, tlo2], then p < M L A S T > - the result is default, # . Thus l'n is the name function for
last, where last:x -- x = <xl, ..., xn> --+ Xn; 3_. Thus the the name n. (We assume that p F E T C H is the primitive
operator < M L A S T > . w o r k s as follows: function such that p < F E T C H , n> -- Tn. Note that Tn
simply passes over elements in its operand that are not
p(<MLA S T > : < A , B > ) cells.)
633 Communications August 1978
of Volume 2 I
the ACM Number 8
Store and push, pop, purge pendent on D; if Tatom:D # # (that is, atom is defined
Like fetch, store takes an object n as its parameter; it in D) then its meaning is It(c:x), where c = Tatom:D, the
is written J,n ("store n"). When applied to a pair <x,y>, contents o f the first cell in D named atom. If ~atom:D
where y is a sequence, ~,n removes the first cell named n = # , then atom is not defined in D and either atom is
from y, if any, then creates a new cell named n with primitive, i.e. the system knows how to compute patom:x,
contents x and appends it to y. Before defining ~n (store and It(atom:x) = It(patom:x), otherwise It(atom:x) = ±.
n) we shall specify four auxiliary functional forms.
(These can be used in combination with fetch n and store 13.4 Formal Semantics for FFP Systems
n to obtain multiple, named, LIFO stacks within a We assume that a set A of atoms, a set D of defini-
storage sequence.) Two of these auxiliary forms are tions, a set P C A of primitive atoms and the primitive
specified by recursive functional equations; each takes functions they represent have all been chosen. We as-
an object n as its parameter. sume that pa is the primitive function represented by a
if a belongs to P, and that pa = ± if a belongs to Q, the
(cellname n) - atom --, F;
set o f atoms in A-P that are not defined in D. Although
eqo[length, 3] ---> eqo[[CELL, h], [1, 2]]; P
p is defined for all expressions (see 13.3.3), the formal
(push n) - pair -->apndlo[[CELL, h, 1], 2]; ±
semantics uses its definition only on P and Q. The
(pop n) - null --> qb;
functions that p assigns to other expressions x are im-
(cellname n)o 1 ---> tl; apndlo [1, (pop n)otl]
plicitly determined and applied in the following semantic
(purge n) =- null ---> ~; (cellname n)o 1 ---> (purge n)otl;
rules for evaluating #(x:y). The above choices of A and
apndlo[1, (purge n)otl]
D, and of P and the associated primitive functions de-
~,n - pair --> (push n)o[1, (pop n)o2]; £
termine the objects, expressions, and the semantic func-
The above functional forms work as follows. For tion #n for an F F P system. (We regard D as fixed and
x # ±, (cellname n):x is T i f x is a cell named n, otherwise write It for ltD.) We assume D is a sequence and that ~'y:D
it is F. (pop n):y removes the first cell named n from a can be computed (by the function ~'y as given in Section
sequence y; (purge n):y removes all cells named n from 13.3.4) for any atomy. With these assumptions we define
y. (push n):<x,y> puts a cell named n with contents # as the least fixed point of the functional % where the
x at the head of sequence y; ~n:<x,y> is function ,it is defined as follows for any function # (for
(push n):<x, (pop n):y>. all expressions x, xi, y, yi, z, and w):
(Thus (push n):<x,y> = y' pushes x onto the top of
(~'it)x = x ~ A ~ x;
a "stack" named n in y'; x can be read by ~n:y' = x and
x = <xl, ..., Xn> --"><gXl . . . . . ItXn>;
can be removed by (pop n):y'; thus Tno(pop n):y' is the
x = (y:z) --,
element below x in the stack n, provided there is more
than one cell named n in y'.) ( y E A & (~'y:D) = # --~ It((py)(itz));
y E A & (l'y:D) -- w ~ #(w:z);
13.3.5 Definitions in FFP systems. The semantics of
an F F P system depends on a fixed set o f definitions D y = <yl ..... yn>---> It(yl:<y,z>); It(ity:z)); ±
(a sequence of cells), just as an FP system depends on its The above description of It expands the operator of an
informally given set of def'mitions. Thus the semantic application by definitions and by metacomposition be-
function It depends on D; altering D gives a new It' that fore evaluating the operand. It is assumed that predicates
reflects the altered definitions. We have represented D like "x ~ A" in the above definition o f ~'# are Z-
as an object because in AST systems (Section 14) we preserving (e.g., " ± E A" has the value ±) and that the
shall want to transform D by applying functions to it and conditional expression itself is also ±-preserving. Thus
to fetch data from i t - - i n addition to using it as the source (Tit)± - £ and (Tit)(±:z) - ±. This concludes the seman-
of function definitions in FFP semantics. tics of FFP systems.
If < C E L L , n,c> is the first cell named n in the se-
quence D (and n is an atom) then it has the same effect
as the FP definition Def n - pc, that is, the meaning of 14. Applicative State Transition Systems
(n:x) will be the same as that o f Oc:x. Thus for example, (AST Systems)
if < C E L L , C O N S T , < C O M P , 2 , 1 > > is the first cell in D
named C O N S T , then it has the same effect as 14.1 Introduction
Def C O N S T =- 201, and the F F P system with that D This section sketches a class o f systems mentioned
would fred earlier as alternatives to von Neumann systems. It must
be emphasized again that these applicative state transi-
It(CONST:<<x,y>,z>) = y
tion systems are put forward not as practical program-
and consequently ming systems in their present form, but as examples of
a class in which applicative style programming is made
It(<CONST, A > : B ) = A.
available in a history sensitive, but non-von N e u m a n n
In general, in an FFP system with definitions D, the system. These systems are loosely coupled to states and
meaning of an application o f the form (atom:x) is de- depend on an underlying applicative system for both
634 Communications August 1978
of Volume 2 l
the ACM Number 8
their programming language and the description of their not depend on a host of baroque protocols for commu-
state transitions. The underlying applicative system of nicating with the state, and we want to be able to make
the AST system described below is an FFP system, but large transformations in the state by the application of
other applicative systems could also be used. general functions. AST systems provide one way of
To understand the reasons for the structure of AST achieving these goals. Their semantics has two protocols
systems, it is helpful first to review the basic structure of for getting information from the state: (1) get from it the
a v o n Neumann system, Algol, observe its limitations, definition of a function to be applied, and (2) get the
and compare it with the structure of AST systems. After whole state itself. There is one protocol for changing the
that review a minimal AST system is described; a small, state: compute the new state by function application.
top-down, self-protecting system program for file main- Besides these communications with the state, AST se-
tenance and running user programs is given, with direc- mantics is applicative (i.e. FFP). It does not depend on
tions for installing it in the AST system and for running state changes because the state does not change at all
an example user program. The system program uses during a computation. Instead, the result of a computa-
"name functions" instead of conventional names and the tion is output and a new state. The structure of an AST
user may do so too. The section concludes with subsec- state is slightly restricted by one of its protocols: It must
tions discussing variants of AST systems, their general be possible to identify a definition (i.e. cell) in it. Its
properties, and naming systems. structure--it is a sequence--is far simpler than that of
the Algol state.
14.2 The Structure of Algol Compared to That of AST Thus the structure of AST systems avoids the com-
Systems plexity and restrictions of the von Neumann state (with
An Algol program is a sequence of statements, each its communications protocols) while achieving greater
representing a transformation of the Algol state, which power and freedom in a radically different and simpler
is a complex repository of information about the status framework.
of various stacks, pointers, and variable mappings of
identifiers onto values, etc. Each statement communi- 14.3 Structure of an AST System
cates with this constantly changing state by means of An AST system is made up of three elements:
complicated protocols peculiar to itself and even to its 1) An applicative subsystem (such as an FFP system).
different parts (e.g., the protocol associated with the 2) A state D that is the set of definitions of the
variable x depends on its occurrence on the left or right applicative subsystem.
of an assignment, in a declaration, as a parameter, etc.). 3) A set of transition rules that describe how inputs
It is as if the Algol state were a complex "store" that are transformed into outputs and how the state D is
communicates with the Algol program through an enor- changed.
mous "cable" of many specialized wires. The complex The programming language of an AST system is just
communications protocols of this cable are fixed and that of its applicative subsystem. (From here on we shall
include those for every statement type. The "meaning" assume that the latter is an FFP system.) Thus AST
of an Algol program must be given in terms of the total systems can use the FP programming style we have
effect of a vast number of communications with the state discussed. The applicative subsystem cannot change the
via the cable and its protocols (plus a means for identi- state D and it does not change during the evaluation of
fying the output and inserting the input into the state). an expression. A new state is computed along with output
By comparison with this massive cable to the Algol and replaces the old state when output is issued. (Recall
state/store, the cable that is the von Neumann bottleneck that a set of definitions D is a sequence of cells; a cell
of a computer is a simple, elegant concept. name is the name of a defined function and its contents
Thus Algol statements are not expressions represent- is the defining expression. Here, however, some cells
ing state-to-state functions that are built up by the use of may name data rather than functions; a data name n will
orderly combining forms from simpler state-to-state be used in l'n (fetch n) whereas a function name will be
functions. Instead they are complex messages with con- used as an operator itself.)
text-dependent parts that nibble away at the state. Each We give below the transition rules for the elementary
part transmits information to and from the state over the AST system we shall use for examples of programs.
cable by its own protocols. There is no provision for These are perhaps the simplest of many possible transi-
applying general functions to the whole state and thereby tion rules that could determine the behavior of a great
making large changes in it. The possibility of large, variety of AST systems.
powerful transformations of the state S by function 14.3.1 Transition rules for an elementary AST sys-
application, S---, f.'S, is in fact inconceivable in the von tem. When the system receives an input x, it forms the
Neumann--cable and protocol--context: there could be application (SYSTEM:x) and then proceeds to obtain its
no assurance that the new state f:S would match the meaning in the FFP subsystem, using the current state
cable and its fLxed protocols unless f is restricted to the D as the set of definitions. S Y S T E M is the distinguished
tiny changes allowed by the cable in the first place. name of a function defined in D (i.e. it is the "system
We want a computing system whose semantics does program"). Normally the result is a pair

635 Communications August 1978


of Volume 21
the ACM Number 8
#(SYSTEM:x) = < o , d > D of our AST system will contain the definitions of all
nonprimitive functions needed for the system program
where o is the system output that results from input x
and for users' programs. (Each definition is in a cell o f
and d becomes the new state D for the system's next
the sequence D.) In addition, there will be a cell in D
input. Usually d will be a copy or partly changed copy
named FILE with contents file, which the system main-
of the old state. If#(SYSTEM:x) is not a pair, the output
tains. We shall give FP definitions of functions and later
is an error message and the state remains unchanged.
show how to get them into the system in their F F P form.
14.3.2 Transition rules: exception conditions and
The transition rules make the input the operand of
startup. Once an input has been accepted, our system
SYSTEM, but our plan is to use name-functions to refer
will not accept another (except <RESET, x>, see below)
to data, so the first thing we shall do with the input is to
until an output has been issued and the new state, if any,
create two cells named K E Y and I N P U T with contents
installed. The system will accept the input <RESET, x>
key and input and append these to D. This sequence of
at any time. There are two cases: (a) If S Y S T E M is
cells has one each for key, input, and -file; it will be the
defmed in the current state D, then the system aborts its
operand o f our main function called subsystem. Subsys-
current computation without altering D and treats x as
tem can then obtain key by applying ~KEY to its oper-
a new normal input; (b) if S Y S T E M is not defined in D,
and, etc. Thus the definition
then x is appended to D as its first element. (This ends
the complete description of the transition rules for our Def system -- pair--> subsystemoj~ [NONPAIR, defs]
elementary AST system.)
where
If S Y S T E M is defmed in D it can always prevent
any change in its own definition. If it is not defined, f =- ~INPUTo[2, ~KEyoI1, defs]]
an ordinary input x will produce #(SYSTEM:x) = &
causes the system to output NONPAIR and leave the
and the transition rules yield an error message and
state unchanged if the input is not a pair. Otherwise, if
an unchanged state; on the other hand, the input
it is <key, input>, then
<RESET, <CELL,SYSTEM, s>> will define S Y S T E M
to be s. f'.<key, input> = <<CELL,INPUT, input>,
14.3.3 Program access to the state; the function <CELL,KEY, key>, dl ..... d.>
ODEF$. Our F F P subsystem is required to have one new where D = < d l . . . . . dn>. (We might have constructed a
primitive function, defs, named DEFS such that for any different operand than the one above, one with just three
object x ~ ±, cells, for key, input, and file. We did not do so because
defs:x = pDEFS:x = D real programs, unlike subsystem, would contain many
name functions referring to data in the state, and this
where D is the current state and set o f definitions of the "standard" construction of the operand would suffice
AST system. This function allows programs access to the then as well.)
whole state for any purpose, including the essential one 14.4.2 The "subsystem" function. We now give the
o f computing the successor state. FP definition of the function subsystem, followed by
brief explanations o f its six cases and auxiliary functions.
14.4 An Example of a System Program
The above description of our elementary AST system, Def subsystem ---
plus the FFP subsystem and the FP primitives and is-system-changeo TK E Y ---, [report-change, apply ]o[ ~l N P U T, defs];
is-expressiono~'KE Y --~ ['[I N P U T, clefs];
functional forms of earlier sections, specify a complete
is-programo TK E Y--~ system-checkoapplyo[ ~l N P UT, defs];
history-sensitive computing system. Its input and output is-queryo'~KE Y --> [query-response oH1NPUT, TFILE], clefs];
behavior is limited by its simple transition rules, but is-update oI"KE Y --,
otherwise it is a powerful system once it is equipped with [report-update, J,FILEo[update, defs]]
a suitable set of definitions. As an example of its use we o[~INPUT, TF1LE];
[report-erroro[~KEY,'~lNPUT], defs].
shall describe a small system program, its installation,
and operation. This subsystem has five "p ~ j~" clauses and a final
Our example system program will handle queries and default function, for a total of six classes of inputs; the
updates for a file it maintains, evaluate FFP expressions, treatment of each class is given below. Recall that the
run general user programs that do not damage the file or operand of subsystem is a sequence o f cells containing
the state, and allow authorized users to change the set of key, input, and file as well as all the defined functions o f
definitions and the system program itself. All inputs it D, and that subsystem:operand =<output, newstate>.
accepts will be o f the form <key, input> where key is a Default inputs. In this case the result is given by the
code that determines both the input class (system-change, last (default) function of the definition when key does
expression, program, query, update) and also the identity not satisfy any of the preceding clauses. The output is
of the user and his authority to use the system for the report-error: <key, input>. The state is unchanged since
given input class. We shall not specify a format for key. it is given by defs:operand = D. (We leave to the reader's
Input is the input itself, of the class given by key. imagination what the function report-error will generate
14.4.1 General plan of the system program. The state from its operand.)

636 Communications August 1978


of Volume 21
the ACM Number 8
System-change inputs. When and the FP function is the same as that represented by
the FFP object, provided that update = pUPDA TE and
is-system-change o~KE Y:operand =
COMP, STORE, and CONS represent the controlling
is-system-change:key = T,
functions for composition, store, and construction.
key specifies that the user is authorized to make a system All FP definitions needed for our system can be
change and that input = ~INPUT:operand represents a converted to cells as indicated above, giving a sequence
f u n c t i o n f t h a t is to be applied to D to produce the new Do. We assume that the AST system has an empty state
statef:D. (Of coursef:D can be a useless new state; no to start with, hence S Y S T E M is not defined. We want to
constraints are placed on it.) The output is a report, define S Y S T E M initially so that it will install its next
namely report-change: <input,D>. input as the state; having done so we can then input Do
Expression inputs. When is-expression:key = T, the and all our definitions will be installed, including our
system understands that the output is to be the meaning program--system--itseff. To accomplish this we enter
of the FFP expression input; ~INPUT:operand produces our first input
it and it is evaluated, as are all expressions. The state is <RESET, <CELL, SYSTEM, loader>>
unchanged. where loader = <CONS, <CONST, DONE>,ID>.
Program inputs and system self-protection. When is- Then, by the transition rule for R E S E T w h e n S Y S T E M
program:key = T, both the output and new state are is undefined in D, the cell in our input is put at the
given by (pinput):D =<output, newstate>. If newstate head of D = ~, thus defining p S Y S T E M - ploader -
contains file in suitable condition and the definitions of [DONE, id]. Our second input is Do, the set of definitions
system and other protected functions, then we wish to become the state. The regular transition rule
system-check: <output,newstate> =<output, newstate>. causes the AST system to evaluate
Otherwise, system-check: <output, newstate> #(SYSTEM:Do) -- [DONE, id]:Do = <DONE, Do>. Thus
= <error-report,D>. the output from our second input is DONE, the new
Although program inputs can make major, possibly dis- state is Do, and p S Y S T E M is now our system program
astrous changes in the state when it produces newstate, (which only accepts inputs of the form <key, input>).
system-check can use any criteria to either allow it to Our next task is to load the file (we are given an
become the actual new state or to keep the old. A more initial value file). To load it we input a program into the
sophisticated system-check might correct only prohibited newly installed system that contains-file as a constant
changes in the state. Functions of this sort are possible and stores it in the state; the input is
because they can always access the old state for compar- <program-key, [DONE, store-file]> where
ison with the new state-to-be and control what state
transition will finally be allowed. pstore-file =-- ~FILEo[file, id].
File query inputs. If is-query:key -- T, the function Program-key identifies [DONE, store-file] as a program
query-response is designed to produce the output = to be applied to the state Do to give the output and new
answer to the query input from its operand <input~file>. state D1, which is:
File update inputs. If is-update:key = T, input speci-
fies a f'de transaction understood by the function update, pstore-file:Do = ~FILEo[file, id]:D0,
which computes updated-file = update: <input,file>. Thus or Do with a cell containing file at its head. The output
~FILE has <updated-file, D> as its operand and thus is DONE:Do = DONE. We assume that system-check
stores the updated file in the cell FILE in the new state. will pass <DONE, D1> unchanged. FP expressions have
The rest of the state is unchanged. The function report- been used in the above in place of the FFP objects they
update generates the output from its operand denote, e.g. DONE for <CONST, DONE>.
<input,file>. 14.4.4 Using the system. We have not said how the
14.4.3 Installing the system program. We have de- system's file, queries or updates are structured, so we
scribed the function called system by some FP definitions cannot give a detailed example of file operations. How-
(using auxiliary functions whose behavior is only indi- ever, the structure of subsystem shows clearly how the
cated). Let us suppose that we have FP definitions for system's response to queries and updates depends on the
all the nonprimitive functions required. Then each defi- functions query-response, update, and report-update.
nition can be converted to give the name and contents of Let us suppose that matrices m, n named M, and N
a cell in D (of course this conversion itself would be done are stored in D and that the function MM described
by a better system). The conversion is accomplished by earlier is defined in D. Then the input
changing each FP function name to its equivalent atom
(e.g., update becomes UPDA TE) and by replacing func- <expression-key, (MMo [~M, ~N]o D EFS:#)>
tional forms by sequences whose first member is the would give the product of the two matrices as output and
controlling function for the particular form. Thus an unchanged state. Expression-key identifies the appli-
~FILEo[update, defs] is converted to cation as an expression to be evaluated and since defs:#
<COMP,<STORE, FILE>, = D and [tM, ~'N]:D -- <m,n>, the value of the expres-
<CONS, UPDATE, DEFS>>, sion is the result MM:<m,n>, which is the output.
637 Communications August 1978
of Volume 2 l
the ACM Number 8
Our miniature system program has no provision for d) By defining appropriate functions one can, I be-
giving control to a user's program to process many lieve, introduce major new features at any time, using
inputs, but it would not be difficult to give it that the same framework. Such features must be built into
capability while still monitoring the user's program with the framework of a v o n Neumann language. I have in
the option of taking control back. mind such features as: "stores" with a great variety of
naming systems, types and type checking, communicat-
14.5 Variants of AST Systems ing parallel processes, nondeterminacy and Dijkstra's
A major extension of the AST systems suggested "guarded command" constructs [8], and improved meth-
abow; would provide combining forms, "system forms," ods for structured programming.
for building a new AST system from simpler, component e) The framework of an AST system comprises the
AST systems. That is, a system form would take AST syntax and semantics of the underlying applicative sys-
systems as parameters and generate a new AST system, tem plus the system framework sketched above. By
just as a functional form takes functions as parameters current standards, this is a tiny framework for a language
and generates new functions. These system forms would and is the only fixed part of the system.
have properties like those of functional forms and would
become the "operations" of a useful "algebra of systems" 14.7 Naming Systems in AST and von Neumann
in much the same way that functional forms are the Models
"operations" of the algebra of programs. However, the In an AST system, naming is accomplished by func-
problem of finding useful system forms is much more tions as indicated in Section 13.3.3. Many useful func-
difficult, since they must handle RESETS, match inputs tions for altering and accessing a store can be defined
and outputs, and combine history-sensitive systems (e.g. push, pop, purge, typed fetch, etc.). All these defi-
rather than fixed functions. nitions and their associated naming systems can be in-
Moreover, the usefulness or need for system forms is troduced without altering the AST framework. Different
less clear than that for functional forms. The latter are kinds of "stores" (e.g., with "typed cells") with individual
essential for building a great variety of functions from naming systems can be used in one program. A cell in
an initial primitive set, whereas, even without system one store may contain another entire store.
forms, the facilities for building AST systems are already The important point about AST naming systems is
so rich that one could build virtually any system (with that they utilize the functional nature of names (Rey-
the general input and output properties allowed by the nolds' OEDANr~N [19] also does so to some extent within
given AST scheme). Perhaps system forms would be a v o n Neumann framework). Thus name functions can
useful for building systems with complex input and be composed and combined with other functions by
output arrangements. functional forms. In contrast, functions and names in
von Neumann languages are usually disjoint concepts
14.6 Remarks About AST Systems and the function-like nature of names is almost totally
As I have tiled to indicate above, there can be concealed and useless, because a) names cannot be ap-
innumerable variations in the ingredients of an AST plied as functions; b) there are no general means to
system--how it operates, how it deals with input and combine names with other names and functions; c) the
output, how and when it produces new states, and so on. objects to which name functions apply (stores) are not
In any case, a number of remarks apply to any reasonable accessible as objects.
AST system: The failure of von Neumann languages to treat
a) A state transition occurs once per major computa- names as functions may be one of their more important
tion and can have useful mathematical properties. State weaknesses. In any case, the ability to use names as
transitions are not involved in the tiniest details of a functions and stores as objects may turn out to be a
computation as in conventional languages; thus the lin- useful and important programming concept, one which
guistic yon Neumann bottleneck has been eliminated. should be thoroughly explored.
No complex "cable" or protocols are needed to com-
municate with the state.
b) Programs are written in an applicative language 15. Remarks About Computer Design
that can accommodate a great range of changeable parts,
parts whose power and flexibility exceed that of any von The dominance of von Neumann languages has left
Neumann language so far. The word-at-a-time style is designers with few intellectual models for practical com-
replaced by an applicative style; there is no division of puter designs beyond variations of the von Neumann
programming into a world of expressions and a world of computer. Data flow models [1] [7] [13] are one alterna-
statements. Programs can be analyzed and optimized by tive class of history-sensitive models. The substitution
an algebra of programs. rules of lambda-calculus based languages present serious
c) Since the state cannot change during the compu- problems for the machine designer. Berkling [3] has
tation of system:x, there are no side effects. Thus inde- developed a modified lambda calculus that has three
pendent applications can be evaluated in parallel. kinds of applications and that makes renaming of vail-
638 Communications August 1978
of Volume 21
the ACM Number 8
ables unnecessary. He has developed a machine to eval- programming style of the von Neumann computer. Thus
uate expressions of this language. Further experience is variables = storage cells; assignment statements = fetch-
needed to show how sound a basis this language is for ing, storing, and arithmetic; control statements = jump
an effective programming style and how efficient his and test instructions. The symbol ".----" is the linguistic
machine can be. von Neumann bottleneck. Programming in a conven-
Mag6 [15] has developed a novel applicative machine t i o n a l ~ v o n Neumann--language still concerns itself
built from identical components (of two kinds). It eval- with the word-at-a-time traffic through this slightly more
uates, directly, FP-like and other applicative expressions sophisticated bottleneck. Von Neumann languages also
from the bottom up. It has no von Neumann store and split programming into a world of expressions and a
no address register, hence no bottleneck; it is capable of world of statements; the first of these is an orderly world,
evaluating many applications in parallel; its built-in op- the second is a disorderly one, a world that structured
erations resemble FP operators more than von Neumann programming has simplified somewhat, but without at-
computer operations. It is the farthest departure from tacking the basic problems of the split itself and of the
the yon Neumann computer that I have seen. word-at-a-time style of conventional languages.
There are numerous indications that the applicative Section 5. This section compares a v o n Neumann
style of programming can become more powerful than program and a functional program for inner product. It
the von Neumann style. Therefore it is important for illustrates a number of problems of the former and
programmers to develop a new class of history-sensitive advantages of the latter: e.g., the von Neumann program
models of computing systems that embody such a style is repetitive and word-at-a-time, works only for two
and avoid the inherent efficiency problems that seem to vectors named a and b of a given length n, and can only
attach to lambda-calculus based systems. Only when be made general by use of a procedure declaration,
these models and their applicative languages have proved which has complex semantics. The functional program
their superiority over conventional languages will we is nonrepetitive, deals with vectors as units, is more
have the economic basis to develop the new kind of hierarchically constructed, is completely general, and
computer that can best implement them. Only then, creates "housekeeping" operations by composing high-
perhaps, will we be able to fully utilize large-scale inte- level housekeeping operators. It does not name its argu-
grated circuits in a computer design not limited by the ments, hence it requires no procedure declaration.
von Neumann bottleneck. Section 6. A programming language comprises a
framework plus some changeable parts. The framework
of a von Neumann language requires that most features
16. Summary must be built into it; it can accommodate only limited
changeable parts (e.g., user-defined procedures) because
The fifteen preceding sections of this paper can be there must be detailed provisions in the "state" and its
summarized as follows. transition rules for all the needs of the changeable parts,
Section 1. Conventional programming languages as well as for all the features built into the framework.
are large, complex, and inflexible. Their limited expres- The reason the von Neumann framework is so inflexible
sive power is inadequate to justify their size and cost. is that its semantics is too closely coupled to the state:
Section 2. The models of computing systems that every detail of a computation changes the state.
underlie programming languages fall roughly into three Section 7. The changeable parts of von Neumann
classes: (a) simple operational models (e.g., Turing ma- languages have little expressive power; this is why most
chines), (b) applicative models (e.g., the lambda calcu- of the language must be built into the framework. The
lus), and (c) von Neumann models (e.g., conventional lack of expressive power results from the inability of von
computers and programming languages). Each class of Neumann languages to effectively use combining forms
models has an important difficulty: The programs of for building programs, which in turn results from the
class (a) are inscrutable; class (b) models cannot save split between expressions and statements. Combining
information from one program to the next; class (c) forms are at their best in expressions, but in von Neu-
models have unusable foundations and programs that mann languages an expression can only produce a single
are conceptually unhelpful. word; hence expressive power in the world of expressions
Section 3. Von Neumann computers are built is mostly lost. A further obstacle to the use of combining
around a bottleneck: the word-at-a-time tube connecting forms is the elaborate use of naming conventions.
the CPU and the store. Since a program must make Section 8. APL is the first language not based on
its overall change in the store by pumping vast numbers the lambda calculus that is not word-at-a-time and uses
of words back and forth through the von Neumann functional combining forms. But it still retains many of
bottleneck, we have grown up with a style of program- the problems of von Neumann languages.
ming that concerns itself with this word-at-a-time traffic Section 9. Von Neumann languages do not have
through the bottleneck rather than with the larger con- useful properties for reasoning about programs. Axio-
ceptual units of our problems. matic and denotational semantics are precise tools for
Section 4. Conventional languages are based on the describing and understanding conventional programs,

639 Communications August 1978


of Volume 21
the ACM Number 8
but they only talk about them and cannot alter their FP systems, as compared with the much more powerful
ungainly properties. Unlike von Neumann languages, classical systems. Questions are suggested about algo-
the language of ordinary algebra is suitable both for rithmic reduction of functions to infinite expansions and
stating its laws and for transforming an equation into its about the use of the algebra in various "lazy evaluation"
solution, all within the "language." schemes.
Section 10. In a history-sensitive language, a pro- Section 13. This section describes formal functional
gram can affect the behavior of a subsequent one by programming (FFP) systems that extend and make pre-
changing some store which is saved by the system. Any cise the behavior of FP systems. Their semantics are
such language requires some kind of state transition simpler than that of classical systems and can be shown
semantics. But it does not need semantics closely coupled to be consistent by a simple fixed-point argument.
to states in which the state changes with every detail of Section 14. This section compares the structure of
the computation. "Applicative state transition" (AST) Algol with that of applicative state transition (AST)
systems are proposed as history-sensitive alternatives to systems. It describes an AST system using an FFP system
von Neumann systems. These have: (a) loosely coupled as its applicative subsystem. It describes the simple state
state-transition semantics in which a transition occurs and the transition rules for the system. A small self-
once per major computation; (b) simple states and tran- protecting system program for the AST system is de-
sition rules; (c) an underlying applicative system with scribed, and how it can be installed and used for file
simple "reduction" semantics; and (d) a programming maintenance and for running user programs. The section
language and state transition rules both based on the briefly discusses variants of AST systems and functional
underlying applicative system and its semantics. The naming systems that can be defined and used within an
next four sections describe the elements of this approach AST system.
to non-von Neumann language and system design. Section 15. This section briefly discusses work on
Section 11. A class of informal functional program- applicative computer designs and the need to develop
ming (FP) systems is described which use no variables. and test more practical models of applicative systems as
Each system is built from objects, functions, functional the future basis for such designs.
forms, and definitions. Functions map objects into ob-
jects. Functional forms combine existing functions to Acknowledgments. In earlier work relating to this
form new ones. This section lists examples of primitive paper I have received much valuable help and many
functions and functional forms and gives sample pro- suggestions from Paul R. McJones and Barry K. Rosen.
grams. It discusses the limitations and advantages of FP I have had a great deal of valuable help and feedback in
systems. preparing this paper. James N. Gray was exceedingly
Section 12. An "algebra of programs" is described generous with his time and knowledge in reviewing the
whose variables range over the functions of an FP system first draft. Stephen N. Zilles also gave it a careful reading.
and whose "operations" are the functional forms of the Both made many valuable suggestions and criticisms at
system. A list of some twenty-four laws of the algebra is this difficult stage. It is a pleasure to acknowledge my
followed by an example proving the equivalence of a debt to them. I also had helpful discussions about the
nonrepetitive matrix multiplication program and a re- first draft with Ronald Fagin, Paul R. McJones, and
cursive one. The next subsection states the results of two James H. Morris, Jr. Fagin suggested a number of im-
"expansion theorems" that "solve" two classes of equa- provements in the proofs of theorems.
tions. These solutions express the "unknown" function Since a large portion of the paper contains technical
in such equations as an infinite conditional expansion material, I asked two distinguished computer scientists
that constitutes a case-by-case description of its behavior to referee the third draft. David J. Giles and John C.
and immediately gives the necessary and sufficient con- Reynolds were kind enough to accept this burdensome
ditions for termination. These results are used to derive task. Both gave me large, detailed sets of corrections and
a "recursion theorem" and an "iteration theorem," which overall comments that resulted in many improvements,
provide ready-made expansions for some moderately large and small, in this final version (which they have
general and useful classes of "linear" equations. Exam- not had an opportunity to review). I am truly grateful
ples of the use of these theorems treat: (a) correctness for the generous time and care they devoted to reviewing
proofs for recursive and iterative factorial functions, and this paper.
(b) a proof of equivalence of two iterative programs. A Finally, I also sent copies of the third draft to Gyula
final example deals with a "quadratic" equation and A. Mag6, Peter Naur, and John H. Williams. They were
proves that its solution is an idempotent function. The kind enough to respond with a number of extremely
next subsection gives the proofs of the two expansion helpful comments and corrections. Geoffrey A. Frank
theorems. and Dave Tolle at the University of North Carolina
The algebra associated with FP systems is compared reviewed Mag6's copy and pointed out an important
with the corresponding algebras for the lambda calculus error in the definition of the semantic function of FFP
and other applicative systems. The comparison shows systems. My grateful thanks go to all these kind people
some advantages to be drawn from the severely restricted for their help.
640 Communications August 1978
of Volume 21
the ACM Number 8
References 12. Iverson, K. A Programming Language. Wiley, New York, 1962.
I. Arvind, and Gostelow, K.P. A new interpreter for data flow 13. Kosinski, P. A data flow programming language. Rep. RC 4264,
schemas and its implications for computer architecture. Tech. Rep. IBM T.J. Watson Research Ctr., Yorktown Heights, N.Y., March
No. 72, Dept. Comptr. Sci., U. of California, Irvine, Oct. 1975. 1973.
2. Backus, J. Programming language semantics and closed 14. Landin, P.J. The mechanical evaluation of expressions. Computer
applicative languages. Conf. Record ACM Symp. on Principles of J. 6, 4 (1964), 308-320.
Programming Languages, Boston, Oct. 1973, 71-86. 15. Mag6, G.A. A network of microprocessors to execute reduction
3. Berkling, K.J. Reduction languages for reduction machines. languages. To appear in Int. J. Comptr. and Inform. Sci.
Interner Bericht ISF-76-8, Gesellschaft f'dr Mathematik und 16. Manna, Z., Ness, S., and Vuillemin, J. Inductive methods for
Datenverarbeitung MBH, Bonn, Sept. 1976. proving properties of programs. Comm..4 CM 16,8 (Aug. 1973)
4. Burge, W.H. Recursive Programming Techniques. Addison- 491-502.
Wesley, Reading, Mass., 1975. 17. McCarthy, J. Recursive functions of symbolic expressions and
5. Church, A. The Calculi of Lambda-Conversion. Princeton U. their computation by machine, Pt. 1. Comm. ,4CM 3, 4 (April 1960),
Press, Princeton, N.J., 1941. 184-195.
6. Curry, H.B., and Feys, R. Combinatory Logic, Vol. 1. North- 18. MeJones, P. A Church-Rosser property of closed applicative
Holland Pub. Co., Amsterdam, 1958. languages. Rep. RJ 1589, IBM Res. Lab., San Jose, Calif., May 1975.
7. Dennis, J.B. First version of a data flow procedure language. 19. Reynolds, J.C. GEDANKEN--asimple typeless language based on
Tech. Mem. No. 61, Lab. for Comptr. Sci., M.I.T., Cambridge, Mass., the principle of completeness and the reference concept. Comm.
May 1973. ACM 13, 5 (May 1970), 308-318.
8. Dijkstra, E.W. ,4 Discipline of Programming. Prentice-Hall, 20. Reynolds, J..C. Notes on a lattice-theoretic approach to the theory
Englewood Cliffs, N.J., 1976. of computation. Dept. Syst. and Inform. Sci., Syracuse U., Syracuse,
9. Friedman, D.P., and Wise, D.S. CONS should not evaluate its N.Y., 1972.
arguments. In Automata, Languages and Programming, S. Michaelson 21. Scott, D. Outline of a mathematical theory of computation. Proc.
and R. Milner, Eds., Edinburgh U. Press, Edinburgh, 1976, pp. 4th Princeton Conf. on Inform. Sci. and Syst., 1970.
257-284. 22. Scott, D. Lattice-theoretic models for various type-free calculi.
10. Henderson, P., and Morris, J.H. Jr. A lazy evaluator. Conf. Proc. Fourth Int. Congress for Logic, Methodology, and the
Record Third ACM Symp. on Principles of Programming Languages, Philosophy of Science, Bucharest, 1972.
Atlanta, Ga., Jan. 1976, pp. 95-103. 23. Scott, D., and Strachey, C. Towards a mathematical semantics
I1. Hoare, C.A.R. An axiomatic basis for computer programming. for computer languages. Proc. Symp. on Comptrs. and Automata,
Comm. ,4CM 12, 10 (Oct. 1969), 576-583. Polytechnic Inst. of Brooklyn, 1971.

641 Communications August 1978


of Volume 21
the ACM Number 8
Abstract Types Have Existential Type
JOHN C. MITCHELL
Stanford University
AND
GORDON D. PLOTKIN
University of Edinburgh

Abstract data type declarations appear in typed programming languages like Ada, Alphard, CLU and
ML. This form of declaration binds a list of identifiers to a type with associated operations, a
composite “value” we call a data algebra. We use a second-order typed lambda calculus SOL to show
how data algebras may be given types, passed as parameters, and returned as results of function calls.
In the process, we discuss the semantics of abstract data type declarations and review a connection
between typed programming languages and constructive logic.
Categories and Subject Descriptors: D.3 [Software]: Programming Languages; D.3.2 [Program-
ming Languages]: Language Classifications-applicative languages; D.3.3 [Programming Lan-
guages]: Language Constructs--abstract data types; F.3 [Theory of Conmputation]: Logics and
Meanings of Programs; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming
Languages-denotational semantics, operational semantics; F.3.3 [Logics and Meanings of Pro-
grams]: Studies of Program Constructs-type structure
General Terms: Languages, Theory, Verification
Additional Key Words and Phrases: Abstract data types, lambda calculus, polymorphism, program-
ming languages, types

1. INTRODUCTION
Ada packages [17], Alphard forms [66, 711, CLU clusters [41, 421, and abstype
declarations in ML [23] all bind identifiers to values. Although there are minor
variations among these constructs, each allows a list of names to be bound to a
composite value consisting of “private” type and one or more operations. For
example, the ML declaration
abstype complex = real # real
with create = . . .
and pius = . . .
and re = . . .
andim= ..-

An earlier version of this paper appeared in the Proceedings of the 22thACM Symposium on Principles
of Programming Languages (New Orleans, La., Jan. 14-16). ACM, New York, 1985.
Authors’ addresses: J. C. Mitchell, Department of Computer Science, Stanford University, Stanford,
CA 94305; G. D. Plotkin, Department of Computer Science, University of Edinburgh, Edinburgh,
Scotland EH9 352.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
0 1988 ACM 0164-0925/88/0700-0470 $01.50
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988, Pages 470-502.
Abstract Types Have Existential Type l 471

binds the identifiers complex, create, plus, re, and im to the components of an
implementation of complex numbers. The implementation consists of the collec-
tion defined by the ML expression real # real, meaning the type of pairs of
reals, and the functions denoted by the code for create, plus, and so on. An
important aspect of this construct is that access to the representation is limited.
We cannot apply arbitrary operations on pairs of reals to elements of type
complex; only the explicitly declared operations may be used.
We will call a composite value constructed from a set and one or more
operations, packaged up in a way that limits access, a data algebra. We will
discuss the typing rules associated with the formation and the use of data algebras
and observe that data algebras themselves may be given types in a straightforward
manner. This will allow us to devise a typed programming notation in which
implementations of abstract data types may be passed as parameters or returned
as the results of function calls.
The phrase “abstract data type” sometimes refers to a class of algebras (or
perhaps an initial algebra) satisfying some specification. For example, the ab-
stract type stack is sometimes regarded as the class of all algebras satisfying the
familiar logical formulas axiomatizing push and pop. Associated with this view is
the tenet that a program must rely only on the data type specification, as opposed
to properties of a particular implementation. Although this is a valuable guiding
principle, most programming languages do not contain assertions or their proofs,
and without this information it is impossible for a compiler to guarantee that a
program depends only on a data type specification. Since we are primarily
concerned with properties of the abstract data type declarations used in common
programming languages, we will focus on the limited form of information hiding
or “abstraction” provided by conventional type checking rules.
We can be more specific about how data algebras are defined by considering
the declaration of complex numbers in more detail. Using an explicitly typed
ML-like notation, the declaration sketched earlier looks something like this:
abstype complex = real # real
with create: real + real + complex = Xx: real. Xy: real. ( X, y )
and plus: complex --, complex =
Xz:real # real. Xw:real # real. ( fst(z) + fst(w), snd(z) + snd(w))
and re: complex + real = Xz:real # real.fst(z)
and im: complex + real = Xz:real # real. snd(z)
The identifiers complex, create, plus, re, and im are bound to a data algebra whose
elements are represented as pairs of reals, as specified by the type expression
real # real. The operations of the data algebra are given by the function expres-
sions to the right of the equals signs1 Notice that the declared types of the
operations differ from the types of the implementing functions. For example, re
is declared to have type complex + real, but the implementing expression has
type real # real + real. This is because operations are defined using the concrete
representation of values, but the representation is hidden outside the declaration.
In the next section, we will discuss the type checking rules associated with
abstract data type declarations, which are designed to make complex numbers
1 In most programming languages, function definitions have the form “create(x:real, y:real) = . . .”
In the example above, we have used explicit lambda abstraction to move the formal parameters from
the left- to the right-hand sides of the equals signs.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
472 9 J. C. Mitchell and G. D. Plotkin

“abstract” outside the data algebra definition. In the process, we will give types
to data algebras. These will be existential types, which were originally developed
in constructive logic and are closely related to infinite sums (as in category
theory, for example). In Section 3, we describe a statically typed language SOL.
This language is a notational variant of Girard’s system F, developed in the
analysis of constructive logic [21,22], and an extension of Reynolds’ polymorphic
lambda calculus [62]. An operational semantics of SOL, based on the work of
Girard and Reynolds, is presented using reduction rules. However, we do not
address a variety of practical implementation issues. Although the basic calculus
we use has been known for some time, we believe that the analysis of data
abstraction using existential types originates with this paper. (A preliminary
version appeared as [56].)
The use of SOL as a proof-theoretic tool is based on an analogy between types
and constructive logic. This analogy gives rise to a large family of typed languages
and suggests that our analysis of abstract data types applies to more expressive
languages involving specifications. Since the connection between constructive
proofs and typed programs does not seem to be well known in the programming
language community (at least at present), our brief discussion of specifications
will follow a review of the general analogy in Section 4. Additional SOL program-
ming examples are given in Section 5.
The design of SOL suggests new programming languages along the lines of
Ada, Alphard, CLU, and ML but with richer and more flexible type structures.
In addition, SOL seems to be a natural “kernel language” for studying the
semantics of languages with polymorphic functions and abstract data type
declarations. For this reason, we expect SOL to be useful in future studies of
current languages. It is clear that SOL provides greater flexibility in the use of
abstract data types than previous languages, since data algebras may be passed
as parameters and returned as results. We believe that this is accomplished
without any compromise in “type security.” However, since we do not have a
precise characterization of type security, we are unable to show rigorously that
SOL is secure.’
Some languages that are similar to SOL in scope and intent are Pebble [7],
designed to capture some essential features of Cedar (an extension of Mesa [57]),
and Kernel Russell, KR, of [28], based on Russell [14, 15, 161. Martin-Lof’s
constructive type theory [46] and the calculus of constructions [ll] are farther
from programming language syntax but share many properties of SOL. Some
features of Martin-Lof’s system have been incorporated into the Standard ML
module design [44, 541, which was formulated after the work described here was
completed. We will compare SOL with some of these languages in Section 3.8.

2. TYPING RULES FOR ABSTRACT DATA TYPE DECLARATIONS


The basic typing rules associated with abstract data type declarations do not
differ much from language to language. To avoid the unnecessary complication
of discussing a variety of syntactic forms, we describe abstract data types using
the syntax we will adopt in SOL. Although this syntax was chosen to resemble

’ Research begun after this paper was written has shed some light on the type security of SOL. See
[52] and [55] for further discussion.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 473

common languages, there is one novel aspect that leads to additional flexibility:
We separate the names bound by a declaration from the data algebra they come
to denote. For example, the complex number example is written as follows:
abstype complex with
create: real --, real * complex,
plus: complex + complex,
re: complex + real,
im: complex + real
is
pack real A real
Xx: real.Xy:real. (1c,y )
Xz:real A real.hw:real A real.(fst(z) + fst(lo), snd(z) + snd(w))
Xzreal A real.fst(z)
Xz:real A real.snd(z) to 3 t.[ (real + real --$ t) A (t + t) A (t + real) A (t + real)],
where the expression beginning pack and running to the end of the example is
considered to be the definition of the data algebra. (In SOL, we write real A real
for the type of pairs of reals. When parentheses are omitted, the connective A
has higher precedence than +-.) This syntax is designed to allow implementations
of abstract data types (data algebras) to be defined using expressions of any form
and to emphasize the view that abstract data type declarations commonly
combine two separable actions, defining a data algebra and binding identifiers to
its components.
The SOL declaration of an abstract data type t with operations x1, . . . , X, has
the general form
abstype t with x1: u,, . . . , x,: CT,,is M in N,

where ul, . . . . a, are the types of the operations and M is a data algebra
expression. As in the complex number example above, the type identifier t often
appears in the types of operations x1, . . . , x,. The scope of the declaration is N.
The simplest data algebra expressions in SOL are those of the form
pack TM, . -. M,, to 3t.u

where 7 is a type expression, M,, . . . , M,, are “ordinary” expressions (denoting


values of type r, or functions, for example) and 3 t.a is an “existential type”
describing the way that the data algebra may be used. The language SOL also
allows more general forms of data algebra expressions, which we will get to later
on. There are three typing rules associated with abstype.
It is easy to see that a declaration
abstypetwithx,:al,...,x,:a,
is pack 7M1 . . . Mk to 3t.a
in N

involving a basic data algebra expression only makes sense if k = n (so that each
operation gets an implementation) and the types of Ml, . . . , Mk match the
declared types of the operations x1, . . . , x, in some appropriate way. The matching
rule in SOL is that the type of Mi must be [T/t ]~i, the result of substituting T for
tin ui (with appropriate renaming of bound type variables in ui). To see how this
works in practice, look back at the complex number declaration. The declared
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
474 - J. C. Mitchell and G. D. Plotkin

type of the first operation create is real + real + complex, whereas the type of
the implementing function expression is real + real + (real A real). The matching
rule is satisfied in this case because the type of the implementing code may be
obtained by substituting real A real for complex in the declared type real + real
+ complex.
We can recast the matching rule using the existential types we have associated
with data algebra expressions. An appropriate type for a data algebra is an
expression that specifies how the operations may be used, without describing the
type used to represent values. If each Mi has type [~/t]ui, then we say that
pack 7M1 . . . M,,to 3t.al A . . . A u,
has type 3t.ul A . . . A un. This type may be read “there exists a type t with
operations of types u1 and . . . and bn.” The operator 3 binds the type variable t
in 3 t.a, so 3 t.u = 3 s.[s/t ]a when s does not occur in u. Existential types provide
just enough information to verify the matching condition stated above, without
providing any information about the representation of the carrier or the algo-
rithms used to implement the operations. The matching rule for abstype may
now be stated.
(AB.l) In abstype t with x1: ul, . . . , x,: a, is M in N, the data algebra
expression M must have type 3 t.ul A . - - A a,.
Although it may seem unnecessarily verbose to write the type of pack - . . to
. . . as part of the expression, this is needed to guarantee that the type is unique.
Without the type designation, an expression like pack TM could have many
types. For example, if the type of M is 7 + 7, then pack TM might have types
3 t.t -+ t, 3 t.t += 7, 3 t.7 -+ t, and 3 t.7 + 7. To avoid this, we have included the
intended type of the whole expression as part of the syntax. Something equivalent
to this is done in most other languages. In CLU, for example, types are determined
using the keyword cvt, which specifies which occurrences of the representation
type are to be viewed as abstract. ML, as documented in [23], uses keywords abs
and rep, whereas later versions [50] use type constructors and pattern matching.
An important constraint in abstract type declarations is that only the explicitly
declared operations may be applied to elements of the type [58]. In SOL, this
constraint is formulated as follows:
(AB.2) In abstype t with x1: ul,. . . , x ,, : u,, is M in N, if y is any free identifier
in N different from x1, . . . , x,, then t must not appear free in the type of y.
In addition to accomplishing the goals put forth in [58], this condition is easily
seen to be a natural scoping rule for type identifiers. We can see why (AB.2)
makes sense and what kind of expressions it prevents by considering the following
example.
let f = Xx: stack . . . in
abstype stack with empty : stack,
push : int A stack + stack,
pop : stack + int A stack
is . . .
in f (empty)
end
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988
Abstract Types Have Existential Type 475

In this program fragment, the declaration of function f specifies a formal


parameter x:stack, and so the domain of f is some type called stack. For this
reason, the application off to the empty stack might seem sensible at first glance.
However, notice that since the name stack in the declaration off is outside the
scope of the stack declaration shown, the meaning of stack in the type off is
determined by some outer declaration in the full program. Therefore, the iden-
tifier stack in the type off refers to a different type from the identifier stack in
the type of empty. Semantically, we have a type mismatch. Rule (AB.2) prohibits
exactly this kind of program since the identifier f is free in the scope of the
abstype declaration, but stack occurs free (unbound) in the type off. Note that
rule (AB.2) mentions only free occurrences of type names. This is because
SOL has types with bound variables, and the names of bound variables are
unimportant.
Since SOL abstype declarations are local to a specific scope, rather than
global, we also need to consider whether the representation of a data type should
be accessible outside the scope of a declaration. The designers of ML, another
language with local abstype declarations, decided that it should not (see [23],
p. 56). In our notation and terminology, the ML restriction is
(AB.3) In abstype t with x1: ul, . . . , xn: a, is M in N, the type variable t
must not be free in the type of N.
One way for t to appear free in the type of N is for N to be one of the operations
of the abstract type. For example, if t appears free in ul, then (AB.3) will prohibit
the expression
abstype t with x, : o,, . . . , x,: (r, is M in x1
which exports the first operation outside the scope of the declaration. (If t does
not appear in the type of x1, then this expression is allowed.) In designing
modules for Standard ML, MacQueen has argued that this restriction is too
strong [44, 451. If programs are composed of sets of modules (instead of being
block structured, like SOL terms) then it makes sense to use the constituents of
a data algebra defined in one module to construct a related data algebra in
another module. However, this really seems to be an objection to block-structured
programs and not a criticism of abstype as a means of providing data abstraction.
In fact, there are several good reasons to adopt rule (AB.3) in SOL.
One justification for (AB.3) is that SOL type checking becomes algorithmically
intractable without it. In SOL, we consider any expression of the correct type a
data algebra expression. One useful example not allowed in many conventional
languages is the conditional expression. If both
pack TM, . . . M, to 3 t.o

and
pack pP, a-. P, to 3 t.o

are data algebra expressions of the same existential type, then


if B then pack TM, . . . M, to 3 t.a
else pack pP, I . . P, to 3 t.u
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
476 . J. C. Mitchell and G. D. Plotkin

is a data algebra expression of SOL with type 3 t.a. Conditional algebra expres-
sions are useful for selecting between several alternative implementations of the
same abstract type. For example, a program using matrices may choose between
sparse or dense matrix implementations using a conditional data algebra expres-
sion inside an abstype declaration. Without (AB.3), the type of an abstype
expression with a data algebra conditional such as
abstype t with x1: t, . . . , xn: u,
isifBthen(pack7M1 e-e M,toilt.(r)
else (pack pP1 . . . P, to 3 t.u)
in x1
may depend on whether the conditional test is true or false. (Specifically, the
meaning of the expression above is either iVll or PI, depending on B). Thus,
without (AB.3), we cannot type check expressions with conditional data algebra
expressions at “compile time,” that is, without computing the values of arbitrary
tests.
Another way of describing this situation is to consider the form of type
expression we would need if we wanted to give the expression above a type
without evaluating B. Since the type of the expression actually depends on the
value of B, we would have to mention B in the type. This approach is used in
some languages (notably Martin-Lof’s intuitionistic type theory), but it intro-
duces ordinary value expressions into types. Consequently, type equality depends
on equality of ordinary expressions. Some of the simplicity of SOL is due to the
separation of type expressions from “ordinary” expressions, and considerable
complication would arise from giving this up.
Finally, the termination of all recursion-free programs seems to fail if we drop
(AB.3). In other words, there is a roundabout way of writing programs that do
not halt on any input, without using any recursive declarations or iterative
constructs. This is a complex issue whose full explanation is beyond the scope of
this paper. The reader is referred to [lo], [29], [49], and [54] for further discussion.
Putting all of these reasons together, it seems that dropping (AB.3) would change
the nature of SOL quite drastically. Therefore, we leave the study of abstype
without (AB.3) to future research.
With rule (AB.3) in place, we can allow very general computation with data
algebras. In addition to conditional data algebra expressions, SOL allows data
algebra parameters. An example that illustrates their use is the general tree
search routine given in Section 2.5. The usual algorithms for depth-first search
and breadth-first search may be written so that they are virtually identical,
except that depth-first search uses a stack and breadth-first search uses a queue.
The general tree-search algorithm in Section 2.6 is based on this idea, using a
formal parameter in place of a stack or queue. If a stack data algebra is supplied
as an actual parameter, then the algorithm performs depth-first search. Similarly
a queue parameter produces breadth-first search. Additional structures like
priority queues may also be passed as actual parameters, resulting in “best-first”
search algorithms.
Data algebra parameters are allowed in SOL simply because the typing rules
do not prevent them. If z is a variable with type 3 t.al A . . . A IS,,,then
abstype t with x1: (Jo, . . . , x,: U, is z in N
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 477

is a well-typed expression of SOL. Since SOL allows parameters of all types,


there is nothing to prevent the data algebra z from being a formal parameter. By
typing data algebra expressions and treating all types in SOL in the same way,
we allow conditional data algebra expressions, data algebra parameters, and many
other useful kinds of computation on data algebras.
The next section presents the language SOL in full. To emphasize our belief
that SOL abstype captures the “essence” of data abstraction, we have described
the construct as if we had designed it for this purpose. However, we emphasize
again that SOL is not our invention at all; SOL with existential types was
invented by Girard as a proof-theoretic tool [22], and SOL without existential
types was developed independently by Reynolds as a model of polymorphism
[62]. The purpose of our paper is to explain that existential types provide a
paradigm example of data type declarations and to suggest some advantages of
this point of view.

3. THE TYPED LANGUAGE SOL


We show how implementations of abstract data types can be typed and passed
as function parameters or results by describing the functional language SOL.
Although SOL is an applicative language, we believe that this treatment of data
algebras also pertains to imperative languages. This belief is based on the general
similarity between binding constructs of functional and imperative languages
and is supported by previous research linking lambda calculus and programming
languages (e.g., [36, 37, 631).
There are two classes of expressions in SOL: type expressions and terms. In
contrast to more complicated languages such as Pebble [7], KR [28], Martin-
Lof’s constructive type theory [46], and the calculus of constructions [ 111, types
may appear in terms, but terms do not appear in type expressions. The type
expressions are defined by the following abstract syntax
u::= tlc(u+-7lu A Tlcr v T(vt.u~3t.a.
In our presentation of SOL, we use two sorts of variables, type variables r, s,
t . . and ordinary variables X, y, z, . . . . In the syntax above, t may be any type
iariable and c any type constant. Some possible type constants are int and bool,
which we often use in examples. Intuitively, u + r is the type of functions from
u to r, an element of the product type u A T is a pair with one component
from u and the other from 7, and an element of the disjoint union or tagged sum
type u V 7 is an element of CTor 7.
The two remaining forms involve V and 3, which bind type variables in type
expressions. The universal type V t.u is a type of polymorphic functions and
elements of 3 t.u are data algebras. Free and bound variables in type expressions
are determined precisely using a straightforward inductive definition, with V
binding t in V t.u and 3 binding t in 3 t.u. Since t is bound in V t.u and 3 t.u, we
consider V t.u = Vs.[s/t]u and 3 t.u = 3 s.[s/t]u, provided s does not occur free in
U. (Recall that [s/t]u is the result of substituting s for free occurrences of t in u,
with bound type variables renamed to avoid capture.)
In SOL, as in most typed programming languages, the type of an expression
depends on the types given to its free variables. We incorporate “context” into
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
478 l J. C. Mitchell and G. D. Plotkin

the typing rules using type assignments, which are functions from ordinary
variables to type expressions. For each type assignment A, we define a partial
function TypeA from expressions to types. Intuitively, TypeA (M) = u means that
the type of M is u, given the assignment A of types to variables that may appear
free in M. Each partial function Type* is defined by a set of deduction rules of
the form
TypeA = u, . . .
Typea = T
meaning that if the antecedents hold, then the value of TypeA at N is defined to
be 7. The conditions on TypeA may mention other type assignments if N
binds variables that occur in subterms.
A variable of any type is a term. Formally, we have the axiom
TyPeA = A(x)
saying that a variable x has whatever type it is given. We also allow term
constants, provided that each constant is assigned a type that does not contain
free type variables. One particularly useful constant is the polymorphic condi-
tional cond, which will be discussed after V-types are introduced.

3.1 Functions and Let


In SOL, we take functions of a single argument as basic and introduce functions
of several arguments as a derived form. A function expression explicitly declares
the type of the formal parameter. Consequently, the type of the function body is
determined in a typing context that incorporates the formal parameter type. If
A is a type assignment, then A [X : g] is a type assignment with
if y is the same variable as x
(Ab:al)(~) = ltyj
otherwise.
The deduction rules for function abstraction and application are

‘I’nx~[,:,~(M) = 7
TypeA(Xx:a.M) = u+ 7
and

TypeA = (T + T, TypeA = u
TypeA = 7
Thus a typed lambda expression has a functional type and may be applied to any
argument of the correct type. An example function expression is the lambda
expression
Xx:int. x + 1
for the successor function on integers.
The semantics of SOL is described using a set of operational reduction rules.
The reduction rules use substitution, and, therefore, require the ability to rename
bound variables. For functions, we rename bound variables according to the
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1933.
Abstract Types Have Existential Type 479

equational axiom
xx : u.M = xy : u.[ y/x]M, y not free in M
The operational semantics of function definition and application are captured by
the reduction rule
(Xx : a.M)N + [N/x]M,
where we assume that substitution [N/x]M includes renaming of bound variables
to avoid capture. (Technically speaking, the collection of SOL reduction rules
defines a relation on equivalence classes of SOL terms, where equivalence is
defined by the collection of all SOL axioms for renaming bound variables. See,
e.g., [2] for further discussion). Intuitively, the reduction rule above says that the
expression (XX: a.M)N may be evaluated by substituting the argument N for
each free occurrence of the variable x in M. For example,
(XX: int. x + 2)5 + 5 + 2.
Some readers may recognize this mechanism as the “copy rule” of ALGOL 60.
We write +z+ for the congruent and transitive closure of +.
We introduce let declarations by the abbreviation
let x = M in N :: = (Xx : a.N)M,
where c = TypeA( Note that since the assignment A of types to variables is
determined by context, the definition of let depends on the context in which it
is used. An alternative would be to write let x: u = M in N, but since u is always
uniquely determined, the more succinct let notation seems preferable.
The typing rules and operational semantics for let are inherited directly
from X. For example, we have
let f = Xn:int. x + 3 in f(f(2)) * (2 + 3) + 3.
A similar declaration is the ML recursive declaration
letrec f = M in N
which declares f to be a recursive function with body M. (If f occurs in M, then
this refers recursively to the function being defined; occurrences of f in M are
bound by letrec.) Although we use letrec in programming examples, it is
technically useful to define pure SOL as a language without recursion. This pure
language has simpler theoretical properties, making it easier to study the type
structure of SOL.

3.2 Products and Sums


A simple kind of record type is the unlabeled pair. In SOL, we use A for pair or
product types. Product types have associated pairing and projection functions as
follows:
Typea = u, Type,(N) = 7
TypeA((M, N)) = u A T
TypeA = u A 7
TypeA ( fst M) = u, TypeA (snd M) = 7
ACM ‘hansactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
480 l J. C. Mitchell and G. D. Plotkin

The operational semantics of pairing and projection are given by the reduction
rules
fst(M, N) * M, snd(M, N) + N.
For example,
let p = (1, 2) in f&(p) + 1.

We can introduce multivariable lambda abstraction as an abbreviation involving


products. Some metanotation makes this easier to describe. We write f’y as an
abbreviation for f ( f . . . ( fy) . . . ) with i occurrences off. As a convenience, we
consider f Oy = y. We also write yci)n for the expression fst(sndi-‘y) if 1 5 i < n
and snd”-‘y if i = n. Thus, if y = (x1, (x2, . . . , (x,-~, z,) . . . ) ), we have
y(i) n =>> xi. Using these abbreviations, we define multiargument lambda abstrac-
tion by
X(x1: Ul,. . . ) xn: un ).M::= xy : Ul A ( * - * A un * * *).[ Y(l)“, . . . ) y(yX1,. . . , x,]M
For example, X ( x : u, y : T ) .M is an abbreviation for
AZ: u A T. [ fst z, snd z/x, y]M.

We will also use the abbreviation


let f(q: (rl, . . . , x,: a,) = M in N ::= let f = X (x1: ul, . . . , x,: a,).M in N

which allows us to declare functions using a more fhmiliar syntax.


Sum types V have injection functions and a case expression. The SOL case
statement is similar to the tagcase statement of CLU, for example [41].
TypeA (M) = u
TypeA (inleft M to u V T) = u V T, Typea (inright M to 7 V a) = 7 V u
TwA(M) = u V 7, TwA~,:,l(N) = P, Tme,+:,1(P) = P
TypeA (case M left x : u. N right y : 7.P end) = p
In the expression above, case binds x in N and y in P. As with X-binding, we
equate case expressions that differ only in the names of bound variables.
case M left x: u.N right y: 7.P end
= case M left u: a.[u/x]N right u: ~.[u/y]P end,
provided u is not free in N and u is not free in P.
It is possible to replace the bindings in case expressions with X-bindings as
P suggested in [63], making case a constant instead of a binding operator. However,
the syntax above seems slightly easier to read.
The reduction rules for sums are
case (inleft M to 0 V T) left x: a.N right y: T.P end - [M/x]N
case (inright M to (r V T) left x: u.N right Y:T.P end G. [M/y]P
For example,
let z = inleft 3 to int V boo1 in
case z left z : int.x right y : bool.if y then 1 else 0 end
=>>3
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 481

Note that the type of this case statement remains int if z is declared to be
inright of a Boolean instead of inleft of an integer.

3.3 Polymorphism
Intuitively, Xt.M is a polymorphic expression that can be “instantiated” to values
of various types. In an Ada-like syntax, the term Xt.M would be written
generic (type t )A4
Polymorphic expressions are instantiated using type application, which we will
write using braces 1,) to distinguish it from an ordinary function application.
If M has type V t.a, then the type of M(T ) is [7/t ]u. The Ada-like syntax for
M(T) is
new M(7).

The formal definitions are


TypeA = 7
TypeA(Xt.M) = Vt.7
t not free in A (CC)for any x free in M
TypeA = V t.a
Tme~QWl) = [Tltb
The restriction on the bound variable t in Xt.M eliminates nonsensical expres-
sions like XX: t.Xt.x, where it is not clear whether t is free or bound. (See [20] for
further discussion.) Note that unlike many programming languages, a SOL
polymorphic function may be instantiated using any type expression whatsoever,
regardless of whether its value could be determined at compile time.
One use of V-types is in the polymorphic conditional cond. The constant cond
has type V t.bool + t + t + t. We often use the abbreviation
if M then N else P ::= cond (r)MNP,
where Typea = TypeA = T.
Polymorphic type binding may be used to define polymorphic functions such
as the polymorphic maximum function. The type of a function Max which,
given any type t and order relation T: t A t + boo& finds the maximum of a pair
of t’s is
Max: Vt[(t A t +- bool) + (t A t) + t].
A SOL expression for the polymorphic maximum function is
Max ::= At. Xr:t A t + bool. Xp:t A t.if r(p) then fst(p) else snd(p)
If r: T A T + boo1 is an order relation on type 7, then
Max:bb-(X, Y>
finds the maximum of a pair (x, y ) of elements of type 7. While Max is written
with the expectation that the actual parameter r will be an order relation, the
SOL type checking rules cannot ensure this.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988
482 - J. C. Mitchell and G. D. Plotkin

Intuitive Semantics and Reduction Rules for Xt.M. The intuitive meaning of
ht.M is the infinite product of all meanings of A4 as t varies over all types. In the
next section, we see that abstract data type declarations involve infinite sums.
To see the similarity between V-types and infinite products, we review the
general notion of product, as ,used in category theory [l, 27, 431. There are two
parts to the definition: product types (corresponding to product objects in
categories) and product elements (corresponding to product arrows). Given a
collection S of types, the product type n S has the property that for each s E S
there is a projection function proj s from n S to s. Furthermore, given any
family F = ( fs] of elements indexed by S with fs E s, there is a unique product
element fl F with the property that
proj s fl F = fs.
Uniqueness of products means that if proj s n F = g, for all s E S, then
nF=flG.
The correspondence with SOL is that we can think of a type expression u and
type variable t as defining a collection of types, namely the collection S of all
substitution instances [T/t]u of CJ.If A4 is a term with t not free in the type of
any free ordinary variable, then M and t determine a collection of substitution
instances [T/t]M. It is easy to show that if t is not free in the type of any
free variable of M and TypeA = u, then TypeA ([T/t ]M) = [T/t ]u. By letting
f [T/t10
= [T/t JM, we may view the collection of substitution instances of M as a
family F = 1fs) indexed by elements of S. Using this indexing of instances, we
may regard V t.a as a product type n S and Xt.M as a product element JJ F, with
projection accomplished by type application. The product axiom above leads to
the reduction rule
(ht.M)bl * b/tlM
where we assume that bound variables are renamed in [r/t]M to avoid capture
of free type variables in 7. Since X binds type variables, we also have the renaming
rule
Xt.M = Xs.[s/t]M, s not free in Xt.M.
There is a third “extensionality” rule for X-abstraction over types, stemming
from the uniqueness of products, but we are not concerned with it in this paper
(primarily because it does not seem to be a part of ordinary programming language
implementation and because it complicates the Static Typing Theorem in
Section 3.7).

3.4 Data Abstraction and Existential Types


Data algebras, or concrete representations of abstract data types, are elements
of existential types. The basic rule for data algebra expressions is this.
TypeA = b/tb
TypeA (pack TM to 3 t.a) = 3 t.a *
The more general form described in Section 2 may be introduced as the following
abbreviation:
pack TM, . ..Mnto3t.~.:=pack7(M1,(...,Mn) ... ))toElt.u,
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 483

where
u= Ul A (*.. A un . ..).
Polymorphic data algebras may be written in Ada, Alphard, CLU, and ML.
Since SOL has X-binding of types, we can also write polymorphic representations
in SOL. For example, let t-stuck be a representation of stacks of elements of t,
say,
t-stuck ::= pack (int A array of t ) empty push pop
to 3s.~ A (t A s + s) A (s + t A s),

where empty represents the empty stack, and push and pop are functions
implementing the usual push and pop operations. Then the expression
stack ::= ht.t-stack

with type
stuck: Vt. 3s.[s A (t A s + s) A (s - t A s)]
is a polymorphic implementation of stacks. We could also define a polymorphic
implementation of queues
queue: Vt. 3q.[q A (t A q + q) A (q + t A q)]
similarly. Note that stuck and queue have the same existential type, reflecting
the fact that as algebras, they have the same signature.
Abstract data type declarations are formed according to the rule

TypeA = 3 t.u, Tme,+,] 09 = P


TypeA(abstype s with x: g is M in N) = p ’

provided t is not free in p or the type A(y) of any free y # x occurring in N.

This definition of abstype provides all the type constraints discussed in


Section 2. Condition (AB.l) is included in the assumption TypeA (M) = 3 t.a,
whereas (AB.2) and (AB.3) follow from the restrictions on free occurrences of t.
As mentioned earlier, the only restriction on data algebras is that they have the
correct type. The more general form is defined by the abbreviation
abstype t with x1: cl, . . . , x,: u, is M in N
::= abstypetwithy:a, A (... A mn -..)
is M in [y(l)*, . . . , y’“)“/x,, . . . , x,]N,

where y(l)” is as defined in Section 3.2.


One advantage of combining polymorphism with data abstraction is that we
can use the polymorphic representation of stacks to declare integer stacks. The
expression
abstype int-stk with empty : int-stk,
push: int A int-stk + int-stk,
pop: int-stk + int A int-stk
is stack (int)
in N
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
484 ’ J. C. Mitchell and G. D. Plotkin

declares a type of integer stacks with three operations. Note that the names for
the stack operations are local to N, rather than defined globally by stuck.
3.5 Programming with Data Algebras
One feature of SOL is that a program may select one of several data type
implementations at run time. For example, a parser that uses a symbol table
could be parameterized by the symbol table implementation and passed either a
hash table or binary tree implementation according to conditions. This ability to
manipulate data algebras makes a common feature of file systems and linkage
editors an explicit part of SOL. For example, many of the functions of the CLU
library, a design for handling multiple implementations [41], may be accom-
plished directly by programs.
In allowing programs to select representations, we also allow programs to
choose among data types that have the same signature. This flexibility accrues
from the fact that SOL types are signatures, rather than complete data type
specifications: Since we only check signature information, data types that have
the same signature have implementations of the same existential type. This is
used to advantage in the tree-search algorithm of Figure 1. It may also be argued
that this points out a deficiency in the SOL typing discipline. In a language with
specifications as types, type checking could guarantee that every actual parameter
to a function is an implementation of a stack, rather than just an implementation
with a designated element and two binary operations. Languages with this
capability will be discussed briefly in Section 4.4.
The common algorithm for depth-first search uses a stack, whereas the usual
approach to breadth-first search uses a queue. Since stack and queue implemen-
tations have the same SOL type, the program fragment in Figure 1 declares a
tree-search function with a data algebra parameter instead of a stack or queue.
If a stack is passed as a parameter, the function does depth-first search, while a
queue parameter produces breadth-first. In addition, other data algebras, such as
priority queues, could be passed as parameters. A priority queue produces a “best-
first” search; the search proceeds along paths that the priority queue deems
“best.”
The three arguments to the function search are a node start in a labeled tree,
a label goal to search for, and the data algebra parameter struct. We assume that
one tree node is labeled with the goal, so there is no error test. The result of a
call to search is the first node reached, starting from start, whose label matches
goal. The tree structure is declared at the top of the program fragment to make
the types of the tree functions explicit. The tree has a root, each node has a label
and is either a leaf or has two descendants. The function is-leaf? tests whether
a node is a leaf, while left and right return the left and right descendants of any
nonleaf.
3.6 Reduction Rules and intuitive Semantics of Existential Types
Intuitively, the meaning of the abstype expression
abstype t with x: CTis (pack TM to 3 t.a) in N
is the meaning of N in an environment where t is bound to 7, and x to M.
Operationally, we can evaluate abstype expressions using the reduction rule
abstype t with x: (T is (pack TM to 3 t.a) in N + [M/x][~/t]lV,
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 485

I* Tree data type declaration ‘1


abstype t with root:t, labei:t-string, isleaf?:t-bool,
left:t-t, right:t-t is tree
in

I’ Search returns first node reached from sfurr with label(node) = goal.
The structure parameter may be a stack, queue, priority queue, etc. */
let search(start:t, goahstring, strnct: VEls[s/\(tAs-s)r\(s-Us)]) =
abstype s with empty:s, insert:t/\s-s, delete:s-tAs
is stnlct {t}
in
1’ function to select next node; also returns updated structure ‘/
let next(node:t, st:s) =
if isleaf?(node) then delete(st)
else delete(insert(left(node), insert(right(node),st)))
in
/* recursive function jind calls near until goal reached +/
letrec find(node:t, st:s) =
if label(node)=goal then node else find(next(node, St))
in
/* callfind to reach node with label(node)=goaL*/
find(start, empty)
end
end
end
in
.. /* program using search function *I
end
end

Fig. 1. Program with search function directed by data algebra


parameter.

where substitution includes renaming of bound variables as usual. (It is not too
hard to prove that the typing rules of SOL guarantee that [iV/x][~/t]N is well-
typed.) Since abstype binds variables, we also have the renaming equivalence

abstype t with x: IJ is M in N = abstype s with y: [s/t]a is M in [ y/x][s/t]N,

provided s is not free in u, and neither s nor y is free in N.


Existential types are closely related to infinite sums. We can see the relation-
ship by reviewing the categorical definition of infinite sum [ 1,27,43]. The general
definition of sum includes sum types (corresponding to sum objects in categories)
and sum functions (corresponding to sum arrows). Given a collection S of types,
the sum type C S has the property that for each s E S there is an injection
function inj s from s to C S. Furthermore, given any fixed type r and family
F = ( fsj of functions indexed by S, with fs: s + r, there is a unique sum
function C F: 2 S + r with the property that

C F(inj sx) = fsx.


ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
466 l J. C. Mitchell and G. D. Plotkin

Uniqueness of sums means that if 2 F(inj sx) = g, for all s E S, then


CF=CG.
The correspondence with sums is similar to the correspondence between
polymorphism and products. It will be easier to see how abstype gives us sums
if, for any term, M with TypeA = u + p and t not free in p, we adopt the
abbreviation
1 t.M :: = AZ: ( 3 t.a). abstype t with x : (Tis z in Mx
where x and z are fresh variables. To see how C t.M is a sum function, recall
from the discussion of V-types that a type expression u and type variable t define
a collection of types, namely, the collection of substitution instances [~/t]a.
Similarly, a term M and type variable t define a family F of substitution instances
[T/t]M. As before, we index elements of F by types in S by associating
[T/t]M with [T/t]a. If M has type c + p for some p that does not have t free,
then F is a family of functions from types in S to a fixed p. We may now
regard the type 3 t.a as the sum type x S, the term C t.M as the sum element
2 F, and Ay:s.(pack sy to 3 t.a) as the injection function inj s. The sum axiom
holds in SOL, since

(2 04) (pack TY to 3 t.a)


::= [AZ: 3 ta. abstype t with x : (r is z in Mx)]pack my to 3 t.o
+z- abstype t with x: u is (pack my to 3 La) in Mx)
--3> [T/t]My.

It is interesting to compare abstype with case since V-types with inleft, inright,
and case correspond to finite categorical sums. Essentially, abstype is an
infinitary version of case.
As an aside, we note that the binding construct abstype may be replaced by a
constant sum. This treatment of abstype points out that the binding aspects of
abstype are essentially X binding. If N is a term with type u + p, and t is not
free in p, then both Xt.N and C t.N are well typed. Therefore, it suffices to have
a function sum 3t.a p that maps Xt.N:Vt.[a + p] to C t.N: (3 t.a) + p.
Essentially, this means sum 3 t.u p must satisfy the equation
(sum 3t.a px)(pack my to 3t.a) = x(~)y
for any x, y of the appropriate types. In the version of SOL with sum as basic,
we use this equation, read from left to right, as the defining reduction rule for
sum. Given sum, both C and abstype may be defined by
C t.M ::= sum 3 t.u p Xt.M,
abstype t with x: u is N in M ::= (C th: a.M)N.

The reduction rules for C and abstype follow the reduction rules for sum. From
a theoretical point of view, it would probably be simpler to define SOL using
sum instead of C or abstype, since this reduces the number of binding operators
in the language. However, for expository purposes, it makes sense to take abstype
as primitive, since this makes the connection with data abstraction more readily
apparent. The difference is really inessential since any one of C, abstype, and
sum may be used to define the other two (using other constructs of the language).
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type l 487

3.7 Properties of SOL


Two important typing properties of SOL can be proved as theorems. The first
theorem may be interpreted as saying that SOL typing prevents run-time type
errors. Technically, the Type Preservation Theorem says that if we begin with a
well-typed term (expression or program) and evaluate or “run” it using the
reduction rules given, then at every step of the way we have a well-typed term
of the same type. This implies that if a term M contains a function f of type
int + int, say, then evaluation will never produce an expression containing f
applied to a Boolean argument, since this would not be well typed. Therefore,
although evaluating a term M may rearrange it dramatically, evaluation will only
produce terms in which f is applied to integer arguments.
TYPE PRESERVATION THEOREM. Let M be a term of SOL with TypeA (M) = U.
If M + N, then TypeA (N) = CT.
A similar theorem for a simpler language without polymorphism appears in [ 121,
where it is called the Subject Reduction Theorem. The proof uses induction on
reduction paths and is essentially straightforward.
Another important theorem is a formal statement of the fact that type infor-
mation may be discarded at run time. More specifically, it is clear from the
language definition that SOL type checking can be done efficiently without
executing programs (i.e., without referring to the operational semantics of the
language). The Static Typing Theorem shows that once the type of a term has
been calculated, the term may be evaluated (or “run”) without further examining
types. This is stated formally by comparing the operational reduction rules given
in the language definition with a similar set of reduction rules on untyped terms.
Given a term M, we let Erase(M) denote the untyped expression produced by
erasing all type information from M. The function Erase has the simple inductive
definition
Erase(x) = x
Erase(c) = c
Eru.se( Xx : a.M) = Xx.Eruse(M)
Eruse(MN) = Erase(M)Er
Eruse( (M, N)) = (Erase(M), Erase(N))
Erase ( fst M) = fst Erase(M)
Erme(snd M) = snd Erase(M)
Erase(inleft M to u V 7) = inleft Erase(M)
Erase( inright M to u V 7) = inright Erase(M)
Erme(case M left x : a.N right y : 7.P end)
= case Erase(M) left x.Eruse(N) right y.Eruse(P) end
Eruse(Xt.M) = Erase(M)
Eruse(M(T)) = Erase(M)
Era.se(pack pM to 3t.u) = Erase(M)
Erme(abstype t with x : u is M in N) = let x = Erase(M) in Erase(N)
We define the untyped reduction relation jE by erasing types from terms in
each reduction rule, for example,
(Xx.M)N +E [N/x]M.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
488 . J. C. Mitchell and G. D. Plotkin

Let qE be the congruent and transitive closure of =s~. Then we have the
following theorem:
STATIC TYPING THEOREM. Let M, N be two terms of SOL with Type*(M) =
TypeA( Then M +c= N iff Erase(M) aE Erase(N).
Since the theorem shows that two sets of reduction rules have essentially
equivalent results, it follows that programs may be executed using any interpreter
or compiler on the basis of untyped reduction rules. Like the Type Preservation
Theorem, the proof uses induction on the length of reduction paths and is
essentially straightforward. Although easily proved, these theorems are important
since they confirm our expectations about the relationship between typing and
program execution.
It is worth mentioning the relationship between the Static Typing Theorem
and the seemingly contradictory “folk theorem” that tagged sums (in SOL
notation, g V T types) require run-time type information. Both are correct but
based on different notions of “untyped” evaluation. The Static Typing Theorem
says that if a term M is well typed, then M can be evaluated using untyped
reduction => E. However, notice that Erase does not remove inleft and inright,
only the type designations on these constructs. Therefore, in evaluating a case
statement
case M left ... right . . . end
the untyped evaluation rules can depend on whether M is of the form inleft M,
or inright Ml. In the “ folk theorem,” this is considered type information, hence
the apparent contradiction.
The SOL reduction rules have several other significant properties. For example,
the reduction rules have the Church-Rosser property [22,61].
CHURCH-R• SSERTHEOREM. Suppose M is a term of SOL which reduces to
Ml and M2. Then there is a term N such that both M, and Mz reduce to N.
In contrast to the untyped lambda calculus, no term of SOL can be reduced
infinitely many times.
STRONG NORMALIZATION THEOREM. There are no infinite reduction se-
quences.
The strong normalization theorem was first proved by Girard [22]. In light of
the strong normalization theorem, the Church-Rosser theorem follows from a
simple check of the weak Church-Rosser property (see Proposition 3.1.25 of [2]).
A consequence of Church-Rosser and Strong Normalization is that all maximal
reduction sequences (from a given term) end in the same normal form.3 As
proved in Girard’s thesis [22] and discussed in [20] and [59], the proof of the
strong normalization theorem cannot be carried out formally in either Peano
arithmetic or second-order Peano arithmetic (second-order Peano is also called
“analysis”). Furthermore, the class of number-theoretic functions that are

LIA normal form M is a term that cannot be reduced. Our use of the phrase strong normalization
follows [2]. Some authors use strong normalization for the property that all maximal reduction
sequences from a given term end in the same normal form.
ACM Transactions on Programming Languages and Systems, Vol. 10, NO. 3, July 1988
Abstract Types Have Existential Type 489

representable in pure SOL without base types are precisely the recursive functions
that may be proved total in second-order Peano arithmetic [22, 681. These and
related results are discussed in [20] at greater length.

3.8 Alternative Views of Abstract Data Type Declarations


As noted in the introduction, several language design efforts are similar in spirit
to ours. The language SOL is based on Reynolds’ polymorphic lambda calculus
[62] and Girard’s proof-theoretic language [22]. Some similar languages are
Pebble [ 71, Kernel Russel, KR, [2&J],ML with modules as proposed by MacQueen
[44], and Martin-Lof’s constructive type theory [46]. We compare abstype in
SOL with an early proposal of Reynolds [62] and, briefly, with the constructs of
Pebble and KR.
In defining the polymorphic lambda calculus, Reynolds proposed a kind of
abstype declaration based on X-binding [62]. As Reynolds notes, the expression
abstype t with x,: o,, . . . , x,: a, is M in N

has the same meaning as


(hLhx1: (rl . . . Ax,: U,.iv)(7JMl, . . . , M,
If M is of the form pack 7M1 . . . M, to 3 t.a. However, abstype should not be
considered an abbreviation for this kind of expression for two reasons. First, it
is not clear what to do if M is not of the form pack rM1 . . . M,, to 3 t.a.
Therefore, we can only simulate a restricted version of SOL by this means; much
flexibility is lost. A lesser drawback of using X to define abstype in this way is
that the expression
(Xt.X(xl: ~1, . . ., ix,: a,).N){~jM1 . . . M,,
is well typed in cases in which the corresponding abstype expression fails to
satisfy (AB.3). As noted in Section 2, rule (AB.3) keeps the “abstract” type from
being exported outside the scope of a declaration. However, other justifications
for (AB.3) discussed in Section 2 do not apply here, since Reynolds’ suggestion
cannot be used to construct conditional data algebra expressions, for example.
While the above definition of abstype using X has some drawbacks, a more
suitable definition using X is described in the final section of the later paper [64].
Pebble and KR take a view of data algebras that appears to differ from SOL.
An intuitively appealing view of pack TM, . . . M,, is simply as a record whose
first component is a type. This seems to lead one to introduce a “type of types,”
a path followed by [7] and [28]. We would expect a product type for pack e. . to
be something like
Type A crl A me. A a,,.
However, this does not link the value of the first component to the types of the
remaining components. To solve this problem, Pebble and KR associate abstract
data types with “dependent product” types of the form
t:Type A (rl A . . . A u,,,
where t is considered bound in u1 A . . . A a,.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
490 l J. C. Mitchell and G. D. Plotkin

Since Pebble does not supply projection functions for dependent products, the
dependent product of Pebble actually seems to be a sum (in the sense of category
theory), like SOL g-types. KR dependent products do have something that looks
like a projection function: If A is a data algebra, then Currier(A) is a type
expression of KR. However, since Carrier(pack TM to 3 La) is not considered
equal to 7, it seems that KR dependent products are not truly products. Perhaps
further analysis will show that KR dependent products are also sums and closer
to SOL existential types than might appear at first glance.
As pointed out in [30], there are actually two reasonable notions of sum type,
“weak” and “strong” sums. The SOL existential type is a typical example of weak
sums, whereas strong sums appear as the C-types of Martin Lof’s type theory
[46]. The main difference lies in rule (AB.3), which holds for weak sums, but not
for strong. Thus, while Martin-Lof’s product types over universes give a form of
polymorphism that is similar to SOL polymorphism, Martin-Lof’s sum types
differ from our existential types. For this reason, the languages are actually quite
different. In addition, the restrictions imposed by universes simplify the seman-
tics of Martin-Lof’s language, at the cost of a slightly more complicated
syntax. (Some relatively natural programming examples, such as the Sieve of
Eratosthenes program given in Section 5.2 of this paper, are prohibited by
the universe restrictions of Martin-Lof type theory.) For further discussion of
sum and product types over universes, the reader is referred to [9], [lo], [31],
[451, [461, [491, and [541.

4. FORMULAS AS TYPES

4.1 Introduction
The language SOL exhibits an analogy between logical formulas and types that
has been used extensively in proof theory [12, 13, 22, 30, 35, 38, 39, 46, 67, 691.
The programming significance of the analogy has been stressed by Martin-Lof
[46]. We review the basic idea using propositional logic and then discuss quan-
tification briefly. In addition to giving some intuition into the connection between
computer science and constructive logic, the formulas-as-types analogy also
suggests other languages with existential types. One such language, involving
specifications as types, is discussed briefly at the end of this section. In general,
our analysis of abstype suggests that any constructive proof rules for existential
formulas provide data type declarations. For this reason, the formulas-as-types
languages provide a general framework for studying many aspects of data
abstraction.

4.2 Propositional Logic


Implicational propositional logic uses formulas that contain only propositional
variables and 3, implication. The formulas of implicational propositional logic
are defined by the grammar
u ::= ti u --) 7,
where we understand that t is a propositional variable. We are concerned with
an intuition&c interpretation of formulas, so it is best not to think of formulas
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 491

as simply being true or false whenever we assign truth values to each variable.
While various forms of intuitionistic semantics have been developed [IO, 33, 34,
701, we will not go into this topic. Instead, we will characterize intuitionistic
validity by means of a proof system.
Natural deduction is a style of proof system that is intended to mimic the
common blackboard-style argument
Assume u.
By . . . we conclude 7.
Therefore u + 7.
We make an assumption in the first line of this argument. In the second line,
this assumption is combined with other reasoning to derive 7. At this point, we
have proved T, but the proof depends on the assumption of u. In the third step,
we observe that since u leads to a proof of 7, the implication 6 + r follows. Since
the proof of u + r is sound without proviso, we have “discharged” the assumption
of u in proceeding from 7 to u + T. In a natural deduction proof, each proposition
may depend on one or more assumptions. A proposition is considered proved
only when all assumptions have been discharged.
The natural deduction proof system for implicational propositional logic
consists of three rules, given below. For technical reasons, we use labeled
assumptions. (This is useful from a proof-theoretic point of view as a means of
distinguishing between different assumptions of the same formula.) Let V be a
set, intended to be the set of labels, and let A be a mapping from labels to
formulas. We will use the notation Conseq,(M) = u to mean that M is a proof
with consequence u, given the association A of labels to assumptions. Proofs and
their consequences are defined as follows:

ConseqA (x) = A(x)


ConseqA(M) = u --, 7, ConseqA (N) = u
,
ConseqA(MN) = 7
ConseqAb,l(M) = 7
ConseqA(Xx: u.M) = u + T ’
The set Assume(M) of undischarged assumptions of M is defined by
Assume(x) = (x)
Assume(MN) = Assume(M) U Assume(N)
Assume(Xx: a.M) = Assume(M) - {x)
In English, we may summarize these two definitions as follows:
A label x is a proof of A(x) with assumption labeled x.
If M is a proof of u + r and N is a proof of u, then MN is a proof of r
(depending on all assumptions used in either proof).
If M is a proof of 7 with assumption u labeled x, then Xx: u.M is a proof of
u + r with the assumption x discharged.
A formula u is intuitionistically provable if there is a proof M with ConseqA (M)
= (T and Assume(M) = 0. (It is easy to show that if Assume(M) = 0, then
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
492 l J. C. Mitchell and G. D. Plotkin

ConseqA(M) does not depend on A.) Even when ---) is the only propositional
connective, there are classical ,tautologies that are not intuitionistically provable.
For example, it is easy to check that the formula ((s + t) + s) + s is a classical
tautology just by trying all possible assignments of true and false to s and t.
However, this formula is not intuitionistically provable.
Of course, we have just defined the typed lambda calculus: The terms of typed
lambda calculus are precisely the proofs defined above and their types are the
formulas given. In fact, ConseqA and TypeA are precisely the same function, and
Assume(M) is precisely the set of free variables of M. The similarity between
natural deduction proofs and terms extends to the other connectives and quan-
tifiers. The proof rules for A, V, V, and 3 are precisely the formation rules given
earlier for terms of these types.
One interesting feature of the proof rule for V of [60] is that it is the
discriminating case statement of CLU [42], rather than the problematic outleft
and outright functions of ML [23]. The “out” functions of ML are undesirable
since they rely on run-time exceptions (cf. [41], p. 569). Specifically, if X: r
in ML, then (inright 3~): cf V 7 and outleft(inright x): 6. However, we cannot
actually compute a value of type g from x : T, so this is not semantically sensible.
The ML solution to this problem is to raise a run-time exception when
outleft(inright X) is evaluated, which introduces a form of run-time type
checking. Since the V rule leads us directly to a case statement that requires no
run-time type checking, it seems that the formulas-as-types analogy may be a
useful guide in designing programming languages.

4.3 Universal and Existensial Quantifiers


The intuitionistic proof rules for universal and existential types are repeated
below for emphasis. It is a worthwhile exercise for the reader to become convinced
that these make logical sense.
Conseab, (M) = V t.cr
Conseq,(M(r)) = [T/t]a’
ConseqA(M) = 7
t not free in A(x) for x free in M,
ConseqA (1t.M) = V t.r
Conseq,(M) = [T/t]u
ConseqA (pack TM to 3 t.a) = 3 t.a ’
ConseqA(M) = 3 t.a, Conseq,I,:.l(N) = p
Conseqa (abstype s with x : (T is M in N) = p ’
provided t is not free in p or the type A ( y) of any free y # x occurring in N.
The rules for V are the usual universal instantiation and generalization. The
third is an existential generalization rule, and the fourth a form of existential
instantiation. Except for the explicit proof notation chosen to suggest program-
ming language syntax, these proof rules are exactly those found in [60]. Although
a full discussion would take us beyond the scope of this paper, it is worth
remarking that reduction rules may also be derived using the formulas-as-types
analogy: The reduction rules of SOL are precisely the proof-simplification rules
given in [61].
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 493

4.4 Other Languages With Existential Types


The formulas-as-types analogy can be applied to other natural deduction proof
systems. Two particularly relevant logics are the second-order logics of [60],
Chapter V. The simpler of these amounts to adding first-order terms to the
second-order logic of SOL. In this language, types are formulas that describe the
behavior of terms.
In an ideal programming language, we would like to use specifications to
describe abstract data types. The ideal or “intended” type of stack is the
specification
Vt. 3s. 3empty:s. 3push:t A s + s.
3pop: s + t A s: Vx:s. Vy: t. (pop(push(x, y)) = (x, y)),
or, perhaps more properly, a similar specification with an induction axiom:

Vt. 3s. 3empty:s. 3push: t A s + s. 3pop:s + t A s.


Vx:s.tly: t. (pop(push(x, y)) = (x, y) A induction axiom].
Both specifications are, in fact, type expressions in the language based on first-
and second-order logic. We expect the meaning of each type expression to
correspond to a class of algebras satisfying the specification (see, e.g., [24] for a
discussion of universal algebra). However, the language based on first- and
second-order logic is cumbersome for programming since constructing an element
of one of these existential types involves proving that an implementation meets
its specification. Some interesting research into providing environments for
programming with specifications as types is provided in [S] and [9]. Induction
rules, used for proofs by “data type induction” [25], are easily included in
specifications since induction is expressible in second-order logic.
A richer “ramified second-order” system in Chapter V of [60] includes X-
abstraction in the language of types. Via formulas-as-types, this leads to the
richer languages of [47] and [51].

5. MORE PROGRAMMING EXAMPLES

5.1 Universal and Existential Parameterization


Some useful constructions involving abstract data types are to pass representa-
tions as parameters, parameterize the data types themselves, and return imple-
mentations as results of procedures. In SOL, we can distinguish between two
kinds of type parameterization. Suppose M uses operations x: (Ton type t, and t
is not free in the type of any other free variable of M. Then the terms
M, = Xt.Xx:a. M
M2 = z t.Xx:cT. M
are both parameterized by a type and operations. However, there are significant
differences between these two terms. To begin with, M, is well typed even if t
appears free in the type of M, where M2 is not. Furthermore, the two terms have
different types. If the type of M is p, then their types are
Ml: Vt(a + p)
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
494 l J. C. Mitchell and G. D. Plotkin

and
M2: (3t.a) */I.
We will say that MI is universally parameterized and MZ is ex&entially pararm+
terized.
Generic packages are universally parameterized data algebras. For example,
given any type t with operations
plus: t A t + t
times: t A t + t,
we can write a data algebra t-matrix implementing matrix operations over t. Four
operations we might choose to include are
create: t A ... A t-mat
mplus: mat A mat + mat,
mtimes: mat A mat * mat,
a!&: mat + t.
If mbody is an expression of the form
mbody ::= pack 7M1 . . - M, to 3s[(t A -. - A t + s)
A (s A s + s) A (s A s + s) A (s + t)]
implementing create, m&s, mtimes, and det using plus and times, then
matrix ::= At. Aplus: t A t + t. Xtimes: t A t + t.mbody
is a universally parameterized data algebra. The type of matrix is
Vt.(t A t + t) --, (t A t + t) + 3s[(t A . . . A
t + s) A (s A s + s) A (s A s + s) A (s + t)].
Note that mbody could not be existentially parameterized by t since t appears
free in the type of mbody.
Functions from data algebras to data algebras are existentially parameterized.
One simple manipulation of data algebras is to remove operations from the
signature. For example, a doubly ended queue, or dequeue, has two insert and
two remove operations. The type of an implementation dq of dequeues with
empty, insertl, insert2, removel, and remove2, is
dq-type ::= Vt.3d.[d A (t A d + d) A
(t A d + d) A (d + t A d) A (d + t A d)]
A function that converts dequeue implementations to queue implementations
is a simple example of an existentially parameterized structure. Given dq, we can
implement queues using the form
Q(x, t) ::= abstype d with empty: . . . , insertl: . . . , insert2: . . . ,
removel: . . . , remove2: . . .
is x(t]
in pack d empty insert1 remove2 to 3 t.a

with dq substituted for X. Thus the term


dq-to-q ::= Xx:dq-type.Xt.Q(x, t)
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 495

with type
&-type ---, Vt. 3s.[s A (t A s + s) A (s + t A s) ]
is a function from data algebras to data algebras. Suppose that queue is the data
algebra produced by applying dq-to-q to dq. Since the type of queue is a closed
type expression, the fact that queue uses the same representation type as dq
seems effectively hidden. Generally, universal parameterization may be used to
effect some kind of sharing of types, whereas existential parameterization ob-
scures the identity of representations. (See [45], which was written later, for
related discussion.)
Some other useful transformations on data algebras are the analogs of the
theory building operations combine, enrich, and derive of CLEAR [5,6]. Although
a general combine operation as in CLEAR, for example, cannot be written in
SOL because of type constraints, we can write a combine operation for any pair
of existential types. For example, we can write a procedure to combine data
algebras of types 3s.~ and 3 t.p into a single data algebra. The type of this
function
Combine, = Xx: 3 t.u Xy : 3 t.p.
abstype s with z: u is x in
abstype t with w : p is y in
packs [pack t(z, w) to 3t(u A p)] to 3s3t(u A p)
is
Combine,: 3s.~ + 3t.p 4 3s3t(u A p).
For universally parameterized data algebras of types V r 3 S.CTand V r 3 t.p, we can
write combine so that in the combined data algebra, the type parameter will be
shared. The combine function with sharing
Combines = Xx:VrZls.a XyzVr3t.p.
Xr.abstype s with z : tr is x(r) in
abstype t with w:p is y(r) in
packs [pack t(z, w) to 3t(u A p)] to 3s3t(u A p)
has type
Combinez: Vr3s.a + Vr3t.p + Vr3s3t(u A p).
A similar, but slightly more complicated, combine function can be written for
the case in which the two parameters are both universally parameterized by a
type and several operations on the type. For example, a polymorphic matrix
package could be combined with a polymorphic polynomial package to give a
combined package parameterized by a type t and two binary operations plus and
times providing both matrices and polynomial over t. Furthermore, the combine
function could be written to enrich the combined package by adding a function
that finds the characteristic polynomial of a matrix.
5.2 Data Structures Using Existential Types
Throughout this paper, we have viewed data algebras as implementations of
abstract data types. An alternative view is that data algebras are simply records
tagged with types. This view leads us to consider using data algebras as parts of
data structures. In many cases, these data structures do not seem directly related
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
496 - J. C. Mitchell and G. D. Plotkin

to any kind of abstract data type. The following example uses existentially typed
data structures to represent streams.
Intuitively, streams are infinite lists. In an applicative language, it is convenient
to think of a stream as a kind of “process” that has a set of possible internal
states and a specific value associated with each state. Since the process imple-
ments a list, there is a designated initial state and a deterministic state transition
function, Therefore, a stream consists of a type s (of states) with a designated
individual (start state) of type s, a next-state function of type s + s, and a value
function of type s --, t, for some t. An integer stream, for example, will
have a value function of type s + int, and so the type of integer streams will be
3s[s A (s ---) s) A (s --, int)].
The Sieve of Eratosthenes can be used to produce an integer stream enumer-
ating all prime numbers. This stream is constructed using a sift operation on
streams. Given an integer stream sl, Sift(sl) is a stream of integers that are not
divisible by the first value of sl. If Num is the stream 2, 3, . . . , then the sequence
formed by taking the first value of each stream
Num, Sift(Num), Sift(Sift(Num)), ...
will be the sequence of all primes.
With streams represented using existential types, Sift may be written as the
following function over existential types.
Sift =
X stream: 3s[s A (s --, s) A (s - int)].
abstype s with start : s, next : s -+ s, value : s + int is stream
in let n = value(start)
in letrec f = X state : s.
if n divides value(state) then f (next(state))
else state
in
pack s f (start) Xx: s.f (next(x)) value to 3s[s A (s + s) A (s + int)]
end
end
end
Sieve will be the stream with states represented by integer streams, start state
the stream of all integers greater than 1, and Sift the successor function on states.
The value associated with each Sieve state is the first value of the integer stream,
so that the values of Sieve enumerate all primes.
Sieve =
abstype s with start : s, next: s + s, value : s + int
ispack 3t[tA (t-t) A (t-+int)]
packint2 Successor Xx:int.r to 3t[t A (t-t) A (t+int)]
Sift
Xstate:Yt[tA(t+t)A(t-+int)].
abstype F with r-start, r-next, r-val is state
in r-val( r-start)
to!lt[tA(t+t)A(t+int)]
Expressed in terms of Sieve, the ith prime number is
abstype s with start : s, next : s + s, value : s + int
is Sieve
in value(next’ start),
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 497

where “next’ start” is the expression next(next(. . . (next start). . .)) with i occur-
rences of next.
It is worth noticing that Sieve is “circular” in the sense that the representation
type 3t[t A (t + t) A (t + int)] used to define Sieve is also the type of Sieve
itself. For this reason, this example could not have been written in a predicative
system like Martin-Lof’s intuitionistic type theory [9, 461. The typing rules of
that theory require that elements of one type be composed only of elements of
simpler types.

6. CONCLUSION AND DIRECTIONS FOR FURTHER INVESTIGATION


We have used language SOL, a syntactic variant of Girard’s system F and an
extension of Reynolds’ polymorphic lambda calculus [22, 621, to discuss abstract
data type declarations in programming languages. SOL is easily defined and has
straightforward operational semantics. The language also allows us to decompose
abstract data type declarations into two parts: defining a data algebra and binding
names to its components. For this reason, SOL allows implementations of
abstract data types to be passed as function parameters or returned as results.
This makes the language more flexible than many contemporary typed languages,
without sacrificing efficient compile-time type checking.
The flexibility of SOL comes about primarily because we treat data algebras
as values that have types themselves. The types of data algebras in SOL are
existential types, a type motivated by an analogy between programming languages
and constructive logic and closely related to infinite sums. We believe that
although the design of SOL does not address certain practical objectives, the
language demonstrates useful extensions to current programming languages. SOL
also seems very useful for studying the mathematical semantics of data type
declarations.
One promising research direction is to use SOL to formalize and prove some
natural properties of abstract data types. For example, if M and N implement
two data algebras with the same observable behavior (see, e.g., [32]), then the
meaning of a program using M should correspond appropriately to the meaning
of the same program using N. However, SOL is sufficiently complicated that it
is not clear how to define “observable behavior.” Among other difficulties, data
algebras are heterogeneous structures whose operations may be polymorphic or
involve existential types. Reynolds, Donahue, and Haynes have examined various
related “representation independence” properties of SOL-like languages without
existential types [18, 26, 55, 641. Some of these ideas have been applied to SOL
in [52], which was written after the work described here was completed. However,
there is still much to be done in this direction.
There are a number of technical questions about SOL that merit further study.
The semantics of various fragments of SOL are studied in [3], [4], [ 181, [26],
[47], [51], [62], and [65], but many questions remain. Some open problems are
listed in [3], [4], and 1511. In addition, there are a number of questions related
to automatic insertion of type information into partially typed expressions of
SOL. For example, it would be useful to find an algorithm which, given a term
M of the untyped lambda calculus, could determine whether type expressions
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
498 - J. C. Mitchell and G. D. Plotkin

and type binding can be added to M to produce a well-typed term of SOL. Some
questions of this nature are discussed in [40], [48], and [53].
A general problem in the study of types is a formal characterization of type
security. We have given two theorems about typing in SOL: Expressions may be
evaluated without considering type information, and the syntactic type of an
expression is not affected by reducing the expression to simpler forms. These
theorems imply that types may be ignored when evaluating SOL expressions and
that SOL type checking is sufficient to prevent run-time type errors. The study
of representation independence (mentioned above) leads to another notion of
type security, but further research seems necessary to show that SOL programs
are “type-safe” in other ways.
One interesting aspect of SOL is that it may be derived from quantified
propositional (second-order) logic using the formulas-as-types analogy discussed
in Section 4. Our analysis of abstype demonstrates that the proof rules for
existential formulas in a variety of logical systems all correspond to declaring
and using abstract data types. Thus, the formulas-as-types languages provide a
general framework for studying abstract data types. In particular, the language
derived from first- and second-order logic seems to incorporate specifications
into programs in a very natural way. The semantics and programming properties
of this language seem worth investigating and relating to other studies of data
abstraction based on specification.

APPENDIX. COLLECTED DEFINITION OF SOL


The type expressions of SOL are defined by the following abstract syntax:

where t is any type variable and c is any type constant. (We use two sorts of
variables, type variables r, s, t, . . . and ordinary variables X, y, z, . . . )
A type assignment A is a function from ordinary variables to type expressions.
We use A[x : u] to denote the type assignment A, with A1 (y) = A (y) for y different
from X, and A, (x) = u. The partial functions TypeA, for all type assignments A,
and the operational semantics of SOL are defined as follows:

Constants and Variables


TypeA = T for constant cr of type 7
Typea = A(x)

Functions and Application


Tsea~x:cl(M) = T
TypeA(Xx:u.M) = u + T

TypeA = u + 7, TypeA = u
TypeA = 7
Xx : u.M = Xy : u.[ y/x]M, y not free in M
(Xx : u.M)N =+ [N/x]M,
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 499

Products
Type,(M) = (r, TypeA = 7
Type,.,((M, N)) = (r A T
TypeA = (r A 7
TypeA ( fst M) = u, TypeA (snd M) = T
fst(M, N) =9 M, snd(M, N) + N
Sums
Typea = u
TypeA (inleft M to CTV 7) = u V 7, TypeA (inright A4 to T V a) = 7 V u
TseA(M) = u V 7, Twq,:,1W) = P, Twqr:rl(P) = P
Type, (case M left x : UN right y : r.P end) = p
case M left x: u.N right y: 7.P end
= case M left u: u.[u/x]N right u: ~.[u/y]P end,

provided u is not free in N and u is not free in P.


case(inleft M to CJV T) left x: u.N right y : 7.P end + [M/x]N
case(inright M to u V 7) left x: u.N right y : T.P end + [M/y]P
Polymorphism
TypeA = T
t not free in A(x) for any x free in M
TypeA(Xt.M) = Vt.7
TypeA = Vt.u
‘&wA(W)) = [Tltlu
M.M = Xs.[s/t]M, s not free in Xt.M.
(xt.M)(T} + [T/t]M
Abstract Data Types
Tea W = b/t 10
TypeA(pack TM to 3 t.u) = 3 t.u
TypeA W = 3 t.u, ‘bpeA~,:u] 07 = P
TypeA (abstype s with x: u is M in N) = p ’
provided t is not free in p or the type A(y) of any free y # x occurring in N
abstype t with x: u is M in N = abstype s with y: [s/t]u is M in [ y/x][s/t]N,
provided s is not free in u, and neither s nor y is free in N.
abstype t with x: u is (pack TM to 3 t.u) in N + [M/x][~/tlN.

ACKNOWLEDGMENTS
Thanks to John Guttag and Albert Meyer for helpful discussions. Mitchell thanks
IBM for a graduate fellowship while at MIT, and Plotkin acknowledges the
support of the BP Venture Research Unit.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
500 l J. C. Mitchell and G. D. Plotkin

REFERENCES
1. ARBIB, M. A., AND MANES, E. G. Arrows, Structures, and Functors: The Categorical Imperative.
Academic Press, Orlando, Fla., 1955.
2. BARENDREGT, H. P. The Lambda Calculus: Its Syntax and Semantics. North-Holland, Amster-
dam, The Netherlands, 1984 (revised edition).
3. BRUCE, K. B., AND MEYER, A. A completeness theorem for second-order polymorphic lambda
calculus. In Proceedings of the International Symposium on Semantics of Data Types. Lecture
Notes in Computer Science 173, Springer-Verlag, New York, 1984, pp. 131-144.
4. BRUCE, K. B., MEYER, A. R., AND MITCHELL, J. C. The semantics of second-order lambda
calculus. In Information and Computation (to be published).
5. BURSTALL, R. M., AND GOGUEN, J. Putting theories together to make specifications. In Fifth
International Joint Conference on Artificial Intelligence, 1977, pp. 1045-1958.
6. BURSTALL, R. M., AND GOGUEN, J. An informal introduction to specification using CLEAR.
In The Correctness Problem in Computer Science, Boyer and Moore, Eds. Academic Press,
Orlando, Fla., 1981, pp. 185-213.
7. BURSTALL, R., AND LAMPSON, B. A kernel language for abstract data types and modules. In
Proceedings of International Symposium on Semantics of Data Types. Lecture Notes in Computer
Science 173, Springer-Verlag, New York, 1984, pp. l-50.
8. CONSTABLE, R. L. Programs and types. In 21st IEEE Symposium on Foundations of Computer
Science (Syracuse, N.Y., Oct. 1980). IEEE, New York, 1980, pp. 118-128.
9. CONSTABLE, R. L., ET AL. Implementing Mathematics With The Nuprl Proof Deuelop-
ment System. Graduate Texts in Mathematics, vol. 37, Prentice-Hall, Englewood Cliffs, N.J.,
1986.
10. COQUAND, T. An analysis of Girard’s paradox. In Proceedings of the IEEE Symposium on Logic
in Computer Science (June 1986). IEEE, New York, 1986, pp. 227-236.
11. COQUAND, T., AND HUET, G. The calculus of constructions. Znf. Comput. 76, 2/3 (Feb./Mar.
1988), 95-120.
12. CURRY, H. B., AND FEYS, R. Combinatoty Logic I. North-Holland, Amsterdam, 1958.
13. DEBRUIJN, N. G. A survey of the project Automath. In To H. Z3. Curry: Essays on Com-
binatory Logic, Lambda Calculus and Formalism. Academic Press, Orlando, Fla., 1980, pp.
579-607.
14. DEMERS, A. J., AND DONAHUE, J. E. Data types, parameters and type checking. In 7th ACM
Symposium on Principles of Programming Languages (Las Vegas, Nev., Jan. 28-30, 1980). ACM,
New York, 1980, pp. 12-23.
15. DEMERS, A. J., AND DONAHUE, J. E. ‘Type-completeness’ as a language principle. In 7th ACM
Symposium on Principles of Programming Languages (Las Vegas, Nev., Jan. 28-30, 1980). ACM,
New York, 1980, pp. 234-244.
16. DEMERS, A. J., DONAHUE, J. E., AND SKINNER, G. Data types as values: polymorphism, type-
checking, encapsulation. In 5th ACM Symposium on Principles of Programming Languages
(Tucson, Ariz., Jan. 23-25,1978). ACM, New York, 1978, pp. 23-30.
17. U.S. DEPARTMENT OF DEFENSE Reference Manual for the Ada Programming Language. GPO
008.ooo-00354-8,198O.
18. DONAHUE, J. On the semantics of data type. SIAM J. Comput. 8 (1979), 546-560.
19. FITTING, M. C. Zntuitionistic Logic, Model Theory and Forcing. North-Holland, Amsterdam,
1969.
20. FORTUNE, S., LEIVANT, D., AND O’DONNELL, M. The expressiveness of simple and second
order type structures. J. ACM 30,l (1983), 151-185.
21. GIRARD, J.-Y. Une extension de l’interpretation de Godel i l’analyse, et son application i
l’elimination des coupures dans l’analyse et la theorie des types. In 2nd Scandinavian Logic
Symposium, J. E. Fenstad, Ed. North-Holland, Amsterdam, 1971, pp. 63-92.
22. GIFWRD, J.-Y. Interpretation fonctionelle et elimination des coupures de l’arithmetique d’ordre
superieur. These D’Etat, Univ. Paris VII, Paris, 1972.
23. GORDON, M. J., MILNER, R., AND WADSWORTH, C. P. Edinburgh Lecture Notes in Computer
Science 78, Springer-Verlag, New York, 1979.
24. GRATZER G. Universal Algebra. Van Nostrand, New York, 1968.
25. GUT-TAG, J. V., HOROWITZ, E., AND MUSSER, D. R. Abstract data types and software validation.
Commun. ACM 21,12 (Dec. 1978). 10481064.
ACM Transactions on Programming Languages and Systems, Vol. lo, No. 3, July 1988.
Abstract Types Have Existential Type 501

26. HAYNES, C. T. A theory of data type representation independence. In Proceedings of Znterna-


tional Symposium on Semantics of Data Types. Lecture Notes in Computer Science 173, Springer-
Verlag, New York, 1984, pp. 157-176.
27. HERRLICH, H., AND STRECKER, G. E. Category Theory. Allyn and Bacon, Newton, Mass., 1973.
28. HOOK, J. G. Understanding Russell-A first attempt. In Proceedings of International Sympo-
sium on Semantics of Data Types. Lecture Notes in Computer Science 173, Springer-Verlag, New
York, 1984, pp. 69-85.
29. HOOK, J., AND HOWE, D. Impredicative strong existential equivalent to type: type. Tech. Rep.
TR 86-760, Cornell Univ., Ithaca, N.Y., 1986.
30. HOWARD, W. The formulas-as-types notion of construction. In To H. B. Curry: Essays on
Combinatory hgic, Lambda-Calculus and Formalism. Academic Press, Orlando, Fla., 1980, pp.
479-490.
31. HOWE, D. J. The computational behavior of Girard’s paradox. In IEEE Symposium on Logic in
Computer Science (June 1987). IEEE, New York, 1987, pp. 205-214.
32. KAPUR, D. Towards a theory for abstract data types. Tech. Rep. MIT/LCS/TM-237, MIT,
Cambridge, Mass., 1980.
33. KLEENE, S. C. Realizability: A retrospective survey. In Cambridge Summer School in Mathe-
matical Logic. Lecture Notes in Mathematics 337, Springer-Verlag, New York, 1971, pp. 95-112.
34. KRIPKE, S. A. Semantical analysis of intuitionistic logic I. In Formal Systems and Recursive
Functions. Proceedings of the 8th Logic Colloquium (Oxford, 1963). North-Holland, Amsterdam,
1965, pp. 92-130.
35. LAMBEK, J. From lambda calculus to Cartesian closed categories. In To H. B. Curry: Essays on
Combinatory Logic, Lambda Calculus and Formalism. Academic Press, Orlando, Fla., 1980,
pp. 375-402.
36. LANDIN, P. J. A correspondence between Algol 60 and Church’s Lambda-notation. Commun.
ACM 8,2,3 (Feb.-Mar. 1965), 89-101; 158-165.
37. LANDIN, P. J. The next 700 programming languages. Commun. ACM 9,3 (Mar. 1966), 157-166.
38. LKUCHLI, H. Intuitionistic propositional calculus and definably non-empty terms. J. Symbolic
Logic 30 (1965), 263.
39. LKUCHLI, H. An abstract notion of realizability for which intuitionistic predicate calculus is
complete. In Zntuitionism ana’ Proof Theory: Proceedings of the Summer Conference at Buffalo
N. Y. (1968). North-Holland, Amsterdam, 1970, pp. 227-234.
40. LEIVANT, D. Polymorphic type inference. In Proceedings of the 20th ACM Symposium on
Principles of Programming Languages (Austin, Tex., Jan. 24-26, 1983). ACM, New York, 1983,
pp. 88-98.
41. LISKOV, B., SNYDER, A., ATKINSON, R., AND SCHAFFERT, C. Abstraction mechanism in CLU.
Commun. ACM 20,8 (Aug. 1977), 564-576.
42. LISKOV, B. ET AL. CLU Reference Manual. Lecture Notes in Computer Science 114, Springer-
Verlag, New York, 1981.
43. MAC LANE, S. Categories for the Working Mathematician. Graduate Texts in Mathematics 5,
Springer-Verlag, New York, 1971.
44. MACQUEEN, D. B. Modules for standard ML. In Polymorphism 2,2 (1985), 35 pages. An earlier
version appeared in Proceedings of 1984 ACM Symposium on Lisp and Functional Programming.
45. MACQUEEN, D. B. Using dependent types to express modular structure. In Proceedings of the
13th ACM Symposium on Principles of Programming Languages (St. Petersburg Beach, Fla, Jan.
13-15,1986). ACM, New York, 1986, pp. 277-286.
46. MARTIN-LUF, P. Constructive mathematics and computer programming. Paper presented at
The 6th International Congress for Logic, Methodology and Philosophy of Science. Preprint, Univ.
of Stockholm, Dept. of Mathematics, Stockholm, 1979.
47. MCCRACKEN, N. An investigation of a programming language with a polymorphic type structure.
Ph.D. dissertation, Syracuse Univ., Syracuse, N.Y., 1979.
48. MCCRACKEN, N. The typechecking of programs with implicit type structure. In Proceedings of
International Symposium on Semantics of Data Types. Lecture Notes in Computer Science 173,
1984. Springer-Verlag, New York, pp. 301-316.
49. MEYER, A. R., AND REINHOLD, M. B. Type is not a type. In Proceedings of the 13th ACM
Symposium on Principles of Programming Languages (St. Petersburg Beach, Fla., Jan. 13-15,
1986). ACM, New York, 1986. pp. 287-295.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
502 l J. C. Mitchell and G. D. Plotkin

50. MILNER, R. The standard ML core language. Polymorphism 2, 2 (1985), 28 pages. An earlier
version appeared in Proceedings of 1984 ACM Symposium on Lisp and Functional Programming.
51. MITCHELL, J. C. Semantic models for second-order Lambda calculus. In Proceedings of the
25th IEEE Symposium on Foundations of Computer Science (1984). IEEE, New York, 1984,
pp. 289-299.
52. MITCHELL, J. C. Representation independence and data abstraction. In Proceedings of the
13th ACM Symposium on Principles of Programming Languages (St. Petersburg Beach, Fla.,
Jan. 13-15, 1986). ACM, New York, 1986, pp. 263-276.
53. MITCHELL, J. C. Polymo?phic type inference and containment. Inf. Comput. 76,2/3 (Feb./Mar.
1988), 211-249.
54. MITCHELL, J. C., AND HARPER, R. The essence of ML. In Proceedings of the 15th ACM
Symposium on Principles of Programming Languages (San Diego, Calif., Jan. 13-15,1988). ACM,
New York, 1988, pp. 28-46.
55. MITCHELL, J. C., AND MEYER, A. R. Second-order logical relations. In Log& of Programs.
Lecture Notes in Computer Science 193, Springer-Verlag, New York, 1985, pp. 225-236.
56. MITCHELL, J. C., AND PLOTKIN, G. D. Abstract types have existential types. In Proceedings
of the 12th ACM Symposium on Principles of Programming Languages (New Orleans, La.,
Jan. 14-16, 1985). ACM, New York, 1985, pp. 37-51.
57. MITCHELL, J. G., MAYBERRY, W., AND SWEET, R. Mesa language manual. Tech. Rep. CSL-
79-3, Xerox PARC, Palo Alto, Calif., 1979.
58. MORRIS, J. H. Types are not sets. In 1st ACM Symposium on Principles of Programming
Languuges (Boston, Mass., Oct. l-3, 1973). ACM, New York, 1973, pp. 120-124.
59. O’DONNELL, M. A practical programming theorem which is independent of Peano arithmetic.
In 11th ACM Symposium on the Theory of Computation (Atlanta, Ga., Apr. 30-May 2, 1979).
ACM, New York, 1979, pp. 176-188.
60. PRAWITZ, D. Natural Deduction. Almquist and Wiksell, Stockholm, 1965.
61. PRAWITZ, D. Ideas and results in proof theory. In 2nd Scandinavian Logic Symposium. North-
Holland, Amsterdam, 1971, pp. 235-308.
62. REYNOLDS, J. C. Towards a theory of type structure. In Paris Colloquium on Programming.
Lecture Notes in Computer Science 19, Springer-Verlag, New York, 1974, pp. 408-425.
63. REYNOLDS, J. C. The essence of Algol. In Algorithmic Languages, J. W. de Bakker and J. C.
van Vliet, Eds. IFIP, North-Holland, Amsterdam, 1981, pp. 345-372.
64. REYNOLDS, J. C. Types, abstraction, and parametric polymorphism. In IFIP Congress (Paris,
Sept. 1983).
65. REYNOLDS, J. C. Polymorphism is not set-theoretic. In Proceedings of International Symposium
on Semantics of Data Types. Lecture Notes in Computer Science 173, Springer-Verlag, New York,
1984, pp. 145-156.
66. SHAW, M. (Ed.) ALPHARD: Form and Content. Springer-Verlag, New York, 1981.
67. STATMAN, R. Intuitionistic propositional logic is polynomial-space complete. Theor. Comput.
Sci. 9 (1979), 67-72.
68. STATMAN, R. Number theoretic functions computable by polymorphic programs. In 22nd IEEE
Symposium on Foundations of Computer Science. IEEE, New York, 1981, pp. 279-282.
69. STENLUND, S. Combinators, X-terms and Proof Theory. Reidel, Dordrecht, Holland, 1972.
70. TROELSTRA, A. S. Mathematical Investigation of Zntuitionistic Arithmetic and Analysis. Lecture
Notes in Mathematics 344, Springer-Verlag, New York, 1973.
71. WULF, W. W., LONDON, R., AND SHAW, M. An introduction to the construction and verification
of Alphard programs. IEEE Trans. Softw. Eng. SE-2 (1976), 253-264.

Received June 1986; revised March 1988; accepted March 1988

ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Using Dependent Types to Express Modu|ar Struetr~re

David MacQueen

AT&T Bell Laboratories


Murray Hill, New Jersey 07974

I~treductio~r~
Writing any large program poses difficult problems of organization, In many modern programming
languages these problems are addressed by special linguistic constructs, variously known as modules, packages,
or clusters, which provide for partitioning programs intn manageable components and for securely combining
these components to form complete programs. Some general purpose components are able to take on a life of
their own, being separately compiled and stored in libraries of generic, reusable program units. Usually
modularity constructs also support some form of information hiding, such as "abstract data types." "Pro-
gramming in the large" is concerned with using such constructs to impose structure on large programs, in con-
trast to '~programming in the smatF ', which deals with the detailed implementation of algorithms in terms of
data structurcs and control constructs. Our goal here is to examine some of the proposed linguistic notions
with respect to how they meet the pragmatic requirements of programming in the large.
Originally, linguistic constructs supporting modularity were introduced as a matter of pragmatic
language engineering, in response to a widely perceived need. More recently, the underlying notions have
been anatyzcd in terms of type systems incorporating second-order concepts. Here I use the term "second-
order '~ in the sense of °~second-order" logic, which admits quantification over predicate variables [Pra651.
Similarly, the type systems in question introduce variables ranging over types and allow various forms of
abstraction or '~quantification"' over them.
Historically, these type systems are based on fundamental insights in proof theory, particularly 'the "for-
mulas as types" notion that evolved through the work of Curry and Feys ICF58h Howard [HHow8010 de Bruijn
ldeBS0] and Scott IScoT0I. This notion provided the basis for Martin-L6f's h~rmalizations of constructive logic
as lntuitionistic Type Theory (ITF) IM-L71, M-L74, M-L821, and was utilized by Girard IGir71], who intro-
duced a h)rm of second-order typed lambda calculus as a tool in his proof-theoretic work. The "t2~rmulas as
types" notion, as developed in de Bruijn's AUTOMATH system and Martin-L6f's ITT° is also central to the
"programming logics", PL/CV3 and nu-PRL developed by Constable and his coworkers ICZ84, BC85].
In the programming language area. Reynolds 1Roy74] independently invented a language similar to that
used by Girard, and his version has come to be called the second-order lambda calculus. An extended form of
this language, called SOL, was used by Mitchell and Plotkin IMP851 to give an explanation of abstract data
types. The programming languages ML IGMW78, Mi178] and Russell IBDDS0, I'.too84, DD851 represent two
distinctly different ways of realizing *'polymorphism" by abstraction with respect to types. ML is basically a
restricted form of second-order lambda calculus, while Russell employs the more general notion of ~dependent
types" (Martin-L6f"s general product and sum, defined in §2). The Pebble language of Burstall and Lampson
IBL84, Bur841 also provides dependent types, but in a somewhat purer t~.)rm. Finally. HHuet and Coquand's
Calculus of Constructions is another variant of typed tambala calculus using the general product dependent
type. It also provides a form of metatype (or type of types), called a '~context", that characterizes the struc-
ture of second-order types, thus making it possib e to abstract not only with respect to types, but also with
respect to families of types and type constructors. T h e Calculus of Constructions is an explicit attempt to com-
bine a logic and a programming language in one system.

Permissionm copy wi~.houtfee all or part of this maleria/isgranted public~Uionand dale appear~and notice is giventhat copyingis by
provided that lhe copiesare nol madeor dislribuledlk~rdirect permi~,~ionof ~hc As,~ociationfor ComputingMachinery.To c~py
commercialadvantage, lhc ACM copyrightnoliceand the tillc of the o~herwi~,or t.~ republish, requires a fee andh~rspecificpermission,

© 1986 ACM-0-89791-175°X-I/86-0277 $00.75

277
Among these languages, Russell and Pebble arc distingl~ished by havi~g "reflexive" type systems° mean-
ing that there is a "type of al] types" that is a m,)mber of itself ('FypecT}r;:)~,), Martin-I,Sf's initial version of
ITT tM-L7t] was also reflexive in this sense, but hc abandoned this vcrsion in fav()r of a "'ramified ''l system
with a hierarchy of type univcrscs when Girard's Paradox fGirTl] showed that the reflexive system was incon-
sistent as a constructive logic. In terms of programming languages, the paradox implies at least the existence
of divergent expressions, but it is not yet clear whether mnre serious pathologies might folk)w fr<nn it (see
Meyer and Rcinhotd's paper, this proceedings /MR861). Since types arc simply values belonging to the type
Tyl)e, reflexive type systems tend to obscure the distinction between types and the values they are meant to
describe, and this in turn tends to compile:ate the task of type checking, tt is, on the other hand~ possible to
construct reasonable semantic models %r reflexive typc systems I McC7% Car85t.
The remaining nonreflcxive languages distinguish, at least implicit]y, between individual types and the
universe of types to which they belong and over which type variables range. Howevcro ~he second order
lambda calculus, SOL, and the Calculus of Constructk)ns (despite its "contcxts-) arc -imprcdicativc., '< mean-
ing that there is only eric type universe and it is closed under type cnnstructions like Vt.(rU) and 3t.(r(,') that
involve quantifiers ranging over itself. The reflexive type systems of Russell and Pebble are also impredica-
tire, in perhaps an even stronger sense since type variables can actually take on Type, the universe of types, as
a value. In contrast, the later verskms of ITT and Constable's logics are ramified systems in which quantifica-
lion or abstraction over a type universe at one level produces an element of the next higher level, and they are
therefore predicative.
Our purpose here is not to set out the mathematical nuances of these various languages, but to look at
some of the pragmatic issues that arise when we actually attempt to use such languages as vehicles h)r pro-
gramming in the 1urge. We will begin by discussing some of the consequences of the SOL Type system %r
modular programming. Then in §2 we briefly sketch a ramified (i.e. stratified) system of dependent types
from which we derive a small language called DL, which is a gcneralized and " d e s u g a r e d " vcrsinn of the
extended ML language presented in IMac85L The final seetinn uses DL to illustrate some uf the stylistic
differences between ML and Pebble.

t° S h o r t c o m i n g s o f S O L ' s e×is~entia] lypes


The SOL language JMP85J provides existential types of the form

where ,: is a type variabD and (r(:) is a type expression possibly containing free occurrences of :. Values of
such types are introduced by expressions of the %rm

where P is an expression of type o-(r), These values are intended to model abstract data types, and were
called data al,gebrets in IMP85] and packages in ICW85/: we wilt use the term structure to agree with the termi-
nology of ]Mac851 that we wilt be adopting in tater sections. The type component ~r will be called the wimess
or rg5~resenmtion type of the structure~ Access to the components of a structure is provided by an expression
of the form

abs~ype : width x is M in N : p

which is well-typed assuming M : d t , ~ ( : ) and x:<r(~~) => N : p with the restriction that : does m)t appear free in
p nor in the type of any variable y appearing free in N.
As mentioned in tMP851, and because of the impredicative nature of SOL, these cxistential types arc
ordinary types just like in: and boot, and the structures that are their values arc just ordinary values. This
implies that atl the standard value manipulating constructs such as conditionals and functional abstraction
apply cqually to structures. Thus a parametric module is just an ordinary function of type T-*dt,(r(t), fl.)r
example,
There is a tradeoff for this simplicity, however. Let us consider carefully the consequences of the res-
trictions on the abs{ype expression, Once a structure has been constructed, say
A = rep3~.,,~,/rP

Since 9ertrand Russeil i~r~ed~eed his ~ramified type theory," the word "ramified" has been used in k~gic to
mean "stratified into a sequence of levels', normally an infinite ascending sequence of levels.
z Roughly speaking, a definition of a set is said to be. impredicativc if the set contains members defined wish
ret~renc¢ to the entire set,

278
the type T is essentially forgotten. Although we may locally " o p e n " the structure, as in
abs~ype t with x is A in N

there is absolutely no connection between the bound type variable t and the original representation type "r.
Moreover, we cannot even make a connection between the witness type n a m e s obtained from two different
opcnings of the s a m e structure. For example the types s and t will not agree within the body of
a b s t y p e s with x is A in
a b s t y p e ¢ with y is A in * . .

h~ effect, not only is the form and identity of the representation type hidden, but we are not even allowed to
assume that there is a unique witness type associated with the structure A. The witness type has been made
not only opaque, but hypothetical! This very strong restriction on our access to an abstraction goes beyond
~ommon practice in Xanguage design, since we nnrmalty have some means of referring to an abstract type as a
definite though unrccognizablc type within the scope of its definition. This indefiniteness seems to bc the price
paid for being able to treat the abstract type structure as an ordinary value rather than as a type. (See
tCMS5I, where we use the terms "virtual witness," "abstract witness °~, and " t r a n s p a r e n t witncss" to describe
three possible treatments of the witness type in an existential structure.)
H i e r a r c h i e s ~/" structures. The consequences of SOL's treatment of the witness type become clearer
when wc consider building abstractions in terms of other abstractions. Consider the following definition of a
structure representing a geometric point abstraction.

PointWRT(p) : (mk~oint:(real x real) -~p~


x c o o r d : p -, real,
y j ' o o r d : p ~ real )

Point = 3p.PointWRT(p)

CartesianPoint = repff,,i,, (real x real)


(ink_point = X(x: real, y : r e a l ) . (x, y ),
x c o o r d = Xp : r e a l x real o (f~'t p ) ,
y e o o r d = Xp: real x r e a l . ( s n d p ) )

Now suppose that we want to define a rectangle abstraction that uses C a r t e s i a n P o i n t . We must first open Car-
t e s i a n P o i n t , define the rectangle structure, and then close the rectangle structure with respect to the point type.

R e e t W R T ~R ) = 3 r e c t . { point interp : P o i n t W R T tP ).
m k _ r e c t : p x p ~ reet,
topic:/t: reet ~ p ,
botright :rect --" p )

Recr= 3p. RectWRT(p)

CartesianRect = a b s t y p e p o i n t with P is C a r t e s i a n P o i n t in
r e p ~,,,., p o i n t
rep gectWRT (poim ~ p o i n t x ,point
( p o i n t _ i n t e r p = P~ \
m k _ r e c t = h ( tl : poinb~ br :point) ~ ( br, tl ),
toph,J? = k r :point x po'i~t . (f~'t r ).
botright = h r : point x p o i n t . (snd r) )

If wc (doubly) opcv C a r t e s i a n R e c t we will get a new virtual point typd unrelated to any existing type We had
to incorporate an interpretation o f this point type in the R e e t structure as p o i n t _ i n t e r p to provide the means to
create elements of that type, which in turn allows us to create rectangles,
Now supposc we also define a circle abstractinn based on the same CartesianPoint structure, and we
warn to allow interactions between the two abstractions, such as creating a unit circle centered at the top left-
hand corner of a given rectangle, This requires that the rectangle structure, the circle structure, and any
operations relating them all be defined within the scope of a single opening of the CartesianPoint structure. In
general, we must anticipate all abstractions that use the point structure and might~ possibly interact in terms of
points and define them within a single abstype expression.
It anvears'H
t ' that' when building a collection of interrelated abstractions,,, the lower the level of the abstrac-
tion the wider the scope in which it must be opened. We thus have the traditional disadvantages of block
structured languages where low-level facilities must be given the widest visibility. IFor further details, see the

279
examptes in {~6 of CardeIli and Wegner's tutorial ICW851.)
]nterpreting km:n:p~ Oy)es, "/'he notion of providing operations to interpret a type does tint apply only to
"'abstract" types, It is oRen useful to,impose additional structure on a given type without hiding the identdy.
of that type, For instance, wc might want tn temporarily view in: x boo/ as an ordered set with some special
ordering. To do this wc might define <the structure ]n:BoolOrd as ibllows

OrdSe: = ?3:, (/e ": × : -> b : m / )

hrtYm~l()rd ....repca.~/.n.:( ira" x b:>~:l)


(/~.... X(n~, hi), (n2, t : : ) o .~f ,6~ and b~ ~her~ n , ~ n :
e~seif -db~ or b z) then :~ t ~ n z
e~seb i )
The following related and potentially useful mapping would :take an OrdSe: structure to the corresponding ]exi-
cographic ardering on lists
LexOrd : OrdSe: .-~OrdSe: =
XO : OrdSe:. abs@'pe : wish L is O i~
repo,,:,~.,,,(/is: : )
(h" = .fix ,/'oX(l,m)o ff (null I) t ~ e ~ :rue " " "}

Under the SOL typing rules, there is no way ~o make use of ]ntBoolOrd because wc could never crea~tc any
d e m e n t s to which the ordering operation could be applied. In fact, no structure of *ypc OrdSeI can ever bc
used, because of our inabdity to express values of type :. O f course, this also means that LexOrd is useless.
However, if we had access to the witness types~ then structures like lntBo(:/Ord and mappings like LexOrd
could be quite useful,
There arc various ways of working around these problems within S O L We can, fur instance° delay or
avoid entirely the creation of closed sttructurcs and ins<cad deal separately with types and thcir interpreting
operations, Thus, ZexOrd couM bc rewritten tu have the type Vt, OrdSe:WRT(:)-~OrdgetWRT(listt :) with
OrdSei'WRT(r) -: (Iv : ; x t -~b::::/), However, our preferred solution is to abandon the restrictive SOL rule and
view structures as inhcrcndy "*npcn" or "transparent. °' This is suggested by tthc type rules of ITT, which pro-
vide access to both the wdncss and interpretation components of an existential (i.e. general sum) structure,
intuitively, within the scope of the local declaration
ahstype : wish .r ~s M ~r~ N

we consider : to bc simply an abbrevia~hm or local name for thc witness type of M. O f cuursc, : itself should
not appear in the types of free variables or <)11:the e n t k e expression, because it has only local significance, but
its meaning that is ,:he witness type of M, may, "Abstraction" is then achieved by other means, namely by
real or simulated functional abstraction with respect< a sttructture variable (scc IMac851), which is merely an
"uncurricd" form of the approach ~o data abstraction originally propnscd by Reynolds in tRey74j. When
sttructures are transparenL it is d e a r that they carry a particu!ar type, ttngctther with its intcrprelation; in fact,
it is reasonable to think of structures as imerpreted types rather than a kind of value. Conseq~cntly we also
abandon the hnprcdioafivc twoqeve~ system of SOL and move to a ramified system in which quantified ~ypes
arc objccts of ~cvc~ 2, while Ievcl ~ is occupied by ordinary monomorphic types, structures, and po]ymorphic
functions,

2, A ~a~g~age with ramified degree,dent ty~es

2, ~. Depefider~t types
There are two basic {brms of depe~der~t types, which we wit~ call the gener~d product and the gee~eral
,r~m, The general product, written J l x : A , B ( x ) , i s ~aively intterpretcd as the ordinary Cartesian product of the
family of ~ets {B (x)}~a indexed by A, Le,

IIx:A,B{x) = { l e A - r U B ( x ) I Va~Aof(a)~B(a)}
IdA

It denotes the type of functions that map aa etcmcat ~¢(A into B ( a ) , that< is functions whnsc result type
depends on the argumenL witth B specif~¢ing the dependence. E!ementts of H x : A , B ( x ) are in<<reduced by
~ambda absttraction and etiminatted by fhnction applicattion. In ~:hc degcacrattc case where B is a constant func-
tion, e,,',f, when B(x) is defined by an expression no~ con<raining x free, <the general product reduces to the ordi-
nary bane<ion space A --~B, ~

General produd: types arc also called "i'ndcxcd pr~,Muds"0 *~Cartcsian products," or *'depe~dcr~t {i~cdnn
spaces "' Other mr<arlenE include x :A ~ B ix) /CZg4L x :A .... B (x) JBLg4L and Vx :A.B (x) (from #~e fimnu~as as

280
The general sum, written E x : A . B ( x ) , is intuitively just the disjoint union of 'the family {B(X)}.v~a, i.e.

)2x:AoB(x) = {(a,b)cA×UB(x) laeA &beB(a)}

Etements of the general sum have the form of pairs, where the first element, called the witness or index deter-
mines the type of the second element. Elements of the general sum are constructed by a primitive injection
function

inj: ]la : A o ( B ( a ) - ~ ) ' 2 x : A . B ( x ) )

and they can be analyzed by using the two primitive projection functions
witness : (~2x : A .B (x)) ~ A

o u t : lip : ( Z x :A °B (x)) o B ( w # n e s s p )

Note that the existence of these projection functions (corresponding roughly to Martin-L6f's E operation) make
the general sum an " o p e n " construct° in contrast to the existential type of SOL or the categorical sum (see
IMP851, §2.6). 4 In the degenerate case where B ( x ) is independent of x, the general sum is isomorphic to the
ordinary binary Cartesian product AcrossB. 5
~n the following sections we will snmetimes take the liberty of saying simply " p r o d u c t " or " s u m " when
we mean "general p r o d u c t " and "general s u m . "

2,2° SmMI a n d Jarge types


The stratified type system we will bc working with is basically a simplified version of the type system
described in lCZ841. It has several (in fact infinitely many) levels, though only the first two or three will be
mentioned here. At the bottom of the hierarchy are the small types, contained in the level t type universe
Type~. The small types are generated from the customary primitive types int, boo/ . . . . by closing under
" f i r s t - o r d e r " gencral products and sums (i.e. l l x : A , B ( x ) and X x : A , B ( x ) where A : T y p e l and
Rx:A,B(x):A-+Typet, including their degenerate forms ~ and ×) and perhaps other constructions such as
recursion.r'
Types serves as a type of all small types, but it is not itself a small type. It resides in the ]evel 2
univcrsc of "largc typcs," Type?, which in turn is a "very large t y p e " belonging to the next universe Type3,
and so on. The type universes are cumulativc, so Type 2 also contains all the small types. Typez contains
other large types generated from Type~ using second-order products and sums. For instance, the first-order
products and sums can be viewed as operations 7 and as such they have the following largc type:

llj, Z 1 : lleX:Typelo(X~2Type~)~2Typel :Type?


where ~ - is the degenerate form of Ih ,which has an analogous type in Type3). Note that as elements of a
]argc type m Type?, 1tl and Zj are considcred level I objects even though they do nm belong to the basic type
universe Typel -
The existential and universal types of SOL correspond to the following large types:

Vt.~r~t) ~ I~,t:Type . ¢ ~ ( t ) : T y p e 2

3t.~tt) ~ Z~t:Type~ ,~r~t) :Type2


T h e elements of these large types arc. respcctively, the I first-order~ polymorphic functions and the Z . -
structures, which are the open analogues of SOL's existential structures [we will call them simply " s t r u c t u r e s "
when there is no danger of confusion). Being elements of large types, polymorphic functions and Z2-
structures are also level I objects, i.e. thcy are of the same level as small types. This means that neither
polymorphic functions nor Z2-structurcs can be manipulated as ordinary values (which are level 0 objects).

types isomorphism).
4 A "closed" version of the general sum, analngous n~ SOL's existential type, can be derived from the general
product [Pra65 , but the open version used here and in tTT appears m be an independent primitive notion.
~ . . . . . . ~ ....... ~ . . . . i~} h ~ n ca ed "indexed sums ' dislo nt unions and "dependcm producW' (an t nf~.~r-
innate c ash with the *'general product" terminology) Other notations used include x:A B(x) IBL ] '
3 x : A . B ( x ) (from the formulas as types isomorphism). Is r ~v
Thc simpler forms of type language will not admit variab e,' anging ~ er values and only constant functions B
will be definable. Under thesecircums~anccs the first-order general product and sum always reduce to their de-
generate forms A ~ B and A x B,
7 With e,g,, H x : A , B ( x t = f I ~ ( A ) ( h x : A , B ( x ) L

281
We wilt in fact think of E~.structmcs as a generalizcd lbrm of small type.
The level 2 gestural sum operation E, and its associated primitive operations actually have vc,y ge~crzd
polymorphie ~ypcs:
E;, : l t.~ .4 : Type:~ 0(k -+ ~Type?) ~.~ 'I"ype2 : Type:
inj: : l l > A : T y p e : o l l : ~ B : ( A - + T y p e : ! , i ] , x : A , ( 8 ( x ) - % E : ( A ) ( : ~ ) :Type;

The corresponding ~ypes for witness~ and o~t.~ arc left as exercises, The basic structure cxprcssbn

rep?:,,(:~TP : ~t.<r(:)
translates i~ato the following
i~j2(Type~ )(X::Type~ . (r(:))(,r)(Y) : E:: : Type, , <r(:)
which wc will often abbreviate to if~j2"r P when the polymorphic parameters Types and X:.(r(:) clear from ~he
context. Notc that because of tbc gcncra]ity of E2, we may also create strudures with structures rather than
types as witnesses (or even with polymorphic functions as witnesses, though we won"t pursue this possibility
here). We willexptoit this generality in the languagcdcscribcd in the next section.
The rules for type chucking in this system arc convcntional, consisting of the appropriate generalizations
of 'the usual introduction and elimination rules at each level, t o g d h e r with additional rules to deal with ~.
conversion and definitional equality.

3, A simple PebMe4ike language


We will now describe a fairty simple language which is intended to isolate a useful subset of the rami
fied type system sketched in the previous section. We will call this language DL, just to have a name for it.
DL resembles Pebble in having explicit depcndunt types, but bccausc of its ramificd nature it is closer in spirit
to ML and the module facilities of {Mac85].

3.1, Small types


The base type language of DL will bc a simplified version of that of ML, For simplicity, we omit
recursive types, but add a labeled product to cxprcss types of galuc environments. Type expressions°
represented by the metavariable te.u~, have the following abstract syntax:

:~<rp ::= b<:o: lint!real It~,ar ! :exp x :exp' i {id I::expl ..... i4,::e.~p,,)! :exp-~e.~p' l w#:w.~.~t,~:~:r)
where :~'ar ranges over type variables and svar over structure wariables. ~ The actual small types of DL
correspond to the closed (i.e. variable free) type expressions, and this class is dcnotcd simply by Type {short
for Type1 ).

3°2, Sig~a~ures
The class of signatures is obtained by starting with Types and closing with rcspcct to the E~ operator.
This gives a class of types characterizing the union of small typcs and "~abstraction-free ~ E?-structures (Ye.
those that d() not contain any second-order lambda abstractions), Rather than use {he E2 opera{or directly, we
give a ~ittlc grammar for signatures that covers the cases of imerest:
sig :: -- Type I E s~!ar : sig otexp I E <far : sig ,#ig'

where E is short for E2. Typically, the :exp forming the body of a signature is a labeled product typc spccify-.
ing a collection of named functions and other vatues. Note that if sig is Type in cithcr of thc E h?rms, the
siructure variable is actually a type -variable, so structurc variables ,subsume type variables. Note also that m a
signature such as E s : A . B ( s ) , the structure variable s can appear ill B only as a cornponen~ of a type subcx-
prcssion, It can appear either directly, if A = Type, or else in a sabc'kpression ~'wiiness(.._r.,.)," tk)rmed by
nested application of w~iriess and out and denoting a smaU type.

f;or wi:ne,~s{~'ar) to be proper small type, ,s'vur should be restricted to range over structures wilb Type
Witf~ebge~,

~2
3°3° S*rue~tures
In DL, the term " s t r u c t u r e " is used in a somewhat broader sense than above to match the notion of sig-
nature. ©L Structures may be either small types or nested Z2-structures. As in the case of signatures, we
substitute some syntax for the use of the inj primitive in its full generality, The syntax of structure expres-
sions naturally follows that of signatures, viz.

sexp ::= svar l texp l inj sexp ~,rp l inj sexp sexp'

where svar ranges over structure variables and exp ranges over ordinary value expressions. We will not
further specify exp other than to say that it includes labeled tuples (called bindings in Pebble) expressing ele-
ments of labeled products, and cxpressions of the form " o u t ( . . . s w m . . ) , " formed by nested application of wi~-
hess and o u t and denoting a value of type depending on the signature of s w m

3,4° Func'tors
Wc allow (second-order) lambda-abstraction of structurc exprcssions with respect to structurc variables
to ff)rm functions from structures to structures. Following IMae85], we will call such abstractions.]unctors.
Wc will allow nested abstractions, yielding " c u r r i e d " functors. The type of a functor is a gencral product that
we will call a,fimctor signature.
The abstract syntax of functor signatures and functor expressions is

msig ::= It svar: sig.sig ] J i svar : sigomsig

mexp :: = h sw~r : s i g , s e x p I ~ swfr : s i g . m e x p

where 1t represents It a. The syntax of structure expressions must be extended to encompass functor applica-
tions by adding
sexp ::= m e x p ( s e x p )

The restrictions embodied in the structure and functor syntax a m o u n t to saying that structures cannot
have functors as components, nor can functors have functnrs as arguments. In other words, functors arc res-
trictcd to bc csscntiatly "first-order" mappings over structures. These restrictions are partly a reflection of
certain tentative principles for programming with parametric modules, and partly an attempt to simplify imple-
mentation of the language. Further experience with parametric modules (functors) and their implementation
should help refine our ideas about what restrictions on the full type theory are pragmatically justified.

4. Dependence, a b s t r a c t i o n , a n d signaIure closure


This section considcrs some of the interactions that occur as structures and funetolis are defined in terms
of one another. T h e interactions bctwecn ll-abstraction and hierarchical chains of definitions, particularly
those involving sharing, are particularly subtle and interesting.
The definition of a new structure will frcquently refer to existing ones. setting up various f o r m s of
dependency between the new structurc and the older structures mentioned in its definition /known as its
anteceder~ts). For instance, suppose CPoint ~short for CartesianPoint, perhaps) is an existing structure of signa-
ture Point and we define a new rectangle structure C R e c t in terms of C P o i n t as follows:

R e c t W R T ( P : Point) =
E r e c t : T y p e . ~mk_rect: IP x [Pt - , r e c t .
tophJ? :rect ~ I P .
botright :rect ~ [p t )

CRect : RectWRT (CPoint) =


i n j ( fCPoint] × [ C P o i n t l )
~mk_rect = h(tl. br).(tl_ b r ) . " " ")
Here the dependence of C R e c t on C P o i n t is explicitly indicated by the fact the the n a m e C P o i n t appears free in
the signature of CRect. tn such cases of overt dependency significant use of the dependent structure usually
requires access to the referenced structures as auxiliaries, In this instance the manipulation of rectangles using
C R e c t is very likely to entail the manipulation o f associated points using C P o i n t ,
In other cases the dependency between a structure and one of its antecedents m a y be tacit rather than
overt, as when a structure B is defined in terms of a structure expression strt~(A) but A does not appear in the
signature of B. This gcnerally occurs When A is used for purely internal purposes in t h e implementation of B
and therefore is not relevant to the use of B. T h e structures on which a structure overtly depends, i.e. those

283
referred to in its signature, will be called its supporting structures, or more brieEty, its support.
kf we have an overt dependency, such as

B = ser~(k) : s @ a ( A )
where A :sigA, there are two ways of m a k i n g B serf-sufficient relative to A, both of which have the effect of
ck)sing ¢he signature sigte(A) with respect to A. One method is to abstract with respect to A, thus turrdng B
in:to a functor:

B ~ = XA :S@A oSt@(A) : ]iA :six a .s/g~(A)

whose signature is the l t-closure of sig u with respect to A.


T h e other alternative is to incorporate A as the witness c o m p o n e n t in a E-sm~ctarc with tle as the body,
yielding :the E--closure of s@ls as signature:

B'" = i N k s W ~ ( k ) : E X : s i g : os'ig~(X)
Note that the B " closure is no longer a structure, in order t() get a usable structure we have to apply it to a
structure expression, thus recreating the original situation o f overt dependency, as in

I3' = B ~ ( F ( G ( X ) ) ) : xigB(F(G(X))).

O n the other hand, B *' is truly self-contained, at least so far as A is concerned, and is usable as i~ stands
because it incorporates the necessary supporting structure A within i{seK. ~n ML, A is callcd a sub,~'tructure o f

Now consider what happens when there is a chain of dependencies such as

A = s:ra : gig a

t3 = strg(A) : sigB(A)

C - sgrc(A, /3): sigc(A, B)

and we wish to abstract C with respect to its supporting structures. There are three different ways ~o do this:
(~) full abstraction with respect to all supporting structures:

M k C I = XA : sign • XB : sigu(A ) , strc,(A, B)


:liA :,rig:, ° ~IIB : sigB(A ) , sigc(A, B)
(2) abstraction with respect to B, with a residual dependencc on the fixed A:

M k C 2 = k B : s i g ~ ( A ) , sire(A, B)
: tlB:sig~e(A)o sigc(A, B)

and (3) abstraction of both B and C with respect to A:

M k B = hA : sig A ostrt~(A ) : J[A : sig A osig~(A )

M k C ~ = XA : sig A , sire(A, M k B (A))


: ]lA : siga , sigc(A, M k B (A))

Now suppose that we first E-close B with respect to A, obtaining

B' = inj A (stra(A)) : sigg, = E X : s i g a .si&~(A)

Then abstracting C with respect to B' gives

M k C ' = k B ' : sigt~, , i N B' strc( IB' l , o u t ( B ' ) )


: ltB':sig:r o sig<.( 1B' Io ou~(B'))

if we both E-close C with respecI to B' a n d abstract with respect to B ' we get

M k C ' = XB' : sig,8, , i N B ' (sire( [B' t, otlt( B') ) ) : I IB' : sig:r ,sig c,

where sig( . . . . EB':si&~, , s i g c ( t B ' t , e u t ( B ' ) ) . The rules o f type equMity will insure ihat h)r all structures
S : s i g u , , I M k C ' ( S ) I = S , even though the relation between the a r g u m e n t and result o f M k C ' is not manifest in
its signature.
Note that when B was E-closed to {brm B' the support of C was coalesced into a single st~~.~ct,~fc, which
made it easier to fuIly abstract C with respect to its support. When there are many levels o f s u p i x ~ i n g struc-
tures this efficiency of abstraction becomes a signifieam advantage. On the other h a n d , it b e c a m e impossible

284
to abstract with respect to B' while leaving A fixed, because A had become a component of B.
The final example illustrates the interplay between sharing and abstraction. Suppose structures A, B, C,
and D are related as follows:

A .... s t r a : s i g A

B .... str~(A) : sigB(A )

C = s:rc(A): sigc(A)

D = strD(A, fL C ) : sigl)(A, I3, C )

i.e, D depends on A, B, and C while B and C both depend on A. It" we fully abstract D with respect to its sup-
port wc have

M k D = XA : sig A o XB : sig B , X C : sigc o strf)( A , 13~ C) : I IA : siga , lIB : sigl~ , I]C: sigc o sigl)(A, B , C )

~f, on the other hand, we first Z-close B and C with respect to A and then abstract D with respect to its sup-
port, we get
B' = i N A stratA) : sigle = Z X : siga os i g # ( A )

C' = inj A s i r e ( A ) : sig c, = Z X : s i g a , s i g n ( A )

M k D ' - XB': sig;r . XC' : sign,, str;)( tB' I , o u t ( B ' ) , o u t ( C ' ) )


: I]B':sig;~,o I I C ' : s i g c, °sigD( t B ' ] , out(B'), ou~(C'))--shar~ng 113' I = IC'I

]n the type of M k D ' something new has been added. The way that B and C support the definition of D prob-
ably depends on the fact that B and C share the same support A (think of B and C as rectangles and circles,
and A as points, for example). For M k D this sharing is directly expressed by the signature, but this is not the
case for M k D ' , so a special sharing constraint must be added to the signature.
Two styles of modular programming have been illustrated here. The first, which is favored in Pebble,
expresses dependencies by allowing structure names to appear in the signatures of other structures, and tends
to abstract directly and individually on each supporting structure. The other style is representative of modules
in ML. It involves h)rming Z-closures to capture dependencies and coalesce the support of structures into one
level. In fact, the ML module language goes so far as to require that all signatures be Z-closed, even the
argument and result signatures of functors. There are several other factors involved which indirectly support
this strict closure rule. In particular, ML's "generative" declarations of datatypes and exceptions, and the fact
that structures can contain state, make it necessary to maintain fairly rigid relations between structures. In
addition, Z-closed structures appear to bc more appropriate units for separate compilation and persistent
storage.

5, CencJusions
The main thrust of this work is that a ramified type system with general dependent type constructs is an
effective tool for the analysis and design of programming language type systems, particularly those oriented
toward programming m the large. We have explored some of the design choices that have been raised by
recently proposed languages such as Pebble. SOL, and Standard ML with modules. But many important ques-
tions remain to be answered. For instance, we need to have precise characterizations of the relative strengths
of predicative vs impredicative type systems, and reflexive vs irreflexive systems, it would be desirable to
have a representation independence result analogous to that of Mitchell ]Mit86] for the stratified system used
here. Finally, It appears that the basic polymorphic type system of ML IMi1781 is in fact a ramified system,
and that the system described in §2, rather than the second order lambda calculus, can be viewed as its most
natural generalization.

References
IBC851 J. L. Bates and R. L. Constable. Proofs" as P r o g r a m s , ACM Trans. on Programming Languages
and Systems, 7, 1~ January 1985, pp 113-136.
t BDD80J H. Boehm, A. Demers. and J. Donahue, An inJbrmal description o f Russell, Technical Report TR
80-430, Computer Science Dept.. Cornell Univ.- October 1980.

IBL841 R o M. Burstall and B. Lampsono A kernel language.fi)r abstract data types a n d modules, in Seman-
tics of Data Types, G. Kahn. D. B. MacQueem and G. Plotkin,. eds., LNCS, Vol 173, Springer-
Verlag, Berlin, 1984.

285
IBur84/ R. M. Burstall, Programming w#h modzdes as (s7:ed ./hnctiona/ pros, ramming, in{'] Conf. {)n 5th
Generation Compm:h~g Systems, Tokyo., Nov. t984.
ICargN] L. CardeIIi~ 77~e impredica:ive o,ped X-ealcu&s, u~pubtishcd mam~script, 1985.
ICF581 H. B, Curry and R. Feys~ Combinatory L o g k I, North.-Holland. 1958.
/CH85] T. Coquand and G. Huet, A ca~cubes q/consfructions, ~nR)rmatkm and Control, to appear.
ICM85] L. Cardelli and D. B. MacQucen, Persistence and type abstvwtion, Proceedings of ~he Appin
Workshop on Data Types and Persistence, Aug ~985, to appear.
Icwss! L. Cardelli and P. Wegncr, On unc@r,~'mnding (vpes, data abstraction, and po&moephism, Technical
Report No. CS-85-~4, Brown University, August t985.
!CZ841 R. L. Constable and D. R. Zlatin, The type theo~T (g PL/CK3, ACM Trans. on Programming
Languages and Systems, 6, 1, January 1984, pp. 94-1 t7.
IdeB80] N. G. de gru0n, A survey G['project AUTOMATH, in To H. B. Curry: Essays on CombinaIory
l,ogic, Lambda-Calculus and Formalism, Academic Press, 1980, pp, 579-607.
tDD85] J. Donahue and A. Dcmers, Data )~7)es are Va&es, ACM Traas. on Programming Languages and
Systems, 7, 3, July 1985, pp. 426--445.
tGir71 ] J.-Y. Girard, Une extension de t'intewremtion de G~}de[ ~) ['ano/vse, el son application d
['d/imination &'s coupures &ms :'ona@se el/a :h&:rie des (vpes, in Second Scandinavian Logic Sym-
posium, J. E. Fenstad, Ed., North-Holland, 197t, pp. 63-92.
I H 0o84 ] J. G. Hook, Understanding Russell--o .firs: attempG in Semantics of Data Types, G. Kahn, D. B.
MacQuecn, and G. Ptotkin., Eds., LNCS Vol 173, Springer-Verlag, 1984, pp. 69-85.
I How80/ W. Howard, The./brnTulas-as-types notion of constant:ion, in To H. B. Ct~rry: Essays on Combina-
tory Logic° Lambda-Calculus and Formalism, Academic Press, 1980, pp. 476-490. (written 1969)
IMac85] D. B° MacQtleen, ModulesJ})r Standard ML (Revised)° Polymorphism Newsletter, It, 2, Oct 1985.
IMcC791 N. J. McCracken, An inves'tigason oj" a programming langeeage with a po@morphic t)'pe structure,
Ph.D. Thesis, Computer and Information Science, Syracusc Univ., June 1979.
IM-L7~i P, Martin-L6f, A :heogv q/'typeso unpunished manuseripL Octtober 1971.
{M-L74t P. Martin-LSf, An intuitionistic :heo 0, :71"(ypes: predicative part, L,ogic Colloquium 73, H. Rose and
J. Shepherdson, Eds., North-Holland, 1974, pp. 73-118.
IM-L821 P. Martin-LSf, Constructive mathematics and computer progratnming, in LogiG Methede]ogy and
Phi~osephy of Scfience, VL North-Holland, Amsterdam, ]982, pp, I53-175.
lMR86] A. R. Meyer and M. B. Reinhold, 'Type' is not a type, t3th Annual ACM POPL Symposium, S~.
Petersburg, January 1986.
IM~I_~?81 R. Milner~ A theory (?/'type po@morphism in programming, JCSS, t7,3, Dec 1978, pp. 348-375.
IMit86] J. C. Mitchell, Representation independence and da:aabs#'action, t3th Annual ACM POPL Sympo-
sium, St. Petersburg, January 1986,
tMP85] J. C. Mitchell and G, D. Plotkin, Absmw: types have ~:ristenthd O,pes, 12th ACM Syrup. on Princi-
pics of Programming Languages, New Orleans, Jan. t985, pp. 37-51,
I Rey74j J, C, ReynoMs, Towards a theop3, 4" (vpe structure, in Co|]oq~htm sur ~a Pregraramaflon° Lecture
Notes in Compu~ter Science, Vol~ 19, Springer Verlag, Berlin, 1974, pp, 408--423,
ISco701 D. Scott, Constructive Va:idigy, in Symposinm on Automatic Demonstration, Lecture Notes in
Math., Vol 125, Springcr-Verlag, t970, pp. 237-275.

288
Higher-Order Modules and the Phase Distinction

Robert Harper * John C. Mitchell t Eugenio Moggi $


Carnegie Mellon University St anford University University of Cambridge
Pittsburgh, PA 15213 Stanford, CA 94305 Cambridge CB2 3QG, UK
(on leave from Univ. of Edinburgh)

Abstract garded as parameterized structures or functions from


structures to structures. The types of structures and
In earlier work, we used a typed function calculus, functors are called signatures. The signature of a
XML, with dependent types to analyze several as- structure lists the component names and their t,ypes,
pects of the Standard ML type system. In this pa- while the signature of a functor also includes the types
per, we introduce a refinement of XML with a clear of all parameters. Typically, program units are repre-
compile-time/run-time phase dislinclion, and a di- sented as structures that are linked together by func-
rect compile-time type checking algorithm. The cal- tor application. When two structure pa.ramet,ers of
culus uses a finer separation of types into universes a. functor must share a common substructure, this
than XML and enforces the phase distinction using a is specified using a “sharing” constraint within the
nonstandard equational theory for module and signa- functor parameter list. III Standard ML as currently
ture expressions. While unusual from a type-theoretic implemented, there a.re no functors with funct*or pa-
point of view, the nonsta.ndard equationa. theory iameters. Iii this respect, the current language only
arises naturally from the well-known GrotNhendieck uses “first-order” modules.
construction on an indexed category.
There are two formal analyses of the module
system, one operational and the other a symac-
1 Introduction tic tra.nslation leading to a. denotational semantics.
The structured operational sema.ntics of [HMT87h,
The module system of Standard ML [HMM86] pro- H MT87a, Tof87] includes a computationa. character-
vides a convenient mechanism for factoring ML pro- iza.tion of the type checker. This gives a precise,
grams into separate but interrelated program units. implementation-independent definition of the Stan-
The basic constructs are struciures, which are a. form da.rd ML language that may be used for a variety
of generalized “records” with type, value and struc- of purposes. The second formal analysis is a t,ype-
ture components, and functors, which may be re- theoretic description of ML, which lea.ds to a denota-
*Supported by the Office of Naval Research under con-
tional semantics to the language. The second line
tract N00014-84-K-0415 and by the Defense Advanced Re- of work, beginning with [Mac861 and continued in
search Projects Agency (DOD), ARPA Order No. 5404, moni- [MH88], uses dependent sum types Cx:A.B to explain
tored by the Office of Naval Research under the same contract. structures and dependent function types II2:A.B for
tPartiaIIy supported by an NSF PYI Award, mntchiug
funds from Digital Equipment Corporation, Xerox Corporat,ion functors. In addition t.o providing some insight into
and the Powell Foundation and by NSF grant CCRSS14921. the functional behavior of the module constructs,
isupported by ESPRIT Basic Research Action No. 3003, the XML calculus introduced in [MH88] establishes a
Categorical Logic In Computer Science. fra.mework for studying a class of ML-l&e languages.
Because variants of Standard ML may be considered
as XML theories, the emphasis of this approa.ch is
on properties of Standa.rd ML that rema.iu invariant
under extensions of the la.nguage. In a.ddition, XML
is most naturally defined with higher-order modules,
suggesting a useful extension of Standard ML. How-
Permission to copy without fee ail or part of this matertial is granted pro-
vided that the copies am not made or distributed for direct commercial
ever, some important aspects of Standard ML are not
advantage, the ACM copyright notice and the title of the publication and accurately reflected in the XML analysis.
its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery. To copy otherwise. or to republish,
requires a fee and/or specific permission.
0 1990 ACM 089?91-3434/90/0001/0341 $1 SO 341
Although ML is designed to allow compile-time non-standard formulation of the rules for dependent
type checking, it is not clear how to “statically” types is needed. Rather than restrict the syntax of
type check versions of XML with certain additional structures and functors, as one might initially expect,
type constructors or with higher-order modules. This we adopt non-standard equational axioms that allow
is particularly unfortunate for higher-order modules, us to simplify each structure or funct,nr into separate
since these seem useful in supporting separate com- “compile-time” and “run-time” parts. Referring back
pilation or as an alternative to ML’s “sharing” spec- to the example above, we test whether Fsi(F[int,el])
ifications [BL84, MacSG]. In this paper, we redesign and Fsf(F[inf,ez]) are equal essentially by simplify-
XML so that compile-time type checking is a.n in- ing F to a pair of maps, one compile-time and the
trinsic part of the type-theoretic framework. Since other run-time. This allows us to compute compile-
it is difficult to characterize the difference between time (type) values of these expressions without evai-
compile-time and run-time precisely, we focus on es- uating run-time expressions el or es. This approach
tablishing a phase dislin.cfion,, in the terminology of follows naturally from the development of [Mog89a],
[Car88]. However, to give better intuition, we gen- which defines the category of modules over any suit-
erally refer to these phases as compile-time and run- able indexed category representing a typed language.
lime. The main benefit of our redesign is that type In categorical terms, the category of modules is the
checking becomes decidable, even in the presence of Grothendieck construction on an indexed category,
higher-order functors and arbitrary equational ax- which is proved relatively Cartesian closed when cer-
ioms between “run-time” expressions. tain natural assumptions about the indexed category
The main difficulty with higher-order functors may are satisfied. Our XML calculus is a concrete out-
be illustrated by considering an expression e contain- growth of Moggi’s categorical development, provid-
ing a “functor” variable F which maps type, int pairs ing an explicit lambda notation for the category of
(representing structures) to type, inl pairs. Such an modules.
expression e might occur as the body of a higher-order Like XML, AML may be extended with any typed
functor, with functor parameter F. In type checking constants and corresponding equationa. axioms. In
e, we might encounter a type expression of the form contrast to XML, constants and non-logicad AML a.x-
Fst(F[int,q]), re f erring to the type component of the ioms only affect the “run-time” theory of the language
structure obtained by applying the functor parameter and do not interact with type checking. We show
F to structure [int,el]. Since F is a formal para.m- that XML typing is decidable for any variant of the
eter, we cannot hope to evaluate this type expres- calculus based on any (possibly undecidable) equa-
sion without performing functor application, which tional theory for “run-time” expressions. A similar
we consider a “run-time,” or second phase, operation. development may be carried out using the compu-
However, in type checking e, we might need to decide tational &calculus approach of [Mog89b] in place of
whether two such type expressions, say Fst(F[inf,ei]) equational axioms, but we will not go into t1ra.t in this
and Fst(F[int,eJ), are equa.1. The natural equality to paper.
consider involves deciding whether structure compo- The paper is organized as follows. In Section 2 we
nents ei and es are equal. However, if these are com- introduce the core calculus, XML, which we later ex-
plicated integer expression, perhaps containing recur- tend to include modules. AML is essentially the HML
sive functions, then it is impossible to algorithmically calculus given in [MogSSa] and cIosely related to the
compare two such expressions for equality, While it Core-XML calculus given in [MH88]. In Section 3
is possible to simplify type checking using syntactic we introduce Xgid, the full calculus of higher-order
equality of possibly divergent expressions, this is too modules. We prove that Xztd is a definitiona. exten-
restrictive in practice. sion of a simpler “structures-only” calculus and use
In this paper, we present a typed calculus XML this result to establish decidability and compile-time
which includes both higher-order modules and a clear type checking for the full calculus of modules. Brief
separation into “phases” which correspond intuitively concluding remarks appear in Section 4.
to compile-time a.nd run-time. The new calculus is
at once a refinement and an extension of XML. The
universe structure of XML is refined so that the core 2 Core Calculus
language (i.e., the language without modules) pos-
sesses a natural phase distinction. Then the lan- We begin by giving the definition of the XML core
guage is extended in a systematic way to include de- ca.lculus, XML, which is essentially the calculus HML
pendent types for representing structures and func- of [Mog89a]. This calculus captures many of the es-
tors. In order to preserve the phase distinction a sential features of the ML type system, but omits,

342
for the sake of simplicity, ML’s concrete and ab- Cp as a partial function with finite domain Dam(@)
stract types (which could be modeled using existen- assigning kinds to const,ructor variables and types to
tial types [MPS8]), recursive types (which can be de- term variables.
scribed through a XML theory), and record types. We
also do not consider pattern matching, or computa-
tional aspecls such as side-effects and exceptions. A 2.3 Judgement Forms
promising approach toward integrating these features
There are two classes of judgements in AML, the GOT-
is described in [Mog89b].
malion judgements and the equality judgements. The
formation judgements are used to define the set of
2.1 Syntactic Preliminaries well-formed AML expressions. With the exception of
the kind expressions, there is one formation judge-
There are four basic syntactic classes in XML: ment for each syntactic category. (Every raw kind ex-
kinds,constructors,types and terms, The kinds in- pression is well-formed.) The equality judgements are
clude T, the collection of all monotypes, and are used to axiomatize equivalence of expressions. (There
closed under formation of products and function is no equality judgement for kinds; kind equivalence
spaces. The constructors, which include monotypes is just syntactic identity.) The equality judgements
such as in& and type constructors such as list, are are divided into two classes, the compile-time equa-
elements of kinds. The types of XML, whose elements tions and the run-time equations, reflecting the in-
are terms, include Cartesian products, function spaces tuitive phase distinction: kind a.nd type equivalence
,and polymorphic types. The terms of the calculus are compile-time, term equivalence is run-time. The
correspond to the basic expression forms of ML, but judgment forms of XML are summa.rized in Table 2.
are written in an explicitly-typed syntax, following The metavariable F ranges over formation judge-
[MH88]. It is important to note that our “types” ments, Cc ranges over eyua.lity jndgements, and ,7
correspond roughly to ML’s “type schemes,” the es- ranges over all forms of judgement. We sometimes
sential difference being that we require them to be write Q >> cr to sta.nd for an arbitrary judgement
closed with respect to quantification over all kinds when we wish t,o make t,he context part explicit.
(not just the kind of monotypes) and function spaces.
These additional closure conditions for type schemes
are needed to make the the category of modules for 2.4 Formation Rules
XML relatively Cartesian closed (i.e., closed under for-
mation of dependent products and sums). The syntas of XML IS specified by a set of inference
The organization of XML is a refinement of the rules for deriving form&ion judgements. These re-
type structure of Core-XML[MH88]. The kind T of semble rules in [MHSS, MogSSa] and are essentially
monotypes corresponds directly to the first universe standard. Due to space constraints, they are omit-
171of Core-XML. However, the second universe, Uz, ted from this conference pa.per. We write XML k 7
of Care-XML is separated into distinct collections of to indicate that the formation judgement F is deriv-
kinds and types. For technical reasons, the cumula- able using these rules. The formation rules may be
tivity of the Core-XML universes is replaced by the summarized as follows. The constructors and kinds
explicit “injection” of T into the collection of types, form a simply-typed X-ca.lculus (with product and
written using the keyword set. unit types) with ba.se kind T, and basic constructors
1, x,and-+. The collection of types is built from base
types 1 and set(r), where r is a constructor of kind T,
2.2 syntax. using the type constructors x a.nd 3, and quantifi-
cation over an arbitrary kind. The terms amount to
The syntax of AML raw expressions is given in Ta-
an explicitly-typed presentation of t,he ML core ian-
ble 1. The collection of term variables, ran.ged over by
guage, similar to t,ltat presented in [MHSS]. (The let
Z, and the collection of constructor variables, ranged
construct is omitted since it is definable here.)
over by V, are assumed to be disjoint. The metavari-
able r ranges over the collection of monotypes (con-
structors ‘of kind ‘?). Contexts consist of a sequence 2.5 Equality rules
of declarations of the form v:k and z:cr declaring the
kind or type, respectively, of a constructor or term The rules for deriving equational judgements also re-
variable. In addition to the context-free syntax, we semble rules in [MHSS, Mog89a] a.nd are essentia.lly
require that no variable be declared more than once standard. We write XML k t’ to indicate that an
in a context G so that we may unambiguously regard equation I is derivable in accordance with these rules.

343
k E kind :: = 1 1 T 1 ICI x liz 1 ICI - kz
u E constr ::= Vjll x 14 1 * 1 (Ul,U2) I %(U) I @J:k.u> I u1wt
u E type :: = set(u)1U] x (32151-52 1(Vv:k.u)
e E term :: = x 1 * 1 (el,ez) 1 K,(e) 1 (Ax:u.e) 1 el e2 1 (hv:k.e) I e[tlj
Q E context :: = 0 1 a’, v:k I ip, X:(T

Table 1: XML raw expressions

5%conlezt (9 is a context
cp >> u : k u is a constructor of kind k
@ >> 5 type u is a type
@,>>e:u e is a term of type U

<p >> u1 = u2 k ~1 and ~2 are equal constructors of kind k


+ > 51 = 52 type 51 and CTZare equal types
ip >> el = e2 : u er and e2 a.re equal terms of type schema u

Table 2: X”‘lL judgement forms

The X”‘L equational rules are formulated so as to en- Types The equivalence relation on types includes
sure that if an equational judgement is derivable, t,hen the following axioms expressing the interpretation of
it. is well-formed, meaning that the evident associated the basic ML type constructors
formation judgements are derivable. For the sake of
@ context
convenience we give a brief summary of the equational (1 T=)
rules of XA4L + > set(l) = 1 type

2.5.1 Compile-Time Equality (P>>q:T @!>>7i:T


(x T=)
Cp> set(Tr x ~2) = set(rr) x set(r2) type
Constructors Equivalence of constructor expres-
sions is the standard equivalence of terms in the
simply-typed X-calculus based on the following ax- Qr >> rl : T Cp> r2 : T
(-+T=)
ioms: 0 >> set(71+72) = set(rr) +set(T2) type

2.5.2 Run-Time Equality


Terms There are seven axioms corresponding to
the reduction rules associated with each of the type
Cp> ul : ICI Cp>> u2 : k2 constructors:
(x P) (i = 1,2)
Q > ri((Ul, ~2)) = ‘Eli : ki Q>>e:l
(1 17) @Be=*:1
@ >> u : kl x kz
0 >> el : 51 @ >> e2 : 52
(i = 1,2)
@ > 7ri((el, e2)) = ej : 5i

@ > u1 : kl a, v:kl >> 212: kz


(- P) fb >> e :51 x52
+ >> (Av:kl .u2) u1 = [ul/v]uz : k2
(’ ‘) Cp > (7rl(e),a2(e)) = e : 51 X 52

Q >> u : kl t k2 + >> el : 51 (P,X:(Tl >>e2 : 52


(- ‘7) Q, > (v $J Dom( a))
(Av:kl.uv) = u : kl -k:! (“-+ PI@ > (kUl.ea) el = [el/X]Q : 52

344
Since the constructors and kinds form a simply-
typed X-calculus, it is a routine matter to show
+‘>>e:al-+a2 that equality of well-formed constructors (and, conse-
(-+ 77) <p >> (Xx:ul.ex) = e : (~1-+u2 (x e Do+v quently, types) in XML is decidable. It is then easy to
show that type checking in XML is decidable. This is a
well-known property of the polymorphic la.mbda cal-
(9 >> u : k Q,v:k >> e : u
culus F,, (c.f. [Gir’ll, Gir72, Rey74, BMM89]), which
(’ ‘) Cp> (hv:k.e)[n] = [u/v]e : [u/v]u may be seen as an impredicative extension of the XhgL
calculus.
<p >> e : (Vv:k.u)
(’ n) @ > (hv:k.e[v]) = e : (Vv:k.u) (’ ’ Dam(‘)) Lemma 2.2 There is a straightforward one-pass al-
gorithm which decides, for an arbitrary well-formed
theory 7 and formation judgement 3, whether or not
2.6 Theories -... PL[7] I- 3.
The XML calculus is defined with respect to an ar-
bitrary theory 7 = (a7,d7) consisting of a well- The main technical accomplishment of this paper
formed context cPr and a set AT of run-time equa- is to present a full calculus encompassing the module
tional axioms of the form el = e2 : u with Qc >> ei : u expressions of ML which has a compile-time decidable
derivable for i = 1,2. A theory corresponds to type checking problem.
the programming language notion of standard pre-
lude, and might contain declara.tions such as inl : T
and fiz : Vt:T. set((t -+ t) + t), and a.xioms such
3 Modules Calculus
as expressing the fixed-point property of f;z. For
7 = (G7 ,dl), we write X ML[7] I- J to indicate that 3.1 Overview
the judgement J is derivable in JML, taking the vari- In the XML account of Standard ML modules
ables declared in a’ as basic constructors and terms, [Ma&G, MHS8] ( see also [NPS88, C+SG, Mar841 for
and taking the equa.tions in Cc7 as non-logical axioms. related ideas), a structure is an element of a strol~g
We write X”!‘L[7] Ect J t#o indicate that the judge- snm type of the form Cx:A.B. For example, a struc-
ment ,7 is deriva.ble from theory ‘7 using only the ture with one type and one value component is re-
compile-time equational rules‘arid equational axioms garded as a pair [T, e] of type S = 2:T.u. Although
of 7. Standard ML structures bind names to their compo-
nents, component selection in XML is simplified us-
2.7 Properties of XML ing the projections Fst and Snd. Functors are treated
as elements of dependent function types of the form
We will describe the pha.se distinct/on in XML by sepa- IIz:A.B. For example, a functor mapping structures
rating contexts into sets of “compile-time” and “run- with signature S to structures with the same signa-
time” declarations. If @ is a J4A4Lcontext, we let (PC ture would have type IIs:(Et:T.a).(Ct:T.u). In XML,
be the context obtained by omitting all term vari- functors are therefore written as X-terms mapping
able declarations from Q and let Qr .be the context structures to structures. As discussed in the intro-
obtainecl by eliminating all constructor variable dec- duction, the standard use of dependent types con-
lara.tions from (5,. The following lemma expresses the flicts with compile-time type checking since a type
compile-t,ime t,ype checking property of AntL: expression (which we expect to evalua,te a compile
time) may depend on an arbitrary (possibly run time)
Lemma 2.1 Let 7 be any theory. Tht?follo,wing im-
expression. For example, if F is a functor variable
plications hold:
of- signature S -+ S (where S is as above), then
If x97] l- then XML[@‘I,O] tct Fst(F [int, 31) is a.n irreducible type expression in-
volving a run-time sub-expression.
Cpcontext V, @ context
, ~- , In this section we develop a calculus Xgbd of higher-
Q >> u : k order modules with a phase distinction based on the
categorical analysis of [Mog89a]. We begin with a
I -- *. simpler “structures-only” calculus that is primarily
a >> 61 = (72 tYP( ? I Oc > u1 = u2 type 4 a technical device used in the proofs. The full cal-
@ >> e : u @,@‘>>e:i -- culus of higher-order modules has a standard syntax
+ >> el = e2 : u Qc,Qr > ei : u for dependent strong sums and functions, resembling

345
XML, but a non-sta.ndard equational theory inspired context, define @* to be the AML context obtained by
by the categorical interpretation of program mod- replacing all structure variable de&rations s : [v:k, 01
ules [Mog89a]. The calculus also employs a single by the pair of declarations sc : k and sr : [sc/v]u.
non-standard typing rule for structures that we con-
jecture is not needed for decidable typing, but which Lemma 3.1 Let 7 be a well-formed XML theory.
allows a more generous (and simple) type-checking al-
Xff:[?-] l- fD > [v:k,a] sig i;tT XML[7] I-
gorithm without invalidating the categorical seman-
W,v:k >> u type, and similarly for signature
tics. Although inspired by a ca.tegorical construc-
equality.
tion, we prove our main results directly using only
standard techniques of lambda calculus. The non- Xft[‘ir] l- (P > [u, e] : [v:k,u] i#XML[7] I- @* >>
standard aspects of XEtd calculus are justified by u : k and AML[7J I- a* >> e : [u/v]a, and simi-
showing tha-t this calculus is a definitional extension larly for structure equality.
of the “structures-only” ca.lculus, which itself bears
a straightforward relationship to the core calculus. AZk[I] I- a > a ig XML[7”j I- 0” >> a, for
This definitional extension result is used to prove that any judgement (Y other than of the four forms
Xtid type equivalence is decidable and that the lan- considered in items 1. and 2. above.
guage therefore has a pra.ctical type checking algo-
It is an immediate consequence of this lemma and
rithm.
the decidability of XML type equivalence that X2:
type equivalence is decidable. This will be impor-
3.2 The Calculus of Structures tant for the decidability of type checking in the full
In this section, we extend XML with structures and modules calculus.
signatures. The resulting calculus, Xzt, has a
straightforward phase distinction and forms the ba- 3.3 The Calculus of Modules
sis for the full calculus of modules. We assume we
The relative Cartesian closure of Moggi’s category of
have some set of structure variables that are disjoint
modules implies that higher-order functors are defin-
from the constructor and term va.riables, and use s, s’,
able in X2:. This may seem surprising, since X$t
Sl, . . as metavariables for structure variables. The
is a rather minimal ca.lculus of structures, with noth-
a.dditional synt,a.x of Xz,” is given in Table 3. Note
ing syntactically resembling lambda abstraction over
that contexts are extended to include declarations of
structures. The key idea in understanding this phe-
structure identifiers, but structures are required to
nomenon is to regard all modules as “mixed-phase”
be in “split” form [u, e]. (A variable s is not a struc-
entities, consisting of a compile-time part and a run-
ture and t,here is no need for operations to select the
time part. For basic structures of the form [u, e], the
components of a. structure.)
partitioning is clear: U, a constructor, may be evalu-
The judgement forms of XdWLare extended with two
ated at compile-time, while e, a term, is left until run-
additional formation judgements, and two additional
time . For more complex module expressions such as
equality judgements, summarized in Table 4. The
functors, the separa.tion requires further explanation.
rules for deriving judgements in Afie are obtained by
Consider the signature S = [v:T, set(v)], and let
extending the rules of XhfL (taking contexts now in
F:S + S be a functor. Since this functor lies within
the extended sense) with the obvious rules for struc-
the first-order fragment of XML, we may rely on Stan-
tures in “split” form, in particular the following two
dard ML for intuition. The functor F takes a struc-
rules governing the use of structure variables:
ture of signature S as argument, and returns a struc-
Q context ture, also of signature S. On the face of it, F might
(13El) + > $C k (@(s)= b:W) compute the type .component of the result as a func-
tion of both the type and term component of the ar-
gument. However; no such computation is possible in
0 condext
([I Ed a > sr : [SF/t+
@(s) = [v:k,u]) ML since there are no primitives for building types
from terms. Thus we may regard F as consisting
The notion of t.heory and derivability with respect to of two parts, the compile-time part, which computes
a theory are the same as in X”‘. the type component of the result as a function of the
The ca.lculus of structures may be understood in type component of the argument, and the run-time
terms of a translation into the core calculus, which part, which computes the term component of the re-
amounts to showing that Azk may be interpreted into sult as a function of both the type and term com-
the category of modules of [MogSSa]. For <p a A$! ponent of the argument. (Since we are working in

346
k E kind :: = ...
21 E conslr :: = . . . 1 s‘
u E 2ype :: = ...
E iem ::= ___ 1 sr
i Esig :: = [v:k,o]
M E mod :: = [u,e]
Q E conle22 :: = . . . ) Q, s:S

Table 3: X2: raw expressions

a >> s sig S is a signature


@>>M:S A4 is a structure of signature S

@ >> S1 = S2 sig Sr and S2 are equal signatures


Cp>> Mr = Mz : S Mi and A42 are equal modules of signature S

Table 4: X2,! judgement forms

a typed framework with explicit polymorphism, the Although X$f,” already “ha.9 higher-order mod-
term component may contain type information that ules, the syntax for representing them forces the
depends on the compile-time functor argument,) For user to explicitly decompose every functor into dis-
a more concrete example, suppose I is the identit,y tinct compile-time and run-time parts, even for the
functor Xs:S.s. Separated into compile time and run first-order functors of Standard ML. This is syn-
time parts, I becomes the structure tactically cumbersome. In keeping with the syntas
of Standard ML, and practical programming con-
[AsC:T.sC, AsC:T.~sr:set(sC).sr]
siderations; we will consider a more natural nota-
of signature tion based, on [Ma&G, MH88]. However, our calcu-
lus will nonetheless respect the phase distinction in-
[f:T--+T, Vs’;T. set(sc+fsC)].
herent in representing functors as structures. This
In other words, I may be represented by the structure is achieved by employing a non-standard equational
consisting of the identity constructor on types, and theory t1~a.t; when used during type checking, makes
the polymorphic identity on terms. (A technical side explicit the underlying- “split” interpretation of mod-
comment is that the structure corresponding to I has ule expressions, and hence eliminates apparent phase
more than one signature, as we shall see.) viol&ions. For example, if A is a functor of signa-
With functors represented by structures, functor ture [t:T> set(ini)]-+[t:T, 11, then the type expression
application becomes a form of “structure a.pplica.- u = Fsl(A [in2,3]) is equal, using the non-standard
tion.” In keeping with the above discussion, structure rules, to Fs$(A) int, which is free of run-time subex-
application is computed by applying the first compo- pressions. As a result, if e is a term of type (T, then
nent of the functor to the first component of the ar- t.lie application
gument, and the second component of the functor to
both components of the argument. More precisely, if
[u, e] is a structure of signature [f:k’ - k,Vv’:k’.r’ -
if v’I44 t and [u’, e’] is a structure of signa.ture is type-correct, whereas in the absence of the non-
[v’:k’, 0’1, then the application [u, e] [u’, e’] is defined standard equations this would not be so (assuming
to be the structure [uu’, cue’] of signature [v:k, 01. As 3 # 5 : inl).
we shall see below, the appropriate typing conditions The raw syntax of Xz& is an extension of that of
are satisfied whenever the first. structure is the im- XklL; the extensions are given in Table 5. The judge-
age of a functor under the translation sketched in the ment forms are the same as for AZ,&, and are asiom-
next paragraph. Moreover, both type correctness and a.tized by standard structure and functor rules, as in
equality are preserved under the translation. [MHS8]. The Xgid calculus is parametric in a the-

347
k E kind :: = , . ,
U E constr :: = . / F&(M)
u E type :: =
e E ierm :: = . . . 1 Sad(M)
S Esig :: = [v:k,cr] ] 1 ] (Cs:S&) ] (IIs:Sl.Sz)
M E mod :: = s I [u, 4 I * I WI, Mz) I ri(M) I (Xs:S.M) I Ml MZ
Cp E contezt :: = . . ( Q,s:S

Table 5: XKid raw expressions

ory, defined as in XML (i.e., we do not admit module We begin by giving a tra,nslation _b from raw XKfd
constants, or axioms governing module expressions.) expressions into raw A$& expressions. This transla-
The formation rules of A$$d are essentially the tion is defined by induction on the structure of AEfd
standard rules for dependent strong sums and depen- expressions. Apart from the cases given in Table 7,
dent function types. The equational rules include t,he the translation is defined to commute with the expres-
expected rules for dependent types, together with t,he sion constructors. For the basis we associate with ev-
non-standard rules summarized in Table 6. ery module variable s a constructor variable s‘ and a
Beside the non-standard equational rules (and “or- term variable sr in X 2,“. For convenience in defining
thogonal” to them), there is a.)so a non-standard typ- the tra.nslation we fix a constructor variable v tha.t
ing rules for structures: may occur in expressions of X2:, but not in expres-
sions of X$bd. Signatures of Aztd will be translated
Q >> M : [v:k, o] to X2: signatures of the form [v:k,a]. The transla-
a, v:k > u’ type tion is extended “declaration-wise” to contexts: ab
is obta.ined from (P by replacing declarations of the
@ > Snd M : [Fst M/v]o’
form X:CT by x:gb, a.nd decla.rations of the form s:S
@ > M : [v:k, CT’]
by s:Sb Note that the translation leaves XML expres-
sions fixed; consequently, the translation need not be
The non-standard typing rule is consistent with the
extended to theories.
interpretation in the category of modules [MogSSa],
but (we conjecture that) without it the main propcr- Lemma 3.2 (Substitutivity) The translation -b
ties of X^,aL,, namely the compile-time type checking commutes with substitution.
theorem and the decidability of typing judgements, 1~1 particvlnr if Mb = Lute], then ([M/S]-)b =
would still hold. The reason for ha.ving such rule [u, e/SC, ~3’](-~).
is mainly pra.gmatic: to have a. simple type check-
Theorem 3.3 (! interpretation) Let 7 be a well-
ing algorithm (see Definition 3.9). Moreover, this
additional typing rule captures a. particularly uatu-
formed theory, and let 3 be a $ftd judgement. If
ML[7] t gb.
A$~~[71 t ,7, then Astr
ral property of C-types (once uniqueness of type has
been a.ba.ndoned), namely that a structure M should Conversely, AZ;4 is essentially a sub-calculus of
be identified with its expansion [Fst M, Snd A/r]. A JKtd, differing only in the treatment of structure vari-
typical example of typing judgement derivable by ables. To make this precise, define the embedding -e
the non-standard typing rule is s:[v:t,a] >> s : of Xz,c raw expressions into A,MoLhraw expressions by’
[v:k, [Fst s/z+]. replacing all occurrences of sc by Fit(s), and all oc-
currences of sr by Snd(s).

3.4 Translation of A:$ into At:: Theorem 3.4 (-e interpretation) Let 7 be a
,well-formed theory, and let J’ be a X2,” judgement.
The non-standard equationa. theory used in the def- If A$,?[71 t J, then Azfd[7] t Je.
inition of ,!zkd is justified by proving that ?I:&, is a
definitional extension of X2:, in a sense t,o be made Theorem 3.5 (Definitional extension) Let 7 be
precise below. This definitional extension result will a well-formed theory,
then play an important role in establishing the decid- l For any formation judgement 3 of A$,“, if
ability and compile-time type checking property of A$![71 t 3, then (3e)b is syntactically equal
AML
mod’ to 3, modulo the names of bound vam’ables.

348
Non-standard equational rules for signatures

53 conte2d
(1 >I 0 > 1 = [v:l, l] sig

a, vl:kl >> 01 type Cp,vl:kl, vz:kz >> CT:!type


(’ ‘) Cp>> (Cs:[vl:kl,ul].[vz:ka, [Fst(s)/v&~z]) = [v: k 1 x k 2, [ 8121/ Vl ] Ul x [ KlzI, 7r221/tJ1, w&72] sig

0, vl:kl > ul type @,vl:kl, vz:kz >> (32 type


(’ ‘) + >> (IIs:[v1:kl, ul].[vz:k2, [Fst(s)/v&7-& = [v:kl + kz, (Vvl:kl.ul -+[v w/v&2)] sig

Non-standard equational rules for modules

Q, context
(1 I >> cp 29 * = [*, *] [v:l, l]

@, vl:kl >> VI type (Q,zII:~I, vz:kz > (~2 type


Q >> u1 : kl @ >> el : [UI/VI]UI
@ >> 242: k2 @ >> e:! : [UI, UZ/~I, ~~$72
@ I >) * >> ([w, el], [u2, e2.j) = [( W,UZ), (el,ez)] : [v:b x k 2 , [ XlV /, (~11u1 x [qv, 7r2V/V1) L;2]4

9, vl:kl >> ~1 type @, vl:kl, vz:kz >> (12 type


0 > u : kl x k2 Q >> e : [xlzl/vJvl X[R~U, 7r2u/v1,v4u~
(C El >> <P>> 7rl[u,e] = [ K~U, ale] : [vl :kl, 011

a., vl;kl >> u1 type a’, vl:kl, v2:kz > U-J type
@ > u : kl x k2 Cp>> e : [~F~u/~I]vI x [7r1’11, K~U/VI,I~U~
(C E2 >> Qp> w[u, e] = [wu, me] : [wkz, [7r14+721

a, wl:kl > ‘~1 type @, q:kl, vz:kx >> u2 type


@,q:kl > u : k2 @‘,vl:k~, z:ul > e : [u/v~]u~
Cp>> (Xs:[vl:kl, vl].[Fst s, Snd s/211, z][u, e]) = [ (Xvl:kl.21), Avl:k1.Xx:ul.e) :
[v:kl+ kz, (Vvl:kl.ul -+[.~~v~/v~]u~)]

@, vl:kl > u1 type Q,vl:kl, vq:kz >> u2 type


Q > ul : kl @ > el : [UI/VI]UI
<p >> u : kl--+ k2 + >> e : (Vvl:kl.ul -+[vv~/v~]u~)
(JJE 4 @ 22 [u, e] [ICI, eJ = [u ~1, e[ul] ell : [vz:hz, [141/v11~2]

Table 6: Non-standmd equations

349
expression translalion induction hypotheses

Fs:sz(M) u where Mb = [u,e]


Snd(M) e where Mb = [~,e] -----
I s IsC. dl
i”:“, u] b:;,‘- i;i4~bl
(Cs:S1 .Sz) [=I(& x kp), ([R~v/v]u~ X[KIV, RZV/&, v]u2)] where 5’: = [v:ki, ci]
(rkS1 .S2) [v:( ICI - kz), vsc:t1 .[sC/v]q -+[v sC/v]a2] where Sib = [v&,ai]
* I*. *l
(M1,M2) i(k:u2), (el,e2)1 where M: = [u;, ei]
7TiM [xiu, Tie] where Mb = [u,e]
(Xs:S.M) [(XsC:k.u), (Asc:k.Asr:[sc/v]g.e)J where Sb = [v:Ic, V] and Mb = [u, e]-
_ Ml M2 b17-42, el b21 e21 where Mi = [ui, ei]

Table 7: Translation of X$td into X$,6

l If X$$7l t- @ >> M : S, then the following Theorem 3.8 (Compile-time type checking)
equality judgements are derivable in AEid[7]: Given any well-formed theory ‘T = (a7, A7), the fol-
lowing implications hold:
- +p, >> @(s) = (a(~)‘)~ sig, for all s E
Dam(@), where + 3 a,, s:@(s), P (and If AE$[T] t- then $f$[@,0] I-,t
similarly fort and v in Dam(@))
Cpcontext Cpcontext
- 4 >> S = (Sb)e sig Cp> u type @ > u type
- Q > M = (Mb)” :S @ >> S sig Cp>> S sig
(an.d similarly for the other formation judge- a,>>tb:k @>>u:k
ments.) @>>e:a 0 >> e : u
@>M:S @>M:S
Corollary 3.6 (Conservative extension) Let 7
be an arbitrary well-formed theory. For any A$!
judgement J, Xr;l,J771 I- Je @ qf,Lp-1 I- 3.

3.5 Compile-Time Type Checking for


FLmod
The compile-time equational theory of XE$ and X5:
is det,ermined using a restricted equational proof sys-
tem, defined as follows.
Definitiou 3.7 (Compile-time calculus)
Compile-time provability in XKtd and A$,! is defined 3.6 Decidability of xttd
by disallowing the use of all /I and 97 rules for ierm
The decidability of Xzfd is proved by giving au algo-
eq7liualence, and all 0 and 11 rules for module equiv-
alence, apart from those related to =basic” signatu,r.es rit,hm that “flattens” structures and signatures dur-
fv:k, u]. ing type checking. As a result, checking signature
equivalence is reduced to checking type equivalence
Let us designate the P and 7 axioms for terms of XML in X2:, and this is, a.swe have already argued, decid-
hy Bq, then the full XEtd calculus may be recovered able. The main complication in the algorithm stems
by working in the theory (0,/37), since the p and ?I from the failure of unicity of types. For example, the
a.xioms for modules are derivable in such a theory. structure [int, 31 has both of the inequivalent signa-
It may be easily verified that the variants of Theo- tures [t:T, set(t)] and [i:T, ini]. Our approach is to
rems 3.3, 3.4 and 3.5 obtained by considering compile- compute the “most specific” signature for a structure
time derivability hold. (in the foregoing exa.mple this would be the second)

350
which will always have the form [v:lc, u] where v does Theorem 3.11 (Completeness) Let 7 be cll~y
not occur free in CT. As a notational convenience, well-formed theory. The following implicutions hold:
we will usually omit explicit designation of the non-
occurring variable, and write such signatures in the then TqIJ I- & X$,“[7] tct
form [:rC,cr]. The algorithm defined below takes as
input a raw context G and, for instance, a raw mod- Q >> u type @ >> u - Qb >> ub type
ule expression M of Xgtd and produces one of the @ >> S sig 0 >> S - Qb >> Sb sig
following results: +>>u:k a, >> u - ab >> ub : k
Q?>>e:a b b
0 >> u - @ >> u type
l The context CDband Mb EE[~,e]:[:k,a], meaning Cp >> e - Qb > eb : CT’
that (9 B- M : [%,a] is derivable in X$td. ab >> ub = u’ type
+>M:S Cp>> S - Qb >> [v:k, u] sig
l An error, meaning that @ context is not derivable Q >> M --H @ >> [u, e] : [:k, u’]
in JI~:~ or that 0 >> M : S is not derivable in ipb > IS’ = {u/vJa type
X$td for any S. -l

Definition 3.9 (Type-checking algorithm) The _ If v%fP-l t‘ct then TC$T] t & XzF[?-] kct
type-checking algorithm TC is given by a determin- Cp > ul = ~2 type @ >> ui 4-k Q,6 >> ai type
istic set of inference rules to derive judgements of the Qb > uQ1= 0; type
following form: 0 > S1 = S2 sig 0 >> S; - cpb > Si sig
eb >> S\ = Si sig
input output (9 >> u1 = ug : k <p>> ui - Qb >> ui : k
91 --+) Qb context ab > ~“1 = 11; : k
Q >> el = e2 : u Cp>> u -.Gb > ub type
@ >> ei 4-t ab >> ei : ui
I Qb >> ub = ui type
i@b>> eb, = ei : Ub
Q >> e -++ Qb > eb : u
@>>M1=M2:SI @B-S +
a! >> A4 ---H ab >> Mb : [:k,u]

In the last three cases TC not only computes the

I.-
translation, but also a kind/type/signature. A sample Qb >> [pi, ei] : [:k, ui]
of the inference rules that constitute the algorithm is Gb >> u1 = u2 : k
given in Table 8. Gb 23 u = [ui/v]ui type
Qb > el s e2 : u
TC is parametric in a theory 7, and we write Theorem 3.12 (Decidahility) It
Tq7] for the instance of the algorithm in which is decidable whether a raw type-checking judgemenf
the constants declared in cP7 are regarded as vari- lhs --H rhs is derivable using the inference rules in
ables. More precisely, Q! - Qb context in Tq’T] iff Definition 3.9.
#I, + --H <p7, Ob context in TC.
Corollary 3.13 Given any zoell-formed theory 7,
th.e derivability of formation judgements in Xttd[7]
Theorem 3.10 (Soundness) Let 7 be a well-
is decidable and does not depend on pun-time axioms
formed theo y. ‘The following implications hold:
nor the axiom.s in 7.

If TPI t- then X Em t-ct


Cp + Ob context Cpcontext 4 Conclusion
‘@2-u.--Hb 2-u’ type @ >> u type
42 > S sig Although the relatively stra.ightforward ML-like func-
Cp> S - cpb > Sb sig
tion calculus XML of [MH88] illustrates some impor-
@ >> u + ab >> ub : k @>>u:k
tant properties of ML-like languages, it does uot pro-
0 >> e - ab > eb : u @B-e::’
vide an adequate basis for the design of a. compile-
CP>> M ---H @b >> [u, e] : [:k, u] + > M : [:k, u’] time type checker. Similar problems arise in other
programming language models based on dependent

351
cp> s - cpb> Sb sig
(Q, s:S)
Cp,s:S --+ Qb,s:Sb context (s 4 DomW

+, v:k >> u - ab, v:k >> rb type


([I w) a > [l&u] - ab > [WC, ub] : sig

dp >> u - ab > ub : k Q, >> e - ab >>eb : u


([I 0 @ >> [u, e] - Qb >> [u, e] : [:k, CT]

@ >> M --++ Qb >> [u, e] : [:k, o]


(II El) Q >> Fst(M) - Gb >> u : k

+ > M + Gb >> [u, e] : [:k, a]


([I E2) Q > L&d(M) -++ + b >e: u

0 --+ Qb context
h’T) cp >> s --n ab >> [sc, sr] : [:k, [s”/v]cT] (@b(s) = “:k’al)

+ context - 5Bbcontext
(1 1) a> * - ab > [*,*I: [:lJl

(22 Ei)
Q, >> ri M - Bb >> [riu, xie] : [:ki, gi]

@',s:Sl> M - Gb, s:[v:kl,rrJ >> [u, e] : [:kz, ~23


m I) @ >> (Xs:SpM) --++ Qb >> [(Asc:k~.~~),(AsC:k~.Xsr:[sc/v]a~.e)] :
[:kl-+k2,VsC:k1.[sC/v]u1~~2]

@>>Jv -+ Gb>>[u,e] : [:kl + k7,Vw:kl.al --m2]

Table 8: Type checking algorithm (selected rules)

352
types. To address this pragmatic issue, we have devel- l’arithmetique d’ordre superieur. These
oped an alternate form of the XML calculus in which D’Eta.t, Universit,e Paris VII, 1972.
there is a clear compile-time/run-time distinction.
Essentially, our technique is to add equational ax- [HMM86] R. Harper, D.B. MacQueen, and R. Mil-
ioms that allow us to decompose structures and func- ner. Standard ml. Technical Report
tors into separate compile-time and run-time compo- ECS-LFCS-86-2, Lab. for Foundations
nents. While the phase distinction in XML reduces of Computer Science, University of Edin-
to the syntactic difference between types and their burgh, March 1986.
elements, the general technique seems applica.ble to
other forms of phase distinction. [I-I h!lT87a] R. Harper, R. Milner, and M. Tofte. The
The basis for our development is the “category semantics of standard ML. Technical Re-
of modules” over an indexed category, which is an port ECS-LFCS-87-36, Lab. for Founda-
instance of the Grothedieck construction. General tions of Computer Science, University of
properties of the category of modules are explained Edinburgh, August 1987.
in the companion paper [Mog89a]. In the specific case
[HMT87b] R. Harper, R. Milner, and M Tofte. A
of XML, our non-standard equational axioms lead to
type discipline for program modules. In
a calculus which bears a natural relationship to the
TAPSOFT ‘87, volume 250 of LNCS.
category of modules. In future work, it would be
Springer-Verlag, March 1987.
interesting to explore the exact connection between
our calculus and the categorical construction, and to [Ma.c86] D.B. MacQueen. Using dependent types
develop phase distinctions.for languages whose type to express modular structure. In Proc. 1%
expressions may contain “run-time” suhexpressions th ACM Symp. on Principles of Program-
in more complicated ways. ming Languages, pages 277-286, 1986.

[Mar841 P. Martin-Lof. Intuitionistic Type Theory.


References Bibliopolis, Napoli, 1984.
[BL84] R. Burstall and B. Lampson. A kernel lan- [MH88] J.C. Mitchell and R. Harper. The essence
guage for abstract data types and mod- of ML. In Proc. 15th ACM Sym.p.
ules. In Proc. Int. Symp. on Semuntics on Principles of Programming Languages,
of Data’Types, Sophia-Antipolis (France), pages 28-46, January 1988.
Springer LNCS 173, pages l-50, 1984.
[Mog89a] E. Moggi. A category-theoretic account of
[BM M89] K. B. Bruce, A. R. Meyer, and J. C.
program modules. In Summer Conf. on.
Mitchell. The semantics of second-order
Category Theory and Computer Science,
lambda calculus. Information and Cont-
pages 101-117, 1989.
putation, 1989. (to appear).
[MogSSb] E. Moggi. Computational lambda calcu-
[C+S6] Constable et al. Implementing Mathe-
lus and monads. In Fourth IEEE Symy.
matics with the Nuprl Proof .Development
Logic in Computer Science, pages 14-23,
System, volume 37 of Gradzlafe Texts in
1989.
Mathematics. ‘Prentice-Hall, 1986.
[MPSS] J.C. Mitchell and G.D. Plotkin. Abstract
[Car881 L. Cardelli. Phase, distinctions. in type
types have existential types. ACM Trans.
theory. Manuscript, 1988.
on Programming Languages and Systems,
[Gir71] J.-Y.. 10(3):470-502, 1988. Preliminary ver-
Girard. Une extension de l’interpretation sion appeared in Proc. 12-th ACM Symp.
de Godel & l’analyse, et son application & on Principles of Programming Languages,
l’elimination des coupures’dans l’analyse 1985.
et la thiorie des types, In J.E. Fenstad,
editor, 2nd Scandinavian Logic Sympo- [N PSSS] B. Nordstrom, K. Peterson, and J. Smith.
sium, pages 63-92. NorthlHolland, 1971. Programming in martin-16f’s type theory.
University of Gothenburg / Chalmers In-
[Gir72] J.-Y. Girard. Interpretation fonc- stitue of Technology, Book draft of Mid-
tionelle et elimination des coupures de summer 1988.

353
IRey741 J .C. Reynolds. Towards a theory of
type structure. In Paris Colloq. on
Programming, pages 405-425. Springer-
Verlag LNCS 19, 1974.

[Tof87] M. Tofte. Operational Semanfics and


Polymorphic Type Inference. PhD thesis,
University of Edinburgh, 1987.

354
The mechanical evaluation of expressions
By P. J. Landin
This paper is a contribution to the "theory" of the activity of using computers. It shows how
some forms of expression used in current programming languages can be modelled in Church's
X-notation, and then describes a way of "interpreting" such expressions. This suggests a
method, of analyzing the things computer users write, that applies to many different problem
orientations and to different phases of the activity of using a computer. Also a technique is
introduced by which the various composite information structures involved can be formally
characterized in their essentials, without commitment to specific written or other representations.

Introduction is written explicitly and prefixed to its operand(s), and


The point of departure of this paper is the idea of a each operand (or operand-list) is enclosed in brackets,
machine for evaluating schoolroom sums, such as e.g.
1. (3 + 4)(5 + 6)(7 + 8) /(a,+(x(2,6),3)).
2. if 219 < 3 12 then 12 V2 else 5 V 2 This notation is a sort of standard notation in which
17 cos n/l7
3 If ~ "V^1 ~ 1 7 sin all the expressions in this paper could (with some loss
l7 cos TT/17 + V(l + 17 sin of legibility) be rendered.
The following remarks about applicative structure
Any experienced computer user knows that his will be illustrated by examples in -which an expression is
activity scarcely resembles giving a machine a numerical written in two ways: on the left in some notation whose
expression and waiting for the answer. He is involved applicative structure is being discussed, and on the right
with flow diagrams, with replacement and sequencing, in a form that displays the applicative structure more
with programs, data and jobs, and with input and output.
There are good reasons why current information- explicitly, e.g.
processing systems are ill-adapted to doing sums. a/(2b + 3) /(fl,
Nevertheless, the questions arise: Is there any way of
extending the notion of "sums" so as to serve some of (a + 3)(6 - 4) +
the needs of computer users without all the elaborations (c-5)(d-6)
of using computers? Are there features of "sums" that In both these examples the right-hand version is in the
correspond to such characteristically computerish con- "standard" notation. In most of the illustrations that
cepts as flow diagrams, jobs, output, etc.? follow, the right-hand version will not adhere rigorously
This paper is an introduction to a current attempt to to the standard notation. The particular point illus-
provide affirmative answers to these questions. It trated by each example will be more clearly emphasized
leaves many gaps, gets rather cursory towards the end if irrelevant features of the left-hand version are carried
and, even so, does not take the development very far. over in non-standard form. Thus the applicative
It is hoped that further piecemeal reports, putting right structure of subscripts is illustrated by
these defects, will appear elsewhere.
Oib,, a{j)b(J,k).
Expressions Some familiar expressions have features that offer
Applicative structure several alternative applicative structures, with no
obvious criterion by which to choose between them.
Many symbolic expressions can be characterized by
For example
their "operator/operand" structure. For instance
al(2b + 3) f +(+(+(3, 4), 5), 6)
3+4+5+6 \ +(3, +(4, +(5, 6)))
can be characterized as the expression whose operator [£'(3,4,5,6)
is V and whose two operands are respectively 'a,' and the
expression whose operator is ' + ' and whose two where 2 ' is taken to be a function that operates on a
operands are respectively the expression whose operator list of numbers and produces their sum. Again
is ' x ' and whose two operands are respectively '2' and
'6,' and '3.' Operator/operand structure, or "applica- / t («. 2)
tive" structure, as it will be called here, can be exhibited \ square (a)
more clearly by using a notation in which each operator where t is taken to be exponentiation.
308
Mechanical evaluation
Sometimes the choice may be more material to the for enclosing operands (and operand-lists). However,
meaning. For instance, without background informa- except that we observe correct mating, no formal
tion it is impossible to decide whether or not significance will be attached to differences of bracket
shape. That is to say, the rules for making sense of a
Ay + 1) + n(y - 1) written expression do not rely on them; no information
contains a sub-expression whose operator is multi- would be lost by disregarding the differences in a
plication. We are not concerned here with offering correctly mated expression.
specific rules for answering such questions. What There is another informal device by which we shall
interests us is that in many cases such a rule can be bring out the internal grouping of long expressions,
considered as a rule about applicative structure. namely indentation. For instance, the connection
Using Church's A-notation [1] we can impute applica- between the items of an operand-list, or the two com-
tive structure to some familiar notations that use ponents of an operator/operand combination, will
"private" (or "internal," or "local," or "dummy," or frequently be emphasized by indenting them equally.
"bound") variables, such as the second and third The use of several auxiliary definitions, rather than
occurrence of 'x' in the following: just one, can be rendered in terms of A. For example,
if the definitions are mutually independent, they can be
considered as a "simultaneous," or "parallel" definition
\*x2dx [(0, x, Xx.x2). of several identifiers, e.g.
Similarly u = 2p + q (u,v) = (2p +q,p- 2q).
^o^joiOjjbjk 2"(0, n, ty.a(i,j)b(j, k)) and v = p — 2q
where E" is a triadic function that is analogous to J. So if Church's notation is extended to permit a list of
identifiers between the 'A' and the '.', a group of
Auxiliary definitions mutually independent auxiliary definitions raises no new
issue, e.g.
The use of auxiliary definitions to qualify an expression
can also be rendered in terms of A. E.g. u(u + 1) - v(v + 1) {X(u, v).u(u + 1) - v(v + 1)}
where u = 2p + q [2p + q, p — 2q]
(« - 1)(M + 2) 2)}[7-3]
where u — 1 — 3. and v = p — 2q.
Notice that AK.(M — 1)(« + 2) is a function and hence If the definitions are inter-dependent the correspond-
it is appropriate to write this expression in a context ence is more elaborate. Some examples of this situation
that is more familiarly occupied by an identifier, such will be given below.
as 'sin' or '/,' designating a function. Notice also that When we say that the applicative structure of a
an expression that denotes a function does not necessarily specific piece of algebraic notation is such-and-such, we
occur in such a context; witness some previous examples are providing unique answers to certain questions about
and also it, such as "What is its operator?" "What are its
operands?" Our discussion of specific algebraic nota-
/ (0, TT/2, sin). tions will now be interrupted by a discussion of what
We shall consistently distinguish between "operators" precisely these questions are. That is to say, the next
and "functions" as follows. An operator is a sub- Section is devoted to explaining what is meant by
expression of a (larger) expression appearing in a "applicative structure" rather than to exhibiting the
context that, when written in standard form, would applicative structure of specific notations.
have an operand (or operand-list) to the right of it. A This attempt to characterize applicative structure will
function bears the same relation to an operator as a use a particular technique called here "structure
number, e.g. the fourth non-negative integer, does to a definitions," and used later in the paper to characterize
numerical expression, e.g. (16 — 7)/(5 — 2). The "value" other sorts of structure. The next Section but one
of this expression is a number; similarly we shall speak explains this technique. After these two Sections, the
of the "value" of an expression that can occur as an discussion of the applicative structure of specific
operator. Just as the value of an expression that occurs notations will be resumed.
as an operand combined with ' V ' must, to make sense,
be a number, so the value of an expression that occurs Applicative expressions
as an operator must be a function. However, any The expressions in this paper are constructed out of
expression that can occur sensibly as an operator can certain basic components which are, for our purposes,
also occur sensibly as an operand. "atomic"; i.e. their internal structure (if any) does not
Applicative structure can be indicated unambiguously concern us. They comprise single- and multi-character
by brackets. Legibility is improved by using a variety constants and variables, including decimal numbers.
of bracket shapes. In particular we shall tend to use All these will be called identifiers. There will be no need
braces for enclosing (long) operators and square brackets for a more precise characterization of identifiers.
309
Mechanical evaluation
By a X-expression we mean, provisionally, an expression of a segment of a vector by
characterized by two parts: its bound variable part, »«.
written between the 'A' and the ' . ' ; and its X-body,
written after the '.'. (A more precise characterization Themostdirect rendering of this as anAEis somethinglike
appears below.) sum (v(r), s).
Some of the right-hand versions appearing above However, this is not a semantically acceptable cor-
contain a A-expression. Some of those below contain respondence since it wrongly implies dependence on
several A-expressions, sometimes one inside another. only one element of v, namely vr. The same criterion
This paper shows that many expressions can be con- prevents A from being considered as an operator, in
sidered as constructed out of identifiers in three ways: our sense of that word; more precisely it rules that
by forming A-expressions, by forming operator/operand
combinations, and by forming lists of expressions. Of X(x, x2 + 1)
these three ways of constructing composite expressions, incorrectly exhibits the applicative structure of
the first two are called "functional abstraction" and 'Ax.x2 + 1.'
"functional application," respectively. We shall show We are interested in finding semantically acceptable
below that the third way can be considered as a special correspondences that enable a large piece of mathe-
case of functional application and so, in so far as our matical symbolism (with supporting narrative) to be
discussion refers to functional application, it implicitly rendered by a single AE.
refers also to this special case.
We are, therefore, interested in a class of expressions Structure definitions
about any one of which it is appropriate to ask the
following questions: AEs are a particular sort of composite information
structure. Lists are another sort of composite informa-
Ql. Is it an identifier? If so, what identifier? tion structure. Seyeral others will be used below, and
Q2. Is it a A-expression? If so, what identifier or they will be explained in a fairly uniform way, each sort
identifiers constitute its bound variable part and being characterized by a "structure definition." A
in what arrangement? Also what is the expression structure definition specifies a class of composite informa-
constituting its A-body? tion structures, or constructed objects (COs) as they will
be called in future. It does this by indicating how many
Q3. Is it an operator/operand combination? If so> components each member of the class has and what sort
what is the expression constituting its operator? of object is appropriate in each position; or, if there
Also what is the expression constituting its are several alternative formats, it gives this information
operand? in the case of each alternative. A structure definition
We call these expressions applicative expressions (AEs). also specifies the identifiers that will be used to designate
Later the notion of the "value of" (or "meaning of," various operations on members of the class, namely
or "unique thing denoted by") an AE will be given a some or all of the following:
formal definition that is consistent with our correspond- (a) predicates for testing which of the various
ence between AEs and less formal notations. We shall alternative formats (if there are
find that, roughly speaking, an AE denotes something alternatives) is possessed by a given
as long as we know the value of each identifier that CO;
occurs free in it, and provided also that the expression (b) selectors for selecting the various components
does not associate any argument with a function that of a given CO once its format is
is not applicable to it. In particular, for a combination known;
to denote something, its operator must denote a function (c) constructors for constructing a CO of given
that is applicable to the value of its operand. On the format from given components.
other hand, any A-expression denotes a function;
roughly speaking, its domain (which might have few The questions Ql to Q3 above comprise the main part
members, or even none) contains anything that makes of the structure definition for AEs. What they do not
sense when substituted for occurrences of its bound convey is the particular identifiers to be used to designate
variable throughout its body. the predicates, selectors and constructors. Future
Given a mathematical notation it is a trivial matter structure definitions in this paper will be laid out in
to find a correspondence between it and AEs. It is less roughly the following way:
trivial to discover one in which the intuitive meaning An AE is either
of the notation corresponds to the value of AEs in the an identifier,
sense just given. A correspondence that meets this
condition might be called a "semantically acceptable" or a X-expression (Xexp) and has a bound variable (by)
correspondence. For instance, someone might con- which is an identifier or
ceivably denote the sum identifier-list,
and a X-body (body)
vr+vr+ . . . +vr+s
+s_1
_ which is an AE,
310
Mechanical evaluation
or a combination and has an operator (rator) as an explicit definition with a A-expression for its right-
which is an AE, hand side, e.g.
and an operand (rand)
which is an AE. f(y) = /=
This is intended to indicate that 'identifier,' 'A- So an expression using an auxiliary function definition
expressiori and 'combination1 (and also the abbreviations can be rendered by using two A-expressions, one for its
written after them if any) designate the predicates, and operator and one for its operand, e.g.
'bv,' 'body,' 'rator,' 'rand' (mentioning here the + f(A)}[Xy.y(y + 1)].
abbreviated forms) designate the selectors. We consider
a predicate to be a function whose result for any suitable where f(y) = y(y + 1)
argument is a "truth-value," i.e. either true or false. A group of auxiliary definitions may include both
For instance, if A' is a A-expression, then the predicate numerical and functional definitions, e.g.
Xexp applied to X yields true, whereas identifier yields
false; i.e. the following equations hold: f(a +b,a-b) + {X(a,b,f).f(a + b,a-b)
f(a -b,a+b) f(a -b,a + b)}
Xexp X = true
where a = 33 [33, 44, A(w, v). uv(u + v)].
identifier X = false.
and b = 44
(It will be observed that, by considering predicates as
functions, we are led into a slight conflict with the and /(M, V) = uv(u + v)
normal use of the word "apply." For instance, in When a A-expression is written as a sub-expression of
normal use it might be said that the predicate even a larger expression, the question may arise: how far to
"applies t o " the number six, and "does not apply t o " the right does its body extend? This question can
the number seven. We must here avoid this turn of always be evaded by using enough brackets, e.g.
phrase and say instead that even "holds for," or "yields
true when applied to," six; and "does not hold for," or (X(u,v).(uv(u+v))).
"yields false when applied to," seven.) However, to economize in brackets, we adopt the
The constructors will not usually be named explicitly. convention that it extends as far as is compatible with
Instead we shall use obviously suggestive identifiers the brackets, except that it is stopped by a comma.
such as 'constructXexp.' E.g. the following equations Another way of saying this is that the "binding power"
hold: of the ' . ' is less than that of"functional application",
multiplication and all the written operators such as
identifier (constructXexp (J, X)) = false
' + . ' 7.' etc-> D U t exceeds that of the comma. For
Xexp (constructXexp (J, X)) = true
example:
bv (constructXexp (J, X)) =J
constructXexp (bv X, body X) =X Asia)) + g(f(b))
and many others. (More precisely, each of these where f(z) = z 2 + 1 [Xz.z2 + 1, A z . z 2 - 1].
equations holds provided J and X are such as to make and g(z) = z 2 — 1
both sides meaningful. Thus the first three hold
provided J is an identifier or list of identifiers and X is An identifier may occur in the bound variable part of
an AE. Again, the last holds provided I is a a A-expression (either constituting the entire bound
A-expression.) variable, or as one item of it). Apart from this, every
A structure definition can also be written more written occurrence of AE is in one of the following four
formally, as a definition with a left-hand side and a sorts of context:
right-hand side. The left-hand side consists of all the
identifiers to which the structure definition gives mean- (a) It is the A-body of some A-expression.
ing. The right-hand side is an AE containing references (b) It is the operator of some combination.
to the component-classes involved (e.g. some class of (c) It is the operand of some combination.
character-strings, in the case of AEs that are identifiers) (d) It is a "complete" AE occurring in a context of
and also to one or more of a small number of functions English narrative, or other non-AE.
concerned with classes of COs. However, in this Each of the three formats of AE can appropriately
paper we shall not formalize the notion of structure appear in any of the four sorts of contexts. We have
definitions, and shall write any we need in the style already seen that A-expressions, like identifiers, can
illustrated above. appropriately occur both as operators and as operands.
Below we shall find combinations appearing as operators,
Function definitions and A-expressions appearing as A-bodies. These last
In ordinary use, definitions frequently give a functional, two possibilities are both associated with the possibility
rather than numerical, meaning to the definiendum by that a function might produce a function as its result.
using a dummy argument variable. This can be rendered Together with more obviously acceptable possibilities,
311
Mechanical evaluation
they almost complete the full range of ways in which a Lists
particular sort of AE can appear in a particular sort of In an earlier Section we gave a structure definition for
context. (The one remaining case is that of an identifier AEs that made no explicit provision for lists of operands.
occurring as a A-body, which occurs in a later example.) Our illustrations have begged this issue by using dyadic
The right-hand side of an auxiliary definition might and triadic functions. It will turn out below that
itself be qualified by an auxiliary definition, e.g. discussion of the evaluation of AEs can be simplified if
{AM. «/(« we can avoid classifying operands into "single operands"
«/(« + 5) and "operand-lists," and avoid classifying functions
where u == a(a +1) [{Xa .a(a - 3]]. into those that take one argument and those that take
where a = 7 - 3 several. We now show how this is done.
In particular this might happen with an auxiliary Lists can be characterized by a structure definition as
definition of a function, e.g. follows:
A list is either null
or else has a head (h)
where f(x) = ax(a + x) [{Xa.Xx.ax(a x)} and a tail (t) which is a list.
where a — 7 — 3 [7 - 3]].
A null-list has length zero. A non-null-list has length
This last example contains a A-expression whose body one or more; if its items are au a2, • • ., ak, (fc > 1),
is another A-expression. Notice that such a A-expression then its head is ax and its tail is the (null or non-null)
describes a "function-producing" function and hence list whose k — 1 items are a1,...,ak_x and ak.
can meaningfully give rise to a combination whose So we let
operator is a combination, e.g.
1st = ft
{{Xa.Xx.ax(a + 2nd L = h(tL)
We shall slightly abbreviate such expressions by 3rd L = h(t(tL)), etc.
omitting brackets round an operator that is a com- defining the functions 1st, 2nd, etc., in terms of the
bination, i.e. selectors h and t. So the "items" of a list are the things
{Xa. Xx. ax{a + x)}[7 - 3][3]. that result from applying 1st, 2nd, etc., to it.
On the lines mentioned earlier, the two identifiers
This amounts to an "association to the left" rule. We constructnullist and constructlist designate constructors
also abbreviate by omitting brackets round a single for lists, taking respectively zero and two arguments.
identifier, So the following equations hold:
{Xa.Xx.ax(a + x)}[l - 3]3. null{constructnullist{)) = true
Similarly we may write 'fa + / 3 + Dfb" for null(constructlist(x, L)) — false
'/(«) + / ( 3 ) + {D(f)}[b],' and rely on context to h(constructlist(x, LJ) = x
distinguish between ' / applied to a' and ' / times a' constructlistQiL, tL) = L
(as indeed we also do when writing 'f(a + 1)')- and several more.
Since we shall use multicharacter identifiers (excluding We shall not distinguish between lists in this sense
spaces), this abbreviation means that the reader will and the argument lists of dyadic and triadic functions.
sometimes be obliged to use his intelligence, together That is to say, we consider a triadic function to be a
with the context, to decide whether, e.g. function whose arguments are limited to lists of length
prefix x nullist three. So an operator denoting a triadic function is not
necessarily prefixed to an operand-list of three items;
is to be read as e.g. if L is a list of two numbers, the following expression
{prefix[x]}[nullist] is acceptable:
or {X{x, y, z).x + y + z}[constructlist(3, L)].
{prefi[x][x]}[nul[list]]
or many other conceivable alternatives. Generally we We use nullist to designate a list of length zero, and
shall use spaces wherever they are helpful without being consider an empty bracket pair as merely an alternative
ungainly in appearance. way of writing nullist. Also we consider commas as
We now turn to three forms of expression that play merely an alternative way of writing a particular sort
an important role in programming languages, namely of combination, which we now explain.
lists (in particular argument-lists), conditional expressions Associated with any object x there is a function that
and recursive definitions. The next three Sections are transforms any given list into a list of length one more,
devoted to showing how these can be rendered as by adding x at the beginning of it. We denote this
operator/operand combinations using certain basic function by
functions. prefixix).
312
Mechanical evaluation
So if L is a list whose k items are au a2, • • •, ak, then item from a list which is not referred to elsewhere. So,
we consider 'if to be an identifier designating a function-
prefix(x)L producing function such that
denotes a list whose k + 1 items are x, au . . .,ak. The
function prefix is function-producing and so gives rise //(true) = 1st
to combinations whose operators are combinations. It //(false) = 2nd.
can be denned in terms of constructlist as follows: Then (A) is equivalent to the following AE:
prefix{x) = \L.constructlist(x, L).
if (a < b)(a\ V). (Al)
By a natural extension of the notation for function
definitions this can also be written This rendering is not, however, adequate. For it
would match
prefix{x){L) = constructlist(x, L).
if a = 0 then 1 else I/a (B)
The following examples illustrate the applicative
structure we are now imposing on operand-lists of by
length two or more, and of length zero: (Bl)
f(a, b, c) /{prefix a(prefix b(prefix c niillist))) But the value of this expression, i.e. to be more explicit,
a+ b -\-{prefix a(prefix b(nullist))) of
constructnullistQ constructnullist(nullist). if (a = Q)(prefix l(prefix(l/a)( ))) (Bl')
Notice that while it is meaningful to ask whether a depends on the value of the sub-expression 'I/a,' and
function is dyadic (i.e. has arguments restricted to lists hence only exists if I/a exists. So (Bl) is not an
of length two), there is no significance to asking whether acceptable rendering of (B) if a is zero and division by
a function is monadic since any function may be denoted zero is undefined. More generally, this method of
in combination with a single operand rather than a list rendering conditional expressions as AEs does not meet
of operand expressions. our criterion of semantic acceptability unless the domain
For the rare cases in which we wish to refer to a list of every function is artificially extended to contain any
with just one item, we use the function defined as argument that might conceivably arise on either
follows: "branch" of a conditional expression. We now present
unitlist{x) = prefix x nullist. another method that avoids any such commitment.
Consider instead the following alternative
We shall use the following abbreviation for 'prefix x
V: i/(a = O)(Ax.l, (B2)
x:L.
So, e.g. where 'x' is an arbitrarily chosen variable and '3' is
an arbitrarily chosen operand. Unlike (Bl), (B2) has
x,y,z = x:(y,z) = -x:(y : unitlistz) = x :{y : (z : ())). a value even if a = 0; for, Xx.l/a denotes a function
even if a = 0 (albeit with null domain—this is in
We shall treat':' as more "binding" than ',', e.g. accordance with our view of the "value" of an expression,
2nd(2nd(L, x:M, N)) = 1st M. as introduced informally in a previous Section and
formalized in a subsequent one). So (B2) is precisely
The last example refers to a list whose items include equivalent to (B) in the sense that either they are
a list. We admit this possibility and write, e.g. equivalent or they are both without value.
(a, b),(c,(),e), unitlistf. The arbitrary V and '3' in (B2) can be obviated.*
For the bv of a A-expression can be a list of identifiers,
In what follows, a list whose items include lists (i.e. a and in particular a list whose length is zero. Such a
list which has items that are amenable to null, 1st, 2nd, A-expression is applicable to an argument list of the
etc.) will be called a "list-structure." same length. This suggests that all conditional expres-
sions can be rendered in a uniform way as follows:
Conditional expressions if a < b then a1 else b1 if {a < 6)(A().a7,
We now show how AEs provide a match for con- if a = 0 then 1 else I/a if (a = 0)(A( ). 1, A( ). \/a)().
ditional expressions, e.g. for
if a < b then a1 else b1. (A) Recursive definitions
This expression somewhat resembles The use of self-referential, or "circular," or what
have come to be called in the computer world,
/th(a7, b1) "recursive" definitions can also be rendered in operator/
where / is a computed index number, used to select an * The device given here was suggested by W. H. Burge.
313
Mechanical evaluation
operand terms. By a circular definition we mean an ditional expressions, the existence' of /(0) does not
implicit definition having the form involve the existence of/(—I). Notice also that Y
may produce a function, and hence gives rise to com-
binations whose operators are combinations, e.g.
i.e. a definition of x in which x is referred to one or {YXf. Xn.ifn = 0 then 1 else nfin - l)}6
more times in the definiens. For example, suppose 'M'
designates a list-structure, then is a meaningful combination. In fact its value is 720.
This device can also be used for a group of "jointly
(a, M, (b, c)) circular" or "simultaneously recursive" definitions, e.g.
denotes a list-structure whose second item is the list- fx = F[f, g, x] if, g) = YXtf, g)• (*x.F[f, g, x],
structure M. The equation and gx = G[f, g, x] Xx. G[f, g, x]).
L = (a, L, (b, c)) So the fixed-point of a function might be a list of
is satisfied by the "infinite" list-structure containing functions. This gives rise to the possibility that a
three items, of which the first is a, the third is (b, c) and dyadic function might appear with what looks like one,
the second is the infinite list-structure whose first item rather than two, arguments, e.g. when the above
is a, and whose third item is (b, c) and whose second, jointly circular functions appear in an auxiliary
. . . and so on. definition:
So the above equation may be considered as a circular figa) + gifb) {Xif, g) .figa) + gifb)}
definition that associates this "infinite" list-structure where fx = Fif, g, x) [FA(/, g). (Ax .Ftf, g, x),
with the identifier 'Z,.' and gx = Gif g, x) Xx. G(f, g, *))].
Again
Notice that the circularity is explicitly indicated in
fin) = if n = 0 then 1 else nfin — 1) the right-hand version, whereas the left-hand version is
i.e. only recognizable as circular by virtue of our comments
/ = An.if n = 0 then 1 else nf(n — 1) about it or by common sense. In the next Section we
shall extend our hitherto informal use of where so as to
may be considered as a circular definition of the factorial
provide a match for any use of A.
function. (In this brief discussion the important
question of whether each circular definition characterizes
a unique object will be skipped.) The difference between structure and written representation
Making use of A, any circular definition can be Our notation for AEs is deliberately loose. There
rearranged so that there is just one self-referential are many ways in which we can write the same AE,
occurrence, and moreover so that the single occurrence differing in layout, use of brackets and use of infixed
constitutes the operand of the definiens, e.g. as opposed to prefixed operators. However, they are
all written representations of the same AE, in the sense
L = (a, L, (b, c)) L = {XL'.(a, V, [b, c))}L that the information elicited by the questions Ql, Q2
fin) = if n = 0 then 1 / = {Xf. An . if n = 0 then 1 and Q3 above are the same in each case. This is the
else nfin - 1) else nfin - 1)}/. essential information that characterizes the AE. We
Notice that, had we used 'V and ' / ' instead of 'L" call this information the "structure" of the AE. Our
and ' / " they would still have been bound and so would laxity with written representations is based on the
not have constituted self-referential occurrences. knowledge that any expression we write could, at the
A circular definition of the form cost of legibility, have been writter ; n standard form,
with exclusively prefixed operators•<.J every bracket in
x = Fx place.
One of the syntactic liberties that we shall take is to
(such as the last two above) characterizes an object as
use where instead of A. More precisely, we shall use an
being invariant when transformed by the function F,
i.e. as the "fixed-point" of F. If we use ' Y' to designate expression of the form
the function of finding the fixed-point of a given function, L where X = M
such a circular definition can be rearranged so that it is as a "syntactic variant" of
formally no longer circular:
{XX.L}[M]
x = YF. even in cases that go rather further than the familiar
Thus the above examples become use of where, e.g.
L = (a, L, (b, c)) L = YXL.{a, L, (b, c)) n2+3n + 2 {Xn.n2 + 3n + 2}\n + 1]
where n = n -\- 1
fin) = if n = 0 then 1 / = FA/. An.if n = 0 then 1
else nfin — 1) else nfin — 1). xy{x + y) {Xy .{Xx. xyix + y)}
where x = a2 + ay/y [a2 + a-\/y]}
Notice that, according to the above treatment of con- where y = a2 + b2 [a2 + b2].
314
Mechanical evaluation
We use indentation to indicate that the where qualifies The formalization of these rules, and in particular their
a sub-expression, e.g. in each of the following examples formalization as AEs, is arother topic that is outside
'y' occurs both bound and free: the scope of this paper.
xy(x + y) {\(x,y).xy(x+y)} The power of applicative expressions
where x — a2 + aV [a2 + ay/y, a2 + b2]
2 2 We have described how certain expressions can be
and y = a + b
considered as being constructed from "known" identi-
xy(x+y) {Xx.xy(x+y)} fiers, or "constants," by means of functional application
where x = a2 + a^y [{Xy.a2+a^y}[a2+b2]]. and functional abstraction. We might look at the
where y = a2 + b2 situation another way round and consider how many
The where notation can be extended to allow for expressions can be constructed, starting with a given
circular definitions and jointly circular definitions, thus selection of constants, using these same means of
formalizing a feature of auxiliary definitions that has construction. More precisely, we might compare
previously required verbal comment. An occurrence of working within such constraints to working within some
'Y' is indicated by the word recursive or, more shortly, other set of constraints, e.g. some algebraic program-
rec. ming language or machine code, or system of formal
logic. It transpires that the seven objects, null, h, t,
/ ( - 3) { / / ( - 3)} nullist, prefix, if and Y provide a basis for describing a
where rec/(«) = [yA/.An.if n = 0 t h e n l wide range of things.
if n = 0 then 1 else nf(n — 1)]. Roughly speaking, when taken together with func-
else nf(n — 1) tional application and functional abstraction, they per-
f(ga) + g(fb) {AC/", *) .f(ga) g(fb)} form the "house-keeping" or "red-tape" roles that are
where rec fx = F(/, #, *) [YMf, g)• (**•*"(/, #, *), performed by sequencing, indices and copying in a
and g* = C7(/, g, x) Xx. G(f, g, x))]. conventional programming language, and by narrative
in informal mathematics. For example:
It will be observed that our discussion of applicative
structure has doubled back on itself. We started by (1) With a few basic numbers and numerical functions
remarking the possibility of analyzing certain more or they are sufficient to describe the numbers and
less familiar notations in terms of functional application functions of recursive number theory. So they
and functional abstraction. We are now remarking the are in some sense "as powerful as" other current
possibility of looking upon these notations as "really" symbolisms of mathematical logic and computer
AEs, written with syntactic variations that make them programming. The question whether this sense
more palatable. Clearly, once a semantically acceptable has much practical significance is one that will
correspondence between AEs and some other notation not be discussed here.
has been established, it can be looked at in either way. (2) With a few basic symbols, and functions associated
The above explanation of where and AEs leaves some with classes of symbol-strings, they are sufficient
details unsettled, but should be enough to make the use to describe syntax (of, say, ALGOL 60, or for
of where in what follows (a) comprehensible and (b) AEs themselves), from the point of view both of
plausibly a mere "syntactic sugaring" of AEs. Further synthesizing and of analyzing.
discussion of where, or of other sorts of syntactic sugar, (3) With a few basic classes, and functions associated
is outside the scope of this paper. with classes of composite information structures,
Another example of alternative notations concerns they are sufficient to formalize "structure defini-
conditional expressions. Interchangeably with the tions," as introduced above (for example the
if . . . then . . . notation we use the. -*• notation as structure definition of AEs themselves).
illustrated by the following two examples: (4) With a few structure definitions they are sufficient
to characterize formally the "value" of an AE,
if p then a and to describe a mechanical process for "pro-
else else b. ducing" it. This is the use to which AEs will be
p -> p->a put in the rest of this paper.
else -> (<7 -> b A discussion of the relative convenience of various
else else -> c). notations in the fields mentioned here is outside the
Any particular set of rules about representing AEs by scope of this paper.
written text (the correspondence with where is one such
set of rules) has two aspects: Evaluation
(a) a rule for deriving the structure of an AE, given The value of an applicative expression
a text that represents it, Every AE in the above examples, including every sub-
(b) a rule for deriving a text that represents an AE, expression of every AE, has a "value," which is either
given its structure. a number, or a function, or a list of numbers, or a list
315
Mechanical evaluation
of functions, etc. More precisely, an AE X has a value in bvX coincides with a constant of E). We denote this
(or rather might have a value) relative to some back- derived environment by
ground information that provides a value for each
identifier that is free in X. This background information derive(assoc(bvX, x))E.
will be called the environment relative to which evaluation We shall describe below a mechanical process for
is conducted. It will be considered as a function that obtaining the value, if it exists, of any given AE relative
associates with each of certain identifiers, either a to any given environment. This process can be imple-
number, or a list, or a function, etc. Each identifier to mented with pencil and paper, or (as we shall briefly
which an environment E gives a value is called a constant sketch) with a digital computer. The rules Rl and R3
of E, and each object "named," or "designated," by a provide a criterion for deciding Whether or not the
constant of E (possibly by several) is called a primitive outcome of this process is in fact the value as we under-
of E. So E is a function whose domain and range stand it.
comprise respectively its constants and its primitives. The three rules can be formalized as a definition of
If we let val, thus:
val(E)(X) recursive valEX=identifierX'-> EX
denote the value of X relative to E (or in E for short), \expX-+f
the function that val designates can be specified by means where fx=val(derive(assoc(bvX,x))E)
of three rules, Rl, R2 and R3. These correspond to the (bodyX)
three questions, Ql, Q2 and Q3 that were introduced else -+ {valE(ratorX))[valE(randX)].
earlier to elucidate the structure of AEs. For example, suppose thrice is the function-producing
Rl. If A' is an identifier, valEX is EX; function defined by
(R2. appears below); thrice(f)(x) = f(f(f(x))).
R3. If X is a combination, valEX can be found by
first subjecting both its operator and operand to Then it follows from the above definition of val that the
valE, and then applying the result of the former values of the followingfiveeAEs,
to the result of the latter. square 5
The rules Rl and R3 are enough to specify valEX thrice square 5
provided that A'contains ho A-expressions. For example, thrice square (thrice square 5)
consider an environment in which the identifier k is thrice (thrice square) 5
associated with the number 7 and the identifier p with thrice thrice square 5
the truthvalue false, and other identifiers have their
expected meanings. Then Rl and R3 suffice to fix the are respectively 52, 523, 5*6, 5 2 ' and 5227. The reader
value of, say, may be better equipped to check this assertion when he
has read the next Section, which describes an orderly
if((219 < 312) yp)(sin, cos)(nlk). way of evaluating AEs.
This example illustrates the need for evaluating the The set of objects that can be denoted by an AE
operator of a combination as well as its operand. relative to an environment E, is the range of the function
valE. It contains all the primitives of E, and everything
R2. If A' is a A-expression, valEX is a function. Like produced by such an object, and every function that can
any function it can be specified by specifying be denoted by a A-expression.
what result it produces for an arbitrary argument,
and we now do this as follows: valEX is that Mechanical evaluation
function whose result for any given argument In order to mechanize the above rule, we represent
can be found by evaluating bodyX in a new an environment by a list-structure made up of name-
environment derived from £ in a way we shall value pairs. There is a function designated by location
presently describe. For example, suppose E is such that if E* is this structure and Xis an identifier then
the environment postulated above, and X is the
A-expression 'Xr.k2 + r2.' Then its value in E locationE*X
is that function whose result for any given argu-
ment, say 13, can be found by evaluating denotes the selector that selects the value of X from E*.
'k2 + r2' in a new environment E', derived from So if E* represents the environment E then the following
E. To be precise, E' agrees with E except that equation holds:
it gives the value 13 to the identifier r. valEX = locationE*XE*.
More generally, this derived environment consists of We shall not bother below to distinguish between E
E, modified by pairing the identifier(s) in bvX with and £*.
corresponding components of the given argument x Also we represent the value of a A-expression by a
(and using the new value for preference if any variable bundle of information called a "closure," comprising
316
Mechanical evaluation
the A-expression and the environment relative to which 2. If C is not null, then hC is inspected, and:
it was evaluated. We must therefore arrange that such (2a) If hC is an identifier X (whose value relative
a bundle is correctly interpreted whenever it has to be to E occupies the position locationEX in E),
applied to some argument. More precisely: then S is replaced by
a closure has
locationEXE : 5
an environment part which is a list whose two items
are: and C is replaced by tC. We describe this step
(1) an environment as follows: "Scanning X causes locationEXE to
be loaded."
(2) an identifier or list of identifiers,
(2b) If hC is a A-expression X, scanning it causes the
and a control part which consists of a list whose closure derived from E and X (as indicated
sole item is an AE. above) to be loaded on to the stack.
The value relative to E of a A-expression X is represented (2c) If hC is ap, scanning it changes £ as follows:
by the closure denoted by hS is inspected and:
constructclosure((E, bvX), unitlist(bodyX)). (2cl) If hS is a closure, derived from E' and X',
This particular arrangement of the information in a then: S is replaced by the nullist,
closure has been chosen for our later convenience. E is replaced by
We now describe a "mechanization" of evaluation in derive(assoc(bvX', 2ndS))E',
the following sense. We define a class of COs, called C is replaced by unitlist(bodyX'),
"states," constructed out of AEs and their values; and D is replaced by (t(tS), E, tC, D).
we define a "transition" rule whose successive application (2c2) If hS is not a closure, then scanning ap
starting at a "state" that contains an environment E causes S to be replaced by
and an AE X (in a certain arrangement), leads eventually ((1st S)(lnd S) : t(tS)).
to a "state" that contains (in a certain position) either
valEX or a closure representing valEX. (We use the (2d) If hC is a combination X, C is replaced by
phrase "result of evaluation" to cover both objects and randX : (ratorX : (ap : tC)).
closures. We suppose that the identifier closure
designates a predicate that detects whether or not a Formally this transformation of one state into another is
given result of evaluation is a closure.)
Transform(S,E,C,D) =
A state consists of a stack, which is a list, each of nullC->[hS:S',E',C,D']
whose items is an intermediate result of where S',E',C,D' = D
evaluation, awaiting subsequent use; else-*-
and an environment, which is a list-structure made identifierX-+ [locationEXE:S, E, tC, D]
up of name/value pairs;
[constructclosure((E, bvX),unitlist(bodyX)) :S,
and a control, which is a list, each of whose items E, tC, D]
is either an AE awaiting evaluation, or a X = ap-> closure(hS) ->
special object designated by 'ap,' distinct [(), derive{assoc(J,2ndS)E'),
from all AEs; C,
and a dump, which is a complete state, i.e. com- (t(tS), E, tC, D)]
prising four components as listed here. where E',J = environmentparl(hS)
We denote a state thus: and C" = controlpart(hS)
else->[(lstS)(2nd):t(tS), E, tC, D]
(S, E, C, D). else-*- [S, E, randX:(ratorX:(ap:tC)), D]
The environment-part (both of states and of closures) where X — hC
would be unnecessary if A-expressions containing free
variables were prohibited. Also the dump would be We assume here that an AE composed of a single
unnecessary if all A-expressions were prohibited. identifier is the same object as the identifier itself. This
suggests a more general assumption that whenever one
Each step of evaluation is completely determined by of the alternative formats of a structure definition has
the current state (S, E, C, D) in the following way:
just one component, the corresponding selector and
1. If C is null, suppose the current dump D is constructor are both merely the identity function. We
(S\ E', C, D'). also assume that a state is identical to a list of its four
components. This suggests a more general assumption
Then the current state is replaced by the state that whenever a structure definition allows just one format,
denoted by the constructor is the identity function. Without these
(hS : S', E', C, £>')• assumptions the above definition would be a bit more
317
Mechanical evaluation
elaborate. A formal account of structure definitions evaluated after the operator; it might even be evaluated
would lead to a more careful discussion of these points. piecemeal when and if it is required during the application
Notice that, whereas a previous formula described a of the value of the operator. Again, the evaluation of
rule for deriving from an AE its value, this new formula a A-expression might be accompanied by partial evalua-
describes a rule for advancing a certain information- tion of its body. The AE might be subjected to pre-
structure through one step. If X is an AE, and E is an processing of various kinds, e.g. to disentangle
environment such that valEX is defined, then starting at combinations once for all or to remove its dependence
any state of the form on an arbitrary choice of identifiers occurring bound in
S, E, X : C, D it. The pre-processing might be more elaborate and
perform symbolic reduction.
and repeatedly applying this transformation, we shall The particular evaluation process described above
eventually reach the state denoted by will be called normal evaluation, and its significance
valEX : S, E, C, D. partly lies in that many other evaluation processes can
That is to say, at some later time X will have been be described in terms of it; i.e. they can be specified as
scanned and its value relative to the current environment a transformation of the AE into some other AE, followed
will have been loaded (on to the stack). In particular, if by normal evaluation of the derived AE. Further
5 and C are both null, i.e. if the initial state is discussion of evaluation processes and of their mutual
( ), E, unitlistX, (S', E\ C, D') relationships is outside the scope of the present paper.
there will be a subsequent state
unitlist(valEX), E, ( ), (5", E', C, D') Evaluating with a digital computer
which will be immediately succeeded by the state denoted This Section describes how a "state," in the above
by sense, can be represented in the instantaneous state of
a digital computer, and how the transformation forma-
valEX : S', E', C, D'. lized above can be represented by a stored program.
These assertions can be verified by performing the The method chosen here is one of many and is disting-
appropriate substitutions in the definition of Transform. uished by its simplicity in description, rather than by
its cheapness. It will hold no surprises for anyone
Basic functions familiar with the "chaining" techniques of storage and
By a "basic" function of E we mean a function other location pioneered in list-processing systems. It is
than a closure, that can arise as a result of evaluation. given here as a demonstration of possibility, not of
At the most the basic functions comprise practicability.
(1) primitive functions;
Representing each composite object by an address
(2) any functions produced by basic functions.
For, any result of a closure must also be a result of a Each component of a state, from the entire state
primitive (or be a result of a result of a primitive, or, etc.). downwards, and including such COs as are definable
However, this may be an over-estimate of the number objects, can be represented in a computer by an address.
of basic functions, for it is clearly possible that a The way of doing this is closely related to the structure
primitive might be a closure. For instance the evaluation definitions used to introduce the various COs concerned.
of For, given that the components can be represented by
{A/./3 +f4}[Xx.x2 + 1] addresses, the complete CO can be represented by a
short segment of store, large enough to contain these
relative to E involves evaluating addresses (and, if the CO is one admitting alternative
/3+/4 formats, a distinguishing tag). So the complete CO can
relative to an environment in which ' / ' names the closure also be represented by the address of this short segment.
that we may roughly denote by There is need for one fixed area in the store, large
enough to hold an address and representing the current
constructclosure((E, V ) , unitlist('x2 + 1')). state. The merit of this method is that the predicates,
Of th six sorts of step described above, namely (1), selectors, and constructors can be represented by stored
(2a), (2b), (2c\), (2c2) and (2d), all except (2c2) are mere programs whose size and speed is independent of the
rearrangements. (2c2) arises whenever ap finds that the size of the COs operated on. Hence this is also true
head of the stack is a basic function. of the information-rearranging steps that occur during
evaluation, namely (1), (2a), (2b), (2c\) and (2d); for
Other ways of mechanizing evaluation each of these is a composition of predicates, selectors
It should be observed that this is only one of many and constructors.
ways of mechanizing the evaluation of AEs, all pro- Each of these steps can be represented by a stored
ducing the value, as specified above. For instance, it is program of ten or twenty orders in most current
not essential that the operand of a combination be machine-codes. Obviously, the possibility arises of
evaluated before its operator. The operand might be designing a machine-code that favours these steps.
318
Mechanical evaluation
However, the implementation sketched here has less the machine is to be congested rapidly) must in turn
claim to such embodiment than some others whose be able to retrieve for re-use used cells that have become
properties are briefly referred to below. irrelevant.
Shared components Other ways of representing our mechanization with a
One consequence of this method is the presence of digital computer
"shared" components. For instance, suppose the It was earlier observed that the mechanization in
environment denoted by terms of SECD-states is only one of many ways of
mechanizing evaluation. Likewise, given a particular
derive(assoc(bvX, x))E mechanization, there may be many ways of representing
is being formed m step (2cl). It is possible that a copy it with a digital computer. In particular, the method
of the address representing E is "incorporated" into the just sketched is not the only way of mechanizing SECD-
new environment. As long as environment components states.
are not updated, the extent of sharing is immaterial. For example, of all the occasions on which a fresh
However, there are two possible developments in which cell is required, there are eertain sub-sets that can
it would become important to consider precisely what reasonably be acquired and disposed of in a "last
components are shared. in/first out" (LIFO) pattern. Hence by distributing
these requirements among more than one source of
(a) We might vary the evaluation process by intro- fresh cells it is possible to exploit consecutive addressing.
ducing a preparatory investigation of each AE, to In particular by restricting the structure of AEs it is
determine whether any of the transformations possible to rely exclusively on the LIFO pattern. Such
of COs that occur during its evaluation can restrictions suggest a pre-evaluational transformation
be performed by overwriting rather than by for eliminating expensive structures in favour of
reconstructing. equivalent cheaper ones. SuGh variations are outside
(ft) We might generalize AEs by introducing a fourth the scope of this paper.
format playing the role of an assignment.
Representing each non-composite object by an address Conclusion
Several lines of development outside the scope of this
The possibility of using the above storage technique paper have been indicated above. Some of these consist
depends on in part of a "sideways advance," a rephrasing of previous
(1) being able to represent each non-composite work in a new jargon. However, a new jargon might
definable object by an address: namely, identifiers, have features that justify this procedure. The best
primitives and all results of evaluation other than claim in the present case seems to be based on the
closures and composite definable objects; extent to which it isolates several aspects of the use of
(2) being able to represent each basic function / by computers that frequently appear inextricably inter-
a stored program such that, if the head of the woven.
stack represents x, the program replaces it by an One such feature is the distinction between structure
address representing fx. and representation effected by "structure definitions."
For instance, the structure of expressions was dis-
Representing Y tinguished from their written representation. Again,
If we consider a specific (powerful) set of primitives, the structure of the information that is recorded during
comprising some basic numbers, some numerical evaluation was distinguished from its representation in
functions, the basic list-processing primitives, and Y, a computer.
only the latter involves any unfamiliar technique. Y Another separation achieved above is that between
can be represented by a stored program that, given an considerations special to a particular subject-matter, and
argument F at the head of the stack, performs the considerations relevant to every subject-matter (or
following steps: "universe of discourse," or "field of application," or
"problem orientation"). The subject-matter is deter-
1. Take a fresh cell z, whose address is Z. mined precisely by the choice of primitives and is not
2. Use Z as a spurious argument for F, producing a affected by the choice of names for them, or of rules for
result-address Z'. writing expressions (except that these rules might narrow
3. Copy the word addressed by Z' to the cell z. Then the subject-matter by making some AEs unwritable).
Z is the required result of Y. The relationship between expressions and their written
This representation of Y is adequate for the uses of it representation encompasses all that is customarily called
mentioned in the Section on "Circular definitions." the "syntax" of a language and part of what is custo-
marily called its "semantics." The chosen name/value
Source of storage relation, together with the primitives themselves (that is
The stored programs for constructing COs must have to say, the applicative relationships between them)
access to a source of fresh storage cells, which (unless constitute the rest of what is customarily called the
319
Mechanical evaluation
"semantics" of a language insofar as it is distinct from etc. [6]. However, they have another merit, that of
the semantics of other languages. being less associated with a particular internal repre-
These remarks about "languages" are subject to an sentation, and, in particular, with a particular ordering
important qualification. They apply only to languages of the components. (Gilmore [5] effectively uses
that can be considered as AEs plus syntactic sugar. "selectors" to avoid entanglement with a specific
While most languages in current use can partly be written representation of expressions.)
accounted for in these terms, entirely "applicative" Church [1], Curry [2] and Rosenbloom [9] all include
languages have yet to be proved adequate for practical discussions of how to eliminate various uses of bound
purposes. Whether or not they will be, and whether variables in terms of just one use, namely functional
their interesting properties can be extended to hold for abstraction; also of how to eliminate lists, and functions
currently useful languages are questions outside the that select items from lists, in terms of functional
scope of this paper. application. The function Y is called 0 in Rosenbloom
[9], and Y in Curry [2]; roughly speaking Y\ is
McCarthy's label [6].
Relation to other work Formalizing a system in its own terms is now a
Most of the above ideas are to be found in the familiar occupation. The relative simplicity of the
literature. In particular Church and Curry, and function val, compared, say, with LISP's eval, apply,
McCarthy and the ALGOL 60 authors, are so large a etc. [6, 7], is due partly to the fact that it treats the
part of the history of their respective disciplines as to operator and operand of a combination symmetrically.
make detailed attributions inevitably incomplete and The formalization of a machine for evaluating
probably impertinent. expressions seems to have no precedent. Gilmore's
The criterion of "semantic acceptability," whereby a machine [5] is specified by a flow diagram. The relative
proposed rendering in terms of AEs can be judged simplicity of the function Transform, compared with his
correct or incorrect, is closely related to what some specification, is also due in part to the above mentioned
logicians (e.g. Quine [8]) call "referential transparency," symmetry. Closures are roughly the same as McCarthy's
and to what Curry [2] calls the "monotony" of "FUNARG" lists [7] and Dijkstra's PARD's [3].
equivalence. (This method of "evaluating" a A-expression is to be
Structure definitions are in some sense merely a contrasted with "literal substitution" such as is used in
convenient way of avoiding the uninformative strings of Church's normalization process, in Gilmore's machine
a's and d's that occur in LISP's 'cadar,' 'cadaddr,' [5], and in Dijkstra's mechanism [4]).

References
1. CHURCH, A. (1941). The Calculi of Lambda-Conversion, Princeton, Princeton University Press.
2. CURRY, H. B., and FEYS, R. (1958). Combinatory Logic, Vol. 1, Amsterdam, North Holland Publishing Co.
3. DUKSTRA, E. W. (1962). "An ALGOL60 Translator for the XI," Automatic Programming Bulletin, No. 13.
4. DtjKSTRA, E. W. (1962). "Substitution Processes," Preliminary Publication, Amsterdam, Mathematisch Centrum.
5. GILMORE, P. C. (1963). "An Abstract Computer with a LISP-like Machine Language without a Label Operator," in
Computer Programming and Formal Systems, ed. Braffort, P., and Hirschberg, D., Amsterdam, North Holland Publishing
Co.
6. MCCARTHY, J. (1960). "Recursive Functions of Symbolic Expressions and their Computation by Machine, Part 1,"
Comm. A.C.M., Vol. 3, No. 4, pp. 184-195.
7. MCCARTHY, J. et al. (1962). LISP 1.5, Programmer's Manual, Cambridge, M.I.T.
8. QUINE, W. V. (1960). Word and Object, New York, Technology Press and Wiley.
9. ROSENBLOOM, P. (1950). The Elements of Mathematical Logic, New York, Dover.

320
The Next 700 Programming Languages
P. J. Landin
Univac Division of Sperry Rand Corp., New York, New York

" . . . t o d a y . . . 1,700 special programming languages used to 'com-


municate' in over 700 application areas."--Computer Software Issues,
an American Mathematical Association Prospectus, July 1965.

A family of unimplemented computing languages is de- differences in the set of things provided by the library or
scribed that is intended to span differences of application area operating system. Perhaps had ALGOL 60 been launched
by a unified framework. This framework dictates the rules as a family instead of proclaimed as a language, it would
ckout the uses of user-coined names, and the conventions have fielded some of the less relevant criticisms of its
about characterizing functional relationships. Within 'lhis frame- deficiencies.
work 'lhe design of a specific language splits into two inde- At first sight the facilities provided in IswI~,~ will appear
pendent parts. One is 'lhe choice of written appearances of comparatively meager. This appearance will be especially
programs (or more generally, their physical representation). misleading to someone who has not appreciated how much
The o:her is the choice of the abstract entities (such as numbers, of current manuals are devoted to the explanation of
character-strings, lists of them, functional relations among common (i.e., problem-orientation independent) logical
them) that can be referred to in the language. structure rather than problem-oriented specialties. For
The system is biased towards "expressions" rather than example, in almost every language a user can coin names,
"statements." It includes a nonprocedural (purely functional) obeying certain rules about the contexts in which the
subsystem fhat aims to expand the class of users' needs that name is used and their relation to the textual segments
can be met by a single print-instruction, without sacrificing the that introduce, define, declare, or otherwise constrain its
important properties that make conventional right-hand-side use. These rules vary considerably from one language to
expressions easy to construct and understand. another, and frequently even within a single language
there may be different conventions for different classes of
1. Introduction names, with near-analogies that come irritatingly close to
Most programming languages are partly a way of being exact. (Note that restrictions on what names can
expressing things in terms of other things and partly a be coined also vary, but these are trivial differences. When
basic set of given things. The Isw~M (If you See What I they have any logical significance it is likely to be perni-
Mean) system is a byproduct of an attempt to disentangle cious, by leading to puns such as ALaOL'S integer labels.)
these two aspects in some current languages. So rules about user-coined names is an area in which
This attempt has led the author to think that many we might expect to see the history of computer applica-
linguistic idiosyneracies are concerned with the former tions give ground to their logic. Another such area is in
rather than the latter, whereas aptitude for a particular specifying functional relations. In fact these two areas are
class of tasks is essentially determined by the latter rather closely related since any use of a user-coined name im-
than the former. The conclusion follows that m a n y plicitly involves a functional relation; e.g., compare
language characteristics are irrelevant to the alleged x(x-ka) f(b+2c)
problem orientation. w h e r e x = b -4- 2c w h e r e f(x) = x(x+a)
IswI~ is an attempt at a general purpose system for ISW~M is thus part. programming language and part pro-
describing things in terms of other things, that can be gram for research. A possible first step in the research
problem-oriented by appropriate choice of "primitives." program is 1700 doctoral theses called " A Correspondence
So it is not a language so much as a family of languages, between x and Church's X-notation. ''~
of which each member is the result of choosing a set of
2. The where-Notation
primitives. The possibilities concerning this set and what
is needed to specify such a set are discussed below. I n ordinary mathematical communication, these uses
Isw~M is not alone in being a family, even after mere of 'where' require no explanation. Nor do the following:
syntactic variations have been discounted (see Section 4). f(b-l-2c) ---I-f(2b--c)
In practice, this is true of most languages that achieve w h e r e f(x) = x(x-t-a)

more than one implementation, and if the dialects are well f(bA--2c) -- f ( 2 b - c )
w h e r e f(x) = x ( x + a )
disciplined, they might with luck be characterized as
and b = u/(u+l)
Presented at an ACM Programming Languages and Pragmatics a n d c = v/(v-t-1)
Conference, San Dimes, California, August 1965. g ( f w h e r e f ( x ) = ax 2 -]- bx -I- c,
1Throe is no more use or mentiol~ of Xin this paper--eognoseenti u/(u-4-1),
will nevertheless sense an undercurrent. A not inappropriate title v/(v+l))
would have been "Church without lambda," w h e r e g ( f , p, q) = f ( p - k 2 q , 2 p - - q )

Volume 9 / Number 3 / March, 1966 Communications of the ACM 157


A phrase of the form ' w h e r e definition' will be called a method of expressing these functionally is explained in
"where-clause." An expression of the form 'expression [2]. I t amounts to using named transfer-functions instead
where-clause' is a "where-expression." Its two principal of class names like i n t e g e r , i.e., writing
components are called, respectively, its "main clause" where n = round(n)
and its "supporting definition." To put ' w h e r e ' into a
programming language the following questions need instead of the specification
answers. integer n
Linguistic Structure. What structures of expression
can appropriately be qualified by a where-clause, e.g., Thus the use of functional notation does not jeopardize
conditional expressions, operand-listings, statements, the determination of type from textual evidence.
declarations, where-expressions?
3. P h y s i c a l I S W I M a n d :Logical I S W I M
Likewise, what structures of expression can appro-
priately be written as the right-hand side (rhs) of a Like ALGOL 60, ISWIM has no prescribed physical
supporting definition? What contexts are appropriate for a appearance. ALGOL C0'S designers sought to avoid commit-
where-expression, e.g., as an arm of a conditional ex- ment to any particular sets of characters or type faces.
pression, an operator, the main-clause of a where-ex- Accordingly they distinguish between "publication hm-
pression, the left-hand side (lhs) of a supporting definition, guage," "reference language" and "hardware languages."
the rhs of a supporting definition? Of these the reference language was the standard and was
Syntax. Having answered the above questions, what used in the report itself whenever pieces of ALGOL 60
are the rules for writing the acceptable configurations occurred. Publication and hardware languages are trans-
unambiguously? E.g., where are brackets optional or literations of the reference language, varying according to
obligatory? or other punctuation? or line breaks? or in- the individual taste, needs and physical constraints on
dentation? Note the separation of decisions about struc- available type faces and characters.
ture from decisions about syntax. (This is not a denial Such variations are different physical representations
that language designers might iterate, like hardware of a single abstraction, whose most faithful physical
designers who distinguish levels of hardware design.) representation is the reference language. In describing
Semantic Constraints on Linguistic Structure. In the IswI~ we distinguish an abstract language called "logical
above examples each main clause was a numerical ex- ISWIM," whose texts are made up of "textual elements,"
pression; i.e., given appropriate meanings for the various characterized without commitment to a particular physical
identifiers in it, it denoted a number. What other kinds of representation. There is a physical representation suitable
meaning are appropriate for a mainclause, e.g., arrays, for the medium of this report, and used for presenting
functions, structure descriptions, print-formats? each piece of IswI~1 that occurs in this report. So this
Likewise what kinds of meaning are appropriate for physical representation corresponds to "reference ALGOL
rhs's of supporting definitions? Notice there is not a third 60," and is called "reference ISWIM," or the "IswI~i
question analogous to the third question above under reference representation," or the "IswI~,r reference hm-
linguistic structure. This is because a where-expression guage."
must mean the same kind of thing as its main clause and To avoid imprecision one should never speak just of
hence raises no new question concerning what contexts "ISWIM," but always of "logical IswxM" or of "such-
are meaningful. Notice also that the questions about and-such physical ISWlM." However, in loose speech,
meaning are almost entirely independent of those about where the precise intention is clear or unimportant, we
structure. They depend on classifying expressions in two refer to "ISWlM" without quMifieation. We aim at a more
ways that run across each other. formal relation between physical and logical languages
Outcome. What is the outcome of the more recondite than was the case in the ALGOL C0. This is necessary since
structural configurations among those deemed admissible, we wish to systematize and mechanize the use of different
e.g. mixed nests of where-expressions, function definitions, physical representations.
conditional expressions, etc.?
Experimental programming has led the author to think 4. F o u r L e v e l s o f A b s t r a c t i o n
that there is no configuration, however unpromising it The "physical~logical" terminology is often used to
might seem when judged cold, that will not turn up quite distinguish features that are a fortuitous consequence of
naturally. Furthermore, some configurations of ' w h e r e ' physical conditions from features that are in some sense
that might first appear to reflect somewhat pedantic dis- more essential. This idea is carried further by making a
tinctions, in fact provide close matches for current lan- similar distinction among the "more essential" features.
guage features such as n a m e / v a l u e and o w n (see [2, 3]). In fact ISWlM is presented here as a four-level concept
All these questions are not answered in this paper. The comprising the following:
techniques for answering them are outlined in Section 4. (1) physical IswlM'S, of which one is the reference
One other issue arises when ' w h e r e ' is added to a language and others are various publication and hardware
programming language--types and specifications. A languages (not described here).

158 C o m m u n i c a t i o n s of t h e ACM Volt, me 9 / N u m b e r 3 / March, 1966


(2) logical ISWlM, which is uncommitted as to char- demand, simple, infixed, etc; also the selectors body, rator,
acter sets and type faces, but committed as to the sequence leflarm, nee, etc; also (taking for granted certain un-
of textual elements, and the grammatical rules for group- formalized conventions concerning structure definitions)
ing them, e.g., by parentheses, indentation and precedence the constructors, consdemand, conscombination (elsewhere
relations. ~bbreviated to combine), consstandardade], etc. Examples
(3) abstract Iswls,,, which is uncommitted as to the of reference IswI~ are given alongside, against the right
grammatical rules of sequence and grouping, but com- margin.
mitted as to the grammatical categories and their nesting
An a m e s s a g e is
structure. Thus abstract Iswis,~ is a "tree language" of e i t h e r a demand, a n d has [ P r i n t a+2b
which logical IswlM is one linearization. a body w h i c h is an a e x p r c s s i o n ,
(4) applicative expressions (AEs), which constitute or e l s e a definition, [Def x=a+2b
another tree language, structurally more austere than where rec
an a e x p r e s s i o n (aexp) is
abstract ISWlM, and providing certain basic grammatical
e i t h e r simple, a n d has [CAth231"
categories in terms of which all of Isw1~'s more numerous a body w h i c h is an identifier
categories can be expressed. or a combination, in w h i c h case it has [sin(a+2b)
The set of acceptable texts of :t physical ISWlM is a rator, w h i c h is an aexp, or
specified by the relations between 1 and 2, and between a n d a rand, which is an aexp, a + 2b
or conditional, in w h i c h case it is
2 and 3. The outcome of each text is specified by these
e i t h e r two-armed, a n d has [p--*a+2b; 2a--b
relations, together with a "frame of reference," i.e., a rule a condition, w h i c h is an aexp,
that associates a meaning with each of a chosen set of a n d a teftarm, w h i c h is a n aexp,
identifiers. a n d a rightarm, w h i c h is a n aexp,
These are the things that vary from one member of our or one-armed, a n d has [q-+2a--b
a condition, w h i c h is a n aexp,
language family to the next. The specification of the family
a n d a n arm, w h i c h is an aexp,
is completed by the relation between abstract IswI~ and or a listing, a n d h a s [a+b, c+d, e + f
AEs, together with an abstract machine that interpret a body w h i c h is an aexp-list,
AEs. These elements are the same for all members of the or beet, a n d has [ x ( x + l ) w h e r e x = a + 2b
family and are not discussed in this paper (see [1, 2, 4]). a mainclause, w h i c h is a n aexp, or
a n d a support l e t x = a + 2b; x ( x + l )
The relationship between physical ISWlM and logical
w h i c h is a n adef,
ISWIM is fixed by saying what physical texts represent and
each logical element, and also what layout is permitted in an adefinition (adef) is
stringing them together. The relationship between logical e i t h e r standard, a n d has [x=a+2b
I s w ~ and abstract IswlM is fixed by a formal grammar a definee (nee), which is an a b e ,
a n d a definiens (niens), w h i c h is a n aexp,
not unlike the one in the ALGOL 60 report, together with a
or functionform, arid has [f(x) = z ( x + l )
statement connecting the phrase categories with the a lefthandside (lhs),
abstract grammatical categories. w h i c h is a n a b e - l i s t of l e n g t h > 2 ,
These two relations cover what is usually called the a n d a righthandside (rhs), w h i c h is an aexp
"syntax" or "grammar" of a language. In this paper or programpoint, a n d has [ppf(x) =x(x+l)
a body w h i c h is an adef,
syntax is not discussed beyond a few general remarks and
or circular, a n d h a s [tee f ( n ) = ( n = 0 ) - + l ; n f ( n - 1 )
a few examples whose meaning should be obvious. a body w h i c h is a n adef,
The relationship between abstract Iswls( and AEs is or simultaneous, a n d has [x=a+2b a n d y=2a--b
fixed by giving the form of AE equivalent to each abstract a body, w h i c h is an adef-list,
IswiM grammatical category. It happens that these latter or beet, a n d has [f(y) = z ( x + y )
a mainclause, w h e r e x=a+2b
include a subset that exactly matches AEs. Hence this w h i c h is a n adef,
link in our chain of reh~tions is roughly a mapping of a n d a support, w h i c h is an adef.
ISWIM into an essential "kernel" of IswIM, of which all the w h e r e a n a b e is
rest is mere decoration. e i t h e r simple, a n d h a s
body, w h i c h is an identifier,
5. A b s t r a c t I S W I M or e l s e , is an abv-lislo. [x, (y, z), w

The texts of abstract ISWlM are composite information A program-point definition introduces a deviant kind
structures called amessage's. The following structure of function. Applying such a function precipitates pre-
definition defines~ the class amessage in terms of a class mature termination of the where-expression containing
called identifier. It also defines several functions for it, and causes its result to be delivered as the value of the
manipulating amessage's. These comprise the predicates entire where-expression.
Program-points are Iswli'S, nearest thing to jumping.
2 W r i t i n g a s t r u c t u r e definition i~volves coining n a m e s for the
v a r i o u s a l t e r n a t i v e f o r m a t s of amessage's and t h e i r c o m p o n e n t s .
Assignment is covered as a particular case of an operator.
T h e only o b s c u r e coinage is " b e e t , " w h i c h a b b r e v i a t e s " b e t a - For both of these the precise specification is in terms of the
r e d e x , " i.e., " a n e x p r e s s i o n a m e n a b l e to rule (fl)"; see Section 7'. underlying abstract machine (see [2]).

V o l u m e 9 / N u m b e r 3 / M a r c h , 1966 Communications of the ACM 159


6. R e l a t i o n s h i p to LISP rule has three important features. I t is based on vertical
alignment, not character width, and hence is equally
IswI~r can be looked on as an attempt to deliver LisP
fronI its eponymous commitment to lists, its reputation appropriate in handwritten, typeset or typed texts. Its
for hand-to-mouth storage allocation, the hardware use is not obligatory, and use of it can be mixed freely with
dependent flavor of its pedagogy, its heavy bracketing, more conventionM alternatives like punctuation. Also, it
is incorporated in IswI~t in a systematic way that admits
and its compromises with tradition. These five points are
of alternatives without changing other features of Isw~.r
now dealt with in turn:
(1) Iswi~ has no particular problem orientation. and that can be applied to other languages.
Experiments so far have been mainly in numerical work (5) The most important contribution of LisP was not
and language processing with brief excursions into "com- in listprocessing or storage allocation or in notation, but
mercial" programming and elsewhere. No bias towards or in the logic~d properties lying behind the notation, ttere
Iswi?i makes little improvement because, except for a few
away from a particular field of application has emerged.
minor details, Lisp left none to make. There are two
(2) Outside a certain subset (corresponding closely to
equivalent ways of stating these properties.
ALGOL ~0 without dynamic own arrays), IswIM needs
garbage collection. An experimental prototype imple- (a) LIsP simplified the equivalence relations that
mentation followed common ALGOL 60 practice. I t used determine the extent to which pieces of program can be
interchanged without affecting the outcome.
dynamic storage allocation with two sources, one LIFO
and the other garbage collected, with the intention that
(b) LISP brought the class of entities that are denoted
the LIFO source should take as big a share as possible. by expressions a programmer can write nearer to those
that arise in models of physical systems and in mathe-
However, as with ALGOL 60, there is a latent potential
matieM and logical systems.
for prealloeating storage in certain favorable and com-
These remarks are expanded in Sections 7 and 8.
monly useful situations. Compared with LISP the chief
amelioration of storage allocation comes out of a mere 7. T h e C h a r a c t e r i s t i c E q u i v a l e n c e s o f I S W I M
syntactic difference, namely, the use of w h e r e (ol 'let' For most programming languages there are certain
which is exactly equal in power and program structure). statements of the kind, "There is a systematic equivalence
This provides a block-structure not dissimilar in textual between pieces of program like this, and pieces like t h a t , "
appearance from ALGOL 60'S, though it slips off the pen that nearly hold but not quite. For instance in ALGOL 60
more easily, and is in many respects more generM. there is a nearly true such statement concerning procedure
(3) LisP has some dark corners, especially outside calls and blocks.
"pure LISP," in which both teachers and programmers At first sight it might appear pedantic to quibble about
resort to talking about addresses and to drawing storage such untidiness--"What's the point of having two different
diagrams. The abstract machine underlying IswI~,r is ways of doing the same thing anyway? Isn't it better to
aimed at illuminating these corners with a mininmm of have two facilities than just one?" The author believes
hardware dependence. that expressive power should be by design rather than
(4) The textual appearance of IswI~l is not like Lisp's accident, and that there is great point in equivalences that
S-expressions. It is nearer to LISP'S M-expressions (which hold without exception. I t is a platitude that any given
constitute an informal language used as an intermediate outcome can be achieved by a wide variety of programs.
result in hand-preparing LISP programs). IswlAi has the The practicability of all kinds of program-processing
following additional features: (optimizing, checking satisfaction of given conditions,
(a) "Auxiliary" definitions, indicated by 'let' or 'where', constructing a program satisfying given conditions)
with two decorations: ' a n d ' for simultaneous definitions, depends on there being elegant equivalence rules. For
and 'rec' for self-referential definitions (not to be mistaken IswlM there are four groups 3, concerning:
for a warning about recursive activation, which can of (1) the extent to which a subexpression can be replaced
course also arise from self-application, and without self- by an equivalent subexpression without disturbing the
reference). equivalence class of the whole expression. Without this
(b) Infixed operators, incorporated systematically. A group the other rules would be applicable only to complete
logical ISWIM can be defined in terms of four unspecified expressions, not to subexpressions.
parameters: three subsets of the class of identifiers, for use (2) user-coined names, i.e., in definitions and, in particu-
as prefixed, infixed and postfixed operators; and a prec- lar, function definitions.
edence relation defined over the union of these subsets. (3) built-in entities implicit in special forms of ex-
(c) Indentation, used to indicate program structure. A pression. The only iiistanees of this in Iswllv[ are conditional
physical IswiM can be defined in terms of an unspecified expressions, listings and self-referential definitions.
parameter: a subset of phrase categories, instances of (4) named entities added in any specific problem-
which are restricted in layout by the following rule called orientation of IswIM.
"the offside rule." The southeast quadrant that just con-
3 To facilitate subsequent discussion each rule is preceded
tains the phrase's first symbol nmst contain the entire by a name, e.g., "(~t)", "(,)", etc. These are chosen to conform
phrase, except possibly for bracketed subsegments. This with precedents in Curry's Combinatory Logic.

160 Communications of the ACM Volume 9 / Number 3 / March, 1966


GROUP 1 (undefined) undefined ~ self apply (selfapply)
w h e r e self apply (f) = f ( f )
(tz) If L -= L' then L (M) -= L ' (M) (Y) recx = L ~ x = (Lwhere recx = L)
@) If M ~ M' then L (M) ~ L (M') (D") (x, - . . , z ) = M ~- (x, ' " , z ) =
@') If M ~ M' then (L,...,M, ..-,N)~(L, ...,M', ...N) null (tkw)
(v") If L ~ L' then (L--~M; N ) ~ (L'--+M, N ) hw, . . . , h(t~-lw)
(v') If M ~ M' then (L--aM; N ) ~ ( L - a M ' ; N ) where w = M
(vi~) If N -~ N' then (L--~M; N ) ~ (L--~M; N ' ) (for k > 2)
(v ~) If M -= M~ then (L w h e r e x = M ) ~- ( L w h e r e x = M t) (x, (u, v), z) = M =- (x, (u, v), z) =
null (taw)
The significant omissions here are the main-clause in the h(w),
last case above, the rhs of a function definition " f ( x ) = M " (null (t2w ') --~
and of a circular definition " r e e :c = M". h(w'), h(t(w'))
where w ' = h(t(w)))
h(t2w)
GRoue 2
where w = M
(let) letx = M; L --= L w h e r e x = M arid so on for each shape of definee
(I') f(x) = L ~ f = (g w h e r e g ( x ) = L ) (null) null (nullisl) -= t r u e
f(a,b,c)(x,y) = L ~ f(a,b,c)= (gwhereg(x,y)=L) (null I) null (La, . . ' , Lk) ~ false
and so on for each s h a p e of l e f t - h a n d s i d e w h e r e (x, " ' , z)
(I) (f where f(x)=L) M ~ L wherex = M = L b " " , Lk (k > 2)
(~') (x=L) where y = M ~ x ~ (L w h e r e y = M ) (h) h ( L b " " , Lk) ~ x
(D') x = L and y = M a n d ~- . . - a n d z = N w h e r e (x, - . . , z)
(z,y, "",z) = (L,M, -",N) = Lj, . - . , Lit (k > 2)
(t) t(L1, . . . , L~) ~ y, . . . , z
Rules (I'), (~'), (D'), together with (Y) below, enable where (x,y, -.-,z)
any definition to be "standardized," i.e., expressed in a = L1, " ' , Lk (k _ 3)
lhs/rhs form, in which the lhs is precisely the definee. Thus (t') t(t(Ll, L2)) =-- nullist
w h e r e (x, y) = L1, L2
a nonstandard definition can be transformed so as to be
amenable to rules (~) and (~) (see Group 2'). The rules about listings may appear willfully indirect.
The more natural transformations are those effected by
GROUP 2 f
applying, for example, (D I') then (~). But these would
M
have suffered the same limited validity as (~). In their
(fl) L where x = M Subst L above slightly cautious formulation the validity of (DI'),
X
etc. is unrestricted, and the more powerful equivalences
where "Subst ~i C" means roughly the expression resulting that hold for nonimperative expressions arise entirely
from (/3).
from substituting A for B throughout C. Here 'x' may be
any list-structure of distinct identifiers, provided that GRouP 4
'M' has structure that fits it. A problem-orientation of IswI~r can be characterized
This rule is the most important, but it has only limited by additional axioms. In the simplest case such an axiom
validity, namely, within the "purely functional" subset of is an IswiM definition. The resulting modification is called
ISWlM that results from not using the program-point a "definitional extension" of the original system.
feature or assignment. In more elaborate cases axioms may mutually constrain
Its importance lies in a variant of a famous theorem of a group of identifiers; e.g. the following rule for equality
mathematical logic, the Church-Rosser theorem. This among integers:
concerns the possibility of eliminating ' w h e r e ' from an ( = ) Suppose L and M are ISWIM written integers
expression by repeatedly applying the rules stated above, (i.e., strings of digits); then either one or the other of the
including crucially (~). The theorem ensures that if there following holds:
are several ways of doing this they all reach the same L = M ~-true
result. L = M ~ false
The author thinks that the fruitful development to
encompass all ISWlM will depend on establishing "safe" according as L and 114 differ at most in lefthand zeros, or
areas of an ISWlA~ expression, in which imperative features not.
can be disregarded. The usefulness of this development Another example, presented even less formally, is the
will depend on how successfully ISWlM'S nonimperative structure definition for abstract ISWlM.
features supersede conventional programming. Group 1 above makes no provision for substitutions
within expressions that are qualified by a supporting
GROUP 3 definition or are used to define a function. However, such
(--~) t r u e --~ M ; N ~ M a substitution is legitimized as long as it does not involve
(--/) f a l s e - ~ M; N ~ N the definees or variables, by encasing it within applications
(---~") P ~ M ~ P ~ M; undefined of rule (/3) and its inverse (with any other rules that might

Volume 9 / Number 3 March, 1966 Communications of the ACM 161


be needed to produce something that is amenable to (~), contains expressions such as
i.e., a beet with a standard definition). 'wine'
EquivMence rules can be used to prove things about the Anyone with a generous ontology will admit that
system. For example, the reader will readily verify that this 6-character expression denotes the 4-character-string
the equivalence of wine
For such a person its use in the language is characterized
f (6) where reef(n) = (n=0) --~ 1; nf (n--l)
by
and • the objects that it is applicable to, and the object it
6 (f(5) where r e e f (n) = (n=0) --~ 1; nf (n--l)) produces in each case (e.g., strings might be used like
vectors, whose application to an integer produces an
can be established with the following steps: item of the string).
(I'), (Y), (f~), (Y), (f3), (I), (=), (i3) backwards, (Y) backwards, • The objects that it is amenable to, and the object it
(I') backwards. yields in each case (e.g., prefixing, appending, selection,
etc.).
in this sequence we omit the auxiliary applications of (~), The sceptic need not feel left out. He just has to talk, a
etc. that are needed at Mmost every step to legitimize the bit more clumsily, about
substitution.
'wine'
8. A p p l i c a t i o n a n d D e n o t a t i o n
being in the equivalence class that also contains
The commonplace expressions of arithmetic and algebra
have a certain simplicity that most communications to concatenate ('wi', 'he')
computers lack. In particular, (a) each expression has a
and
nesting subexpression structure, (b) each subexpression
denotes something (usually a number, truth vMue or append (fiflhletterof (romanalphabet),
numerical function), (c) the thing an expression denotes, (threeletterstemof ('winter'))
i.e., its "value", depends only on the values of its sub-
Then he goes on to speak of the equivalence class of
expressions, not on other properties of them.
expressions that can serve as operand or operator to any
It is these properties, and crucially (c), that explains
of the above, and the equivalence class of the resulting
why such expressions are easier to construct and under-
operator/operand combination.
stand. Thus it is (c) that lies behind the evolutionary
trend towards "bigger righthand sides" in place of strings
9. N o t e o n T e r m i n o l o g y
of small, explicitly sequenced assignments and jumps.
When faced with a new notation that borrows the func- ISWIM brings into sharp relief some of the distinctions
tional appearance of everyday algebra, it is (c) that gives that the author thinks are intended by such adjectives as
us a test for whether the notation is genuinely functional procedural, nonproeedural, algorithmic, heuristic, impera-
or merely masquerading. tive, declarative, functional, descriptive. Here is a sug-
The important feature of ISWIM's equivalence rules is gested classification, and one new word.
chat they guarantee the same desirable properties to First, none of these distinctions are concerned with the
ISWlM'S nonimperative subset. We may equate "abstract use of pidgin English rather than pidgin algebra. Any
object" with "equivalence class," and equate "denotes" pidgin algebra can be dressed up as pidgin English to
with "is a member of." Then the properties (g) and (v) please the generals. Conversely, it is a special ease of the
ensures anologies of (c) above. They state that the value thesis underlying ISWlM that any pidgin English that has
of an operator/operand combination depends only on the so far been implemented can be stripped to pidgin algebra.
values of its component subexpressions, not on any other There is nevertheless an important possibility of having
aspects of them. languages that are heuristic on account of their "applica-
Thus conditions (g) and (v) are equivalent to the tive structure" being heuristic.
existence of a dyadic operation among the abstract ob- An important distinction is the one between indicating
jects; we call this operation "application." what behavior, step-by-step, you want the machine to
The terminology of "abstract objects," "denoting" and perform, and merely indicating what outcome you want.
"application" is frequently more convenient than that of P u t that way, the distinction will not stand up to close
equivalence relations. For example, it suggests another investigation. I suggest that the conditions (a-e) in Section
way of characterizing each problem-orientation of ISWlM. 8 are a necessary part of "merely indicating what outcome
We can think of a set of abstract objects with a partially you want." The word "denotative" seems more appro-
defined dyadic "application" operation and a monadic priate than nonproeedural, declarative or functional. The
"designation" operation that associates a "primitive" antithesis of denotative is " i m p e r a t i v e . " Effectively
abstract object with each of some chosen set of names, "denotative" means "can be mapped into ISW~M without
called the "constants" of the special system. using jumping or assignment," given appropriate primi-
Consider for example a programming language that tives.

162 C o m m u n i c a t i o n s of the ACM Voh, me 9 / Number 3 / March, 1966


I t follows that functional programming has little to do (i.e., nonproeedural/procedural) distinction. On the other
with functional notation. I t is a trivial and pointless task hand if limited forms of i can be algorithmized, they still
to rearrange some piece of symbolism into prefixed opera- deserve the term "descriptive." So this factor is also
tors and heavy bracketing. I t is an intellectually demand- independent.
ing activity to characterize some physical or logical
system as a set of entities and functional relations among 10. Eliminating Explielt Sequenelng
them. However, it m a y be less demanding and more Thm'e is a game sometimes played with ALGOL 60
revealing than characterizing the system by a conventional programs--rewriting them so as to avoid using labels and
program, and it m a y serve the same purpose. Having go to statements. I t is part of a more embracing g a m e - -
formulated the model, a specific desired feature of the reducing the extent to which the program conveys its
system can be systematically expressed in functional information by explicit sequencing. Roughly speaking this
notation. E u t other notations m a y be better h u m a n amounts to using fewer and larger statements. The game's
engineering. So the role of functional notation is a standard significance lies in t h a t it frequently produces a more
by which to describe others, and a standby when they fail. " t r a n s p a r e n t " program--easier to understand, debug,
The phrase "describe in terms of" has been used above modify and incorporate into a larger program.
with reference to algorithmic modes of expression, i.e., The author does not argue the ease against explicit
interchangeably with "express in terms of." I n this sense sequencing here. Instead he takes as point of departure the
" 3 + 4" is a description of the number 7 in terms of the observation that the user of any programming language is
numbers 3 and 4. This conflicts with current use of the frequently presented with a choice between using explicit
phrase "descriptive languages," which appears to follow sequencing or some alternative feature of the language.
the logicians. For example, a language is descriptive in Furthermore languages v a r y greatly in the alternatives
which the machine is told they offer. For example, our game is greatly facilitated b y
ALGOL 60'S conditional statements and conditional ex-
P r i n t t h e x s u c h t h a t x 2 -- x - - 6 = 0 /~ x > 0
pressions. So the question considered here is: W h a t other
Such a classification of languages (as opposed to merely such features are there? This question is considered be-
expressions within languages) is useless, and even harmful cause, not surprisingly, it turns out t h a t an emphasis on
b y encouraging stupidly restrictive language design, if it describing things in terms of other things leads to the
excludes the following: same kind of requirements as an emphasis against explicit
P r i n t square ( t h e x s u c h t h a t x ~ -- x -- 6 = 0 A x > 0) sequencing.
Print u(u+l) Though A~GO~ g0 is comparatively favorable to this
whereu = thexsuchth~tx 2- x-- 6 = 0Ax _> 0. activity, it shares with most other current languages
P r i n t f ( 1 , --1, 6) certain deficiencies that severely limit how far the game
where f ( a , b, c) = the x such that ax ~ ÷ bx + c = 0 A x >_0 can go. The author's experiments suggest that two of the
On the other hand it might reasonably exclude most needed features are:
• Treat a listing of expressions as a special ease of the
P r i n t solepositivezeroof (1, --1, - - 6 )
class of expressions, especially in the arms of a conditional
where s o l e p o s i t i v e z e r o o ] happens to be a library function. expression, and in defining a function.
The author therefore suggests that there is a useful • Treat, argument lists as a special ease of lists. So a
distinction that can be made here concerning l a n g u a g e s . triadic function can have its arguments supplied by a
Consider the function i, which operates on a class (or conditional whose arms are 3-listings, or by application of
property) having a sole member (or instance), and trans- a function that produces a 3-list. A similar situation arises
forms it into its sole member. We are interested in whether when a 3-listing occurs as a definee. (Even LIsP trips up
or not a language permits reference to i, with more or here, over lists of length one.)
less restricted domain. To clarify their practical use, here are some of the
For example the above programs become: steps by which m a n y a conventional ALGOL e0 or P L / 1
P r i n t i(p w h e r e p(x)=x2--x--6 A x > O) program can be transformed into an IswI~,r program t h a t
P r i n t square (i(p w h e r e p(x)=x2--x--6 A x > 0)) exploits IswIsl's nonimperative features.
P r i n t u (u--}-1) (1) Rewrite the program so as to use two-dimensional
w h e r e u = i (p w h e r e p(x)=x~--x--6 A x > O) layout and arrows to illuminate the explicit sequencing,
P r i n t f ( 1 , --1, - - 6 )
i.e., as a flowchart with algebraic steps. Rearrange this to
w h e r e f (a, b, c) = i(p w h e r e p(x)=ax2-bbx+c A x > O)
achieve the least confusing network of arrows.
More precisely, the distinction hinges on whether, when (2) Apply the following changes repeatedly wherever
"applicative structure" is imputed to the language, it can they are applicable:
be done without resorting to i, or to primitives in terms of (a) Replace a string of independent assignments by one
which i can be defined. multiple assignment.
This discussion of i reveals the possibility that primitives (b) Replace an assignment having purely local signifi-
might be sensationally nonalgorithmie. So the algorithmic/ cance b y a where-clause.
heuristic distinction cuts across the denotative/imperative (e) Replace procedures by type-procedures (possibly

Volume 9 / Number 3 / March, 1966 Communications of tbe ACM 163


with multiple type), and procedure statements by assign- ~EFERENCES
ment statements. 1. LANDIN, P. J. The mechanical evaluation of expressions.
(d) Replace conditional jumps by conditional state- Comput. J. 6, 4 (Jan. 1964), 308-320.
ments having bigger arms. 2.--. A correspondence between ALGOL 60 arid Church's
(e) Replace a branch whose arms have assignees in L a m b d a - n o t a t i o n . Comm. ACM 8 (1965), 89-101; 158-165.
3. - - . A formal description of ALGOL 60. In Formal Language
common by an assignment with conditional right-hand
Description Languages for Computer Programming, T. B.
side. Steel, Jr. (Ed.), N o r t h Holland, Amsterdam, 1965.
(f) Replace a join by two calls for a procedure. 4. - - . An abstract machine for designers of computing lan-
I t should be observed that translating into ISWlM does guages. (Summary). IFIP65 Proc., P a r t II.
not force such rearrangements; it merely facilitates them.
One interesting observation is that the most recalcitrant DISCUSSION
uses of explicit sequencing appear to be associated with Naur: Ilegarding i n d e n t a t i o n , in many ways I am in s y m p a t h y
success/failure situations and the action needed on failure. with this, but I believe t h a t if it came about t h a t this notation
Section 2 discussed adding ' w h e r e ' to a conventional were used for very wide communication and also publication, you
would regret it because of the kind of rearrangement of manu-
programming language. Theory and experiment both
scripts done in printing, for example. You very frequently run
support the opposite approach, that taken in Llsv, of into the problem t h a t you have a wide w r i t t e n line and then
adding imperative features to a basically nonimperative suddenly you go to the Communications of the ACM and radically,
language. One big advantage is that the resulting language perhaps, you have to compress it. The printer will do this in any
will have a nonimperative subset. way he likes; he is used to having great freedom here and he will
foui up your notation.
The special claim of ISWlM is that it grafts procedural
Landin: I have great experience with this. (Laughter) I t h i n k
notions onto a purely functional base without disturbing I am p r o b a b l y the only person who has run through three versions
many of the desirable properties. The underlying ideas of the galley proofs for the Communications of the ACM. However,
have been presented in [2]. This paper can do no more than I think t h a t next time I could do better, and I t h i n k it is worth
begin the task of explaining their practical significance. looking into. At any rate, the principle t h a t [ have described here
is a good deal better than some t h a t one might t h i n k of ; for example
it does riot depend on details of character width, character by
11. C o n c l u s i o n
c h a r a c t e r - - i t is just as good h a n d w r i t t e n as it is printed. Secondly,
The languages people use to communicate with com- limiting the b r e a d t h of the page, I agree with you, needs more
consideration. By the time I got through with the particular
puters differ in their intended aptitudes, towards either a
example I am talking about, by getting it printed, I had devised
particular application area, or a particular phase of com- what I t h o u g h t was a fairly reasonable method of communicating
puter use (high level programming, program assembly, the principles t h a t have been used in indentation.
job scheduling, etc). They also differ in physical appear- Floyd: A n o t h e r objection t h a t 7[ t h i n k is quite serious to
ance, and more important, in logical structure. The ques- i n d e n t a t i o n is t h a t while it works on the m i c r o - s c a l e - - t h a t is, one
page is all r i g h t - - w h e n dealing with an extensive program, turning
tion arises, do the idiosyncracies reflect basic logical
from one page to the next there is no obvious way of indicating
properties of the situations that are being catered for? how far i n d e n t a t i o n stretches because there is no p r i n t i n g at all to
Or are they accidents of history and personal background indicate how far you have indented. I would like you to keep t h a t
that may be obscuring fruitful developments? This in mind.
question is clearly important if we are trying to predict or Landin: Yes, I agree. In practice I deal with this by first making
the page breaks in sensible places.
imq_uence language evolution.
Floyd: T h a t ' s all right as long as you d o n ' t have an indented
To answer it we must think in terms, not of languages, region which is simply several pages long.
but of families of languages. T h a t is to say we must Landin: Well in t h a t ease the way I did it was to cut down the
systematize their design so that a new language is a point number of carryover levels to about four or five from one page to
chosen from a well-mapped space, rather than a laboriously another. You can at least make it simpler when you are hand-
writing by p u t t i n g some kind of symbols at the b o t t o m of the page
devised construction.
and top of the continuation.
To this end the above paper has marshalled three Floyd: Even if you regard your indentation spaces as characters
techniques of language design: abstract syntax, axiomatiza- there still d o e s n ' t seem to be any w a y - - i n fact, I am fairly sure
tion, and an underlying abstract machine. t h e r e is no w a y - - o f representing the i n d e n t a t i o n conventions
It is assumed that future calls on language development within a phrase-structure grammar.
Landin: Yes, but some indentation conventions can be kept
cannot be forstalled without gener~lizing the alternatives
within phrase structure grammars by introducing two terminal
to explicit sequencing. The innovations of "program- symbols t h a t are grammatically like parentheses, but are textually
points" and the "off-side rule" are directed at two of the like typewriter keys for settling and clearing tabulation positions.
problems (respectively in the areas of semantics and More precisely, the textual representation of the second of these
syntax) that must consequently be faced. symbols can be explained as the following sequence of t y p e w r i t e r
actions: 1) line-feed; 2) back-space as far as the right-most tab
Acknowledgments. The author is grateful for helpful position t h a t is still currently active; 3) clear tab position; and
discussions with W. H. Burge. Wider influences on the 4) do step 2 again.
While this fits some indentation conventions, the olle I propose
investigation of which this paper is one outcome are
is too permissive to be included. For my language I have written
mentioned in [1]. Of these the main ones are the publica- a formal grammar t h a t is not phrase structure and includes one
tions of Curry and of McCarthy. departure t h a t meets this problem.

164 C o m m u n i c a t i o n s o f t h e ACM V o l u m e 9 / N u m b e r 3 / M a r c h , 1966


Leavenworth: I should like to raise the question of eliminating assignment s t a t e m e n t s and it is remarkable what you can do with
explicit jumps, I mean of using recursion as against interation. pure Lisp if you try. If you t h i n k of it in terms of the implementa-
Landin: I t seems to me t h a t there are rather a small number of tions t h a t we know about, the result is generally intolerably
functions which you could use if you were writing a Lisp program inefficient--but then t h a t is where we come to the later questions.
in the places where ordinary programs would use iterations, and How do we implement them? There have not been many at-
t h a t if you were to use these the processor might do as well as if t e m p t s to implement DLs efficiently, I think. Obviously, it can be
you had written a loop. For example, i t e r a t e (m, f, x) might done fairly straightforwardly by an interpretive method, but this
apply f, m times to x with the result f'~(x). This is the simplest is very slow. Methods which compile a runable program run into a
kind of loop I know and the function i t e r a t e provides a purely lot of very interesting problems. It can be done, because DLs are
functional notation for this rather simple kind of loop. If a lot of a subset of ordinary programming languages; any programming
familiar types of loop can be represented by a few functions which language which has sufficient capabilities can cope with them.
could be defined recursively, I think it is sensible to take these as There arc problems, however: we need entities whose value is a
primitive. Another such function is w h i l e (p, f, x) which goes on f u n c t i o n - - n o t the application of a function but a f u n c t i o n - - a n d
applying f to x until the predicate p becomes false. these involve some problems.
Strachey: I must just interpolate here something which is a bit Itow to implement efficiently is another very interesting and
of advertising l suppose. Nearly all the linguistic features, such as difficult problem. It means, I think, recognizing certain subsets
w h e r e and w h i l e and a n d and r e c u r s i v e , t h a t P e t e r Landin has and transforming t h e m from, say, recursions into loops. This can
been talking about are incorporated as an integral p a r t of a pro- certainly be done even if they have been written iu terms of
gramming language being developed at Cambridge and London recursions and not, as P e t e r Landin suggested, in terms of already
called CPL. In fact the w h e r e clauses are a very i m p o r t a n t feature transformed functions like i t e r a t e or w h i l e .
of this" language. I think the last question, "Should DLs be nIixed with impera-
Irons: I have put together a program which uses some of these tive languages?", clearly has the answer t h a t they should, be-
features and which has a s t a n d a r d output which prints the pro- cause at the moment we d o n ' t know how to do everything in pure
gram in an indented manner. If it runs off the right end of the page, DLs. If you mix declarative and imperative features like this, you
it t:rnduces another page to go on the right, and so forth. While may get an apparently large programming language, but the
certainly there are some situations t h a t occur when it would be a important thing is t h a t it should be simple and easy to define a
bit awkward to make the paper go around the room, I have found function. Any language which by mere chance of the way it is writ-
t h a t in practice, by and large it is true t h a t this is a very profit- ten makes it extremely difficult to write compositions of functions
able a ay of operating. and very easy to write sequences of commands will, of course, in an
Strachey: I should like to intervene now and try to initiate a obvious psychological way, hinder people from using descriptive
slightly more general discussion on declarative or descriptive rather t h a n imperative features. In the long run, I think the effect
languages and to try to clear up some points about which there is will delay our understanding of basic similarities, which underlie
considerable confusion. I have called the objects I am trying to different sorts of programs and different ways of solving problems.
discuss DLs because I d o n ' t quite know what they are. Here are Smith: As I understand the declarative languages, there has to
some questions concerning ])Ls: (1) What are DLs? (2) What is be a mixture of imperative and descriptive s t a t e m e n t s or no com-
their relationship to imperative languages? (3) Why do we need p u t a t i o n will take place. If I give you a set of simultaneous equa-
DLs? (4) How can we use t h e m to program? (5) How can we tions, you may say "yes?", meaning well, what am I supposed to
implement them? (6) How can we do this efficiently? (7) Should we do about it, or you may say " y e s " , meaning yes I understand, b u t
mix l)Ls with imperative languages? you d o n ' t do anything until I say "now find the values of the vari-
It seems to me t h a t what I mean by DLs is not exactly what ables." In fact, in a well-developed language there is not just one
other people mean. I inean, roughly, languages which do not question t h a t I can ask but a number of questions. So, in effect, the
contain assignment s t a t e m e n t s or jumps. This is, as a m a t t e r of declarative s t a t e m e n t s are like data which you set aside to be u~ed
fact, not a very clear distinction because you can always disguise later after I give you the imperatives, of which I had a choice,
the assignments and the jumps, for t h a t matter, inside other state- which get the action.
m e a t forms which m~ke t h e m look different. The i m p o r t a n t Strachey: This is a major point of confusion. There are two ideas
characteristic of DLs is t h a t it is possible to produce equivalence here and I t h i n k we should try to sort t h e m out. If you give a
relations, particularly the rule for substitution which P e t e r quadratic equation to a machine and then say " p r i n t the value of
Landin describes as (~) in his paper. T h a t equivalance relation, x", this is not the sort of language t h a t I call a DL. I regard it as
which appears to be essential in ahnost every proof, does not an implicit l a n g u a g e - - t h a t is, one where you give the machine the
hold if you allow assignment statements. The great advantage data and then hope t h a t it will be smart enough to solve the prob-
then of l)Ls is t h a t they give you some hope of proving the equi- lem for you. I t is very different from a language such as LisP,
valence of program transformations and to begin to have a calculus where you define a function explicitly and have only one impera-
for combining and manipulating them, which at the moment we tive. which says "evaluate this expression and print the r e s u l t . "
h a v e n ' t got. Abrahams: I ' v e clone a fair amount of programming in LisP,
I suggest t h a t an answer to the second question is t h a t DLs form and there is one situation which I feel is symptomatic of the times
a subset of all languages. They are an interesting subset, but one when you really do want an imperative language. I t is a situation
which is inconvenient to use unless you are used to it. We need t h a t arises if you are planning to do programming in pure Lisp and
t h e m because at the moment we d o n ' t know how to construct you find t h a t your functions accumulate a large number of argu-
proofs with languages which include imperatives and jumps. ments. This often happens when you have a nmnber of variables
How should we use them to program? I think this is a m a t t e r of and you are actually going through a process and at each stage of
learning a new programnling technique. I am not convinced t h a t the process you want to change the state of the world a little b i t - -
all problems are amenable to programming in DLs but I am not say, to change one of these variables. So you have the choice of
convinced t h a t there are any which are not either; I preserve an either trying to communicate t h e m all, or trying to do some sort
open mind on this point. It is perfectly true t h a t in the process of of essentially imperative action t h a t changes one of them. If you
rewriting programs to avoid labels and jumps, you've gone half t r y to list all of the transitions from state to state and incorporate
the way towards going into 1)Ls. When you have also avoided t h e m into one function, you'll find t h a t this is not really a very
assignment statements, you've gone the rest of the way. With natural kind of function because the natures of the transitions
many problems yeu can, in fact, go the whole way. LisP has no are too different.

V o l u m e 9 / N u m b e r 3 / M a r c h , 1966 C o m m u n i c a t i o n s o f t h e ACM 165


Landin: I said in iny talk t h a t LisP h a d not gone quite all the the chances of being ttble to get all the way descriptive is a b o u t
way and I t h i n k t h a t t h i s difficulty is connected with going all t h e zero, b u t there is a settle a n d we should recognize t h i s scale.
way. If we write a f u n c t i o n definition where the r i g h t - h a n d side is Smilh: I t h i n k t h a t there is a confusion between implicit or
a listing of expressions such as explicit on the one h a n d a n d i m p e r a t i v e or declarative on the
other. These are two s e p a r a t e d i s t i n c t i o n s a n d can occur in all
F ( x ) = E1 , E 2 , E~
c o m b i n a t i o n s . For illstance, an analog c o m p u t e r h a n d l e s ilnplicit
thel~ we can say t h a t this f u n c t i o n will produce a three-list as its declaratives.
result. If llOW we h a v e ~mother f u n c t i o n G(x, y, z) = E, on some Young: I t h i n k it is fairly obvious t h a t y o u ' v e got to h a v e t h e
occasion we m i g h t h a v e an expression s u c h as G(a 2, b 2, c ~) a n d we ability for s e q u e n c i n g i m p e r a t i v e s in a n y sort of practical lan-
often feel t h a t we should be able to write G(F(t)), a n d a n o t h e r guage. T h e r e are m a n y , m a n y cases in which only a certain se-
example which s h o u l d be allowed is quence of operations will produce the logically correct results.
So t h a t we c a n n o t have a purely declarative language, ~ e m u s t
G(a > b --~ E1 , E2 , E3 e l s e E4 , E5 , E6). h a v e a general p u r p o s e one. A possible definition of a declarative
l a n g u a g e is one in which I can m a k e the s t a t e m e n t s (a), (b), (c)
l a m n o t quite sure b u t I t h i n k you can get a r o u n d y o u r problem a n d (d) a n d indicate w h e t h e r I m e a n these to be t a k e n as a se-
by t r e a t i n g every f u n c t i o n as if it were in fact monadic a n d h a d quence or as a set; t h a t is, m u s t t h e y be p e r f o r m e d in a p a r t i c u l a r
a single a r g u m e n t which was t h e list s t r u c t u r e y o u are t r y i n g to order or do I merely m e a n t h a t so long as t h e y are all performed,
process. t h e y m a y be p e r f o r m e d in a n y sequence at a n y t i m e a n d w h e n e v e r
Abrahams: T h i s is a difficulty in o t h e r p r o g r a m m i n g l a n g u a g e s c o n v e n i e n t for efficiency.
too; you c a n n o t define a f u n c t i o n of an indefinite n u m b e r of argu- Strachey: Y o u can, in fact, impose an ordering on a l a n g u a g e
ments. which d o e s n ' t h a v e t h e s e q u e n c i n g of c o m m a n d s by n e s t i n g t h e
Naur: I still d o n ' t u n d e r s t a n d t h i s d i s t i n c t i o n a b o u t an im- f u n c t i o n a l applications.
plicit language. Does it m e a n t h a t w h e n e v e r y o u h a v e s u c h a Landin: T h e p o i n t is t h a t w h e n y o u c o m p o u n d f u n c t i o n a l ex-
l a n g u a g e there is a built-in f e a t u r e for solving e q u a t i o n s ? pressions y o u are i m p o s i n g a p a r t i a l ordering, a n d w h e n y o u de-
Abrahams: I t h i n k t h e p o i n t is w h e t h e r y o u are concerned w i t h compose t h i s into c o m m a n d s y o u are u n n e c e s s a r i l y giving a lot of
the problem or are concerned w i t h t h e m e t h o d of solution of t h e inforination a b o u t sequencing.
problem. Strachey: One i n c o n v e n i e n t t h i n g a b o u t a p u r e l y i m p e r a t i v e
Ingerman: I s u g g e s t t h a t in the s i t u a t i o n where y o u h a v e speci- l a n g u a g e is t h a t y o u h a v e to specify far too m u c h sequencing. F o r
fied e v e r y t h i n g t h a t y o u w a n t to know, t h o u g h t h e exact sequence example, if y o u wish to do a m a t r i x m u l t i p l i c a t i o n , y o u h a v e to do
in which y o u evoke t h e v a r i o u s o p e r a t i o n s to cause t h e solution is n a m u l t i p l i c a t i o n s . If you write an o r d i n a r y p r o g r a m to do this,
left unspecified, t h e n y o u h a v e s o m e t h i n g which is effectively a y o u h a v e to specify t h e exact sequence which t h e y are all to be
descriptive l a n g u a g e ; if y o u h a v e exactly t h e s a m e pieces of in- done. A c t u a l l y , it d o e s n ' t m a t t e r in w h a t order y o u do the m u l t i -
f o r m a t i o n , s u r r o u n d e d w i t h p r o m i s e s t h a t y o u will do t h i s a n d plications so long as y o u a d d t h e m t o g c t h e r in t h e right groups.
t h e n this, t h e n y o u h a v e an i m p e r a t i v e l a n g u a g e . T h e significant T h u s t h e o r d i n a r y sort of i m p e r a t i v e l a n g u a g e imposes m u c h too
p o i n t is t h a t it is n o t all or n o t h i n g b u t there is a scale a n d while m u c h sequencing, which m a k e s it v e r y difficult to r e a r r a n g e if y o u
it is p r o b a b l y p r e t t y simple to go all t h e way with i m p e r a t i v e s , w a n t to m a k e t h i n g s more efficient.

Syntax-Directed Interpretation of Classes of Pictures


R. N a r a s i m h a n
Tata Institute of Fundamental Research, Bombay, India

A descriptive scheme for classes of pictures based on label- 1. I n t r o d u c t i o n


ing techniques using parallel processing algorithms was pro- Recent active interest in the area of graphic data-based
posed by the author some years ago. Since then much work "conversation programs ''1 has pointed up the urgent need
has been done in applying this to bubble chamber pictures. for sophisticated picture processing models in a convincing
The parallel processing simulator, originally written for an manner. Kitsch [2] has very ably argued that "from the
IBM 7094 system, has now been rewritten for a CDC 3600 point of view of computer information processing, the
system. This paper describes briefly the structure of syntactic important fact about natural language text and pictures
descriptive models by considering their specific application to is that both have a syntactic structure which is capable of
bubble chamber pictures. How the description generated in being described to a machine and of being used for purposes
this phase can be embedded in a larger "conversation" pro- of interpreting the information within a data processing
gram is explained by means of a certain specific example that system." " T h e problem of how to describe the syntactic
has been worked out. A partial generative grammar for structure of text and pictures and how to use the syntactic
"handwritten" English letters is given, as are also a few com- description in interpreting the text and pictures" has been
puter-generated outputs using this grammar and the parallel tackled in a certain specific way by Kirsch and his co-
processing simulator mentioned earlier. workers. (For other references, see [9].)
1 See [9] for a good s u r v e y of work a c c o m p l i s h e d a n d in progress
P r e s e n t e d at an A C M P r o g r a m m i n g L a n g u a g e s a n d P r a g m a t i c s in this area, as well as in t h e general field of " E n g l i s h q u e s t i o n -
Conference, San D i m a s , California, A u g u s t , 1965. answer" programs.

166 Communications o f t h e ACM V o l u m e 9 / N u m b e r 3 / M a r c h , 1966


grams, three basic constructs have received widespread
recognition and use: A repetitive construct (e.g. the while
loop), an alternative construct (e.g. the conditional
if..then..else), and normal sequential program composi-
Programming S. L. Graham, R. L. Rivest tion (often denoted by a semicolon). Less agreement has
Teclmiques Editors been reached about the design of other important pro-

Communicating gram structures, and many suggestions have been made:


Subroutines (Fortran), procedures (Algol 60 [15]), entries

Sequential Processes (PL/I), coroutines (UNIX [171), classes (SIMULA 67 [5]),


processes and monitors (Concurrent Pascal [2]), clusters
(CLU [13]), forms (ALPHARD [19]), actors (Hewitt [1]).
C.A.R. Hoare The traditional stored program digital computer has
The Queen's University been designed primarily for deterministic execution of a
Belfast, Northern Ireland single sequential program. Where the desire for greater
speed has led to the introduction of parallelism, every
attempt has been made to disguise this fact from the
This paper suggests that input and output are basic programmer, either by hardware itself (as in the multiple
primitives of programming and that parallel function units of the CDC 6600) or by the software (as
composition of communicating sequential processes is a in an I/O control package, or a multiprogrammed op-
fundamental program structuring method. When erating system). However, developments of processor
combined with a development of Dijkstra's guarded technology suggest that a multiprocessor machine, con-
command, these concepts are surprisingly versatile. structed from a number of similar self-contained proc-
Their use is illustrated by sample solutions of a variety essors (each with its own store), may become more
of familiar programming exercises. powerful, capacious, reliable, and economical than a
Key Words and Phrases: programming, machine which is disguised as a monoprocessor.
programming languages, programming primitives, In order to use such a machine effectively on a single
program structures, parallel programming, concurrency, task, the component processors must be able to com-
input, output, guarded commands, nondeterminacy, municate and to synchronize with each other. Many
coroutines, procedures, multiple entries, multiple exits, methods of achieving this have been proposed. A widely
classes, data representations, recursion, conditional adopted method of communication is by inspection and
critical regions, monitors, iterative arrays updating of a common store (as in Algol 68 [18], PL/I,
CR Categories: 4.20, 4.22, 4.32 and many machine codes). However, this can create
severe problems in the construction of correct programs
and it may lead to expense (e.g. crossbar switches) and
1. Introduction unreliability (e.g. glitches) in some technologies of hard-
ware implementation. A greater variety of methods has
Among the primitive concepts of computer program- been proposed for synchronization: semaphores [6],
ming, and of the high level languages in which programs events (PL/I), conditional critical regions [10], monitors
are expressed, the action of assignment is familiar and and queues (Concurrent Pascal [2]), and path expressions
well understood. In fact, any change of the internal state [3]. Most of these are demonstrably adequate for their
of a machine executing a program can be modeled as an purpose, but there is no widely recognized criterion for
assignment of a new value to some variable part of that choosing between them.
machine. However, the operations of input and output, This paper makes an ambitious attempt to find a
which affect the external environment of a machine, are single simple solution to all these problems. The essential
not nearly so well understood. They are often added to proposals are:
a programming language only as an afterthought. (1) Dijkstra's guarded commands [8] are adopted (with
Among the structuring methods for computer pro- a slight change of notation) as sequential control struc-
General permission to make fair use in teaching or research of all tures, and as the sole means of introducing and control-
or part of this material is granted to individual readers and to nonprofit ling nondeterminism.
libraries acting for them provided that ACM's copyright notice is given (2) A parallel command, based on Dijkstra's parbegin
and that reference is made to the publication, to its date of issue, and
to the fact that reprinting privileges were granted by permission of the [6], specifies concurrent execution of its constituent se-
Association for Computing Machinery. To otherwise reprint a figure, quential commands (processes). All the processes start
table, other substantial excerpt, or the entire work requires specific simultaneously, and the parallel command ends only
permission as does republication, or systematic or multiple reproduc-
tion. when they are all finished. They may not communicate
This research was supported by a Senior Fellowship of the Science with each other by updating global variables.
Research Council.
(3) Simple forms of input and output command are
Author's present address: Programming Research Group, 45, Ban-
bury Road, Oxford, England. introduced. They are used for communication between
© 1978 ACM 0001-0782/78/0800-0666 $00.75 concurrent processes.
666 Communications August 1978
of Volume 21
the ACM Number 8
(4) Such communication occurs when one process tations for the most common and useful special cases;
names another as destination for output a n d the second (3) development of automatic optimization techniques;
process names the first as source for input. In this case, and (4) the design of appropriate hardware.
the value to be output is copied from the first process to Thus the concepts and notations introduced in this
the second. There is n o automatic buffeting: In general, paper (although described in the next section in the form
an input or output command is delayed until the other of a programming language fragment) should not be
process is ready with the corresponding output or input. regarded as suitable for use as a programming language,
Such delay is invisible to the delayed process. either for abstract or for concrete programming. They
(5) Input commands may appear in guards. A guarded are at best only a partial solution to the problems tackled.
command with an input guard is selected for execution Further discussion of these and other points will be
only if and when the source named in the input com- found in Section 7.
mand is ready to execute the corresponding output com-
mand. If several input guards of a set of alternatives
have ready destinations, only one is selected and the 2. Concepts and Notations
others have n o effect; but the choice between them is
arbitrary. In an efficient implementation, an output com- The style of the following description is borrowed
mand which has been ready for a long time should be from Algol 60 [15]. Types, declarations, and expressions
favored; but the defmition of a language cannot specify have not been treated; in the examples, a Pascal-like
this since the relative speed of execution of the processes notation [20] has usually been adopted. The curly braces
is undefmed. { } have been introduced into BNF to denote none or
(6) A repetitive command may have input guards. If all more repetitions of the enclosed material. (Sentences in
the sources named by them have terminated, then the parentheses refer to an implementation: they are not
repetitive command also terminates. strictly part of a language defmition.)
(7) A simple pattern-matching feature, similar to that of < c o m m a n d > :.---<simple command>l<structured command>
[16], is used to discriminate the structure of an input <simple command> :.--- <null command>l<assignment command>
message, and to access its components in a secure fash- I<input command>l<output command>
ion. This feature is used to inhibit input of messages that <structured command> :.--- <alternative command>
I<repetitive command>l<parallel command>
do not match the specified pattern.
<null command> :.--- skip
The programs expressed in the proposed language <command list> :.--- {<declaration>; I<command>;} < c o m m a n d >
are intended to be implementable.both by a conventional
machine with a single main store, and by a fixed network A command specifies the behavior of a device exe-
of processors connected by input/output channels (al- cuting the command. It may succeed or fail. Execution
though very different optimizations are appropriate in of a simple command, if successful, may have an effect
the different cases). It is consequently a rather static on the internal state of the executing device (in the case
language: The text of a program determines a fixed of assignment), or on its external environment (in the
upper bound on the number of processes operating case of output), or on both (in the case of input). Exe-
concurrently; there is no recursion and no facility for cution of a structured command involves execution of
process-valued variables. In other respects also, the lan- some or all of its constituent commands, and if any of
guage has been stripped to the barest minimum necessary these fail, so does the structured command. (In this case,
for explanation of its more novel features. whenever possible, an implementation should provide
The concept of a communicating sequential process some kind of comprehensible error diagnostic message.)
is shown in Sections 3-5 to provide a method of express- A null command has no effect and never fails.
ing solutions to many simple programming exercises A command list specifies sequential execution of its
which have previously been employed to illustrate the constituent commands in the order written. Each decla-
use of various proposed programming language features. ration introduces a fresh variable with a scope which
This suggests that the process may constitute a synthesis extends from its declaration to the end of the command
of a number of familiar and new programming ideas. list.
The reader is invited to skip the examples which do not
interest him. 2.1 Parallel Commands
However, this paper also ignores many serious prob-
<parallel command> :.--- [<process> {I I<process>} ]
lems. The most serious is that it fails to suggest any proof <process> :.---<process label> <command list>
method to assist in the development and verification of <process label> :.---<empty>l<identifier> ::
correct programs. Secondly, it pays no attention to the I<identifier>(<label subscript>{,<label subscript>}) ::
problems of efficient implementation, which may be <label subscript> :.---<integer constant>l<range>
<integer constant> :.---<numeral>l<bound variable>
particularly serious on a traditional sequential computer.
< b o u n d variable> :.---<identifier>
It is probable that a solution to these problems will < r a n g e > :.~ <bound variable>:<lower bound>..<upper bound>
require (1) imposition of restrictions in the use of the <lower bound> :.~ <integer constant>
proposed features; (2) reintroduction of distinctive no- <upper bound> :.~ <integer constant>

667 Communications August 1978


of Volume 21
the ACM Number 8
Each process of a parallel command must be disjoint An expression denotes a value which is computed by
from every other process of the command, in the sense an executing device by application of its constituent
that it does not mention any variable which occurs as a operators to the specified operands. The value of an
target variable (see Sections 2.2 and 2.3) in any other expression is undefined if any of these operations are
process. undefined. The value denoted by a simple expression
A process label without subscripts, or one whose label may be simple or structured. The value denoted by a
subscripts are all integer constants, serves as a name for structured expression is structured; its constructor is that
the command list to which it is prefixed; its scope extends of the expression, and its components are the list of
over the whole of the parallel command. A process values denoted by the constituent expressions of the
whose label subscripts include one or more ranges stands expression list.
for a series of processes, each with the same label and An assignment command specifies evaluation of its
command list, except that each has a different combi- expression, and assignment of the denoted value to the
nation of values substituted for the bound variables. target variable. A simple target variable may have as-
These values range between the lower bound and the signed to it a simple or a structured value. A structured
upper bound inclusive. For example, X(i:l..n) :: CL target variable may have assigned to it a structured value,
stands for with the same constructor. The effect of such assignment
X(l) :: CEll[X(2):: CL211...[IX(n) :: CL~
is to assign to each constituent simpler variable of the
structured target the value of the corresponding compo-
where each CLy is formed from CL by replacing every nent of the structured value. Consequently, the value
occurrence of the bound variable i by the numeral j. denoted by the target variable, if evaluated after a suc-
After all such expansions, each process label in a parallel cessful assignment, is the same as the value denoted by
command must occur only once and the processes must the expression, as evaluated before the assignment.
be well formed and disjoint. An assignment fails if the value of its expression is
A parallel command specifies concurrent execution undefined, or if that value does not match the target
of its constituent processes. They all start simultaneously variable, in the following sense: A simple target variable
and the parallel command terminates successfully only matches any value of its type. A structured target variable
if and when they have all successfully terminated. The matches a structured value, provided that: (1) they have
relative speed with which they are executed is arbitrary. the same constructor, (2) the target variable list is the
Examples: same length as the list of components of the value, (3)
(1) [cardreader?cardimage[ [lineprinter!lineimage] each target variable of the list matches the corresponding
component of the value list. A structured value with no
Performs the two constituent commands in parallel,
components is known as a "signal."
and terminates only when both operations are complete.
The time taken may be as low as the longer of the times Examples:
taken by each constituent process, i.e. the sum of its
(1) x . - - - x + 1 the value of x after the assignment
computing, waiting, and transfer times. is the same as the value o f x + 1
(2) [west :: DISASSEMBLEIlX :: SQUASH I [east :: ASSEMBLE] before.
(2) (x, y) -- (y, x) exchanges the values o f x and y.
The three processes have the names "west," "X," and (3) x .--- cons(left, right) constructs a structured value and
"east." The capitalized words stand for command lists assigns it to x.
which will be defined in later examples. (4) cons(left, right) .--- x fails if x does not have the form
cons(y, z); but if it does, then y is
(3) [room :: ROOM I Ifork(i:0..4) :: F O R K I Iphil(i:0..4) :: PHIL] assigned to left, and z is assigned
There are eleven processes. The behavior of "room" to right.
(5) insert(n) ~ insert(2,x + 1) equivalent to n .--- 2*x + l.
is specified by the command list ROOM. The behavior of assigns to c a "signal" with con-
(6) c .--- PO
the five processes fork(0), fork(l), fork(2), fork(3), structor P, and no components.
fork(4), is specified by the command list FORK, within ( 7 ) . P 0 .--- c fails if the value of c is not P 0 ;
which the bound variable i indicates the identity of the otherwise has no effect.
particular fork. Similar remarks apply to the five proc- (8) insert(n) .---has(n) fails, due to mismatch.
esses PHIL. Note: Successful execution of both (3) and (4) ensures
the truth of the postcondition x = cons(left, right); but
2.2 Assignment Commands (3) does so by changing x and (4) does so by changing
<assignment c o m m a n d > :.--- <target variable> := <expression> left and right. Example (4) will fail if there is no value of
<expression> :.-= <simple expression>l<structured expression> left and right which satisfies the postcondition.
<structured expression> : ~ <constructor>(<expression list>)
<constructor> :.--- <identifier>l<empty>
<expression list> :-- <empty>l<expression>{,<expression>} 2.3 Inimt and Output Commands
<target variable> :.--- <simple variable>l<structured target>
<structured target> :.--- <constructor>(<target variable list>) <input c o m m a n d > :.~ <source>?<target variable>
<target variable list> : ~ <empty>[<target variable> <output c o m m a n d > :.---<destination>!<expression>
{,<target variable>} <source> :.~ <process name>

668 Communications August 1978


of Volume 2 l
the ACM Number 8
<destination> :.--- <process name> 2.4 Alternative and Repetitive Commands
<process name> :.= <identifier>[<identifier>(<subscripts>)
<subscripts> ::= <integer expression>{,<integer expression>} <repetitive command> :.---,<alternative command>
<alternative command> :-- [<guarded command>
Input and output commands specify communication (13<gnarded command>} ]
between two concurrently operating sequential processes. <guarded command> : ~ <guard> ----, <command list>
I(<range>{,<range>})<guard> --, <command list>
Such a process may be implemented in hardware as a <guard> :.--- <guard list>l<guard list>;<input command>
special-purpose device (e.g. cardreader or lineprinter), or I<input command>
its behavior may be specified by one of the constituent <guard list> : ~ <guard element>(;<gnard element>}
processes of a parallel command. Communication occurs <guard element> : ~ <boolean expression>l<declaration>
between two processes of a parallel command whenever
(1) an input command in one process specifies as its
A guarded command with one or more ranges stands
source the process name of the other process; (2) an
for a series of guarded commands, each with the same
output command in the other process specifies as its
guard and command list, except that each has a different
destination the process name of the first process; and (3)
combination of values substituted for the bound varia-
the target variable of the input command matches the
bles. The values range between the lower bound and
value denoted by the expression of the output command.
upper bound inclusive. For example, (i:l..n)G --~ CL
On these conditions, the input and output commands are
stands for
said to correspond. Commands which correspond are
executed simultaneously, and their combined effect is to G1 ~ CLI[IG2 --> CL2n...[IGn ~ CLn
assign the value of the expression of the output command
where each Gj --> CLj is formed from G --, CL by
to the target variable of the input command.
replacing every occurrence of the bound variable i by the
An input command fails if its source is terminated.
numeralj.
An output command fails if its destination is terminated A guarded command is executed only if and when
or if its expression is undefined. the execution of its guard does not fail. First its guard is
(The requirement of synchronization of input and executed and then its command list. A guard is executed
output commands means that an implementation will
by execution of its constituent elements from left to right.
have to delay whichever of the two commands happens
A Boolean expression is evaluated: If it denotes false, the
to be ready first. The delay is ended when the corre-
guard fails; but an expression that denotes true has no
sponding command in the other process is also ready, or
effect. A declaration introduces a fresh variable with a
when the other process terminates. In the latter case the
scope that extends from the declaration to the end of the
first command fails. It is also possible that the delay will
guarded command. An input command at the end of a
never be ended, for example, if a group of processes are
guard is executed only if and when a corresponding
attempting communication but none of their input and
output command is executed. (An implementation may
output commands correspond with each other. This form
test whether a guard fails simply by trying to execute it,
of failure is known as a deadlock.)
and discontinuing execution if and when it fails. This is
valid because such a discontinued execution has no effect
Examples: on the state of the executing device.)
(1) cardreader?cardimage from cardreader, read a card and An alternative command specifies execution of ex-
assign its value (an array of char- actly one of its constituent guarded commands. Conse-
acters) to the variable cardimage quently, if all guards fail, the alternative command fails.
(2) lineprinter!lineimage to lineprinter, send the value of Otherwise an arbitrary one with successfully executable
lineimage for printing
(3) X?(x, y)
guard is selected and executed. (An implementation
from process named X, input a pair
of values and assign them to x should take advantage of its freedom of selection to
andy ensure efficient execution and good response. For ex-
(4) DIV!(3.a + b, 13) to process DIV, output the two ample, when input commands appear as guards, the
specified values. command which corresponds to the earliest ready and
Note: If a process named DIV issues command (3), and a process matching output command should in general be pre-
named X issues command (4), these are executed simultaneously, ferred; and certainly, no executable and ready output
and have the same effect as the assignment: ( x , y ) ~ (3*a + b, 13) command should be passed over unreasonably often.)
(mx~3*a+b;y~ 13). A repetitive command specifies as many iterations as
(5) console(0?c from the/th element of an array of possible of its constituent alternative command. Conse-
consoles, input a value and assign quently, when all guards fail, the repetitive command
it to c terminates with no effect. Otherwise, the alternative com-
(6) console(./- I)!"A" to the ( j - l)th console, output mand is executed once and then the whole repetitive
character "A"
(7) x(o?v( ) command is executed again. (Consider a repetitive com-
from the/th of an array of processes
X, input a signal V( ); refuse to mand when all its true guard lists end in an input guard.
input any other signal Such a command may have to be delayed until either (1)
(8) sem!P( ) to sem output a signal P( ) an output command corresponding to one of the input
669 Communications August 1978
of Volume 21
the ACM Number 8
guards becomes ready, or (2) all the sources named by 3. Coroutines
the input guards have terminated. In case (2), the repet-
itive command terminates. If neither event ever occurs, In parallel programming coroutines appear as a more
the process fails (in deadlock.) fundamental program structure than subroutines, which
Examples: can be regarded as a special case (treated in the next
(l) [x >_y---~ m .--- xOy_> x--~ m ~ y]
section).

If x _>y, assign x to m; i f y >_ x assign y to m; if both 3.1 COPY


x _> y and y >_ x, either assignment can be executed. Problem: Write a process X to copy characters output by
(2) i .~ 0;*[i < size; content(/) # n ~ i .~ i + 1] process west to process, east.
Solution:
The repetitive command scans the elements con-
X :: ,[c:character; west?c ~ east!c]
tent(i), for i = 0, 1. . . . . until either i >_ size, or a value
equal to n is found. Notes: (1) When west terminates, the input "west?e" will
(3) ,[c:character; west?c ~ east!c] fail, causing termination of the repetitive command, and
of process X. Any subsequent input command from east
This reads all the characters output by west, and will fail. (2) Process X acts as a single-character buffer
outputs them one by one to east. The repetition termi- between west and east. It permits west to work on
nates when the process west terminates. production of the next character, before east is ready to
(4) ,[(i:l..10)continue(t); console(i)?c-:, X!(i, c); console(0!ack(); input the previous one.
continue(i) := (c # sign off)]
3.2 S Q U A S H
This command inputs repeatedly from any of ten
Problem: Adapt the previous program to replace every
consoles, provided that the corresponding element of the
pair of consecutive asterisks "**" by an upward arrow
Boolean array continue is true. The bound variable i
identifies the originating console. Its value, together with "~". Assume that the final character input is not an
the character just input, is output to X, and an acknowl- asterisk.
edgment signal is sent back to the originating console. If Solution:
the character indicated "sign off," continue(i) is set false, X :: ,[c:character; west?c --~
to prevent further input from that console. The repetitive [c # asterisk --~ east!c
0c = asterisk ---~ wesOc;
command terminates when all ten elements of continue
[c # asterisk ~ east!asterisk; east!c
are false. (An implementation should ensure that no Dc = asterisk ~ east!upward arrow
console which is ready to provide input will be ignored 11 ]
unreasonably often.)
Notes: (l) Since west does not end with asterisk, the
(5) ,In:integer; X?insert(n) ---~ I N S E R T second "west?c" will not fail. (2) As an exercise, adapt
On:integer; X?has(n) ~ S E A R C H ; X!(i < size) this process to deal sensibly with input which ends with
]
an odd number of asterisks.
(Here, and elsewhere, capitalized words INSERT and
SEARCH stand as abbreviations for program text defined 3.3 D I S A S S E M B L E
separately.) Problem: to read cards from a cardfile and output to
On each iteration this command accepts from X either process X the stream of characters they contain. An extra
(a) a request to "insert(n)," (followed by INSERT) or (b) space should be inserted at the end of each card.
a question "has(n)," to which it outputs an answer back Solution:
to X. The choice between (a) and (b) is made by the next
• [cardimage:(l..80)character; cardfile?cardimage
output command in X. The repetitive command termi- i:integer; i ~ 1;
nates when X does. If X sends a nonmatching message, • [i _< 80 ~ X!cardimage(i); i .~ i + 1]
deadlock will result. X!space
]
(6) *[X?V 0 ~ val := val + 1
0val > 0; Y?PO --~ val := val - 1 Notes: (1) "(1..80)character" declares an array of 80
1 characters, with subscripts ranging between 1 and 80. (2)
On each iteration, accept either a V 0 signal from X The repetitive command terminates when the cardfile
and increment val, or a PO signal from Y, and decrement process terminates.
val. But the second alternative cannot be selected unless
val is positive (after which val will remain invariantly 3.4 A S S E M B L E
nonnegative). (When val > 0, the choice depends on the Problem: To read a stream o f characters from process X
relative speeds o f X and Y, and is not determined.) The and print them in lines o f 125 characters on a lineprinter.
repetitive command will terminate when both X and Y The last line should be completed with spaces if neces-
are terminated, or when X is terminated and val <_ 0. sary.

670 Communications August 1978


of V o l u m e 21
the A C M Number 8
Solution: ... ; subr?(results). Any commands between these two will
lineiraage:( 1.. 125)character; be executed concurrently with the subroutine.
/:integer; i ~ 1; A multiple-entry subroutine, acting as a representa-
• [c:character; X?c tion for data [ 11], will also contain a repetitive command
l i n e i m a g e ( 0 ~ c; which represents each entry by an alternative input to a
[i~_ 124--~ i := i + I
structured target with the entry name as constructor. For
Ui = 125 ~ lineprinter!lineimage; i ~ 1
l 1; example,
[i = ! ~ skip • [ X ? e n t r y l ( v a l u e p a r a m s ) ~ ...
0i > 1 ~ *[i _< 125 ~ l i n e i m a g e ( 0 ~ space; i ~ i + ll; I]X?entry2(value p a r a m s ) --~ ...
lineprinter!lineimage
1
1
The calling process X will determine which of the alter-
Note: (I) When X terminates, so will the first repetitive natives is activated on each repetition. When X termi-
command of this process. The last line will then be nates, so does this repetitive command. A similar tech-
printed, if it has any characters. nique in the user program can achieve the effect of
multiple exits.
3.5 Reformat A recursive subroutine can be simulated by an array
Problem: Read a sequence o f cards o f 80 characters each, o f processes, one for each level of recursion. The user
and print the characters on a linepfinter at 125 characters process is level zero. Each activation communicates its
per line. Every card should be followed by an extra parameters and results with its predecessor and calls its
space, and the last line should be completed with spaces successor if necessary:
if necessary.
Solution: [recsub(0)::USERllrecsub(i: l..reclimit)::RECSUB].

[west::DISASSEMBLEIIX::COPYIleast=ASSEMBLE] The user will call the first element of


recsub: r e c s u b ( l ) ! ( a r g u m e n t s ) ; ... ; recsub(l)?(results);.
Notes: (1) The capitalized names stand for program text
defmed in previous sections. (2) The parallel command The imposition of a fixed upper bound on recursion
is designed to terminate after the cardfile has terminated. depth is necessitated by the "static" design of the lan-
(3) This elementary problem is difficult to solve elegantly guage.
without coroutines. This clumsy simulation of recursion would be even
more clumsy for a mutually recursive algorithm. It would
3.6 Conway's Problem [4] not be recommended for conventional programming; it
Problem: Adapt the above program to replace every pair may be more suitable for an array of microprocessors
of consecutive asterisks by an upward arrow. for which the fixed upper bound is also realistic.
Solution: In this section, we assume each subroutine is used
only by a single user process (which may, of course, itself
[ w e s t = D I S A S S E M B L E [ IX=SQUASH[ least=ASSEMBLE] contain parallel commands).

4.1 Function: Division With Remainder


4. Subroutines and Data Representations Problem: Construct a process to represent a function-
type subroutine, which accepts a positive dividend and
A conventional nonrecursive subroutine can be read- divisor, and returns their integer quotient and remainder.
ily implemented as a coroutine, provided that (1) its Efficiency is o f no concern.
parameters are called "by value" and "by result," and Solution:
(2) it is disjoint from its calling program. Like a Fortran
subroutine, a coroutine may retain the values of local [DIV ::,[ x,y:integer; X?( x,y) --~
q u o t , r e m : i n t e g e r ; q u o t m 0; rein ~ x;
variables (own variables, in Algol terms) and it may use • (rein _> y ~ rem ~ rem - y; q u o t ~ q u o t + 1];
input commands to achieve the effect o f "multiple entry X!(quot,rem)
points" in a safer way than PL/I. Thus a coroutine can 1
be used like a SIMULA class instance as a concrete rep- [IX=USER
resentation for abstract data. 1
A coroutine acting as a subroutine is a process oper-
ating concurrently with its user process in a parallel 4.2 Recursion: Factorial
command: [subr::SUBROUTINEI[X::uSER]. The SUBROU- Problem: Compute a factorial by the recursive method,
TINE will contain (or consist of) a repetitive command: to a given limit.
*[X?(value params) ~ ... ; X!(result params)], where ... Solution:
computes the results from the values input. The subrou- [fac( i: 1..limit)::
tine will terminate when its user does. The USER will call • [n:mteger;fac(i - t)?n
the subroutine by a pair of commands: subr!(arguments); [n = 0 ~ fac(i -- 1)!1

671 Communications A u g u s t 1978


of V o l u m e 21
the A C M Number 8
fin > 0 - ~ fac(i + l ) ! n - 1; 4.5 Recursive Data Representation: Small Set of
r:knteger;fac(i + l ) ? r ; f a c ( i - i)!(n • r)
Integers
II
Ilfac(O)::USER Problem: Same as above, but a.n array of processes is to
] be used to achieve a high degree of parallelism. Each
process should contain at most one number. When it
Note: This unrealistic example introduces the technique
contains no number, it should answer "false" to all
of the "iterative array" which will be used to a better
inquiries about membership. On the first insertion, it
effect: in later examples.
changes to a second phase of behavior, in which it deals
with instructions from its predecessor, passing some of
4.3 Data Representation: Small Set of Integers [11]
them on to its successor. The calling process will be
Problem: To represent a set of not more than 100 integers
named S(0). For efficiency, the set should be sorted, i.e.
as a process, S, which accepts two kinds of instruction
the ith process should contain t h e / t h largest number.
from its calling process X: (1) S!insert(n), insert the
integer n in the set, and (2) S!has(n); ... ; S?b, b is set true Solution:
if n is in the set, and false otherwise. The initial value of S(i: I.. 100)::
the set is empty. • [n:integer; S ( i - l ) ? h a s ( n ) ~ S(0)!false
fin:integer; S ( i - l ) ? i n s e r t ( n ) --~
Solution: , I r a : i n t e g e r ; S(i - 1)?has(m) ---*
S:: [m _< n ~ S ( 0 ) ! ( m = n)
c o n t e n t : ( 0 . . 9 9 ) i n t e g e r ; size:integer; size .--- 0; fire > n ---* S ( i + l ) ! h a s ( m )
• [ n : i n t e g e r ; X ? h a s ( n ) --* S E A R C H ; X ! ( i < size) 1
fin:integer; X ? i n s e r t ( n ) --* S E A R C H ; fire:integer; S(i - I ) ? i n s e r t ( m ) --->
[i < size --* s k i p [ m < n --* S ( i + l ) ! i n s e r t ( n ) ; n ~ m
fii = size; size < 100 --~ fire = n ~ s k i p
c o n t e n t (size) .---- n; size .--'--size + l fire > n --~ S ( i + l ) ! i n s e r t ( m )
] l III
where SEARCHis an abbreviation for: Notes: (1) The user process S(0) inquires whether n is a
member by the commands S(l)!has(n); ... ; [(i: l.. 100)S(0?
/:integer; i .--- 0;
• [i < size; c o n t e n t ( 0 # n --, i .--- i + l]
b --> skip]. The appropriate process will respond to the
input command by the output command in line 2 or line
Notes: (1) The alternative command with guard "size < 5. This trick avoids passing the answer back "up the
100" will fail if an attempt is made to insert more than chain." (2) Many insertion operations can proceed in
100 elements. (2) The activity of insertion will in general parallel, yet any subsequent "has" operation will be
take place concurrently with the calling process. How- performed correctly. (3) All repetitive commands and all
ever, any subsequent instruction to S will be delayed processes of the array will terminate after the user process
until the previous insertion is complete. S(0) terminates.

4.4 Scanning a Set 4.6 Multiple Exits: Remove the Least Member
Problem: Extend the solution to 4.3 by providing a fast Exercise: Extend the above solution to respond to a
method for scanning all members of the set without command to yield the least member of the set and to
changing the value of the set. The user program will remove it from the set. The user program will invoke the
contain a repetitive command of the form: facility by a pair of commands:

S!scan( ); m o r e : b o o l e a n ; m o r e .--- true; S(1)!least( ); [ x : i n t e g e r ; S ( l ) ? x --* ... d e a l w i t h x ...


f i S ( l ) ? n o n e l e f t ( ) ---> ...
• [ m o r e ; x : i n t e g e r ; S ? n e x t ( x ) --, ... d e a l w i t h x ....
fimore; S ? n o n e l e f i ( ) - - , m o r e .--- false 1
l or, if he wishes to scan and empty the set, he may write:
where S!scan( ) sets the representation into a scanning S ( l ) ! l e a s t ( ) ; m o r e : b o o l e a n ; m o r e .'= true;
mode. The repetitive command serves as a for statement, • [more; x : i n t e g e r ; S ( l ) ? x - - , ... d e a l w i t h x ... ; S ( l ) ! l e a s t ( )
inputting the successive members of x from the set and fimore; S ( l ) ? n o n e l e f i ( ) ~ m o r e .--- false
inspecting them until finally the representation sends a 1
signal that there are no members left. The body of the
repetitive command is n o t permitted to communicate Hint: Introduce a Boolean variable, b, initialized to true,
with S in any way. and prefu¢ this to all the guards of the inner loop. After
responding to a !least( ) command from its predecessor,
Solution: Add a third guarded command to the outer each process returns its contained value n, asks its suc-
repetitive command of S: cessor for its least, and stores the response in n. But if the
successor returns "noneleft( )," b is set false and the
... f i X ? s c a n ( ) ~ / : i n t e g e r ; i ~ 0;
• [i < size --~ X ! n e x t ( c o n t e n t ( 0 ) ; i .--- i + l]; inner loop terminates. The process therefore returns to
X!noneleft( ) its initial state (solution due to David Gries).
672 Communications A u g u s t 1978
of V o l u m e 21
the ACM Number 8
5. Monitors and Scheduling However, after the producer has produced its next por-
tion, the consumer's request can be granted on the next
This section shows how a monitor can be regarded as iteration. (3) Similar remarks apply to the producer,
a single process which communicates with more than when in -- out + 10. (4) X is designed to terminate when
one user process. However, each user process must have out = in and the producer has terminated.
a different name (e.g. producer, consumer) or a different
subscript (e.g. X(0) and each communication with a user 5.2 Integer Semaphore
must identify its source or destination uniquely. Problem: To implement an integer semaphore, S, shared
Consequently, when a monitor is prepared to com- among an array X(i:I..100) of client processes. Each
municate with any of its user processes (i.e. whichever of process may increment the semaphore by S!V() or
them calls first) it will use a guarded command with a decrement it by S!P(), but the latter command must be
range. For example: .[(i:1.. 100)X(0?(value parameters) delayed if the value of the semaphore is not positive.
--~ ... ; X(0!(results)]. Here, the bound variable i is used Solution:
to send the results back to the calling process. If the S::val:integer; val .--- 0;
*[(i:I..100)X(0?V ( ) ~ val .--- val + 1
monitor is not prepared to accept input from some II(i:l..100)val > 0; X(0?P ( ) --, val ~ val - 1
particular user (e.g. X(j)) on a given occasion, the input ]
command may be preceded by a Boolean guard. For Notes: (1) In this process, no use is made of knowledge
example, two successive inputs from the same process of the subscript i of the calling process. (2) The sema-
are inhibited by j = 0; *[(i: 1.. 100)i # j; X(0?(values ) --, phore terminates only when all hundred processes of the
... ; j .--- i]. Any attempted outpui from X(j) will be process array X have terminated.
delayed until a subsequent iteration, after the output of
some other process X(i) has been accepted and dealt 5.3 Dining Philosophers (Problem due to E.W. Dijkstra)
with. Problem: Five philosophers spend their lives thinking
Similarly, conditions can be used to delay acceptance and eating. The philosophers share a common dining
of inputs which would violate scheduling constraints-- room where there is a circular table surrounded by five
postponing them until some later occasion when some chairs, each belonging to one philosopher. In the center
other process has brought the monitor into a state in of the table there is a large bowl of spaghetti, and the
which the input can validly be accepted. This technique table is laid with five forks (see Figure 1). On feeling
is similar to a conditional critical region [10] and it hungry, a philosopher enters the dining room, sits in his
obviates the need for special synchronizing variables own chair, and picks up the fork on the left of his place.
such as events, queues, or conditions. However, the Unfortunately, the spaghetti is so tangled that he needs
absence of these special facilities certainly makes it more to pick up and use the fork on his right as well. When he
difficult or less efficient to solve problems involving has finished, he puts down both forks, and leaves the
priorities--for example, the scheduling of head move- room. The room should keep a count of the number of
ment on a disk. philosophers in it.
Fig. 1.
5.1 Bounded Buffer
Problem: Construct a buffering process X to smooth
variations in the speed of output of portions by a pro-
ducer process and input by a consumer process. The
consumer contains pairs of commands X!more( );
X?p, and the producer contains commands of the form
X!p. The buffer should contain up to ten portions.
Solution:
(2
X::
buffer:(0..9) portion;
in,out:integer; in .--- 0; out .--- 0;
comment 0 <_ out _< in _< out + 10;
*[in < out + 10; producer?buffer(in mod 10) --* in .--- in + 1
[lout < in; consumer?more( ) --~ consumer!buffer(out rood 10); Solution: The behavior of the ith philosopher may be
out .--- out + 1 described as follows:
]
PHIL = *[... during ith lifetime ... ---,
Notes: (1) When out < in < out + 10, the selection of THINK;
the alternative in the repetitive command will depend on room!enter( );
fork(0!pickup( ); f o r k ( ( / + 1) rood 5)!pickup( );
whether the producer produces before the consumer
EAT;
consumes, or vice versa. (2) When out -- in, the buffer is fork(i)!putdown( ); f o r k ( ( / + 1) mod 5)!putdown( );
empty and the second alternative cannot be selected even room!exit( )
if the consumer is ready with its command X!more(). ]

673 Communications August 1978


of Volume 21
the A C M Number 8
The fate of the ith fork is to be picked up and put down 6.2 An Iterative Array: Matrix Multiplication
by a philosopher sitting on either side of it Problem: A square matrix A of order 3 is given. Three
streams are to be input, each stream representing a
FORK =
*[phil(0?pickup( )--* phil(0?putdown( ) column of an array IN. Three streams are to be output,
0phil((i - 1)rood 5)?pickup( ) --* p h i l ( ( / - l) raod 5)?putdown( ) each representing a column of" the product matrix IN ×
1 A. After an initial delay, the results are to be produced
The story of the room may be simply told: at the same rate as the input is consumed. Consequently,
a high degree of parallelism is required. The solution
R O O M = occupancy:integer; occupancy .--- 0; should take the form shown in Figure 2. Each of the nine
,[(i:0..4)phil(0?enter ( ) --* occupancy .--- occupancy + l
nonborder nodes inputs a vector component from the
11(i:0..4)phil(0?exit ( ) --~ occupancy .--- occupancy - l
] west and a partial sum from the north. Each node outputs
the vector component to its east, and an updated partial
All these components operate in parallel: sum to the south. The input data is produced by the west
[room::ROOM I [fork( i:0..4)::FORK IIphil( i:0..4)::PHIL]. border nodes, and the desired results are consumed by
south border nodes. The north border is a constant
Notes: (1) The solution given above does not prevent all
source of zeros and the east border is just a sink. No
five philosophers from entering the room, each picking
provision need be made for termination nor for changing
up his left fork, and starving to death because he cannot
the values of the array A.
pick up his right fork. (2) Exercise: Adapt the above
program to avert this sad possibility. Hint: Prevent more
Fig. 2.
than four philosophers from entering the room. (Solution
due to E. W. Dijkstra).

0 0 0

6. M i s c e l l a n e o u s
~111 ~12x A13x
This section contains further examples of the use of
communicating sequential processes for the solution of
some less familiar problems; a parallel version o f the A11x*Azly A12x*A22y A13x*A23y
sieve o f Eratosthenes, and the design of an iterative
array. The proposed solutions are even more speculative
than those of the previous sections, and in the second A11x*A21y,A~ AlzX.A22Y.A3~AoX.A23y.A~z
example, even the question of termination is ignored.

6.1 Prime Numbers: The Sieve of Eratosthenes [14] S

Problem: To print in ascending order all primes less than


10000. Use an array of processes, SIEVE, in which each Solution: There are twenty-one nodes, in five groups,
process inputs a prime from its predecessor and prints it. comprising the central square and the four borders:
The process then inputs an ascending stream of numbers
[M(i: 1..3,0)::WEST
from its predecessor and passes them on to its successor, IIM(0d:I..3)::NORTH
suppressing any that are multiples of the original prime. IIM(i:I..3,4)::EAST
Solution: IIM(4j:I..3)::SOUTH
IIM(i:I..3d:I..3)::CENTER
[SIEVE(i: 1 100)::
1
. .

p,rap:integer;
SIEVE(i- l)?p; The WEST and SOUTH borders are processes of the user
print!p; program; the remaining processes are:
rap .---p; c o m m e n t rap is a multiple o f p;
,[re:integer; S I E V E ( / - l)?m ---* N O R T H = .[true --* M(Id)!0 ]
*[m > mp ~ mp .--'- m p + p]; E A S T = .Ix:real; M(i,3)?x---> skip]
[m = rap --* skip C E N T E R = .[x:real; M(id - l)?x --*
nra < rap --* SIEVE(i + l)!ra M ( i , j + l)!x; sum:real;
] ] M ( i - l,j)?sum; M(i + l d ) ! ( A ( i , j ) * x + sum)
HSIEVE(0)::print!2; n:integer; n .--- 3; ]
* I n < 10000--* SIEVE(I)!n; n .--- n + 2]
IISIEVE(101)::*[n:integer;SIEVE(100)?n --~ print!n]
Hprint::,[(i:0.. 101) n:integer; SIEVE(0?n --> ...]
1 7. D i s c u s s i o n
Note: (1) This beautiful solution was contributed by
David Giles. (2) It is algorithmically similar to the A design for a programming language must neces-
program developed in [7, pp. 27-32]. sarily involve a number o f decisions which seem to be
674 Communications August 1978
of Volume 2 I
the A C M Number 8
fairly arbitrary. The discussion of this section is intended printed text which should describe the execution of the
to explain some of the underlying motivation and to program, independent of which parts were drawn from
mention some unresolved questions. a library.
Since I did not intend to design a complete language,
7.1 Notations I have ignored the problem of libraries in order to
I have chosen single-character notations (e.g. !,?) to concentrate on the essential semantic concepts of the
express the primitive concepts, rather than the more program which is actually executed.
traditional boldface or underlined English words. As a
result, the examples have an APL-like brevity, which 7.3 Port Names
some readers fred distasteful. My excuse is that (in An alternative to explicit naming of source and des-
contrast to APL) there are only a very few primitive tination would be to name a port through which com-
concepts and that it is standard practice of mathematics munication is to take place. The port names would be
(and also good coding practice) to denote common prim- local to the processes, and the manner in which pairs of
itive concepts by brief notations (e.g. +,x). When read ports are to be connected by channels could be declared
aloud, these are replaced by words (e.g. plus, times). in the head of a parallel command.
Some readers have suggested the use of assignment This is an attractive alternative which could be de-
notation for input and output: signed to introduce a useful degree of syntactically check-
able redundancy. But it is semantically equivalent to the
<target variable> := <source> present proposal, provided that each port is connected to
<destination> .---<expression>
exactly one other port in another process. In this case
I fend this suggestion misleading: it is better to regard each channel can be identified with a tag, together with
input and output as distinct primitives, justifying distinct the name of the process at the other end. Since I wish to
notations. concentrate on semantics, I preferred in this paper to use
I have used the same pair of brackets ([...]) to bracket the simplest and most direct notation, and to avoid
all program structures, instead of the more familiar raising questions about the possibility of connecting more
variety of brackets (if..fi, begin..end, case...esac, etc.). In than two ports by a single channel.
this I follow normal mathematical practice, but I must
also confess to a distaste for the pronunciation of words 7.4 Automatic Buffering
like fi, od, or esac. As an alternative to synchronization of input and
I am dissatisfied with the fact that my notation gives output, it is often proposed that an outputting process
the same syntax for a structured expression and a sub- should be allowed to proceed even when the inputting
scripted variable. Perhaps tags should be distinguished process is not yet ready to accept the output. An imple-
from other identifiers by a special symbol (say #). mentation would be expected automatically to interpose
I was tempted to introduce an abbreviation for com- a chain of buffers to hold output messages that have not
bined declaration and input, e.g. X?(n:integer) for yet been input.
n:integer; X?n. I have deliberately rejected this alternative, for two
reasons: (1) It is less realistic to implement in multiple
7.2 Expficit Naming disjoint processors, and (2) when buffering is required
My design insists that every input or output com- on a particular channel, it can readily be specified using
mand must name its source or destination explicitly. This the given primitives. Of course, it could be argued
makes it inconvenient to write a library of processes equally well that synchronization can be specified when
which can be included in subsequent programs, inde- required by using a pair of buffered input and output
pendent of the process names used in that program. A commands.
partial solution to this problem is to allow one process
(the main process) of a parallel command to have an 7.5 Unbounded Process Activation
empty label, and to allow the other processes in the The notation for an array of processes permits the
command to use the empty process name as source or same program text (like an Algol recursive procedure) to
destination of input or output. have many simultaneous "activations"; however, the
For construction of large programs, some more gen- exact number must be specified in advance. In a conven-
eral technique will also be necessary. This should at least tional single-processor implementation, this can lead to
permit substitution of program text for names defined inconvenience and wastefulness, similar to the fixed-
elsewhere--a technique which has been used informally length array of Fortran. It would therefore be attractive
throughout this paper. The Cobol coPY verb also permits to allow a process array with no a priori bound on the
a substitution for formal parameters within the copied number of elements; and to specify that the exact number
text. But whatever facility is introduced, I would rec- of elements required for a particular execution of the
ommend the following principle: Every program, after program should be determined dynamically, like the
assembly with its library routines, should be printable as maximum depth of recursion of an Algol procedure or
a text expressed wholly in the language, and it is this the number of iterations of a repetitive command.

675 Communications August 1978


of Volume 21
the ACM Number 8
However, it is a good principle that every actual run striking. There, coroutines are strictly deterministic: No
of a program with unbounded arrays should be identical choice is given between alternative sources of input. The
to the run of some program with all its arrays bounded output commands are automatically buffered to any
in advance. Thus the unbounded program should be required degree. The output of one process can be au-
defined as the "limit" (in some sense) of a series of tomatically fanned out to any :number of processes (in-
bounded programs with increasing bounds. I have cho- cluding itself!) which can consume it at differing rates.
sen to concentrate on the semantics of the bounded Finally, the processes there are designed to run forever,
case--which is necessary anyway and which is more whereas my proposed parallel command is normally
realistic for implementation on multiple microprocessors. intended to terminate. The design in [12] is based on an
elegant theory which permits proof of the properties of
7.6 Fairness programs. These differences are not accidental--they
Consider the parallel command: seem to be natural consequences of the difference be-
[X::Y!stop( )ll Y::continue:boolean; continue .--- true; tween the more abstract applicative (or functional) ap-
.[continue; X?stop( ) ~ continue .---false proach to programming and the more machine-oriented
Ilcontinue ---, n .--- n + 1 imperative (or procedural) approach, which is taken by
1 communicating sequential processes.
1.
If the implementation always prefers the second alter- 7.8 Output Guards
native in the repetitive command of Y, it is said to be Since input commands may appear in guards, it
unfair, because although the output command in X could seems more symmetric to permit output commands as
have been executed on an infinite number of occasions, well. This would allow an obvious and useful simplifi-
it is in fact always passed over. cation in some of the example programs, for example, in
The question arises: Should a programming language the bounded buffer (5.1). Perhaps a more convincing
definition specify that an implementation must be fair?. reason would be to ensure that the externally visible
Here, I am fairly sure that the answer is NO. Otherwise, effect and behavior of every parallel command can be
the implementation would be obliged to successfully modeled by some sequential command. In order to
complete the example program shown above, in spite of model the parallel command
the fact that its nondeterminism is unbounded. I would
Z :: [X!211Y!3l
therefore suggest that it is the programmer's responsibil-
ity to prove that his program terminates correctly--with- we need to be able to write the sequential alternative
out relying on the assumption of fairness in the imple- command:
mentation. Thus the program shown above is incorrect,
Z :: [X!2 ~ Y!31IY!3 ~ X!2]
since its termination cannot be proved.
Nevertheless, I suggest that an efficient implementa- Note that this cannot be done by the command
tion should try to be reasonably fair and should ensure Z :: [true ~ X!2; Y!31]true ~ Y!3; X!2]
that an output command is not delayed unreasonably
often after it first becomes executable. But a proof of which can fail if the process Z happens to choose the
correctness must not rely on this property of an efficient first alternative, but the processes Y and X are synchro-
implementation. Consider the following analogy with a nized with each other in such a way that Y must input
sequential program: An efficient implementation of an from Z before X does, e.g.
alternative command will tend to favor the alternative Y :: Z?y; X ! g o ( )
which can be most efficiently executed, but the program- IIX:: Y?go(); Z?x
mer must ensure that the logical correctness of his pro-
gram does not depend on this property of his implemen- 7.9 Restriction: Repetitive Command With Input Guard
tation. In proposing an unfamiliar programming language
This method of avoiding the problem of fairness does feature, it seems wiser at first to specify a highly restric-
not apply to programs such as operating systems which tive version rather than to propose extensions--
are intended to run forever because in this case termi- especially when the language feature claims to be prim-
nation proofs are not relevant. But I wonder whether it itive. For example, it is clear that the multidimensional
is ever advisable to write or to execute such programs. process array is not primitive, since it can readily be
Even an operating system should be designed to bring constructed in a language which permits only single-
itself to an orderly conclusion reasonably soon after it dimensional arrays. But I have a rather more serious
inputs a message instructing it to do so. Otherwise, the misgiving about the repetitive command with input
only way to stop it is to "crash" it. guards.
The automatic termination of a repetitive command
7.7 Functional Coroutines on termination of the sources of all its input guards is an
It is interesting to compare the processes described extremely powerful and convenient feature but it also
here with those proposed in [12]; the differences are most involves some subtlety of specification to ensure that it
676 Communications August 1978
of Volume 21
the A C M Number 8
is implementable; and it is certainly not primitive, since 5. Dahl, O-J., et al. SIMULA 67, common base language.
the required effect can be achieved (with considerable Norwegian Computing Centre, Forskningveien, Oslo, 1967.
6. Dijkstra, E.W. Co-operating sequential processes. In
inconvenience) by explicit exchange of "end()" signals. Programming Languages, F. Genuys, Ed., Academic Press, New
For example, the subroutine DIV(4.l) could be rewritten: York, 1968, pp. 43-112.
7. Dijkstra, E.W. Notes on structured programming. In Structured
[DIV :: continue:boolean; continue .----true; Programming, Academic Press, New York 1972, pp. 1-82.
*[continue; X?endO--, continue .--- false 8. Dijkstra, E.W. Guarded commands, nondeterminacy, and formal
[Icontinue; x,y:integer; X?(x,y) --->... ; X!(quot,rem) derivation of programs. Comm. A CM 18, 8 (Aug. 1975), 453-457.
[IX :: USER PROG; DIV!endO 9. Dijkstra, E.W. Verbal communication, Marktoberdorf, Aug.
] 1975.
10. Hoare, C.A.R. Towards a theory of parallel programming. In
Other examples would be even more inconvenient. Operating Systems Techniques, Academic Press, New York, 1972, pp.
61-71.
But the dangers of convenient facilities are notorious. 11. Hoare, C.A.R. Proof of correctness of data representations..4cta
For example, the repetitive commands with input guards Informatica 1, 4 (1972), 271-281.
may tempt the programmer to write them without mak- 12. Kahn, G. The semantics of a simple language for parallel
programming. In Proc. IFIP Congress 74, North Holland, 1974.
ing adequate plans for their termination; and if it turns 13. Liskov, B.H. A note on CLU. Computation Structures Group
out that the automatic termination is unsatisfactory, Memo. 112, M.I.T., Cambridge, Mass, 1974.
reprogramming for explicit termination will involve se- 14. Mcllroy, M.D. Coroutines. Bell Laboratories, Murray Hill, N.J.,
1968.
vere changes, affecting even the interfaces between the 15. Naur, P., Ed. Report on the algorithmic language ALGOL 60.
processes. Comm. ACM 3, 5 (May 1960), 299-314.
16. Reynolds, J.C. COGENT. ANL-7022, Argonne Nat. Lab.,
Argonne, II1., 1965.
17. Thompson, K. The UNIX command language. In Structured
8. Conclusion Programming, Infotech, Nicholson House, Maidenhead. England,
1976, pp. 375-384.
18. van Wijngaarden, A. Ed. Report on the algorithmic language
This paper has suggested that input, output, and ALGOL 68. Numer. Math. 14 (1969), 79-218.
concurrency should be regarded as primitives of pro- 19. Wulf, W.A., London, R.L., and Shaw, M. Abstraction and
gramming, which underlie many familiar and less famil- verification in ALPHARD. Dept. of Comptr. Sci., Carnegie-MeUon
U., Pittsburgh, Pa., June 1976.
iar programming concepts. However, it would be unjus- 20. Wirth, N. The programming language PASCAL. Acta
tified to conclude that these primitives can wholly replace Informatica 1, 1 (1971), 35-63.
the other concepts in a programming language. Where
a more elaborate construction (such as a procedure or a
monitor) is frequently useful, has properties which are
more simply provable, and can also be implemented
more efficiently than the general case, there is a strong
reason for including in a programming language a special
notation for that construction. The fact that the construc-
tion can be defined in terms of simpler underlying prim-
itives is a useful guarantee that its inclusion is logically
consistent with the remainder of the language.

Acknowledgments. The research reported in this pa-


per has been encouraged and supported by a Senior
Fellowship of the Science Research Council of Great
Britain. The technical inspiration was due to Edsger W.
Dijkstra [9], and the paper has been improved in pres-
entation and content by valuable and painstaking advice
from D. Gries, D. Q. M. Fay, Edsger W. Dijkstra, N.
Wirth, Robert Milne, M. K. Harper, and its referees.
The role of IFIP W.G.2.3 as a forum for presentation
and discussion is acknowledged with pleasure and grat-
itude.
Received March 1977; revised August 1977
References
1. Atkinson, R., and Hewitt, C. Synchronisation in actor systems.
Working Paper 83, M.I.T., Cambridge, Mass., Nov. 1976.
2. Brinch Hansen, P. The programming language Concurrent
Pascal. IEEE Trans. Software Eng. 1, 2 (June 1975), 199-207.
3. Campbell, R.H., and Habermann, A.N. The specification of
process synchronisation by path expressions. Lecture Notes in
Computer Science 16, Springer, 1974, pp. 89-102.
4. Conway, M.E. Design of a separable transition-diagram
compiler. Comm. ACM 6, 7 (July 1963), 396-408.

677 Communications August 1978


of Volume 21
the ACM Number 8
Some Properties of Conversion
Author(s): Alonzo Church and J. B. Rosser
Source: Transactions of the American Mathematical Society, Vol. 39, No. 3 (May, 1936), pp.
472-482
Published by: American Mathematical Society
Stable URL: http://www.jstor.org/stable/1989762
Accessed: 09/09/2009 11:36

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ams.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

American Mathematical Society is collaborating with JSTOR to digitize, preserve and extend access to
Transactions of the American Mathematical Society.

http://www.jstor.org
A Structural Approach to Operational
Semantics

Gordon D. Plotkin
Laboratory for Foundations of Computer Science, School of Informatics,
University of Edinburgh, King’s Buildings, Edinburgh EH9 3JZ, Scotland

Contents

1 Transition Systems and Interpreting Automata 3

1.1 Introduction 3

1.2 Transition Systems 3

1.3 Examples of Transition Systems 5

1.4 Interpreting Automata 12

1.5 Exercises 18

2 Bibliography 23

3 Simple Expressions and Commands 24

3.1 Simple Expressions 24

3.2 Simple Commands 31

3.3 L-commands 34

3.4 Structural Induction 37

3.5 Dynamic Errors 41

3.6 Simple Type-Checking 42

3.7 Static Errors 45

Email address: gdp@inf.ed.ac.uk (Gordon D. Plotkin).

Preprint submitted to Journal of Logic and Algebraic Programming 30 January 2004


3.8 Exercises 46

3.9 Bibliographical Remarks 51

4 Bibliography 52

5 Definitions and Declarations 54

5.1 Introduction 54

5.2 Simple Definitions in Applicative Languages 54

5.3 Compound Definitions 58

5.4 Type-Checking and Definitions 65

5.5 Exercises 79

5.6 Remarks 85

6 Bibliography 86

7 Functions, Procedures and Classes 88

7.1 Functions in Applicative Languages 89

7.2 Procedures and Functions 102

7.3 Other Parameter Mechanisms 107

7.4 Higher Types 114

7.5 Modules and Classes 117

7.6 Exercises 124

A A Guide to the Notation 130

B Notes on Sets 131

2
1 Transition Systems and Interpreting Automata

1.1 Introduction

It is the purpose of these notes to develop a simple and direct method for specifying the seman-
tics of programming languages. Very little is required in the way of mathematical background;
all that will be involved is “symbol-pushing” of one kind or another of the sort which will al-
ready be familiar to readers with experience of either the non-numerical aspects of programming
languages or else formal deductive systems of the kind employed in mathematical logic.

Apart from a simple kind of mathematics the method is intended to produce concise com-
prehensible semantic definitions. Indeed the method is even intended as a direct formalisation
of (many aspects of) the usual informal natural language descriptions. I should really confess
here that while I have some experience what has been expressed above is rather a pious hope
than a statement of fact. I would therefore be most grateful to readers for their comments and
particularly their criticisms.

I will follow the approach to programming languages taken by such authors as Gordon [Gor] and
Tennent [Ten] considering the main syntactic classes – expressions, commands and declarations
– and the various features found in each. The linguistic approach is that developed by the Scott-
Strachey school (together with Landin and McCarthy and others) but within an operational
rather than a denotational framework. These notes should be considered as an attempt at
showing the feasibility of such an approach. Apart from various inadequacies of the treatment
as presented many topics of importance are omitted. These include data structures and data
types; various forms of control structure from jumps to exceptions and coroutines; concurrency
including semaphores, monitors and communicating process.

Many thanks are due to the Department of Computer Science at Aarhus University at whose
invitation I was enabled to spend a very pleasant six months developing this material. These
notes partially cover a series of lectures given at the department. I would like also to thank the
staff and students whose advice and criticism had a strong influence and also Jette Milwertz
whose typing skills made the work look better than it should.

1.2 Transition Systems

The announced “symbol-pushing” nature of our method suggests what is the truth; it is an
operational method of specifying semantics based on syntactic transformations of programs
and simple operations on discrete data. The idea is that in general one should be interested in
computer systems whether hardware or software and for semantics one thinks of systems whose
configurations are a mixture of syntactical objects – the programs and data – such as stores or

3
environments. Thus in these notes we have

SYSTEM = PROGRAM + DATA

One wonders if this study could be generalised to other kinds of systems, especially hardware
ones.

Clearly systems have some behaviour and it is that which we wish to describe. In an opera-
tional semantics one focuses on the operations the system can perform – whether internally
or interactively with some supersystem or the outside world. For in our discrete (digital) com-
puter systems behaviour consists of elementary steps which are occurrences of operations. Such
elementary steps are called here, (and also in many other situations in Computer Science) tran-
sitions (= moves). Thus a transition steps from one configuration to another and as a first idea
we take it to be a binary relation between configurations.

Definition 1 A Transition System (ts) is (just!) a structure hΓ, −→i where Γ is a set (of
elements, γ, called configurations) and −→ ⊆ Γ × Γ is a binary relation (called the transition
relation). Read γ −→ γ 0 as saying that there is a transition from the configuration γ to the
configuration γ 0 . (Other notations sometimes seen are `, ⇒ and ).

...... .... .........@@


@ @ ...
. .. . @ @@
.... @ ... @ @ @@
@ @ @
@...... ....
@
..... @ @ @ @ @...
@ @ @
......@ @ ......@ @ @ @@@ .@
......@ @ @ @@ @@@
-
.... @
@@ @@ @ .. @ @ @
.@
..... @ @ @ ....
@@
@ @ @ . @...
@
.@ @ @ @ @ ....... .
. .
.
@...@.
@
@ @
@@ @ @@ @ @
@
@ @ @@ @.... @ @.... ... ....
@ @@ @ .... @ @.. @ @@@@..
.@
@ @ @@ @ . @ @ @
.....@ @ @ @ @@. .@
..... @ @ @@ @ @ @
.. @ @ @ @ .... .
..... ..
@ @
.@ ... @
..... @ @ ..@ .. .@
@
....@....@
@
... @
.@ ..@...@
@ @ @.
γ γ0

A Transition

Of course this idea is hardly new and examples can be found in any book on automata or formal
languages. Its application to the definition of programming languages can be found in the work
of Landin and the Vienna Group [Lan,Oll,Weg].

Structures of the form, hΓ, −→i are rather simple and later we will consider several more
elaborate variants, tailored to individual circumstances. For example it is often helpful to have
an idea of terminal (= final = halting) configurations.

4
Definition 2 A Terminal Transition System (tts) is a structure hΓ, −→, T i where hΓ, −→i is
a ts, and T ⊆ Γ (the set of final configurations) satisfies ∀γ ∈ T ∀γ 0 ∈ Γ . γ 6−→ γ 0 .

A point to watch is to make a distinction between internal and external behaviour. Internally
a system’s behaviour is nothing but the sum of its transitions. (We ignore here the fact that
often these transitions make sense only at a certain level; what counts as one transition for one
purpose may in fact consist of many steps when viewed in more detail. Part of the spirit of our
method is to choose steps of the appropriate “size”.) However externally many of the transitions
produce no detectable effect. It is a matter of experience to choose the right definition of external
behaviour. Often two or more definitions of behaviour (or of having the same behaviour) are
possible for a given transition system. Indeed on occasion one must turn the problem around and
look for a transition system which makes it possible to obtain an expected notion of behaviour.

1.3 Examples of Transition Systems

We recall a few familiar and not so familiar examples from computability and formal languages.

1.3.1 Finite Automata

A finite automaton is a quintuplet M = hQ, Σ, δ, q0 , F i where

• Q is a finite set (of states)


• Σ is a finite set (the input alphabet)
• δ : Q × Σ −→ P(Q) (is the state transition relation)
• q0 ∈ Q (is the initial state)
• F ⊆ Q (is the set of final states)

To obtain a transition system we set

Γ = Q × Σ∗

So any configuration, γ = hq, wi has a state component, q, and a control component, w, for
data.

For the transitions we put whenever q 0 ∈ δ(q, a):

hq, awi ` hq 0 , wi

(More formally, ` = {hhq, awi, hq 0 , wii | q, q 0 ∈ Q, a ∈ Σ, w ∈ Σ∗ , q 0 ∈ δ(q, a)}).

The behaviour of a finite automaton is just the set L(M ) of strings it accepts:

L(M ) = {w ∈ Σ∗ | ∃q ∈ F hq0 , wi`∗ hq, εi}

5
Of course we could also define the terminal configurations by:

T = {hq, εi | q ∈ F }

and then

L(M ) = {w ∈ Σ∗ | ∃γ ∈ T hq0 , wi`∗ γ}

In fact we can even get a little more abstract. Let hΓ, −→, T i be a tts. An input function for it
is any mapping in: I −→ Γ and the language it accepts is then L(Γ) ⊆ I where:

L(Γ) = {i ∈ I | ∃γ ∈ T . in(i)−→∗ γ}

(For finite automata as above we take I = Σ∗ , and in(w) = hq0 , wi). Thus we can easily
formalise at least one general notion of behaviour.

Example 3 The machine:


 1  0  0,1
'$ 0 '$ '$
? ? ?

start - ? 0-
p q r
 
&% 6 &% &%
1

A transition sequence:

hp, 01001i ` hq, 1001i ` hp, 001i


` hq, 01i ` hr, 1i
` hr, εi

1.3.2 Three Counter Machines

We have three counters, C, namely I, J and K. There are instructions, O, of the following four
types:

• Increment: inc C : m
• Decrement: dec C : m
• Zero Test: zero C : m/n
• Stop: stop

Then programs are just sequences P = O1 , . . . , Ol of instructions. Now, fixing P , the set of
configurations is:

Γ = {hm, i, j, ki | 1 ≤ m ≤ l; i, j, k ∈ N}

6
Then the transition relation is defined in terms of the various possibilities by:

• Case II: Om = inc I : m0

hm, i, j, ki ` hm0 , i + 1, j, ki

• Case ID: Om = dec I : m0

hm, i + 1, j, ki ` hm0 , i, j, ki

• Case IZ: Om = zero I : m0 /m00

hm, 0, j, ki ` hm0 , 0, j, ki
hm, i + 1, j, ki ` hm00 , i + 1, j, ki

and similarly for J and K.

Note 1 There is no case for the stop instruction.


Note 2 In case m0 or m00 are 0 or > k the above definitions do not (of course!) apply.
Note 3 The transition relation is deterministic, that is:

∀γ, γ 0 , γ 00 · γ −→ γ 0 ∧ γ −→ γ 00 ⇒ γ 0 = γ 00

or, diagrammatically:
γ

J

J

J

J



JJ
^
γ 0 ======= γ 00
(Exercise – prove this).

Now the set of terminal configurations is defined by:

T = {hm, 0, j, 0i | Om = stop}

and the behaviour is a partial function f : N −→ N where:


P

def
f (i) = j = h1, i, 0, 0i−→∗ hm, 0, j, 0i ∈ T

This can be put a little more abstractly, if we take for any tts hΓ, −→, T i an input function,
in : I −→ Γ as before and also an output function, out : T −→ O and define a partial function
fΓ : I −→ O by
P

fΓ (i) = o ≡ ∃γ in(i) −→∗ γ ∈ T ∧ o = out(γ)

7
Of course for this to make sense the tts must be deterministic (why?). In the case of a three-
counter machine we have

I=O=N






 in(i) = h1, i, 0, 0i



 out(hm, i, j, ki) = j

Example 4 A program for the successor function, n 7→ n + 1

inc J

?
H H
HH
 HH yes -
zero I stop


HH 
H  
HH 
H
no
?

6
dec I

inc J

1.3.3 Context-Free Grammars

A context-free grammar is a quadruple, G = hN, Σ, P, Si where

• N is a finite set (of non-terminals)


• Σ is a finite set (the input alphabet)
• P ⊆ N × (N ∪ Σ)∗ (is the set of productions)
• S ∈ N (is the start symbol)

Then the configurations are given by:

Γ = (N ∪ Σ)∗

and the transition relation ⇒ is given by:

wXv ⇒ wxv (when X → x is in P )

8
Now the behaviour is just

L(G) = {w | S ⇒∗ w}

Amusingly, this already does not fit into our abstract idea for behaviours as sets (the one which
worked for finite automata). The problem is that was intended for acceptance where here we
have to do with generation (by leftmost derivations).

Exercise: Write down an abstract model of generation.

Example 5 The grammar is:

S→
S → (S)
S → SS

and a transition sequence could be

S ⇒ SS ⇒ (S)S ⇒ ()S ⇒ ()(S)


⇒ ()(SS) ⇒2 ()(()S) ⇒2 ()(()())

1.3.4 Labelled Transition Systems

Transition systems in general do not give the opportunity of saying very much about any
individual transition. By adding the possibility of such information we arrive at a definition.

Definition 6 A Labelled Transition System (lts) is a structure hΓ, A, −→i where Γ is a set (of
configurations) and A is a set (of actions (= labels = operations)) and

−→ ⊆ Γ × A × Γ

is the transition relation.


a
We write a transition as: γ −→ γ 0 where γ, γ 0 are configurations and a is an action. The idea
is that an action can give information about what went on in the configuration during the
transition (internal actions) or about the interaction between the system and its environment
(external actions) (or both). The labels are particularly useful for specifying distributed systems
where the actions may relate to the communications between sub-systems. The idea seems to
originate with Keller [Kel].

The idea of Labelled Terminal Transition Systems hΓ, A, −→, T i should be clear to the reader
who will also expect the following generalisation of reflexive (resp. transitive) closure. For any

9
lts let γ and γ 0 be configurations and take x = a1 . . . ak in A+ (resp. A∗ ) then:

x + (resp. ∗) def a a
γ −→ γ0 = 1
∃γ1 , . . . , γk . γ −→ k
γ1 . . . −→ γk = γ 0

where k > 0 (resp. k ≥ 0).

Example 7 (Finite Automata (continued)) This time define a tts by taking

• Γ=Q
• A=Σ
a
• q −→ q 0 ≡ q 0 ∈ δ(q, a)
• T =F
w ∗
Then we have L(M ) = {w ∈ A∗ | ∃q ∈ T. q0 −→ q}. The example transition sequence given
above now becomes simply:
0 1 0 0 1
p −→ q −→ p −→ q −→ r −→ r ∈ F

Example 8 (Petri Nets) One idea of a Petri Net is just a quadruple N = hB, E, F, mi where

• B is a finite set (of conditions)


• E is a finite set (of events)
• F ⊆ (B × E) ∪ (E × B) (is the flow relation)
• m ⊆ B (is the initial case)

A configuration, m, is contact-free if

¬∃e ∈ E. (F −1 (e) ⊆ m ∧ F (e) ∩ m 6= ∅)

#
a #b
x x
Q 
"! "!
QQ
s 
QQ 
3
Q e 
 Q
# #
 QQ
s

3 Q
x  QQ

"! "!
a0 0
b

A contact situation for m = a, a0 , b

The point of this definition is that the occurrence of an event, e, is nothing more than the
ceasing-to-hold of its preconditions ( = F −1 (e)) and the starting-to-hold of its postconditions
( = F (e)) in any given case. Here a case is a set of conditions (those that hold in the case). A

10
contact-situation is one where this idea does not make sense. Often one excludes this possibility
axiomatically (and imposes also other intuitively acceptable axioms). We will just (somewhat
arbitrarily) regard them as “runtime errors” and take

Γ = {m ⊆ B | m is contact-free}

If two different events share a precondition in a case, then according to the above intentions
they cannot both occur at once. Accordingly we define a conflict relation between events by:

e]e0 ≡ (F −1 (e) ∩ F −1 (e0 ) 6= ∅ ∧ e 6= e0 )

An event can occur from a given case if all its preconditions hold in the case. What is (much)
more, Petri Nets model concurrency in that several events (not in conflict) can occur together
in a given case. So we put

A = {X ⊆ E | ¬∃e, e0 ∈ X. e]e0 }

and define
X
m −→ m0 ≡ F −1 (X) ⊆ m ∧ m0 = [m\F −1 (X)] ∪ F (X)

Here is a pictorial example of such a transition

# # # # # # # #
x x
"! "! "! "! "! "! "! "!
6 6 6 6 {1,4}
-
6 6 6 6

1 2 3 4 1 2 3 4
Q
kQ 3PPP Q 3PPP
1
6  iP 6 1
6 kQ  iP 6
# # # # # #
P P
 Q P  Q P
x x x x
"! "! "! "! "! "!

A Transition

We give no definition of behaviour as there does not seem to be any generally accepted one in
the literature. For further information on Petri Nets see [Bra,Pet].

Of course our transitions with their actions must also be thought of as kinds of events; even
more so when we are discussing the semantics of languages for concurrency. We believe there
are very strong links between our ideas and those in Net Theory, but, alas, do not have time
here to pursue them.

11
Example 9 (Readers and Writers) This is a (partial) specification of a Readers and Writ-
ers problem with two agents each of whom can read and write (and do some local processing)
but where the writes should not overlap.
# #

"! "!
@
@ @
@
@ @
 I
@
@  I
@
@
@ @

FR1 ? FW1 FW2 ? FR2


@
@R
@
6 6 @ 6 6
# # @# # #
@

LP1 x LP2 x
"! "! "! "! "!
@
@
R
@
6 6 @@ 6 6
@

SR1 ? SW1 SW2 ? SR2


@
@ @
@
@ @
I
@  I
@ 
@# @#
@ @

x
"! "!

SWi is Start Writing i


FWi is Finish Writing i
SRi is Start Reading i
FRi is Finish Reading i
LRi is Local Processing i where 1 ≤ i ≤ 2

1.4 Interpreting Automata

To finish Chapter 1 we give an example of how to define the operational semantics of a language
by an interpreting automaton. The reader should obtain some feeling for what is possible along
these lines (see the references given above for more information), as well as a feeling that the

12
method is somehow a little too indirect thus paving the way for the approach taken in the next
chapter.

1.4.1 The Language L

We begin with the Abstract Syntax of a very simple programming language called L. What is
abstract about it will be discussed a little here and later at greater length. For us syntax is a
collection of syntactic sets of phrases; each set corresponds to a different type of phrase. Some
of these sets are very simple and can be taken as given:

• Basic Syntactic Sets


Truth-values This is the set T = {tt, ff} and is ranged over by (the metavariable) t (and
we also happily employ for this (and any other) metavariable sub- and super-scripts to
generate other metavariables: t0 , t0 , t000
1k ).
Numbers m, n are the metavariables over N = {0, 1, 2, . . .}.
Variables v ∈ Var = {a, b, c, . . . , z}
Note how we have progressed to a fairly spare style of specification in the above.
• Derived Syntactic Sets
Expressions e ∈ Exp given by

e ::= m | v | e + e0 | e − e0 | e ∗ e0

Boolean Expressions b ∈ BExp given by

b ::= t | e = e0 | b or b0 | ∼b

Commands c ∈ Com given by

c ::= nil | v := e | c; c0 | if b then c else c0 | while b do c

This specification can be taken, roughly speaking, as a context-free grammar if the reader just
ignores the use of the infinite set N and the use of primes. It can also (despite appearances!) be
taken as unambiguous if the reader just regards the author as having lazily omitted brackets as
in:

b ::= t | e = e0 | b or b0 | ∼b

specifying parse trees so that rather than saying ambiguously that (for example):

while b do c; c0

is a program what is being said is that both

13
Z

J Z

J

J Z
J
J Z J
and are trees.


J ZZ
J

J
J Z
Z
JJ
while b do
J B@
B ; c0

J  B@

J  B @

J  B @

JJ  B @
@
c ; c0 while b do c

So we are abstract in not worrying about some lexical matters and just using for example
integers rather than numerals and in not worrying about the exact specification of phrases.
What we are really trying to do is abstract away from the problems of parsing the token strings
that really came into the computer and considering instead the “deep structure” of programs.
Thus the syntactic categories we choose are supposed to be those with independent semantic
significance; the various program constructs – such as semicolon or while . . . do . . . – are the
constructive operations on phrases that possess semantic significance.

For example contrast the following concrete syntax for (some of) our expressions (taken from
[Ten]):

hexpressioni ::= htermi | hexpressioni haddopi htermi


htermi ::= hfactori | htermi hmultopi hfactori
hfactori ::= hvariablei | hliterali | (hexpressioni)
haddopi ::= + | −
hmultopi ::= ∗
hvariablei ::= a | b | c | . . . | z
hliterali ::= 0 | 1 | . . . | 9

Now, however convenient it is for a parser to distinguish between hexpressioni, htermi and
hfactori it does not make much semantic sense!

Thus we will never give semantics directly to token strings but rather to their real structure.
However, we can always obtain the semantics of token strings via parsers which we regard as
essentially just maps:

Parser: Concrete Syntax −→ Abstract Syntax

Of course it is not really so well-defined what the abstract syntax for a given language is, and
we shall clearly make good use of the freedom of choice available.

14
Returning to our language L we observe the following “dependency diagram”:

C
@
@
@
@
@
? @R
@
B - E

1.4.2 The SMC-Machine

Now we define a suitable transition system whose configurations are those of the SMC-machine.

• Value Stacks is ranged over by S and is the set (T ∪ N ∪ Var ∪ BExp ∪ Com)∗
• Memories is ranged over by M and is Var −→ N
• Control Stacks is ranged over by C and is

(Com ∪ BExp ∪ Exp ∪ {+, −, ∗, =, or, ∼, :=, if , while})∗

The set of configurations is

Γ = Value Stacks × Memories × Control Stacks

and so a typical configuration is γ = hS, M, Ci. The idea is that we interpret commands and
produce as our interpretation proceeds, stacks C, of control information (initially a command
but later bits of commands). Along the way we accumulate partial results (when evaluating
expressions), and bits of command text which will be needed later; this is all put (for some
reason) on the value stack, S. Finally we have a model of the store (= memory) as a function
M : Var −→ N which given a variable, v, says what its value M (v) is in the store.

Notation: In order to discuss updating variables, we introduce for a memory, M , natural


number, m, and variable v the memory M 0 = M [m/v] where

0 0
m

(if v 0 = v)
M (v ) =
 M (v 0 )

(otherwise)

So M [m/v] is the memory resulting from updating M by changing the value of v from M (v)
to m.

The transition relation, ⇒, is defined by cases according to what is on the top of the control
stack.

15
• Expressions

En hS, M, n Ci ⇒ hn S, M, Ci
Ev hS, M, v Ci ⇒ hM (v) S, M, Ci
+ + +
E −I hS, M, e − e0 Ci ⇒ hS, M, e e0 − Ci
∗ ∗ ∗
+ +
0
E −E hm m S, M, − Ci ⇒ hn S, M, Ci
∗ ∗
+
(where n = m − m0 )

Note 1 The symbols +, −, ∗, are being used both as symbols of L and to stand for the
functions addition, subtraction and multiplication.
• Boolean Expressions

Bt hS, M, t Ci ⇒ ht S, M, Ci
B=I hS, M, e = e0 Ci ⇒ hS, M, e e0 = Ci
B=E hm0 m S, M, = Ci ⇒ ht S, M, Ci
(where t = (m = m0 ))
B or I hS, M, b or b0 Ci ⇒ hS, M, b b0 or Ci
B or E ht0 t S, M, or Ci ⇒ ht00 S, M, Ci
(where t00 = (t ∨ t0 ))
B∼I hS, M, ∼ b Ci ⇒ hS, M, b ∼ Ci
B∼E ht S, M, ∼ Ci ⇒ ht0 S, M, Ci
(where t0 = ∼ t)

16
• Commands

C nil hS, M, nil Ci ⇒ hS, M, Ci


C := I hS, M, v := e Ci ⇒ hv S, M, e := Ci
C := E hm v S, M, := Ci ⇒ hS, M [m/v], Ci
C; hS, M, c; c0 Ci ⇒ hS, M, c c0 Ci
C if I hS, M, if b then c else c0 Ci ⇒ hc c0 S, M, b if Ci
C if E ht c c0 S, M, if Ci ⇒ hS, M, c00 Ci
(where if t = tt then c00 = c else c00 = c0 )
C while I hS, M, while b do c Ci ⇒ hb c S, M, b while Ci
C while E1 htt b c S, M, while Ci ⇒ hS, M, c while b do c Ci
C while E2 hff b c S, M, while Ci ⇒ hS, M, Ci

Now that we have at some length defined the transition relation, the terminal configurations
are defined by:

T = {hε, M, εi}

and an input function in : Commands × Memories −→ Γ is defined by:

in(C, M ) = hε, M, Ci

and out : T −→ Memories by:

out(hε, M, εi) = M

The behaviour of the SMC-machine is then a partial function, Eval : Commands×Memories −→


P
Memories and clearly:

Eval(C, M ) = M 0 ≡ hε, M, Ci ⇒∗ hε, M 0 , εi

Example 10 (Factorial)

C0
z }| {
y := 1; while ∼(x = 0) do y := y ∗ x; x := x − 1
| {z }
C

hε, h3, 5i, y := 1; Ci


⇒ hε, h3, 5i, y := 1 Ci by C;
⇒ hy, h3, 5i, 1 := Ci by C := I

17
⇒ h1 y, h3, 5i, := Ci by Em
⇒ hε, h3, 1i, Ci by C := E
⇒ h∼(x = 0) C 0 , h3, 1i, ∼(x = 0) whilei by C while I
⇒ h∼(x = 0) C 0 , h3, 1i, (x = 0) ∼ whilei by E∼I
⇒ h∼(x = 0) C 0 , h3, 1i, x 0 = ∼ whilei by E=I
⇒ h3 ∼(x = 0) C 0 , h3, 1i, 0 = ∼ whilei by Ev
⇒ h0 3 ∼(x = 0) C 0 , h3, 1i, = ∼ whilei by Em
⇒ hff ∼(x = 0) C 0 , h3, 1i, ∼ whilei by E=E
⇒ htt ∼(x = 0) C 0 , h3, 1i, whilei by E∼E
⇒ hε, h3, 1i, C 0 Ci by C while E1
⇒ hε, h3, 1i, y := y ∗ x x := x − 1 Ci by C;
⇒∗ hε, h3, 3i, x := x − 1 Ci
⇒∗ hε, h2, 3i, Ci
⇒∗ hε, h1, 6i, Ci
⇒∗ hε, h0, 6i, Ci
⇒ h∼(x = 0) C 0 , h0, 6i, ∼(x = 0) whilei by C while I
⇒∗ hff ∼(x = 0) C 0 , h0, 6i, whilei
⇒ hε, h0, 6i, εi by C while E2

Many other machines have been proposed along these lines. It is, perhaps, fair to say that
none of them can be considered as directly formalising the intuitive operational semantics to
be found in most language definitions. Rather they are more or less clearly correct on the basis
of this intuitive understanding. Further, although this is of less importance, they all have a
tendency to pull the syntax to pieces or at any rate to wander around the syntax creating
various complex symbolic structures which do not seem particularly forced by the demands
of the language itself. Finally, they do not in general have any great claim to being syntax-
directed in the sense of defining the semantics of compound phrases in terms of the semantics of
their components, although the definition of the transition relation does fall into natural cases
following the various syntactical possibilities.

1.5 Exercises

Finite Automata

Let M = hQ, Σ, δ, q0 , F i be a finite automaton.

1. Redefine the behaviour of M so that it accepts infinite strings a1 a2 . . . an . . ., that is so


that L(M ) ⊆ Σω . [Hint: There are actually two answers, which can with difficulty be
proved equivalent.]

18
2. Suppose that δ were changed so that the labelled transition relation had instead the form:
a
q −→ q1 , q2

and F so that F ⊆ Q × Σ. What is the new type of δ? How can binary trees like
a
A
 A
 A
 A
b c
A
 A
 A
 A
now be accepted by M ? d e

3. Suppose instead transitions occurred with probability so that we had


a
q −→ q 0
p

with 0 ≤ p ≤ 1 and for any q and a:


a
Σ{p | q −→ q 0 for some q 0 } ≤ 1
p

What is a good definition of behaviour now?

4. Finite automata can be turned into transducer by taking δ to be a finite set of transitions
of the form:
v
q −→ q 0
w

v
with v, w ∈ Σ∗ . Define the relation q −→
w
q 0 and the appropriate notion of behaviour.
Show any finite-state transducer can be turned into an equivalent one, where we have in
any transition that 0 ≤ |v| ≤ 1.

Various Machines

5. Define k counter machines. Show that any function computable by a k counter machine
is computable by a 3-counter machine. [Hint: First program elementary functions on the
3-counter machine including pairing, pair : N2 −→ N, and selection functions, fst, snd :
N −→ N such that:

fst(pair(m, n)) = m
snd(pair(m, n)) = n

19
Then simulate by coding all the registers of the k counter machine by a big tuple held in
one of the registers of the 3-counter machine.]
Show that any partial-recursive function (= one computable by a Turing Machine) can
be computed by some 3-counter machine (and vice-versa).

6. Consider stack machines where the registers hold stacks and operations on a stack (=
element of Σ∗ ) are pusha , pop, ishda (for each a ∈ Σ) given by:

pusha (w) = aw
pop(aw) = w
 true

(if w = aw0 for some w0 )
ishda (w) =
 false

(otherwise)

Show stack machines compute the same functions as Turing Machines. How many stacks
are needed at most?

7. Define and investigate queue machines.

8. See how your favourite machines (Turing Machines, Push-Down Automata) fit into our
framework. For a general view of machines, consult the eminently readable: [Bir] or [Sco].
Look too at [Gre].

Grammars

9. For CF grammars our notion of behaviour is adapted to generation. Define a notion that
is good for acceptance. What about mixed generation/acceptance? Change the definitions
so that you get parse trees as behaviour. What is the nicest way you can find to handle
syntax-directed translation schemes?

10. Show that for LL(1) grammars you can obtain deterministic labelled (with Σ) transitions
of the form

a
w −→ w0

with w strings of terminals and non-terminals. What can you say about LL(k), LR(k)?

11. Have another look at other kinds of grammar too, e.g., Context-Sensitive, Type 0 (=
arbitrary) grammars. Discover other ideas for Transition Systems in the literature. Ex-
amples include: Tag, Semi-Thue Systems, Markov Algorithms, λ-Calculus, Post Systems,
L-Systems, Conway’s Game of Life and other forms of Cell Automata, Kleene’s Nerve
Nets . . .

20
Petri Nets

12. Show that if we have


m
 A
 A 0
X  A X
 A
 AAU
m0 m00
−1 −1
where F (X) ∩ F (X ) = ∅ (i.e., no conflict between X and X 0 ) then for some m000 we
0

have:
m

J
0
X

J
J X

J

 J^
J
m0 Y m00
J

X0 J
X
J

J ?

J 
m000 where Y = X ∪ X 0
This is a so-called Church-Rosser Property.
X
13. Show that if we have m −→ m0 where X = {e1 , . . . , ek } then for some m1 , . . . , mk we
have:
{e1 } {e2 } {ek }
m −→ m1 −→ · · · −→ mk = m

What happens if we remove the restrictions on finiteness?

14. Write some Petri Nets for a parallel situation you know well (e.g., for something you knew
at home or some computational situation).

15. How can nets accept languages (= subsets of Σ∗ )? Are they always regular?

16. Find, for the Readers and Writers net given above, all the cases you can reach by transition
sequences starting at the initial case. Draw (nicely!) the graph of cases and transitions
(this is a so-called case graph).

Interpreting Automata

17. Let G = hN, Σ, P, Si be a context-free grammar. It is strongly unambiguous if there are


no two leftmost derivations of the same word in Σ∗ , even possibly starting from different
non-terminals. Find suitable conditions on the productions of P which ensure that G0 =

21
hN, Σ0 , P 0 , Si is strongly unambiguous where Σ0 = Σ ∪ {(, )} where the parentheses are
assumed not to be in N or Σ and where

T −→ (w) is in P 0 if T −→ w is in P .

18. See what changes you should make in the definition of the interpreting automaton when
some of the following features are added:

e ::= if b then e else e | begin c result e

c ::= if b then c |
case e of e1 : c
..
.
ek : c
end |
for v := e, e do c |
repeat c until b

19. Can you handle constructions that drastically change the flow of control such as:

c ::= stop | m : c | goto m

(Here stop just stops everything!)

20. Can you handle elementary read/write instructions such as:

c ::= read(v) | write(e)

[Hint: Consider an analogy with finite automata – especially transducers.]

21. Can you add facilities to the automaton to handle run-time errors?

22. Can you produce measures of time/space complexity by adding extra components to the
automaton?

23. Can you treat diagnostic (debugging, tracing) facilities?

24. What about real-time? That is suppose we had the awful expression:

e ::= time

which delivers the correct time.

25. Treat the following PASCAL subset. The basic sets are T, N and x ∈ I × {i, r, b} – the
set of typical identifiers (which is infinite) and o ∈ O – the set { =, <>, <, <=, >, >=,

22
+, −, ∗, /, div, mod, and } of operations. The idea for typical identifiers is that i, r, b
are type symbols for integer, real and boolean respectively and so hFRED, ri is the real
identifier FRED.
The derived sets are expressions and commands where:

e ::= m | t | v | −e | not e | e o e
c ::= nil | v := e | c; c0 | if e then c else c0 | while e do c

The point of the question is that you must think about compile-time type-checking and
the memories used in the hS, M, Ci machine should be finite (even although there are
potentially infinitely many identifiers).

26. Can you treat the binding mechanism

s ::= i | r | b
c ::= var v : s begin c end

so that you must now incorporate symbol tables?

2 Bibliography

[Bir] Bird, R. (1976) Programs and Machines, Wiley and Sons.


[Bra] Brauer, W., ed. (1980) Net Theory and Applications, LNCS 84, Springer.
[Gor] Gordon, M.J. (1979) The Denotational Description of Programming Languages,
Springer.
[Gre] Greibach, S.A. (1975) Theory of Program Structures: Schemes, Semantics, Verification,
LNCS 36, Springer.
[Kel] Keller, R.M. (1976) Formal Verification of Parallel Programs, Communications of the
ACM 19(7):371–384.
[Lan] Landin, P.J. (1966) A Lambda-calculus Approach, Advances in Programming and Non-
numerical Computation, ed. L. Fox, Chapter 5, pp. 97–154, Pergamon Press.
[Oll] Ollengren, A. (1976) Definition of Programming Languages by Interpreting Automata,
Academic Press.
[Pet] Peterson, J.L. (1977) Petri Nets, ACM Computing Surveys 9(3):223–252.
[Sco] Scott, D.S. (1967) Some Definitional Suggestions for Automata Theory, Journal of
Computer and System Sciences 1(2):187–212.
[Ten] Tennent, R.D. (1981) Principles of Programming Languages, Prentice-Hall.
[Weg] Wegner, P. (1972) The Vienna Definition Language, ACM Computing Surveys 4(1):5–
63.

23
3 Simple Expressions and Commands

The hS, M, Ci machine emphasises the idea of computation as a sequence of transitions involv-
ing simple data manipulations; further the definition of the transitions falls into simple cases
according to the syntactic structure of the expression or command on top of the control stack.
However, many of the transitions are of little intuitive importance, contradicting our idea of
the right choice of the “size” of the transitions. Further the definition of the transitions is not
syntax-directed so that, for example, the transitions of c; c0 are not directly defined in terms of
those for c and those for c0 . Finally but really the most important, the hS, M, Ci machine is
not a formalisation of intuitive operational ideas but is rather, fairly clearly, correct given these
intuitive ideas.

In this chapter we develop a method designed to answer these objections, treating simple
expressions and commands as illustrated by the language L. We consider run-time errors and
say a little on how to establish properties of transition relations. Finally we take a first look at
simple type-checking.

3.1 Simple Expressions

Let us consider first the very simple subset of expressions given by:

e ::= m | e0 + e1

24
and how the hS, M, Ci machine deals with them. For example we have the transition sequence
for the expression (1 + (2 + 3)) + (4 + 5):

hε, M, (1 + (2 + 3)) + (4 + 5)i −→ hε, M, (1 + (2 + 3)) (4 + 5) +i


−→ hε, M, 1 (2 + 3) + (4 + 5) +i
−→ h1, M, (2 + 3) + (4 + 5) +i
−→ h1, M, 2 3 + + (4 + 5) +i
−→ h2 1, M, 3 + + (4 + 5) +i
−→ h3 2 1, M, + + (4 + 5) +i (∗)
−→ h5 1, M, + (4 + 5) +i (∗)
−→ h6, M, (4 + 5) +i
−→3 h5 4 6, M, + +i (∗)
−→ h9 6, M, +i (∗)
−→ h15, M, εi

In these 13 transitions only the 4 additions marked (∗) are of any real interest as system events.
Further the intermediate structures generated on the stacks are also of little interest. Preferable
would be a sequence of 4 transitions on the expression itself thus:

5 5
(1 + (2 + 3)) + (4 + 5)) −→ (1 + 5) + (4 + 5)
5
−→ 6 + (4 + 5)
5
−→ 6 + 9
−→ 15

where we are ignoring the memory and we have marked the occurrences of the additions in each
transition. (These transition sequences of expressions are often called reduction sequences (=
derivations) and the occurrences are called redexes; this notation originates in the λ-calculus
(see, e.g., [Hin]).)

Now consider an informal specification of this kind of expression evaluation. Briefly one might
just say one evaluates from left-to-right. More pedantically one could say:

Constants Any constant, m, is already evaluated with itself as value.


Sums To evaluate e0 + e1
(1) Evaluate e0 obtaining m0 , say, as result.
(2) Evaluate e1 obtaining m1 , say, as result.

25
(3) Add m0 to m1 obtaining m2 , say, as result.
This finishes the evaluation and m2 is the result of the evaluation.

Note that this specification is syntax-directed, and we use it to obtain rules for describing steps
(= transitions) of evaluation which we think of as nothing else than a derivation of the form:

e = e1 −→ e2 −→ . . . −→ en−1 −→ en = m

(where m is the result). Indeed if we just look at the first step we see from the above specification
that

(1) If e0 is not a constant the first step of the evaluation of e0 + e1 is the first step of the
evaluation of e0 .
(2) If e0 is a constant, but e1 is not, the first step of the evaluation of e0 + e1 is the first step
of the evaluation of e1 .
(3) If e0 and e1 are constants the first (and last!) step of the evaluation of e0 +e1 is the addition
of e0 and e1 .

Clearly too the first step of evaluating an expression, e, can be taken as resulting in an expression
e0 with the property that the evaluation of e is the first step followed by the evaluation of e0 .
We now put all this together to obtain rules for the first step. These are rules for establishing
binary relationships of the form:

e −→ e0 ≡ e0 is the result of the first step of the evaluation of e.

Rules: Sum
e0 −→ e00
(1)
e0 + e1 −→ e00 + e1
e1 −→ e01
(2)
m0 + e1 −→ m0 + e01
(3) m0 + m1 −→ m2 (if m2 is the sum of m0 and m1 )

Thus, for example, rule 1 states what is obvious from the above discussion:

If e00 is the result of the first step of the evaluation of e0 then e00 + e1 is the result of the first
step of the evaluation of e0 + e1 .

We now take these rules as a definition of what relationships hold – namely exactly these we
can establish from the rules. We take the above discussion as showing why this mathematical
definition makes sense from an intuitive view; it is the direct formalisation referred to above.

As an example consider the step:

(1 + (2 + 3)) + (4 + 5) −→ (1 + 5) + (4 + 5)

26
To establish this step we have

1. 2 + 3 −→ 5 (By rule 3)
2. 1 + (2 + 3) −→ 1 + 5 (By rule 2)
3. (1 + (2 + 3)) + (4 + 5) −→ (1 + 5) + (4 + 5) (By rule 1)

Rather than this unnatural “bottom-up” method we usually display these little proofs in the
“top-down” way they are actually “discovered”. The arrow is supposed to show the “direction”
of discovery:

6
Sum 1
(1+(2+3))+(4+5) −→ (1+5)+(4+5))

Sum 2
1 + (2+3) −→ 1 + 5

Sum 3
2+3 −→ 5

& %

Thus, while the evaluation takes four steps, the justification (proof) of each step has a certain
size of its own (which need not be displayed). In this light the hS, M, Ci machine can be viewed
as mixing-up the additions with the reasons why they should be performed into one long linear
sequence.

It could well be argued that our formalisation is not really that direct. A more direct approach
would be to give rules for the transition sequences themselves (the evaluations). For the intuitive
specification refers to these evaluations rather than any hypothetical atomic actions from which
they are composed. However, axiomatising a step is intuitively simpler, and we prefer to follow
a simple approach until it leads us into such difficulties that it is better to consider whole
derivations.

Another point concerns the lack of formalisation of our ideas. The above rules are easily turned
into a formal system of formulae, axioms and rules. What we would want is a sufficiently
elastic conception of a range of such formal systems which on the one hand allows the natural
expression of all the systems of rules we wish, and on the other hand returns some profit in
the form of interesting theorems about such systems or interesting computer systems based on
such systems. However, the present work is too exploratory for us to fix our ideas, although we
may later try out one or two possibilities. We also fear that introducing such formalities could
easily lead us into obscurities in the presentation of otherwise natural ideas.

Now we try out more expressions. To evaluate variables we need the memory component of the
hS, M, Ci machines – indeed that is the only “natural” component they have! It is convenient

27
here to change our notation to a more generally accepted one:

OLD NEW

Memory Store

Memories = (Var −→ N) = S

M σ

M [m/v] σ[m/v]

3.1.1 L-Expressions

Now for the expression language of L:

e ::= m | v | (e + e0 ) | (e − e0 ) | (e ∗ e0 )

we introduce the configurations

Γ = {he, σi}

and the relation

he, σi −→ he0 , σi

meaning one step of the evaluation of e (with store σ) results in the expression e0 (with store
σ). The rules are just those we already have, adapted to take account of stores plus an obvious
rule for printing the value of a variable in a store.

Rules: Sum
he0 , σi −→ he00 , σi
(1)
he0 + e1 , σi −→ he00 + e1 , σi
he1 , σi −→ he01 , σi
(2)
hm + e1 , σi −→ hm + e01 , σi

28
(3) hm + m0 , σi −→ hn, σi (where n = m + m0 )
Minus
1,2. Exercise for the reader.
3. hm − m0 , σi −→ hn, σi (if m ≥ m0 and n = m − m0 )
Times
1,2,3. Exercise for the reader.
Variable
(1) hv, σi −→ hσ(v), σi

Note the two uses of the symbol, +, in rule Sum 3: one as a syntactic construct and one for
the addition function. We will often overload symbols in this way relying on the context for
disambiguation. So here, for example, to make sense of n = m+m0 we must be meaning addition
as the left-hand-side of the equation denotes a natural number.

Of course the terminal configurations are those of the form hm, σi, and m is the result of the
evaluation. Note that there are configurations such as:

γ = h5 + (7 − 11), σi

which are not terminal but for which there is no γ 0 with γ −→ γ 0 .

Definition 11 Let hΓ, T, −→i be a tts. A configuration γ is stuck if γ 6∈ T and ¬∃γ 0 . γ −→ γ 0 .

In most programming languages these stuck configurations result in run-time errors. These will
be considered below.

The behaviour of expressions is the result of their evaluation and is defined by:

eval(e, σ) = m ⇔ he, σi −→∗ hm, σi

The reader will see (from 2.3 below, if needed) that eval is a well-defined partial function.

One can also define the equivalence of expressions by:

e ≡ e0 ⇔ ∀σ. eval(e, σ) = eval(e0 , σ)

3.1.2 Boolean Expressions

Now we turn to the Boolean expressions of the language L given by:

b := t | b or b0 | e = e0 | ∼b

Here we take Γ = {hb, σi} and consider the rules for the transition relation. There are clearly
none for truth-values, t, but there are several possibilities for disjunctions, b or b0 . These possi-
bilities differ not only in the order of the transitions, but even on which transitions occur. The
configurations are pairs hb, σi.

29
A. Complete Evaluation: This is just the Boolean analogue of our rules for expressions and
corresponds to the method used by our SMC-machine.
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
hb1 , σi −→ hb01 , σi
(2)
ht or b1 , σi −→ ht or b01 , σi
(3) t or t0 −→ t00 (where t00 = t ∨ t0 )
B. Left-Sequential Evaluation: This takes advantage of the fact that it is not needed to
evaluate b, in tt or b, as the result will be tt independently of the result of evaluating b,
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
(2) htt or b1 , σi −→ htt, σi
(3) hff or b1 , σi −→ hb1 , σi
C. Right-Sequential Evaluation: Like B but “backwards”.
D. Parallel Evaluation: This tries to combine the advantages of B and C by evaluating b0 and
b1 in parallel. In practice that would mean having two processors, one for b0 and one for b1 , or
using one but interleaving, somehow, the evaluations of b0 and b1 . This idea is therefore not
found in the usual sequential programming languages (as opposed to these making explicit
provisions for concurrency). However, it may be useful for hardware specification.
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
hb1 , σi −→ hb01 , σi
(2)
hb0 or b1 , σi −→ hb0 or b01 , σi
(3) htt or b1 , σi −→ htt, σi
(4) hb0 or tt, σi −→ htt, σi
(5) hff or b1 , σi −→ hb1 , σi
(6) hb0 or ff, σi −→ hb0 , σi

The above evaluation mechanisms are very different when subexpressions can have non-terminating
evaluations, when we have the following relationships:

B⇐A
⇓ ⇓
D⇐C

where X ⇒ Y means that if method X terminates with result t, so does method Y. We take
method A for the semantics of our example language L.

30
For Boolean expressions of the form e = e0 our rules depend on those for expressions, but
otherwise are normal (and for brevity we omit the σ’s).

• Equality
e0 −→ e00
(1)
e0 = e1 −→ e00 = e1
e1 −→ e01
(2)
m = e1 −→ m = e01
(3) m = n −→ t (where t is tt if m = n and ff otherwise)

For negations ∼b we have, again omitting the σ’s:

• Negation
b −→ b0
(1)
∼b −→ ∼b0
(2) ∼t −→ t0 (where t0 = ¬t)

The behaviour of Boolean expressions is defined by:

eval(b, σ) = t ⇔ hb, σi −→∗ ht, σi

One can also define equivalence of Boolean expressions by:

b ≡ b0 ⇔ ∀σ. eval(b, σ) = eval(b0 , σ)

3.2 Simple Commands

Again we begin with a trivial language of commands,

c ::= nil | v := e | c; c0

31
and see how the SMC-machine behaves on an example:

hε, abc, z := x; (x := y; y := z)i −→ hε, abc, z := x x := y; y := zi


−→ hz, abc, x := x := y; y := zi
−→ ha z, abc, := x := y; y := zi (∗)
−→ hε, aba, x := y; y := zi
−→ hε, aba, x := y y := zi
−→2 hb x, aba, := y := zi (∗)
−→ hε, bba, y := zi
−→2 ha y, bba, :=i (∗)
−→ hε, baa, εi

And we see that of the eleven transitions only three – the assignments – are of interest as
system events.

Preferable here would be a sequence of three transitions on configurations of the form hc, σi,
thus:
5 5
hz := x; (x := y; y := z), abci −→ h(x := y; y := z), abai
5
−→ hy := z , bbai
−→ baa

where we have marked the assignments occurring in transitions.

Now informally one can specify such command executions as follows:

• Nil: To execute nil from store σ take no action and terminate with σ as the final store of
the execution.
• Assignment: To execute v := e from store σ evaluate e, and if the result is m, change σ to
σ[m/v] (the final store of the execution).
• Composition: To execute c; c0 from store σ
(1) Execute c from store σ obtaining a final store, σ 0 , say, if this execution terminates.
(2) Execute c0 from the store σ 0 . The final store of this execution is also the final store of the
execution of c; c0 .

Sometimes the execution of c; c0 is pictured in terms of a little flowchart:

- c - c0 -

32
As in the case of expressions one sees that this description is syntax-directed. We formalise it
considering terminating executions of a command c from a store σ to be transition sequences
of the form:

hc, σi = hc0 , σ0 i −→ hc1 , σ1 i −→ . . . −→ hcn−1 , σn−1 i −→ σn

Here we take the configurations to be:

Γ = {hc, σi} ∪ {σ}

and the terminal configurations to be

T = {σ}

where the transition relation hc, σi −→ hc0 , σ 0 i (resp. σ 0 ) is read as:

One step of execution of the command c from the store σ results in the store σ 0 and the rest
of the execution of c is the execution of c0 from σ 0 (resp. and the execution terminates).

Thus we choose c0 to represent, in as simple a way as is available, the remainder of the execution
of c after its first step. The rules are

• Nil: hnil, σi −→ σ
• Assignment:
he, σi −→∗ hm, σi
(1)
hv := e, σi −→ σ[m/v]
• Composition:
hc0 , σi −→ hc00 , σ 0 i
(1)
hc0 ; c1 , σi −→ hc00 ; c1 , σ 0 i
hc0 , σi −→ σ 0
(2)
hc0 ; c1 , σi −→ hc1 , σ 0 i

Note: In formulating the rule for assignment we have considered the entire evaluation of the
right-hand-side as part of one execution step. This corresponds to a change in view of the size
of our step when considering commands, but we could just as well have chosen otherwise.

As an example consider the first transition desired above for the execution

hz := x; (x := y; y := z), abci

It is presented in the top-down way


Comp 2
hz := x; (x := y; y := z), abci −→ h(x := y; y := z), abai
Ass 1
hz := x, abci −→ aba
Var 1
hx, abci −→ ha, abci

33
Again we see, as in the case of expressions a “two-dimensional” structure consisting of a “hor-
izontal” transition sequence of the events of system significance and for each transition a “ver-
tical” explanation of why and how it occurs.
γ0 - γ1 - ...... - γn ∈ T
AA AA AA
 A  A  A
 A  A  A
  
 proof A  proof A  proof A
A A A

For terminating executions of c0 ; c1 this will have the form:


c1 c1 0 0 0 0 c2 0 00
< c0 ; c1 , σ >→ · · · → < c0.. ; c1 , σ .. > → < c1 , σ .. > 0 00 0
0 ..0 0 ..0 0 ..00 · · · → σ .. ..
< c0 , σ >→ →< c0 , σ > →σ
AA
AA AA AA
 A ···  A ··· 
 A
 A
 A
A  A  A  A
 A 
 
A

A  A
 A A  A

Again we see that the SMC-machine transition sequences are more-or-less linearisations of these
structures. Note the appearance of rules for binary relations (with additional data components)
such as:
def
R(c, c0 , σ, σ 0 ) = hc, σi −→ hc0 , σ 0 i
def
S(e, e0 , σ) = he, σi −→ he0 , σ 0 i

Later we shall make extensive use of predicates to treat the context-sensitive aspects of syntax
(= the static aspects of semantics). As far as we can see there is no particular need for ternary
relations, although the above discussion on the indirectness of our formalisation does suggest
the possibility of needing relations of variable degree for dealing with execution sequences.

3.3 L-commands

Recalling the syntax of L-commands,

c ::= nil | v := e | c; c0 | if b then c else c0 | while b do c

we see that it remains only to treat conditionals and repetitions.

• Conditionals: To execute if b then c else c0 from σ


1. Evaluate b in σ
2.1 If result was tt execute c from σ.

34
2.2 If result was ff execute c0 from σ.
In pictures we have:
?

""bb
" b
" bb
"
bb b "
b "
tt +

 b
b""
" Q
Q ff
" sb
Q
" b
"
" b
b

c c0

? ?
And the rules are: ∗
hb, σi −→ htt, σi
(1)
hif b then c else c0 , σi −→ hc, σi
hb, σi −→∗ hff, σi
(2)
hif b then c else c0 , σi −→ hc0 , σi

Note: Again we are depending on the transition relation of another syntactic class – here
Boolean expressions – and a whole computation from that class becomes one step of the com-
putation.

Note: No rules for T (if b then c else c0 ) are given as that predicate never applies. For a
conditional is never terminal as one always has at least one action – namely evaluating the
condition.

• While: To execute while b do c from σ


1. Evaluate b
2.1 If the result is tt, execute c from σ. If that terminates with final state σ 0 , execute while b do c
from σ 0 .
2.2 If the result is ff, the execution is finished and the final state is σ.
In pictures we have the familiar flowchart:

35
?

""bb
" b
" bb
"
bb b "
-
b "
b "
b""
6 ?

The rules are:


hb, σi −→∗ htt, σi
(1)
hwhile b do c, σi −→ hc; while b do c, σi
hb, σi −→∗ hff, σi
(2)
hwhile b do c, σi −→ σ

Example 12 Consider the factorial example y := 1; w from Chapter 1, where w = while ∼(x =
0) do c where c = (y := y ∗ x; x := x − 1). We start from the state h3, 5i.

ASS1
hyh3, 5ii −→ hw, h3, 1iii
COMP2
−→ hc; w, h3, 1ii (via WHI)
COMP1
−→ hx := x − 1; w, h3, 3ii (via COMP2 and ASS1)
COMP2
−→ hw, h2, 3ii
COMP2
−→ hc; w, h2, 3ii (WHI)
COMP1
−→ hx := x − 1; w, h2, 6ii (via COMP2 and ASS1)
COMP1
−→ hw, h1, 6ii
−→ hc; w, h1, 6ii
−→ hx := x − 1; w, h1, 6ii
−→ hw, h0, 6ii
COMP2
−→ h0, 6i (via WHI2)

A terminating execution sequence of a while-loop w = while b do c looks like this (omitting


σ’s):

w −→ c; w −→ . . . −→ w −→ c; w −→ . . . −→ w − − − − − − . . . −→ w −→ ·
b −→∗ tt c −→ . . . −→ ·b −→∗ tt c −→ . . . −→ ·b − − − − − − . . . −→ ·; b −→∗ ff

36
L
∗ L
. . . L L
∗ L
. . . L ∗ LL
∗ L L

L L L
L L L L L L
L L L L L L L L
L L L L L L L L
LL L
LL LL LL LL LL L L LL

One can now define the behaviour and equivalence of commands by:

exec(c, σ) = σ 0 ⇔ hc, σi −→∗ σ 0

and

c ' c0 ⇔ ∀σ. exec(c, σ) = exec(c0 , σ)

where we are using Kleene equality, which means that one side is defined iff the other is, and
in that case they are both equal.

3.4 Structural Induction

Although we have no particular intention of proving very much either about or with our oper-
ational semantics, we would like to introduce enough mathematical apparatus to enable us to
establish the truth of such obvious statements as:

if γ 6∈ T then for some γ 0 we have γ −→ γ 0

The standard tool is the principle of Structural Induction (SI). It enables us to prove properties
P (p) of syntactic phrases, and it takes on different forms according to the abstract syntax of the
language. For L we have three such principles, one for expressions, one for Boolean expressions
and one for commands.

Structural Induction for Expressions

Let P (e) be a property of expressions. Suppose that:

(1) For all m in N it is the case that P (m) holds, and


(2) For all v in Var it is the case that P (v) holds, and
(3) For all e and e0 in E if P (e) and P (e0 ) holds so does P (e + e0 ), and
(4) As 3 but for −, and
(5) As 3 but for ∗

Then for all expressions e, it is the case that P (e) holds.

37
We take this principle as being intuitively obvious. It can be stated more compactly by using
standard logical notation:

[(∀m ∈ N. P (m)) ∧ (∀v ∈ Var. P (v))


∧ (∀e, e0 ∈ E. P (e) ∧ P (e0 ) ⊃ P (e + e0 ))
∧ (∀e, e0 ∈ E. P (e) ∧ P (e0 ) ⊃ P (e − e0 ))
∧ (∀e, e0 ∈ E. P (e) ∧ P (e0 ) ⊃ P (e ∗ e0 ))]
⊃ ∀e ∈ E. P (e)

As an example we prove

Fact 13 The transition relation for expressions is deterministic.

PROOF. We proceed by SI on the property P (e) where

P (e) ≡ ∀σ, γ 0 , γ 00 (he, σi −→ γ 0 ∧ he, σi −→ γ 00 ) ⊃ γ 0 = γ 00

Now there are five cases according to the hypotheses necessary to establish the conclusion by
SI.

1. e = m ∈ N Suppose hm, σi −→ γ 0 , γ 00 . But this cannot be the case as hm, σi is stuck.


Thus P (e) holds vacuously.
2. e = v ∈ Var Suppose hv, σi −→ γ 0 , γ 00 . Then as there is only one rule for variables, we
have γ 0 = hσ(v), σi = γ 00 .
3. e = e0 + e1 Suppose he0 + e1 , σi −→ γ 0 , γ 00 . There are three subcases according to why
he0 + e1 , σi −→ γ 0 .
3.1 Rule 1 For some e00 we have he0 , σi −→ he00 , σi and γ 0 = he00 + e1 , σi. Then e0 is not in
N (otherwise he0 , σi would be stuck) and so for some e000 we have he0 , σi −→ he000 , σi and so
γ 00 = he000 + e1 , σi. But by the induction hypothesis applied to e0 we therefore have e00 = e000
and so γ 0 = γ 00 .
3.2 Rule 2 We have e0 = m ∈ N and for some e01 we have he1 , σi −→ he01 , σi and γ 0 = hm+
e01 , σi. Then e1 is not in N and for some e001 we have he1 , σi −→ he001 , σi and γ 00 = hm + e001 , σi.
But applying the induction hypothesis to e1 , we see that e01 = e001 and so γ 0 = γ 00 .
3.3 Rule 3 We have e0 = m0 , e1 = m1 . Then clearly γ 0 = γ 00 .
4. e = e0 − e1
5. e = e0 ∗ e1 These cases are similar to the third case and are left to the reader.

In the above we did not need such a strong induction hypothesis. Instead we could choose a
fixed σ and proceed by SI on Q(e) where:

Q(e) ≡ ∀γ 0 , γ 00 . (he, σi −→ γ ∧ he, σi −→ γ 00 ) ⊃ γ 0 = γ 00

However, this is just a matter of luck (here that the evaluation of expressions does not side
effect the state). Generally it is wise to choose one’s induction hypothesis as strong as possible.

38
The point is that if one’s hypothesis has the form (for example)

P (e) ≡ ∀σ. Q(e, σ)

then when proving P (e0 + e1 ) given P (e0 ) and P (e1 ) one fixes σ and tries to prove Q(e, σ). But
in this proof one is at liberty to use the facts Q(e0 , σ), Q(e0 , σ 0 ), Q(e1 , σ), Q(e1 , σ 00 ) for any σ 0
and σ 00 .

SI for Boolean Expressions

We just write down the symbolic version for a desired property P (b) of Boolean expressions.

[(∀t ∈ T. P (t)) ∧ (∀e, e0 ∈ E. P (e = e0 ))


∧ (∀b, b0 ∈ B. P (b) ∧ P (b0 ) ⊃ P (b or b0 ))
∧ (∀b ∈ B. P (b) ⊃ P (∼b))]
⊃ ∀b ∈ B.P (b)

In general when applying this principle one may need further structural inductions on expres-
sions. For example:

Fact 14 If b is not in T and contains no occurrence of an expression of the form (m − n) where


m < n, then no hb, σi is stuck.

PROOF. We fix σ and proceed by SI on Boolean expressions on the property:

Q(b) ≡ [b 6∈ T ∧ (∀m < n.(m − n) does not occur in b)]


⊃ hb, σi is not stuck

Case 1 b = tt This holds vacuously.


Case 2 b = (e = e0 ) Here there are three subcases depending on the forms of e and e0 .
Case 2.1 If e is not in N, then for some e00 we have he, σi −→ he00 , σi
Lemma For any expression e not in N if e has no subexpressions of the form, m−n, where
m < n, then no he, σi is stuck.
Proof By SI on expressions and left to the reader. 2×

Continuing with case 2.1 we see that he = e0 , σi −→ he00 = e0 , σi so hb, σi is not stuck.
Case 2.2 Here e is in N but e0 is not; the proof is much like case 2.1 and also uses the
lemma.
Case 2.3 Here e, e0 are in N and we an use rule EQU. 3.
Case 3 b = (b0 or b1 ) This is like case 3 of the proof of fact 1.
Case 4 b = ∼b0 If b0 is not in T we can easily apply the induction hypothesis. Otherwise
use rule NEG. 2.

39
This concludes all the cases and hence the proof.

SI for Commands

We just write down the symbolic version for a (desired) property P (c) of commands:

[P (nil) ∧ ∀v ∈ Var, e ∈ E. P (v := e)
∧ (∀c, c0 ∈ C. P (c) ∧ P (c0 ) ⊃ P (c; c0 ))
∧ (∀b ∈ B.∀c, c0 ∈ C. P (c) ∧ P (c0 ) ⊃ P (if b then c else c0 ))
∧ (∀b ∈ B. ∀c ∈ C. P (c) ⊃ P (while b do c))]
⊃ ∀c ∈ C. P (c)

For an example we prove:

Fact 15 If v does not occur on the left-hand-side of an assignment in c, then the execution of
c cannot affect its value. That is if hc, σi −→∗ σ 0 then σ(v) = σ 0 (v).

PROOF. By SI on commands. The statement of the hypothesis should be apparent from the
proof, and is left to the reader.

Case 1 c = nil Clear.


Case 2 c = (v 0 := e) Here v 0 6= v and we just use the definition of σ[m/v 0 ].
Case 3 c = (c0 ; c1 ) Here if hc0 ; c1 , σi −→∗ σ 0 then for some σ 00 we have hc0 , σi −→∗ σ 00 and
hc1 , σ 00 i −→∗ σ 0 . (This requires a lemma for proof by the reader).
Then by the induction hypothesis applied first to c1 and then to c0 we have:

σ 0 (v) = σ 00 (v) = σ(v)

Case 4 c = if b then c0 else c1 Here we easily use the induction hypothesis on c0 and c1
(according to the outcome of the evaluation of b).
Case 5 c = while b do c0 Here we argue on the length of the transition sequence hc, σi −→
. . . −→ σ 0 . This is just an ordinary mathematical induction. In case the sequence has length
0, we have σ 0 = σ. Otherwise there are two cases according to the result of evaluating b. We
just look at the harder one.
Case 5.1 hc, σi −→ hc0 ; c, σi −→ . . . −→ σ1 . Here we see that hc0 , σi −→∗ σ2 (and apply
the main SI hypothesis) and also that hc, σ2 i −→∗ σ1 and a shorter transition sequence to
which the induction hypothesis can therefore be applied.

This particular lemma shows that on occasion we will use other induction principles such as
induction on the length of a derivation sequence.

Another possibility is to use induction on some measure of the size of the proof of an assertion
γ −→ γ 0 (which would, strictly speaking, require a careful definition of the size measure).

40
Anyway we repeat that we will not develop too much “technology” for making these proofs,
but would like the reader to be able, in principle, to check out simple facts.

3.5 Dynamic Errors

In the definition of the operational semantics of L-expressions we allowed configurations of the


kind h(5 + 7) ∗ (10 − 16), σi to stick. Thus, although we did ensure:

γ ∈ T ⊃ ¬∃γ 0 .γ −→ γ 0

we did not ensure the converse. Implementations of real programming languages will ensure the
converse generally by issuing a run-time ( = dynamic) error report and forcibly terminating
the computation. It would therefore be pleasant if we could also specify dynamic errors.

As a first approximation we add an error configuration to the possible configurations of each


of the syntactic classes of L. Then we add some error rules.

• Expressions
· Sum
he0 , σi −→ error
4.
he0 + e1 , σi −→ error
he1 , σi −→ error
5.
hm + e1 , σi −→ error
· Minus
4,5 as for Sum
6. hm − m0 , σi −→ error (if m < m0 )
· Times
4,5 as for Sum
• Boolean Expressions
· Disjunction
4,5 as for Sum
· Equality
4,5 as for Sum
· Negation
hb, σi −→ error
3.
h∼b, σi −→ error
• Commands
· Assignment
he, σi −→ error
2.
hv := e, σi −→ error
· Composition
hc0 , σi −→ error
3.
hc0 ; c1 , σi −→ error

41
· Conditional
hb, σi −→∗ error
3.
hif b then c else c0 , σi −→ error
· Repetition
hb, σi −→∗ error
3.
hwhile b do c, σi −→ error

So the only possibility of dynamic errors in L arises from the subtraction of a greater from a
smaller. Of course other languages can provide many other kinds of dynamic errors: division by
zero, overflow, taking the square root of a negative number, failing dynamic type-checking tests,
overstepping array bounds, missing a dangling reference or reaching an uninitialised location
etc. etc. But the above simple example does at least indicate a possibility.

Fact 16 No L-configuration sticks (with the above rules added).

PROOF. Left to the reader as an exercise.

3.6 Simple Type-Checking

We consider a variant, L0 , of L in which expressions and Boolean expressions are amalgamated


into one syntactic class and have to be sorted out again by type-checking. Here is the language
L0 .

• Basic Syntactic Sets


· Truth-values: t ∈ T
· Numbers: m, n ∈ N
· Variables: v ∈ Var = {a, b, x, x0 , x1 , x2 , . . .}
· Binary Operations: bop ∈ Bop = {+, −, ∗, =, or}
• Derived Syntactic Sets
· Expressions: e ∈ Exp where:

e ::= m | t | v | e0 bop e1 | ∼e

· Commands: c ∈ Com where:

c ::= nil | v := e | c0 ; c1 | if e then c0 else c1 | while e do c

Note: We have taken Var to be infinite in the above in order to raise a little problem (later)
on how to avoid infinite memories.

Many expressions such as (tt + 5) or ∼6 now have no sense to them, and nor do such commands
as if x or 5 then c0 else c1 . To make sense an expression must have a type, and in L0 there are
exactly two possibilities:

42
• Types: τ ∈ Types = {nat, bool}

To see which expressions have types and what they are we will just give some rules for assertions:

e:τ ≡ e has type τ

Note first that the basic syntactic sets have, in a natural way, associated type information.
Clearly we will have truth-values having type bool, numbers having type nat, variables having
type nat and for each binary operation, bop, we have a partial binary function τbop on Types:

+, −, ∗ bool nat = bool nat or bool nat


bool ? ? bool ? ? bool bool ?
nat ? nat nat ? bool nat ? ?

• Rules
Truth-values: t : bool
Numbers: m : nat
Variables: v : nat
e0 : τ0 e1 : τ1
Binary Operations: (where τ2 = τbop (τ0 , τ1 ))
e0 bop e1 : τ2
e : bool
Negation:
∼e : bool
Now for commands we need to sort out those commands which are well-formed in the sense that
all subexpressions have a type and are Boolean when they ought to be. The rules for commands
involve assertions:

Wfc(c) ≡ c is a well-formed command.

Nil: Wfc(nil)
e : nat
Assignment:
Wfc(v := e)
Wfc(c0 ) Wfc(c1 )
Sequencing:
Wfc(c0 ; c1 )
e : bool Wfc(c0 ) Wfc(c1 )
Conditional:
Wfc(if e then c0 else c1 )
e : bool Wfc(c)
While:
Wfc(while e do c)

Of course all of this is really quite trivial and one could have separated out the Boolean ex-
pressions very easily in the first place, as was done with L. However, we will see that the
method generalises to the context-sensitive aspects, also referred to in the literature as the
static semantics.

43
Turning to the dynamic semantics we want now to avoid configurations hc, σi with σ : Var −→
N, as such stores are infinite objects. For we have more or less explicitly indicated that we are
doing (hopefully nice) finitary mathematics. The problem is easily overcome by noting that we
only need σ to give values for all the variables in C, and there are certainly only finitely many
such variables. Consequently for any finite subset V of Var we set:

StoresV = V −→ N

and take the configurations also to be indexed by V

ΓE,V = {he, σi | ∃τ. e : τ, Var(e) ⊆ V, σ ∈ StoresV }


ΓC,V = {hc, σi | Wfc(c), Var(c) ⊆ V, σ ∈ StoresV } ∪ {σ | σ ∈ StoresV }

where Var(e) is the set of variables occurring in e. The rules are much the same as before,
formally speaking. That is they are the same as before but with the variables and metavariables
ranging over the appropriate sets and an added index. So for example in the rule

hc0 , σi −→V σ 0
Comp 2
hc0 ; c1 , σi −→V hc1 , σ 0 i

it is meant that c0 , c1 (and hence c0 ; c1 ) are well formed commands with their variables all in
V and all of the configurations mentioned in the rule are in ΓC,V .

Equally in the rule

he0 , σi −→V he00 , σi


Sum 1
he0 + e1 , σi −→V he00 + e1 , σi

it is meant that all the expressions e0 , e00 , e0 + e1 , e00 + e1 have a type (which must here be
nat) and all their variables are in V and all the configurations mentioned in the rule are in
ΓE,V . Thus the rules define families of transition relations, −→V ⊆ ΓE,V × ΓE,V for expressions,
−→V ⊆ ΓC,V × ΓC,V for commands.

In the above we have taken the definition of Var(e), the variables occurring in e and also of
Var(c) for granted as it is rather obvious what is meant. However, it is easily given by a so-called
definition by structural induction.

Var(t) = Var(m) = ∅
Var(v) = {v}
Var(e0 bop e1 ) = Var(e0 ) ∪ Var(e1 )
Var(∼e) = Var(e)

With this kind of syntax-directed definition what is meant is that it can easily be shown by
SI that the above equations ensure that for any e there is only one V with Var(e) = V . The

44
definition for commands is similar and is left to the reader, the only point of (very slight)
interest is the definition of Var(v := e).

The definition can also be cast in the form of rules for assertions of the form Var(t) = V .

Truth-values: Var(t) = ∅
Numbers: Var(m) = ∅
Variables Var(v) = {v}
Var(e0 ) = V0 Var(e1 ) = V1
Binary Operations:
Var(e0 bop e1 ) = V0 ∪ V1
Var(e) = V
Negation:
Var(∼e) = V

Exercise: Give rules for the assertion Var(e) ⊆ V .

Finally we have a parametrical form of behaviour. For example for commands we have a partial
function:

Exec : CV × StoresV −→ StoresV

where CV = {c ∈ C | Wfc(c) ∧ Var(c) ⊆ V }, given by:

Exec(c, σ) = σ 0 ≡ hc, σi −→∗ σ 0

3.7 Static Errors

The point here is to specify failures in the type-checking mechanism. Here are some rules for a
very crude specification where one just adds a new predicate Error.

• Binary Operations
Error(e0 )
(1)
Error(e0 bop e1 )
Error(e1 )
(2)
Error(e0 bop e1 )
e0 : τ0 e1 : τ1
(3) (if τbop (τ0 , τ1 ) is undefined)
Error(e0 bop e1 )
• Negation
Error(e)
Error(∼e)
• Assignment
Error(e)
(1)
Error(v := e)
e : bool
(2)
Error(v := e)

45
• Sequencing
Error(c0 )
(1)
Error(c0 ; c1 )
Error(c1 )
(2)
Error(c0 ; c1 )
• Conditional
Error(e)
(1)
Error(if e then c0 else c1 )
Error(c0 )
(2)
Error(if e then c0 else c1 )
Error(c1 )
(3)
Error(if e then c0 else c1 )
e : nat
(4)
Error(if e then c0 else c1 )
• While
Error(e)
(1)
Error(while e do c)
Error(c)
(2)
Error(while e do c)
e : nat
(3)
Error(while e do c)

3.8 Exercises

Expressions

1. Try out a few example evaluations.

2. Write down rules for the right-to-left evaluation of expressions, as opposed to the left-to-
right evaluation described above.

3. Write down rules for the parallel evaluation of expressions, so that the following kind of
transition sequence is possible:

(1 + (2 + 3)) + ((4 + 5) + 6) −→ (1 + (2 + 3)) + (9 + 6) −→ (1 + 5) + (9 + 6)


−→ 6 + (9 + 6) −→ 6 + 15 −→ 21

Here one transition is one action of imaginary processors situated just above the leaves
of the expressions (considered as a tree).

4. Note that in the rules if he, σi −→ he0 , σ 0 i then σ 0 = σ. This is the mathematical coun-
terpart of the fact that evaluation of L-expressions produces no side-effects. Rephrase the
rules for L-expressions in terms of relations σ ` e −→ e0 where σ ` e −→ e0 means that
he, σi −→ he0 , σi and can be read as “given σ, e reduces to e0 ”.

46
5. Give rules for “genuine” parallel evaluation where one or more processors as imagined in
3 can perform an action during the same transition. [Hint: Use the idea of exercise 4.]
∗∗
6. Try to develop a method of axiomatising entire derivation sequences. Can you find any
advantages for this idea?

Boolean Expressions

7. Can you find various kinds of rules analogous to those for or for conjunctions b and b0 ?
By the way, the left-sequential construct is often advantageous to avoid array subscripts
going out of range as in:

while (i <= n) and a[i] <> x


do i := i + 3; c

8. Treat the following additions to the syntax

e ::= if b then e0 else e1


b ::= if b0 then b1 else b2

Presumably you will have given rules for the usual sequential conditional. Can you find
and give rules for a parallel conditional analogous to parallel disjunction?

9. Treat the following additions to the syntax which introduce the possibilities of side-effects
in the evaluation of expressions:

e ::= begin c result e

(meaning: execute c then evaluate e) and the assignment expression:

e ::= (v := e)

where the intention is that the value of (v := e) is the value of e but the assignment also
occurs, producing a side-effect in general.

10. Show that the equivalence relations on expressions and boolean expressions are respected
by the program constructs discussed above so that for example:

a) e0 ≡ e00 ∧ e1 ≡ e01 ⊃ (e0 + e1 ) ≡ (e00 + e01 )


b) e0 ≡ e00 ∧ e1 ≡ e01 ⊃ (e0 − e1 ) ≡ (e00 − e01 )
c) e0 ≡ e00 ∧ e1 ≡ e01 ⊃ (e0 = e1 ) ≡ (e00 = e01 )
d) b ≡ b0 ⊃ ∼b ≡ ∼b0

47
Commands

11. Give a semantics for the “desk calculator” command

v+ := e

so that the equivalence

(v+ := e) ≡ (v := v + e)

holds (and you can prove it!)

12. Give a semantics for the ALGOL-60 assignment command

v1 := (v2 := . . . (vn := e) . . .)

so that (see exercise 9) the equivalence

(v1 := (v2 := . . . (vn := e) . . .)) ≡ (v1 := e)

where e = (v2 := . . . (vn := e) . . .) holds, and you can prove it.

13. Treat the simultaneous assignment

v1 := e1 and . . . and vn := en

where the vi must all be different. Execution of this command consists of first evaluating
all the expressions and then performing the assignments.

14. Treat the following variations on the conditional command:

if b then c | unless b then c |


if b1 then c1
else if b2 then c2
..
.
else if bn then cn
else cn+1

and show they can all be eliminated (to within equivalence) in favour of the ordinary
conditional.

15. Treat the simple iteration command

do e times c

and the following variations on repetitive commands like while b do c:

repeat c until b | until b repeat c | repeat c unless b |

48
loop
c1
when b1 do c01 exit
c2
..
.
when bn do c0n exit
cn+1
repeat

where the last construct has n possible exits from the loop.

16. Show that equivalence is respected by the above constructs on commands so that, for
example

a) e ≡ e0 ⊃ (v := e) ≡ (v := e0 )
b) c0 ≡ c00 ∧ c1 ≡ c01 ⊃ c0 ; c1 ≡ c00 ; c01
c) b ≡ b0 ∧ c0 ≡ c00 ∧ c1 ≡ c01 ⊃ if b then c0 else c1 ≡ if b0 then c00 else c01
d) b ≡ b0 ∧ c ≡ c0 ⊃ while b do c ≡ while b0 do c0

17. Redefine behaviour and equivalence to take account of run-time errors. Do the statements
of exercise 16 remain valid?

∗∗
18. Try time and space complexity in the present setting. [Hint: Consider configurations of
the form, say, hc, σ, t, si where

t = “the total time used so far”


s = “the maximum space used so far”]

There is lots to do. Try finding fairly general definitions, define behaviour and equivalence
(approximate equivalence?) and see which program equivalences preserve equivalence. Try
looking at measures for the parallel evaluation of expressions. Try to see what is reasonable
to incorporate from complexity literature. Can you use the benefits of our structured
languages to make standard simulation results easier/nicer for students?

∗∗
19. Try exercises 23 and 24 from Chapter 1 again.

20. Give an operational semantics for L, but where only 1 step of the evaluation of an expres-
sion or Boolean expression is needed for 1 step of execution of a command. Which of the
two possibilities – the “big steps” of the main text or the “little steps” of the exercise –
do you prefer and why?

49
Proof

21. Let c be any command not involving subexpressions of the form (e−e0 ) or while loops but
allowing the simple iteration command of exercise 15. Show that any execution sequence
hc, σi −→∗ . . . terminates.

22. Establish (for L) the following “arithmetic” equivalences:

e0 + 0 ≡ e0
e0 + e1 ≡ e1 + e0
e0 + (e1 + e2 ) ≡ (e0 + e1 ) + e2
etc

Which ones fail if side-effects are allowed in expressions?


Establish the equivalences:
a) if b then c else c ≡ c
b) if b then if b then c0 else c0 else if b then c1 else c01 ≡ if b then c0 else c01
c) if b then if b0 then c0 else c1 else if b0 then c2 else c3 ≡
if b0 then if b then c0 else c2 else if b then c1 else c3 .
Which ones remain true if Boolean expressions have side-effects/need not terminate?

23. Establish or refute each of the following suggested equivalences for the language L (and
slight extensions, as indicated):

a) nil; c ≡ c ≡ c; nil
b) c; if b then c0 else c1 ≡ if begin c result b then c0 else c1
c) (if b then c0 else c1 ); c ≡ if b then c0 ; c else c1 ; c
d) while b do c ≡ if b then (c; while b do c) else nil
e) repeat c until b ≡ c; while ∼b do c

Type-Checking

24. Make L0 a little more realistic by adding a type real, decimals, variables of all three types,
and a variety of operators. Allow nat to real conversion, but not vice-versa.

25. Show that if hc, σi −→ hc0 , σ 0 i and x ∈ Dom(σ)\V ar(c) then σ(x) = σ 0 (x).

26. Show that if hc, σi −→ hc0 , σ 0 i is a transition within ΓC,V and hc, σi −→ hc0 , σ 0 i is a
transition within ΓC,V 0 where V ⊆ V 0 then, if σ = σ  V , it follows that σ 0 = σ 0  V .

50
27. The static error specification is far too crude. Instead one should have a set M of messages
and a relation:

Error(e, m) ≡ m is a report on an error in e

and similarly for commands. Design a suitable M and a specification of Error for L0 . Try
to develop a philosophy of what a nice error message should be. See [Hor] for some ideas.

28. How would you treat dynamic type-checking in L0 ? What would be the new ideas for error
messages (presumably one adds an M (see exercise 27) to the configurations).

3.9 Bibliographical Remarks

The idea of reduction sequences originates in the λ-calculus [Hin] as does the present method
of specifying steps axiomatically where I was motivated by Barendregt’s thesis [Bar1]. I applied
the idea to λ-calculus-like programming languages in [Plo1], [Plo2] and Milner saw how to
extend it to simple imperative languages in [Mil1]. More recently the idea has been applied to
languages for concurrency and distributed systems [Hen1], [Mil2], [Hen2]. The present course
is a systematic attempt to apply the idea as generally as possible. A good deal of progress
has been made on other aspects of reduction and the λ-calculus, a partial survey and further
references can be found in [Ber] and see [Bar2].

Related ideas can be found in work by de Bakker and de Roever. A direct precursor of our
method can be found in the work by Lauer and Hoare [Hoa], who use configurations which have
the rough form hs1 , . . . , sn , σi where the si are statements (includes commands). They define
a next-configuration function and the definition is to some extent syntax-directed. The idea of
a syntax-directed approach was independently conceived and mentioned all too briefly in the
work of Salwicki [Sal].

Somewhat more distantly various grammatical (= symbol-pushing too) approaches have been
tried. For example W-grammars [Cle] and attribute grammars [Mad]; although these defini-
tions are not syntax-directed definitions of single transitions it should be perfectly possible to
use the formalisms to write definitions which are. The question is rather how appropriate the
formalisms would be with regard to such issues as completeness, clarity (= readability), nat-
uralness, realism, modularity (= modifiability + extensionality). One good discussion of some
of these issues can be found in [Mar]. For concern with modularity consult the course notes
of Peter Mosses. Our method is clearly intended to be complete and natural and realistic, and
we try to be clear; the only point is that it is quite informal, being normal finite mathematics.
There must be many questions on good choices of formalism. As regards modularity we just
hope that if we get the other things in a reasonable state, then current ideas for imposing
modularity on specifications will prove useful.

For examples of good syntax-directed English specifications consult the excellent article by

51
Ledgard on ten mini-languages [Led]. These languages will provide you with mini-projects
which you should find very useful in understanding the course, and which could very well be
the basis for more extended projects. For a much more extended example see the ALGOL 68
Report [Wij]. Structural Induction seems to have been introduced to Computer Science by
Burstall in [Bur]; for a system which performs automatic proofs by Structural Induction on
lists see [Boy]. For discussions of what error messages should be see [Hor] and for remarks on
how and whether to specify them see [Mar].

4 Bibliography

[Bar1] Barendregt, H. (1971) Some Extensional Term Models for Combinatory Logic and
Lambda-calculi, PhD thesis, Department of Mathematics, Utrecht University.
[Bar2] Barendregt, H. (1981) The Lambda Calculus, Studies in Logic 103, North-Holland.
[Ber] Berry, G. and Lévy, J-J. A Survey of Some Syntactic Results in the Lambda-calculus,
Proc. MFCS’79, ed. J. Becvár, LNCS 74, pp. 552–566.
[Boy] Boyer, R.S. and Moore, J.S. (1979) A Computational Logic, Academic Press.
[Bur] Burstall, R.M.B. (1969) Proving Properties of Programs by Structural Induction, Com-
puter Journal 12(1):41–48.
[Cle] Cleaveland, J.C. and Uzgalis, R.C. (1977) Grammars for Programming Languages,
Elsevier.
[Hen1] Hennessy, M.C.B. and Plotkin, G.D. (1979) Full Abstraction for a Simple Parallel
Programming Language, Proc. MFCS’79, ed. J. Becvár, LNCS 74, pp. 108–120.
[Hen2] Hennessy, M.C.B., Li, Wei and Plotkin, G.D. (1981) A First Attempt at Translating
CSP into CCS, Proc. ICDCS’81, pp. 105–115, IEEE.
[Hin] Hindley, J.R., Lercher, B. and Seldin, J.P. (1972) Introduction to Combinatory Logic,
Cambridge University Press.
[Hoa] Hoare, C.A.R. and Lauer, P.E. (1974) Consistent and Complementary Formal Theories
of the Semantics of Programming Languages, Acta Informatica 3:135–153.
[Hor] Horning, J.J. (1974) What the Compiler Should Tell The User, Compiler Construction:
An Advanced Course, eds F.L. Bauer and J. Eickel, LNCS 21, pp. 525–548.
[Lau] Lauer, P.E. (1971) Consistent Formal Theories of The Semantics of Programming
Languages, PhD thesis, Queen’s University of Belfast, IBM Laboratories Vienna TR
25.121.
[Led] Ledgard, H.F. (1971) Ten Mini-Languages: A Study of Topical Issues in Programming
Languages, ACM Computing Surveys 3(3):115–146.
[Mad] Madsen, O.L. (1980) On Defining Semantics by Means of Extended Attribute Gram-
mars, Semantics-Directed Compiler Generation, ed. N.D. Jones, LNCS 94, pp. 259–299.
[Mar] Marcotty, M., Ledgard, H.F. and von Bochmann, G. (1976) A Sampler of Formal
Definitions, ACM Computing Surveys 8(2):191-276
[Mil1] Milner, A.J.R.G. (1976) Program Semantics and Mechanized Proof, Foundations of
Computer Science II, eds K.R. Apt and J.W. de Bakker, Mathematical Centre Tracts
82, pp. 3–44.

52
[Mil2] Milner, A.J.R.G. (1980) A Calculus of Communicating Systems, LNCS 92.
[Plo1] Plotkin, G.D. (1975) Call-by-name, Call-by-value and the Lambda-calculus, Theoretical
Computer Science 1(2):125–159.
[Plo2] Plotkin, G.D. (1977) LCF Considered as a Programming Language, Theoretical Com-
puter Science 5(3):223–255.
[Sal] Salwicki, A. (1976) On Algorithmic Logic and its Applications, Mathematical Institute,
Polish Academy of Sciences.
[Wij] van Wijngaarden, A., Mailloux, B.J., Peck, J.E.L., Koster, C.H.A., Sintzoff, M., Lind-
sey, C.H., Meertens, L.G.T. and Fisker, R.G. (1975) Revised Report on the Algorithmic
Language ALGOL 68, Acta Informatica 5:1–236.

53
5 Definitions and Declarations

5.1 Introduction

In this chapter we begin the journey towards realistic programming languages by considering
binding mechanisms which enable the introduction of new names in local contexts. This leads to
definitions of local variables in applicative languages and declarations of constant and variable
identifiers in imperative languages. We will distinguish the semantic concepts of environments
and stores. The former concerns those aspects of identifiers which do not change throughout
the evaluation of expressions or the execution of commands and so on; the latter concerns those
aspects which do as in side-effects in the evaluation of expressions or the effects of the execution
of commands. In the static semantics context-free methods no longer suffice, and we show how
our rules enable the context-sensitive aspects to be handled in a natural and syntax-directed
way.

5.2 Simple Definitions in Applicative Languages

We consider a little applicative (= functional) language with simple local definitions of variables.
It can be considered as a first step towards full-scale languages like ML [Gor].

• Syntax Basic Sets


Numbers: m, n ∈ N
Binary Op.: bop ∈ Bop = {+, −, ∗}
Variables: x, y, z ∈ Var = {x1 , x2 , . . .}
• Derived Sets
Expressions: e ∈ Exp where

e ::= m | x | e0 bop e1 | let x = e0 in e1

Note: Sometimes let x = e0 in e1 is written instead as e1 where x = e0 . From the point of


view of readability the first form is preferable when a bottom-up style is appropriate, and the
second where a top-down style is appropriate. For in the first case one first defines x and then
uses it, and in the second it is used before being defined.

Clearly any expression contains various occurrences of variables, and in our language there are
two kinds of these occurrences. First we have defining occurrences where variables are intro-
duced; second we have applied occurrences where variables are used. For example, considering
the figure below the defining occurrences are 2, 6, 9 and the others are applied. In some lan-
guages - but not ours! - one finds other occurrences which can fairly be termed useless.

54
x1 ∗ ( let x2 = 5 ∗ y 3 ∗ x4
in x5 + ( let y 6 = 14 − x7
in y 8 + ( let x9 = 3 + x10 + x11
in x12 ∗ y 13 )))

Some Variable Occurrences

Now the region of program text over which defining occurrences have an influence is known as
their scope. One often says, a little loosely, that, for example, the scope of the first occurrence
of x in e = let x = e0 in e1 is the expression e1 . But then one considers examples such as
that of the above figure, where occurrence 12 is not in the scope of 2 (as it is instead in the
scope of 9), this is called a hole in the scope of 2. It is more accurate to say that the scope
of a defining occurrence is a set of applied occurrences. In the case of let x = e0 in e1 the
scope of x is all those applied occurrences of x in e1 , which are not in the scope of any defining
occurrence of x in e1 . Thus in the case of figure 1 we have the following table showing which
applied occurrences are in the scope of which defining occurrences (equivalently which defining
occurrences bind which applied occurrences).

Defining Occurrence Applied Occurrences


2 {5, 7, 10, 11}
6 {8, 13}
9 {12}

Note that each applied occurrence is in the scope of at most one defining occurrence. Those
not in any scope are termed free (versus bound); for example occurrences 1, 3, 4 above are free.
One can picture the bindings and the free variables by means of a drawing with arrows such
as:
3


let x = 5 + y
6
6 3



in let y = 4 + x + y
S
o
S 3


S 
in x + y + z

From the point of view of semantics it is irrelevant which identifiers are chosen just so long as
the same set of bindings is generated. (Of course a sensible choice of identifiers greatly affects
readability, but that is not a semantic matter.) All we really need are the arrows, but it is

55
hard to accommodate then into our one-dimensional languages. In the literature on λ-calculus
one does find direct attempts to formalise the arrows and also attempts to eliminate variables
altogether; as in Combinatory Logic [Hin]; in Dataflow one sees graphical languages where the
graphs display the arrows [Ack].

Static Semantics

Free Variables: The following definition by structural induction is of FV(e), the set of free
variables (= variables with free occurrences) of e:

m x e0 bop e1 let x = e0 in e1

FV ∅ {x} FV(e0 ) ∪ FV(e1 ) FV(e0 ) ∪ (FV(e1 )\{x})

Example 17

FV(let x = 5 + y in ( let y = 4 + y + z in x + y + z))


= FV(5 + y) ∪ (FV( let y = 4 + x + y in x + y + z)\{x})
= {y} ∪ (({x, y} ∪ ({x, y, z}\{y}))\{x})
= {y} ∪ ({x, y, z}\{x})
= {y, z}

Dynamic Semantics

For the most part applicative languages have no concept of state; there is only the evaluation
of expressions in different environments (= semantic contexts). We take:

EnvV = (V −→ N)
P
for any finite subset of V of the set Var of variables and let ρ range over Env = V EnvV and
write ρ : V to mean that ρ is in EnvV . Of course EnvV = StoreV , but we introduce a new
notation in order to emphasise the new idea.

The set of configurations is also parameterised on V and

ΓV = {e ∈ Exp | FV(e) ⊆ V }
TV = N

56
The transition relation is now relative to an environment and for any ρ : V and e, e0 in ΓV we
write

ρ `V e −→ e0

and read that in (= given) environment ρ one step of the evaluation of the expression e results
in the expression e0 . The use of the turnstile is borrowed from formal logic as we wish to think
of the above as an assertion of e −→ e0 conditional on ρ which in turn is thought of as an
assertion supplied by the environment on the values of the free variables of e and e0 . As this
environment will not change from step to step of the evaluation of an expression, we will often
use, fixing ρ in the transition relation, the transitive reflexive closure ρ `V e −→∗ e0 . It is left
to the reader to define relative transition systems.

Rules:

Variables: ρ `V x −→ ρ(x)
Binary Operations: (1) ρ `V e0 −→ e00 ⇒ ρ `V e0 bop e1 −→ e00 bop e1
(2) ρ `V e1 −→ e01 ⇒ ρ `V m bop e1 −→ m bop e01
(3) ρ `V m bop m0 −→ n (where n = m bop m0 )

Note: To save space we are using an evident horizontal lay-out for our rules. That is the rule:
A1 . . . . . . Ak
A
can alternatively be written in the form

A1 , . . . . . . , Ak ⇒ A.

Definition 18 Informally, to evaluate e = let x = e0 in e1 given ρ

(1) Evaluate e0 given ρ to get the value m0 .


(2) Change from ρ to ρ0 = ρ[m0 /x].
(3) Evaluate e1 given ρ0 to get the value m.

Then m is the value of e given ρ.

The rules for one step of the evaluation are:

ρ `V e0 −→ e00
(1)
ρ `V let x = e0 in e1 −→ let x = e00 in e1
ρ[m/x] `V ∪{x} e1 −→ e01
(2)
ρ `V let x = m in e1 −→ let x = m in e01
(3) ρ `V let x = m in n −→ n

Of course these rules are just a clearer version of those given in Chapter 2 for expressions (as
suggested in exercise 4). Continuing the logical analogy our rules look like a Gentzen system

57
of natural deduction [Pra] written in a linear way. Possible definitions of behaviour are left to
the reader.

5.3 Compound Definitions

In general it is not convenient just to repeat simple definitions, and so we consider several ways
of putting definitions together. The category of expressions is now:

e ::= m | x | e0 bop e1 | let d in e

where d ranges over the category Def of definitions where:

d ::= nil | x = e | d0 ; d1 | d0 and d1 | d0 in d1

To understand this it is convenient to think in terms of import and export. An expression, e,


imports values for its free variables from its environment (and produces a value). This can be
pictured as:
Q
Q
Q
Q
x - Q
e Q

-






An Expression

where x is a typical free variable of e. A definition, d, imports values for its free variables and
exports values for its defining variables (those with defining occurrences). This can be pictured
as:

x- y -
d

A Definition

These are dataflow diagrams and they also help explain compound expressions and definition.
For example a definition block let d in e imports from its environment into d and then d exports
into e with any other needed imports of e coming from the block environment. Pictorially

58
a- - x

d Q
- y
Q
- Q
b -. Q
Q
- e Q
-
c- 




A Definition Block

Here a is a typical variable imported by d but not e, and b is one imported by d and e, and
c is one imported by e and not d; again x is a variable exported by d and not imported by e
(useless but logically possible), and y is a variable exported by d and imported by e. Of course
we later give a precise explanation of all this by formal rules of an operational semantics.

Turning to compound definitions we have sequential definition, d0 ; d1 , and simultaneous def-


initions, d0 and d1 , and private definitions, d0 in d1 . What d0 ; d1 does is import from the
environment into d0 and export from d0 into d1 (with any additional exports needed for d1
being taken from the environment); then d0 ; d1 exports from both d0 and d1 with the latter
taking precedence for common exports. Pictorially (and we need a picture!):

a -
- - x
b - y d1 - y
- -
- z -

d0 - z
c - - u

Sequential Definition

Simultaneous definition is much simpler; d0 and d1 imports into both d0 and d1 from the
environment and then exports from both (and there must be no common defined variable).
Pictorially

59
a -

d0 - x
-

b - .

d1 - y
c -

Simultaneous Definition

Finally, a private definition d0 in d1 is just like a sequential one, except that the only exports
are from d1 . It can be pictured as:

a -
- - x
b - d1
- y - y
z
d0
c - u

Private Definition

We may write also d0 in d1 as let d0 in d1 or as private d0 within d1 . Private definitions


provide examples of blocks where the body is a definition. We have already seen blocks with
expression bodies and will see ones with command bodies. Tennent’s Principle of Qualification
says that in principle any semantically meaningful syntactic class can be the body of a block
[Ten]. We shall later encounter other examples of helpful organisational principles.

As remarked in [Ten] many programming languages essentially force one construct to do jobs
better done by several; for instance it is common to try to get something of the effect of both
sequential and simultaneous definition. A little thought should convince the reader that there
are essentially just the three interesting ways of putting definitions together.

Example 19 Consider the expression

let x = 3
in let x = 5 & y = 6 ∗ x
in x + y

60
Depending on whether & is ; or and or in, the expression has the values 35 = 5 + (6 ∗ 5) or
23 = 5 + (6 ∗ 3) or 33 = 3 + (6 ∗ 5).

Static Semantics

We will define the set DV(d) of defined variables of a definition d and also FV(d/e), the set of
free variables of a definition d or expression e.

nil x = e d0 ; d1 d0 and d1 d0 in d1

DV ∅ {x} DV(d0 ) ∪ DV(d1 ) DV(d0 ) ∪ DV(d1 ) DV(d1 )

FV ∅ FV(e) FV(d0 ) ∪ (FV(d1 )\DV(d0 )) FV(d0 ) ∪ FV(d1 ) FV(d0 ) ∪ (FV(d1 )\DV(d0 ))

For expressions the definition of free variables is the same as before except for the case

FV(let d in e) = FV(d) ∪ (FV(e)\DV(d))

Because of the restriction on simultaneous definitions not all expressions or definitions are well-
formed - for example consider let x = 3 and x = 6 in x. So we also define the well-formed ones
by means of rules for a predicate W(d/e) on definitions and expressions.

Rules:

• Definitions
Nil: W(nil)
Simple: W(e) ⇒ W(x = e)
Sequential: W(d0 ), W(d1 ) ⇒ W(d0 ; d1 )
Simultaneous: W(d0 ), W(d1 ) ⇒ W(d0 and d1 ) (if DV(d0 ) ∩ DV(d1 ) = ∅)
Private: W(d0 ), W(d1 ) ⇒ W(d0 in d1 )
• Expressions
Constants: W(m)
Variables: W(x)
Binary Op.: W(e0 ), W(e1 ) ⇒ W(e0 bop e1 )

61
Definitions: W(d), W(e) ⇒ W(let d in e)

Dynamic Semantics

It is convenient to introduce some new notation to handle environments. For purposes of dis-
playing environments consider, for example, ρ : {x, y, z}, where ρ(x) = 1, ρ(y) = 2, ρ(z) = 3.
We will also write ρ as {x = 1, y = 2, z = 3} and drop the set brackets when desired; this
situation makes it clearer that environments can be thought of as assertions.

Next for any V0 , V1 and ρ0 :V0 , ρ1 :V1 we define ρ = ρ0 [ρ1 ]:V0 ∪ V1 by:

 ρ1 (x)

(x ∈ V1 )
ρ(x) = 
 ρ (x)
0 (x ∈ V0 \V1 )

We now have the nice ρ[x = m] to replace the less readable ρ[m/x]. Finally for any ρ0 :V0 , ρ1 :V1
with V0 ∩ V1 = ∅ we write ρ0 , ρ1 for ρ0 ∪ ρ1 . Of course this is equal to ρ0 [ρ1 ], and also to ρ1 [ρ0 ],
but the extra notation makes it clear that it is required that V0 ∩ V1 = ∅.

The expression configurations are parameterised on V by:

ΓV = {e | W(e), FV(e) ⊆ V }

and of course

TV = N

And our transition relation, ρ `V e −→ e0 , is defined only for ρ : V , and e, e0 in ΓV .

For definitions the idea is that just as an expression is evaluated to yield values so is a definition
elaborated to yield a “little” environment (for its defined variables). For example, given ρ =
{x = 1, y = 2, z = 3} the definition x = 5 + x + z; y = x + y + z is elaborated to yield
{x = 9, y = 14}. In order to make this work we add another clause to the definition of Def

d ::= ρ

What this means is that the abstract syntax of declaration configurations allows environments;
it does not mean that the abstract syntax of declarations does so.

In a sense we slipped a similar trick in under the carpet when we allowed numbers as expressions.
Strictly speaking we should only have allowed literals and then allowed natural numbers as part
of the configurations and given rules for evaluating literals to numbers. Similar statements hold
for other kinds of literals. However, there seemed little point in forcing the reader through this
tedious procedure.

62
Returning to definitions we now add clauses for free and defined variables:

FV(ρ) = ∅
DV(ρ) = V (if ρ : V )

and also add for any ρ that W(ρ) holds, and for any V that

ΓV = {d | W(d), FV(d) ⊆ V }

and

TV = {ρ}

and consider for V and ρ : V and d, d0 ∈ ΓV the transition relation

ρ `V d −→ d0

which means that, given ρ, one step of the elaboration of d yields d0 .

Example 20 We shall expect to see that:

x = 1, y = 2, z = 3 ` x = (5 + x) + z; y = (x + y) + z
−→∗ {x = 9}; y = (x + y) + z
−→∗ {x = 9}; {y = 14}
−→ {x = 9, y = 14}

Rules:

• Expressions: As before but with a change for definitions:


Definitions: Informally, to evaluate e1 = let d in e0 in the environment ρ
(1) Elaborate d in ρ yielding ρ0 .
(2) Change from ρ to ρ0 = ρ[ρ0 ].
(3) Evaluate e0 in ρ0 yielding m.
Then the evaluation of e1 yields m. Formally we have:
ρ `V d −→ d0
(1)
ρ `V let d in e −→ let d0 in e
ρ[ρ0 ] `V ∪V0 e −→ e0
(2) (where ρ0 : V0 )
ρ `V let ρ0 in e −→ let ρ0 in e0
(3) ρ `V let ρ0 in m −→ m
• Definitions: The first two cases are self-explanatory.
Nil: ρ `V nil −→ ∅
Simple: (1) ρ `V e −→ e0 ⇒ ρ `V x = e −→ x = e0
(2) ρ `V x = m −→ {x = m}

63
Sequential: Informally to elaborate d0 ; d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0
(2) Elaborate d1 in ρ[ρ0 ] yielding ρ1
Then the elaboration of d0 ; d1 yields ρ0 [ρ1 ]. Formally we have:
ρ `V d0 −→ d00
(1)
ρ `V d0 ; d1 −→ d00 ; d1
ρ[ρ0 ] `V ∪V0 d1 −→ d01
(2) (where ρ0 : V0 )
ρ `V ρ0 ; d1 −→ ρ0 ; d01
(3) ρ `V ρ0 ; ρ1 −→ ρ0 [ρ1 ]
Simultaneous: Informally to elaborate d0 and d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0 .
(2) Elaborate d1 in ρ yielding ρ1 .
Then the elaboration of d0 and d1 yields ρ0 , ρ1 if that is defined. Formally
(1) ρ `V d0 −→ d00 ⇒ ρ `V d0 and d1 −→ d00 and d1
(2) ρ `V d1 −→ d01 ⇒ ρ `V ρ0 and d1 −→ ρ0 and d01
(3) ρ `V ρ0 and ρ1 −→ ρ0 , ρ1
Private: Informally to elaborate d0 in d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0 .
(2) Elaborate d1 in ρ[ρ0 ] yielding ρ1 .
Then the elaboration of d0 in d1 yields ρ1 . Formally
(1) ρ `V d0 −→ d00 ⇒ ρ `V d0 in d1 −→ d00 in d1
(2) ρ[ρ0 ] `V ∪V0 d1 −→ d01 ⇒ ρ `V ρ0 in d1 −→ ρ0 in d01
(where ρ0 : V0 )
(3) ρ `V ρ0 in ρ1 −→ ρ1

Example 21

x = 1, y = 2, z = 3 ` x = (5 + x) + z; y = (x + y) + z
SEQ1
−→ x = (5 + 1) + z; y = (x + y) + z (using SIM1)
SEQ1
−→ x = 9; y = (x + y) + z (using SIM1)
SEQ1
−→ {x = 9}; y = (x + y) + z (using SIM2)
SEQ2
−→ {x = 9}; y = (9 + y) + z
SEQ2
−→ {x = 9}; {y = 14}
SEQ3
−→ {x = 9, y = 14}.

The reader is encouraged here (and generally too) to work out examples for all the other
constructs.

64
5.4 Type-Checking and Definitions

New problems arise in static semantics when we consider type-checking and definitions. For
example one cannot tell whether or not such an expression as x or tt or x + x is well-typed
without knowing what the type of x is and that depends on the context of its occurrence. We
will be able to solve these problems by introducing static environments α to give this type
information and giving rules to establish properties of the form

α `V e : τ

As usual we work by considering an example language.

• Basic Sets
Types: τ ∈ Types = {nat, bool}
Numbers: m, n ∈ N;
Truth-values: t ∈ T;
Variables: x, y, z ∈ Var;
Binary Operations: bop ∈ Bop = {+, −, ∗, =, or}.
• Derived Sets
Constants: con ∈ Con where con ::= m | t
Definitions: d ∈ Def where

d ::= nil | x : τ = e | d0 ; d1 | d0 and d1 | d0 in d1

Expressions: e ∈ Exp where

e ::= con | x | ∼e | e0 bop e1 | if e0 then e1 else e2 |


let d in e

Static Semantics

The definitions of DV(d) and FV(d) are as before as is FV(e) just adding that

FV(if e0 then e1 else e2 ) = FV(e0 ) ∪ FV(e1 ) ∪ FV(e2 )

We now need type environments over V . These form the set

TEnvV = V −→ Types

and the set TEnvV = V TEnvV is ranged over by α and β and we write α : V for α ∈ TEnvV .
P

Of course all the notation α[β] and α, β extends without change from ordinary environments
to type environments.

65
Now for every V and α:V , τ and e with FV(e) ⊆ V we give rules for the relation

α `V e : τ

meaning that given α the expression e is well-formed and has type τ . This will involve us in
giving similar rules for constants and also for every V and α : V , β and definition d with
FV(d) ⊆ V , for the relation

α `V d : β

meaning that given α the definition d is well-formed and yields the type environment β.

Example 22 (1) y = bool ` (let x : nat = 1 in (x = x) or y) : bool


(2) y = bool ` (x : nat = if y then 0 else 1; y : nat = x + 1) : {x = nat, y = nat}

Rules:

• Constants:
Numbers: α `V m : nat
Truth-values: α `V t : bool
• Expressions:
Constants: α `V con : τ ⇒ α `V con : τ (this makes sense!)
Variables: α `V x : α(x)
Negation: α `V e : bool ⇒ α `V ∼e : bool
α `V e0 : τ0 α `V e1 : τ1
Binary Operations: (if τ = τ0 τbop τ1 )
α `V e0 bop e1 : τ
Conditional: α `V e0 : bool, α `V e1 : τ, α `V e2 : τ
⇒ α `V if e0 then e1 else e2 : τ
α `V d : β α[β] `V ∪V0 e : τ
Definition: (where β : V0 )
α `V let d in e : τ

Note that this allows the type of variables to be redefined.

Definition 23

Nil: α `V nil : ∅
Simple: α `V e : τ ⇒ α `V (x : τ = e) : {x = τ }
α `V d0 : β0 α[β0 ] `V ∪V0 d1 : β1
Sequential: (where β0 : V0 )
α `V (d0 ; d1 ) : β0 [β1 ]
α `V d0 : β0 α `V d1 : β1
Simultaneous: (if DV(d0 ) ∩ DV(d1 ) = ∅)
α `V (d0 and d1 ) : β0 , β1
α `V d0 : β0 α[β0 ] `V ∪V0 d1 : β1
Private: (where β0 : V0 )
α `V (d0 in d1 ) : β1

66
It is hoped that these rules are self-explanatory. It is useful to define for any V and α : V and
e with FV(e) ⊆ V the property of being well-formed

WV (e, α) ≡ ∃τ. α `V e : τ

and also for any V , α : V and d with FV(d) ⊆ V the property of being well-formed

WV (d, α) ≡ ∃β. α `V d : β.

Dynamic Semantics

If x has type τ in environment α then in the corresponding ρ it should be the case that ρ(x)
also has type τ ; that is if τ = nat, then we should have ρ(x) ∈ N and otherwise ρ(x) ∈ T. To
this end for any V and α : V and ρ : V −→ N + T we define:

ρ : α ≡ ∀x ∈ V. (α(x) = nat ⊃ ρ(x) ∈ N)


∧ (α(x) = bool ⊃ ρ(x) ∈ T)

and put Envα = {ρ : V −→ N + T | ρ : α}. Note that if ρ0 : α0 and ρ1 : α1 then ρ0 [ρ1 ] : α0 [α1 ]
and so too that (if it makes sense) (ρ0 , ρ1 ) : (α0 , α1 ).

Configurations: We separate out the various syntactic categories according to the possible
type environments.

• Expressions: For every α : V we put Γα = {e | WV (e, α)} and Tα = N + T.


• Definitions: We add the production d ::= ρ as before (but with ρ ranging over the Envα )
and then for every α : V we put Γα = {d | WV (d, α)} and Tα = {ρ}.

Transition Relations:

• Expressions: For every α : V we have the relation where ρ : α and e, e0 ∈ Γα :

ρ `α e −→ e0

• Definitions: For every α : V we have the relation where ρ : α and d, d0 ∈ Γα :

ρ `α d −→ d0

Rules: The rules are much as usual but with the normal constraints that all mentioned ex-
pressions and definitions be configurations and environments be of the right type-environment.
Here are three examples which should make the others obvious.

• Expressions:
ρ[ρ0 ] `α[α0 ] e −→ e0
Definition 2: (where ρ0 : α0 )
ρ `α let ρ0 in e −→ let ρ0 in e0

67
• Definitions:
Simple 2: ρ `α x = con −→ {x = con}
ρ[ρ0 ] `α[α0 ] d1 −→ d01
Sequential 2: (where ρ0 : α0 )
ρ `α ρ0 ; d1 −→ ρ0 ; d01

Example 24

{x = tt, y = 5} `{x=bool,y=nat} let private(x : nat = 1 and y : nat = 2)


within z : nat = x + y
in if x then y + z else y
−→3 let private {x = 1, y = 2}
within z : nat = x + y
in if x then y + z else y
−→4 let private {x = 1, y = 2}
within {z = 3}
in if x then y + z else y
−→ let {z = 3} in if x then y + z else y
−→2 let {z = 3} in y + z
−→4 8.

Declarations in Imperative Languages

The ideas so far developed transfer to imperative languages where we will speak of declarations
(of identifiers) rather than definitions (of variables). Previously we have used stores for imper-
ative languages and environments for applicative ones, although mathematically they are the
same - associations of values to identifiers/variables. It now seems appropriate, however, to use
both environments and stores; the former shows what does not vary and the latter what does
vary when commands are executed.

It is also very convenient to change the definitions of stores by introducing an (arbitrary) infinite
set, Loc, of locations (= references = cells) and taking for any L ⊆ Loc

StoresL = L −→ Values

and
X
Stores = StoresL ( = Loc −→fin Values)
L

68
and putting

Env = ld −→fin (Values + Loc)

The idea is that if in some environment ρ we have an identifier x whose values should not
vary then ρ(x) = that value; otherwise ρ(x) is a location, l, and given a store σ : L (with l
in L) then σ(l) is the value held in the location l (its contents). In the first case we talk of
constant identifiers and in the second we talk of variable identifiers. The former are introduced
by constant declarations like

const x = 5

and the latter by variable declarations like

var x = 5

In all cases declarations will produce new (little) environments, just as before. The general form
of transitions will be:

ρ `l hd, σi −→ hd0 , σ 0 i

where ρ is the elaboration environment and σ, σ 0 are the stores. So, for example we will have

ρ `l hconst x = 5, σi −→ h{x = 5}, σi

and

ρ `l hvar x = 5, σi −→ h{x = l}, σ[l = 5]i (∗)

where l is a certain “new” location.

Locations can be thought of as “abstract addresses” where we do not really want to commit
ourselves to any machine architecture, but only to the needed logical properties. A better way
to think of a location is as an individual or object which has lifetime (= extent); it is created
in a transition such as (∗) and its lifetime continues either throughout the entire computation
(execution sequence) or until it is deleted (= disposed of) (the deletion being achieved either
through such mechanisms as block exit or through explicit storage management primitives
in the language). Throughout its lifetime it has a (varying) contents, generally an ordinary
mathematical value (or perhaps other locations). It is generally referred to by some identifier
and is then said to be the L-value (or left-hand value) of the identifier and its contents, in
some state, is the R-value (right-hand value) of the identifier, in that state. The lifetime of the
location is related to, but logically distinct from the scope of the identifier. Thus we have a
two-level picture

69
ρ σ
.... ....... .......H H .......
x ... HHHH - l. H H - v
6 6 6 6 6

identifier location value


environment store

The L/R value terminology comes from considering assignment statements

x := y

where on the left we think of the variable as referring to a location and on the right as referring
to a value. Indeed we analyse the effect of assignment as changing the contents of the location
to the R-value of y:

ρ ` hx := y, σi −→ σ[ρx = σ(ρy)]

This is of course a more complicated analysis of assignment than in Chapter 2. The L/R ter-
minology is a little inappropriate in that some programming languages write their assignments
in the opposite order and also in that not all occurrences on the left of an assignment are
references to L-values.

The general idea of locations and separation of environments and stores comes from the Scott-
Strachey tradition (e.g., [Gor,Ten,Led]); it is also reminiscent of ideas of individuals in modal
logic [Hug]. In fact we do not need locations for most of the problems we encounter in the rest
of this chapter (see exercise 26) but they will provide a secure foundation for later concepts
such as

• Static binding of the same global variables in different procedure bodies (storage sharing).
• Call-by-reference (aliasing problems).
• Arrays (location expressions).
• Reference types (anonymous references).

On the other hand it would be interesting to see how far one can get without locations and to
what extent programming languages would suffer from their excision (see [Don][Rey]). One can
argue that it is the concept of location that distinguishes imperative from applicative languages.

We now make all this precise by considering a suitable mini-language.

Syntax:

• Basic Sets:
Types: τ ∈ T ypes = {bool, nat}
Numbers: m, n ∈ N
Truth-values: t∈T

70
Binary Operations: bop ∈ Bop
• Derived Sets
Constants: con ∈ Con where con ::= m | t
Expressions: e ∈ Exp where

e ::= con | x | ∼e | e0 bop e1 | if e0 then e1 else e2

Declarations: d ∈ Dec where

d ::= nil | const x : τ = e | var x : τ = e | d0 ; d1 |


d0 and d1 | d0 in d1

Commands: c ∈ Com where

c ::= nil | x := e | c0 ; c1 | if e then c0 else c1 |


while e do c | d; c

Note: On occasion we write begin c end for (c). That is begin . . . end act as command
parentheses, and have no particular semantic significance. However, their use can make scopes
more apparent.

The whole of our discussion of defining, applied, and free and bound occurrences carries over
to commands and is illustrated by the command in figure 2.

var x : bool = tt ;
6
6 >


var y : int = if x then 0 else z;
6 >


const z : bool = if ∼ (x =0) then tt else v;
6
begin

y : = if x then 0 else z
>


x : = tt or v

end
Bindings

Note that left-hand variable occurrences in assignments are applied, not binding.

71
Static Semantics

Identifiers: For expressions we need the set, FI(e), of identifiers occurring freely in e (defined
as usual). For declarations we need the sets FI(d) and DI(d) of identifiers with free and defining
occurrences in d; they are defined just like in the case of definitions and of course

FI(const x : τ = e) = FI(var x : τ = e) = FI(e)


DI(const x : τ = e) = DI(var x : τ = e) = {x}

For commands we only need FI(c) defined as usual plus FI(d; c) = FI(c)\DI(d).

Type-Checking: We take

TEnv = Id −→fin (Types + Types × {loc})

and write α : I for any α in TEnv with domain I ⊆ Id. The idea is that α(x) = τ means that x
def
denotes a value of type τ , whereas α(x) = τ loc ( = hτ, loci) means that x denotes a location
which holds a value of type τ .

Assertions:

• Expressions: For each I and expression e with FI(e) ⊆ I and type-environment α : I we


define

α `I e : τ

meaning that given α the expression e is well-formed and of type τ .


• Declarations: Here for each I and declaration d with FI(d) ⊆ I and type-environment α : I
we define

α `I d : β

meaning that given α the declaration d is well-formed and yields the type-environment β.
• Commands: Here for each I and command c with FI(c) ⊆ I and type-environment α : I we
define:

α `I c

meaning that given α the command c is well-formed.

Rules:

• Expressions: As usual except for identifiers where:


Identifiers: α `I x : τ (if α(x) = τ or α(x) = τ loc)
• Declarations: Just like definitions before, except for simple ones:

72
α `I e : τ
Constants:
α `I const x : τ = e : {x = τ }
α `I e : τ
Variables:
α `I var x : τ = e : {x = τ loc}
• Commands: The rules are similar to those in Chapter 2. We give an illustrative sample.
Nil: α `I nil
α `I e : τ
Assignment: (if α(x) = τ loc)
α `I x := e
α `I c 0 α `I c 1
Sequencing:
α `I c 0 ; c 1
α `I d : β α[β] `I∪I0 c
Blocks: (where β : I0 )
α `I d; c

Dynamic Semantics

Following the ideas on environments and stores we consider suitably typed locations and assume
we have for each τ infinite sets

Locτ

which are disjoint and that (in order to create new locations) we have for each I ⊆ Locτ a
location Newτ (I) ∈ Locτ with Newτ (I) 6∈ I (the new property).

Note: It is very easy to arrange these matters. Just put Locτ = N × {τ } and Newτ (I) =
hµm.hm, τ i 6∈ I, τ i.
[
Now putting Loc = Locτ we take for
τ

Stores = {σ : L ⊆ Loc −→fin Con | ∀l ∈ Locnat ∩ L. σ(l) ∈ N


∧∀l ∈ Locbool ∩ L. σ(l) ∈ T}

(as Con is the set of values). And we also take

Env = Id −→fin Con + Loc

For any ρ : I and α : I we define ρ : α by:

ρ : α ≡ ∀x ∈ I. (α(x) = bool ∧ ρ(x) ∈ T) ∨ (α(x) = nat ∧ ρ(x) ∈ N)


∨ ∃τ. (α(x) = τ loc ∧ ρ(x) ∈ Locτ )

Transition Relations:

73
• Expressions: For any α : I we set

Γα = {he, σi | ∃τ. α `I e : τ }
Tα = {hcon, σi}

and for any α : I we will define transition relations of the form

ρ `α he, σi −→ he0 , σ 0 i

where ρ : α and he, σi and he0 , σ 0 i are in Γα .


• Declarations: We extend Dec by adding the production

d ::= ρ

and putting FI(ρ) = ∅ and DI(ρ) = I (where ρ : I), and putting α `I ρ : β (where ρ : β).
Now for any α : I we take

Γα = {hd, σi | ∃β. α `I d : β} ∪ {ρ} and Tα = {ρ}

and the transition relation has the form

ρ `α hd, σi −→ hd0 , σ 0 i (or ρ0 )

where ρ : α and hd, σi and hd0 , σ 0 i (or ρ0 ) are in Γα .


• Commands: For any α : I we take

Γα = {hc, σi | α `I c} ∪ {σ} and Tα = {σ}

and the transition relation has the form

ρ `α hc, σi −→ hc0 , σ 0 i (or σ 0 )

where ρ : α and hc, σi and hc0 , σ 0 i (or σ 0 ) are in Γα .

Rules:

• Expressions: These should be fairly obvious and we just give some examples.
Identifiers: (1) ρ `α hx, σi −→ hcon, σi (if ρ(x) = con)
(2) ρ `α hx, σi −→ hcon, σi (if ρ(x) = l and σ(l) = con)
ρ `α he0 , σi −→ he00 , σi
Conditional: (1)
ρ `α hif e0 then e1 else e2 , σi −→ hif e00 then e1 else e2 , σi
(2) ρ `α hif tt then e1 else e2 , σi −→ he1 , σi
(3) ρ `α hif ff then e1 else e2 , σi −→ he2 , σi
• Declarations:
Nil: ρ `α hnil, σi −→ h∅, σi

74
ρ `α he, σi −→ he0 , σ 0 i
Constants: (1)
ρ `α hconst x : τ = e, σi −→ hconst x : τ = e0 , σ 0 i
(2) ρ `α hconst x : τ = con, σi −→ h{x = con}, σi
Variables: Informally to elaborate var x : τ = e from state σ given ρ
(1) Evaluate e from state σ given ρ yielding con.
(2) Get a new location l and change σ to σ[l = con] and yield {x = l}.
Formally
ρ `α he, σi −→ he0 , σ 0 i
(1)
ρ `α hvar x : τ = e, σi −→ hvar x : τ = e0 , σ 0 i
(2) ρ `α hvar x : τ = con, σi −→ h{x = l}, σ[l = con]i
(where σ : L and l = Newτ (L ∩ Locτ ))
ρ `α hd0 , σi −→ hd00 , σ 0 i
Sequential: (1)
ρ `α hd0 ; d1 , σi −→ hd00 ; d1 , σ 0 i
ρ[ρ0 ] `α[α0 ] hd1 , σi −→ hd01 , σ 0 i
(2) (where ρ0 : α0 )
ρ `α hρ0 ; d1 , σi −→ hρ0 ; d01 , σ 0 i
(3) ρ `α hρ0 ; ρ1 , σi −→ hρ0 [ρ1 ], σi
Private: 1./2. Like Sequential.
3. ρ `α hρ0 in ρ1 , σi −→ hρ1 , σi
Simultaneous: (1) Like Sequential.
ρ `α hd1 , σi −→ hd01 , σ 0 i
(2)
ρ `α hρ0 and d1 , σi −→ hρ0 and d01 , σ 0 i
(3) ρ `α hρ0 and ρ1 , σi −→ hρ0 ,ρ1 , σi
Note: These definitions follow those for definitions very closely.
• Commands: On the whole the rules for commands are much like those we have already seen
in Chapter 2.
Nil: ρ `α hnil, σi −→ σ
ρ `α he, σi −→∗ hcon, σ 0 i
Assignment:
ρ `α hx := e, σi −→ σ 0 [l = con]
(where ρ(x) = l, and if l ∈ L where σ : L)
Composition: 1./2. Like Chapter 2, but with ρ.
Conditional While: Like Chapter 2, but with ρ.
Blocks: Informally to execute d; c from σ given ρ
(1) Elaborate d from σ given ρ yielding ρ0 and a store σ 0 .
(2) Execute c from σ 0 given ρ[ρ0 ] yielding σ 00 . Then σ 00 is the result of the
execution.
ρ `α hd, σi −→ hd0 , σ 0 i
(1)
ρ `α hd; c, σi −→ hd0 ; c, σ 0 i
ρ[ρ0 ] `α[α0 ] hc, σi −→ hc0 , σ 0 i
(2) (ρ0 : α0 )
ρ `α hρ0 ; c, σi −→ hρ0 ; c0 , σ 0 i

75
ρ[ρ0 ] `α[α0 ] hc, σi −→ σ 0
(3)
ρ `α hρ0 ; c, σi −→ σ 0

In the above we have not connected up ρ and σ. In principle it could happen either that

(1) There is an l in the range of ρ but not in the domain of σ. This is an example of a dangling
reference. They are also possible in relation to a configuration such as hc, σi where l occurs
in c (via some ρ) but not in the domain of σ.
(2) There is an l not in the range of ρ but in the domain of σ. And similarly wrt c and σ, etc.
This is an example of an inaccessible reference.
However, we easily show that if for example we have no dangling references in ρ and σ,
or c and σ and if ρ ` hc, σi −→∗ hc0 , σ 0 i then there are none either in ρ and σ 0 or c and σ 0 .
One says that the language has no storage insecurities. An easy way to obtain a language
which is not secure is to add the command

c ::= dispose(x)

with the dynamic semantics

ρ `α hdispose(x), σi −→ σ\l (where l = ρ(x))

(and σ\l = σ\{hl, σ(l)i}) (and obvious static semantics). One might wish to add an error
rule for attempted assignments to dangling references.

On the other hand according to out semantics we do have inaccessible references. For example
a block exit

ρ ` hvar x : bool = tt, begin nil end, σi −→ h{x = l}; nil, σ[l = tt]i
−→ σ[l = tt]

Another example is provided by sequential or private definitions, e.g.,

ρ ` hvar x : bool = tt; var x : bool = tt, σi −→ h{x = l1 }; var x : bool = tt, σ[l1 = tt]i
−→ h{x = l1 }; {x = l2 }, σ[l1 = tt, l2 = tt]i
−→ h{x = l2 }, σ[l1 = tt, l2 = tt]i

and again

ρ ` hvar x : bool = tt in var y : bool = tt, σi −→∗ h{x = l1 in y = l2 }, σ[l1 = tt, l2 = tt]i
−→ h{y = l2 }, σ[l1 = tt, l2 = tt]i

It is not clear whether inaccessible references should be allowed. They can easily be avoided,
at the cost of complicating the definitions, by “pruning” them away as they are created, a kind

76
of logical garbage collection. We prefer here to leave them in, for the sake of simple definitions;
they do not, unlike dangling references, cause any harm.

The semantics for expressions is a little more complicated than necessary in that if ρ ` he, σi −→
he0 , σ 0 i then σ = σ 0 ; that is there are no side-effects. However, the extra generality will prove
useful. For example suppose we had a production:

e ::= begin c
result e

To evaluate begin c result e from σ given ρ one first executes c from σ given ρ yielding σ 0 and
then evaluates e from σ 0 given ρ. The transition rules would, of course, be:
ρ `α hc, σi −→ hc0 , σ 0 i
ρ `α hbegin c result e, σi −→ hbegin c0 result e, σ 0 i
ρ `α hc, σi −→ σ 0
ρ `α hbegin c result e, σi −→ he, σ 0 i

(and the static semantics is obvious).

With this construct one also has now the possibility of side-effects during the elaboration of
definitions; previously we had instead that if

ρ `α hd, σi −→ hd0 , σ 0 i

then σ 0  L = σ where σ : L.

We note some other important constructs. The principle of qualification suggests we include
expression blocks:

e ::= let d
in e

with evident static semantics and the rules


ρ `α hd, σi −→ hd0 , σ 0 i
ρ `α hlet d in e, σi −→ hlet d0 in e, σ 0 i
ρ[ρ0 ] `α[α0 ] he, σi −→ he0 , σ 0 i
(where ρ0 : α0 )
ρ `α hlet ρ0 in e, σi −→ hlet ρ0 in e0 , σ 0 i
ρ `α hlet ρ0 in con, σi −→ hcon, σi

As another kind of atomic declaration consider

d ::= x == y

77
meaning that x should refer to the location referred to by y (in ρ). The relevant static semantics
will, of course, be:

DI(x == y) = {x}; FI(x == y) = {y}


α `I x == y : {x = τ loc} (if α(y) = τ loc)

and the dynamic semantics is:

ρ `α hx == y, σi −→ hx = l, σi (if ρ(y) = l)

This construct is an example where it is hard to do without locations; more complex versions
allowing the evaluation of expressions to references will be considered in the next chapter.

It can be important to allow initialisation commands in declarations such as

d ::= d
initial
c
end

and the static semantics is:

DI(d initial c end ) = DI(d); FI(d initial c end) = FI(d) ∪ (FI(c)\DI(d))

and
α `I d : β α[β] `I∪I0 c
(if β : I0 )
α `I d initial c end
However, we may wish to add other conditions (like the drastic FI(c) ⊆ DI(d)) to avoid side-
effects. The dynamic semantics is:
ρ `α hd, σi −→ hd0 , σ 0 i
ρ `α hd initial c end, σi −→ hd0 initial c end, σ 0 i
ρ `α[α0 ] hc, σi −→ hc0 , σ 0 i
(where ρ0 : α0 )
ρ `α hρ0 initial c end, σi −→ hρ0 initial c0 end, σ 0 i
ρ[ρ0 ] `α[α0 ] hc, σi −→ σ 0
ρ `α hρ0 initial c end, σi −→ hρ0 , σ 0 i

In the exercises we consider a dual idea of declaration finalisation commands which are executed
after the actions associated with the scope rather than before the scope of the declaration.

Finally, we stand back a little and look at the various classes of values associated with our
language.

78
• Expressible Values: These are the values of expressions. In our language this set, EVal, is
just the set, Con, of constants.
• Denotable Values: These are the values of identifiers in environments. Here the set, DVal,
is the set Con + Loc of constants and locations. Note, that Env = Id −→fin DVal.
• Storeable Values: These are the values of locations in the store. Here, the set, SVal, is the
set Con of constants. Note, that Stores is the set of type-respecting finite maps from Loc to
SVal.

Thus we can consider the sets EVal, DVal, SVal of expressible, denotable and storeable values;
languages can differ greatly in what they are and their relationship to each other [Str]. Other
classes of values – e.g., writeable ones – may also be of interest.

5.5 Exercises

1. It is possible to formalise the notion of occurrence. An occurrence is a sequence l =


m1 . . . mn (n ≥ 0) of non-zero natural numbers. For any expression, e, (say in the first
language of Chapter 3) and occurrence, l, one has the expression e0 = Occ(e, l) occurring
in e at l (it may not be defined). For example

Occ(e, ε) = e 
 Occ(x, l) (m = 1)







 Occ(e0 , l) (m = 2)
Occ(let x = e0 in e1 , m _ l) =
Occ(e1 , l) (m = 3)
 





 undefined

(otherwise)

Define Occ(e, l) in general. Define FO(x, e) = the set of free occurrences of x in e and also
the sets AO(x, e) and BO(x, e) of applied and binding occurrences of x in e. For any l in
BO(x, e) define Scope(l) = the set of applied occurrences of x in the scope of l; for any
bound occurrence, l, of x in e (i.e., l in [AO(x, e) ∪ BO(x, e)]\FO(x, e), define binder(l)
the unique occurrence in whose scope l is.

2. Repeat exercise 1 for the other languages in Chapter 3 (and later chapters!).

3. Ordinary mathematical language also has binding constructions. Notable are such exam-
ples as integration and summation.
Z y Z x
an x n
X
f (y) dy dx and
0 1 n≥0

Define mathematical expression language with these constructs and then define free vari-
ables and occurrences etc, just as in exercise 1.

79
4. The language of predicate logic also contains binders. Given a syntax for arithmetic
expressions (say) we can define formulae by:

F ::= e = e | e > e | . . . | ¬F | F ∨ F | F ∧ F | F ⊃ F | ∀x. F | ∃xF

where ∧, ∨, ⊃ mean logical and, or and implies and to assert ∀x. F means that for all
x we have F and to assert ∃x. F means that we have F for some x. Repeat the work
of exercise 3 for predicate logic. To what extent is it feasible to construct an operational
semantics
Xfor the languages of exercise 3 and 4? How would it help to only consider finite
sums, e and quantifications ∀x. ≤ b.F and piecewise approximation?
a≤n≤b

5. Can you specify the location of dynamic errors? Thus starting from c, σ suppose we reach
c0 , σ 0 and the next action is (for example) division by zero; then we want to specify an error
occurred as some occurrence in the original command c. [Hint: Add a labelling facility,
c ::= L :: c and transition rules for it, and start not from c but a labelled version in which
the occurrences are used for labels.]

6. Define the behaviour and equivalence of definitions and expressions of the second language
of this chapter; prove that the program constructs respect equivalence. Establish or refute
each of the following suggested equivalences

d0 and (d1 and d2 ) ≡ (d0 and d1 ) and d2


d0 and d1 ≡ d1 and d0
d0 and nil ≡ d0
d0 and nil ≡ nil

and similar ones for private and sequential definition.

7. Show that the following right-distributive law

d0 in (d1 and d2 ) ≡ (d0 in d1 ) and (d0 and d2 )

holds. What about the left-distributive law? What about other such laws? Show that
d0 in (x = e) ≡ x = let d0 in e. Show that d0 ; d1 ≡ d0 in (d1 and dV ) where V =
DV(d0 )\DV(d1 ) and where for any V = {x1 , . . . , xn } we put dV = x1 = x1 and . . . and xn =
xn . Conclude that any d can be put, to within equivalence, in the form x1 = e1 and . . . and xn =
en .

8. Show that let d0 ; d1 in e ≡ let d0 in (let d1 in e). Under what general conditions do we
have d0 ; d1 ≡ d1 ; d0 ? When do we have d0 ; d1 ≡ d0 in d1 ? When do we have let d0 ; d1
in e ≡ let d0 in d1 in d0 ; e?

9. It has been said that in blocks like let d0 in e all free variables of e should be bound by
d for reasons of programming readability. Introduce strict blocks let d0 in e and d0 in d1
where it is required that FV(e) (resp. FV(d1 )) ⊆ DV(d0 ). Show that the non-strict blocks

80
are easily defined in terms of the strict ones. [Hint: Use simultaneous definitions and the
dV of exercise 7.] Investigate equivalences for the strict constructions.

10. Two expressions (of the first language of the present chapter) e and e0 are α-equivalent -
written e ≡α e0 - if they are identical “up to renaming of bound variables”. For example

let x = e in let y = e0 in x + y ≡α let y = e in let x = e0 in y + x

if x, y 6∈ FV(e0 ), but let x = e in x + y 6≡α let y = e in y + y. Define α-equivalence.


[Hint: For a definition by structural induction to show let x = e0 in e1 ≡α let y =
e00 in e01 it is necessary to show some relation between e1 and e01 . So define π : e ≡α e0
where π : FV(e) ∼ = FV(e0 ) is a bijection; this relation means e is α-equivalent to e0 up
to the renaming, π, of the free variables.] Show that e ≡α e0 implies e ≡ e0 . Show that
for any e there is an e0 with e ≡α e0 and no bound variable of e0 in some specified finite
set and no variable of e0 has more than one binding occurrence.

11. Define for the first language of the present chapter the substitution of an expression e
for a variable x in the expression e0 - written [e/x]e0 ; in the substitution process no free
variable of e0 should be captured by a binding occurrence in e0 , so that some systematic
renaming of bound variables will be needed. For example we could not have

[x/y] let x = e in x + y = let x = [x/y] e in x + x

but could have

[x/y] let x = e in x + y = let z = [x/y] e in z + x

where z 6= x. Show the following

let x = e in e0 ≡α let y = e in [y/x]e0 (if y 6∈ FV(e0 ))


[e/x][e0 /y]e00 ≡α [[e/x]e0 /y][e/x]e00 (if x 6= y)
[e/x][e0 /x]e00 ≡α [[e/x]e0 /x]e00
[e/x]e0 ≡α e0 (if x 6∈ FV(e0 ))
FV([e/x]e0 ) = FV(e) ∪ (FV(e0 )\{x})
[e/x]e0 ≡ let x = e in e0 .

12. By using substitution we could avoid the use of environments in the dynamic semantics
of the first language of the present chapter. The transition relation would have the form
e −→ e0 for closed e, e0 (no free variables) and the rules would be as usual for binary
operations, none (needed) for identifiers, and let x = e0 in e1 −→ [e0 /x]e1 . Show this
gives the same notion of behaviour for closed expressions as the usual semantics.

13. Extend the work of exercises 10, 11 and 12 to the second language of the present chapter.

81
14. It is possible to have iterative constructs in applicative languages. Tennent has suggested
the construct

e = for x = e0 to e1 op bop on e2
X
So that, for example, if e0 = 0 and e1 = n and bop = + and e2 = x∗x then e = x∗x.
0≤x≤n
Give the operational semantics of this construct.

15. It is even possible to use definitions to obtain analogues of while loops. Consider the
definition construct

d = while e do d

So that

let private x = 1 and y = 1


within while y 6= n
do x = x ∗ y and y = y + 1
in x

computes n! for n ≥ 1. Give this construct a semantics; show that the construct of exercise
14 can be defined in terms of it. Is the new construct a “good idea”?

16. Consider the third language of the present chapter. Show that the type-environments
generated by definitions are determined by defining by Structural Induction a partial
function DTE: Definitions −→ TEnv and then proving that for any α, V, d, β:

α `V d : β ⇒ DTE(d) is defined and equal to β.

17. Give a semantics to a variant of the third language in which the types of variables are
not declared and type-checking is dynamic.

18. Change the fourth language of the present chapter so that the atomic declarations have
the more usual forms:

const x = e and var x : τ

Can you type-check the resulting language? To what extent can you impose in the static
semantics the requirement that variables should be initialised before use? Give an op-
erational semantics following one of the obvious alternatives regarding initialisation at
declaration:
(1) The variable is initialised to a conventional value (e.g., 0/ff), or an unlikely one (e.g.,
the maximum natural number available/?).

82
(2) The variable is not initialised at declaration. [Hint: Use undefined maps for stores
or (equivalently) introduce a special UNDEF value into the natural numbers (and
another for truth-values).] In this case show how to specify the error of access before
initialisation. Which alternative do you prefer?

19. In PL/I identifiers can be declared to be “EXTERNAL”; as such they take their value
from an external environment - and so the declaration is an applied occurrence - but
they have local scope - and so the declaration is also a binding occurrence. For example
consider the following fragment in an extension of our fourth mini-language (not PL/I!)
(where we allow d ::= external x : τ ):

external x : nat;
begin
x := 2;
var x : nat;
begin
x := 1;
external x : nat;
begin y := x end
end
end

This sets y equal to 2. Give a semantics to external declarations.

20. In PL/I variables can be declared without storage allocation being made until explicitly
requested. Thus a program fragment like

var x : nat
begin
x := 1; allocate(x)
end

would result in a dynamic error under that interpretation of variable declaration. Give a
semantics to this idea.

21. In the programming language EUCLID it is possible to declare identifiers as pervasive,


meaning that no holes are allowed in their scope - they cannot be redeclared within their
scope. Formulate an extension of the imperative language of this chapter which allows

83
pervasive declarations and give it a static semantics. Are there any problems with its
dynamic semantics?

22. Formalise Dijkstra’s ideas on scope as presented in Section 10 of his book, A Discipline of
Programming (Prentice-Hall, 1976). To do this define and give a semantics to a variant
of the fourth mini-language which incorporates his ideas in as elegant a way as you can
manage.

23. Suppose we have two flavours of variable declaration

local var x : τ and heap var x : τ

(cf PL/I, ALGOL 68). From an implementation point of view local variables are allocated
space on the stack and heap ones on the heap; from a semantical point of view the locations
are disposed of on block exit (i.e., they live until the end of the variable’s scope is reached)
or never (unless explicitly disposed of). Formalise the semantics for these ideas. Does
replacing local by heap make any difference to a program’s behaviour? If not, find some
language extensions for which it does.

24. Add to the considerations of exercise 23 the possibility

static var x : τ

Here, the locations are allocated as part of the static semantics (of FORTRAN, COBOL,
PL/I).

25. Consider the finalisation construct d = d0 final c. Informally to elaborate this from
an environment ρ one elaborates d0 obtaining ρ0 but then after the actions (whether
elaboration, execution or evaluation) involved in the scope of d one executes c in the
environment ρ0 = ρ[ρ0 ] (equivalently, one executes ρ0 ; c). Give an operational semantics for
an extension of the imperative language of the present chapter by a finalisation construct.
[Hint: The elaboration of declarations should result in an environment and a command
(with no free identifiers).] Justify your treatment of the interaction of finalisation and the
various compound definition forms.

26. How far can you go in treating the constructs of the imperative language of this chapter
(or later ones) without using locations? One idea would be for declarations to produce
couples < ρ, σ > of environments and stores (in the sense of Chapter 2) where ρ : I1 , σ :
I2 and I1 ∩ I2 = φ. What problems arise with the declaration x == y?

27. Formalise the notion of accessibility of a location and of a dangling location by defining
when given an environment ρ and a configuration hc, σi (or hd, σi or he, σi) a location,
l, is accessible. Define the notion of lifetime with respect to the imperative language of
the present chapter. Would it be best to define it so that the lifetime of a location ended
exactly when it was no longer accessible or dangling? Using your definition formulate and

84
prove a theorem, for the imperative language, relating scope and lifetime.

28. Locations can be considered as “dynamic place holders” (in the execution sequence) just
as we considered identifiers as “static place holders” (in program text). Draw some arrow
diagrams for locations in execution sequences to show their creation occurrences analogous
to those drawn in this chapter to show binding occurrences.

29. Define α-equivalence for the imperative programming language of the present chapter
(see exercise 10). One can consider c ≡α c0 as saying that c and c0 are equivalent up
to choice of static place holders. Define a relation of location equivalence between couples
of environments and configurations, written ρ, γ ≡l ρ0 , γ 0 (where γ is an expression,
command or declaration configuration); it should mean that the couples are equivalent
up to choice of locations (dynamic place holders). For example

{x = l1 }, h{y = l2 }; x := x + y, {l1 = 3, l2 = 4}i ≡l


{x = l2 }, h{y = l1 }; x := x + y, {l2 = 3, l1 = 4}i

holds.

30. Define the behaviour of commands, expressions and declarations and define an equivalence
relation ≡l between behaviours which should reflect equality of behaviours up to choice
of dynamic place holders. Prove, for example, that

(var x : nat = 1; var y : nat = 1) ≡l (var y : nat = 1; var x : nat = 1)

even though the two sides do not have identical behaviours. Investigate the issues of
exercises 10, 11, and 12 using ≡l .

5.6 Remarks

The ideas of structuring definitions and declarations seem to go back to Landin [Lan] and Milne
and Strachey [Mil]. The idea of separating environments and stores, via locations, can also be
found in [Mil]. The concepts of scope, extent, environments, stores and their mathematical
formulations seem to be due to Burstall, Landin, McCarthy, Scott and Strachey. [I do not want
to risk exact credits, or exclude others . . . ] For another account of these matters see [Sto].

The ideas of Section 5.4 on static semantics where the constraints are clearly context-sensitive
in general were formulated in line with the general ideas on dynamic semantics. In fact, they
are simpler as it is only needed to establish properties of phrases rather than having relations
between them. lt is hoped that the method is easy to read and in line with one’s intuition.
There are many other methods for the purpose and for a survey with references, see [Wil].
It is also possible to use the techniques of denotational semantics for this purpose [Gor,Sto].
Our method seems particularly close to the production systems of Ledgard and the extended

85
attribute grammars used by Watt; one can view, in such formulae as α `V d : β, the turnstile
symbols α and V as inherited attributes and β as a synthesized attribute of the definition
d; obviously too the type-environments α and β are nothing but symbol tables. It would be
interesting to compare the methods on a formal basis.

As pointed out in exercise 26 one can go quite far without using locations. Donahue also tries
to avoid them in [Don]. In a first version of our ideas we also avoided them, but ran into
unpleasantly complicated systems when considering shared global variables of function bodies.

As pointed out in exercise 12 one can try to avoid environments by using substitutions; it
is not clear how far one can go in this direction (which is the usual one in syntactic studies
of the λ-calculus). However, we have made a definite decision in these notes to stick to the
Scott-Strachey tradition of environments. Note that in such rules as

let x = e0 in e1 −→ [e0 /x]e1

there is no offence against the idea of syntax-directed operational semantics. It is just that
substitution is a rather “heavy” primitive and one can argue that the use of environments is
closer to the intuitions normally used for understanding programming languages. (One awful
exception is the ALGOL 60 call-by-name mechanism.)

6 Bibliography

[Ack] Ackerman, W.B. (1982) Data Flow Languages, IEEE Computer 15(2):15–25.
[Don] Donahue, J.E. (1977) Locations Considered Unnecessary, Acta Informatica 8:221–242.
[Gor1] Gordon, M.J., Milner, A.J.R.G. and Wadsworth, C.P. (1979) Edinburgh LCF, LNCS
78, Springer.
[Gor2] Gordon, M.J. (1979) The Denotational Description of Programming Languages,
Springer.
[Hin] Hindley, J.R., Lercher, B. and Seldin, J.P. (1972) Introduction to Combinatory Logic,
Cambridge University Press.
[Hug] Hughes, G.E. and Cresswell, M.J. (1968) An Introduction to Modal Logic, Methuen.
[Lan1] Landin, P.J. (1964) The Mechanical Evaluation of Expressions, Computer Journal
6(4):308–320.
[Lan2] Landin, P.J. (1965) A Correspondence between ALGOL 60 and Church’s Lambda-
notation, Communications of the ACM 8(2):89–101 and 8(3):158–165.
[Led] Ledgard, H.F. and Marcotty, M. (1981) The Programming Language Landscape, Science
Research Associates.
[Mil] Milne, R.E. and Strachey, C. (1976) A Theory of Programming Language Semantics,
Chapman and Hall.
[Pra] Prawitz, D. (1971) Ideas and Results in Proof Theory, Proc. 2nd Scandinavian Logic
Symposium, ed. J.E. Fenstad, p. 237–309, North Holland.
[Rey] Reynolds, J.C. (1978) Syntactic Control of Interference, Proc. POPL’78, pp. 39–46.

86
[Str] Strachey, C. (1973) The Varieties of Programming Language, Technical Monograph
PRG-10, Programming Research Group, Oxford University.
[Sto] Stoy, J.E. (1977) Denotational Semantics: The Scott-Strachey Approach to Program-
ming Language Theory, MIT Press.
[Wil] M.H. Williams (1981) Methods for Specifying Static Semantics, Computer Languages
6(1):1–17.

87
7 Functions, Procedures and Classes

In this chapter we consider various mechanisms allowing various degrees of abbreviation and
abstraction in programming languages. The idea of abbreviating the repeated use of some
expressions by using definitions or declarations of identifiers was considered in Chapter 3; if we
apply the same choice to commands we arrive at (parameterless) procedures (= subroutines). It
is very much more useful to abstract many similar computations together, different ones being
obtained by varying the values of parameters. In this way we obtain functions from expressions
and procedures from commands.

Tennent’s Principle of Abstraction declares that the same thing can be done with any semanti-
cally meaningful category of phrases. Applying the idea to definitions of declarations we obtain
a version of the class concept, introduced by SIMULA and recently taken up in many modern
programming languages. (If we just use identifiers to stand for definitions or declarations we
obtain the simpler but still most useful idea of module.)

Calling (= invoking) abstractions with actual parameters (their arguments) for the formal
ones appearing in their definition results in appropriate computations whether evaluations,
executions or elaborations of the bodies of their definitions. We will explain this by allowing
abstraction identifiers to denote closures which record their formal parameters and bodies.
Invocations will be explained in terms of computations of blocks chosen in terms of Tennent’s
Principle of Correspondence which declares that in principle to every parameter mechanism
there corresponds an appropriate definition or declaration mechanism. For example if we define

f (x : nat) : nat = x + 1

then the elaboration results in the environment

f = λx : nat. x + 1 : nat

To invoke f in an expression, say f (5), we just evaluate the expression block

let x : nat = 5
in x + 1

Note that this block exists by virtue of Tennent’s Principle of Qualification.

Below we use these ideas to consider an applicative programming language with (possibly re-
cursive) definitions of functions of several arguments. We then consider an imperative language
where we consider both functions and procedures and use the Principle of Correspondence
to obtain the parameter mechanisms of call-by-constant and call-by-value. Other parameter
mechanisms are easily handled using the same ideas (some explicitly in the text and others
in exercises); let us mention call-by-reference, call-by-result, call-by-value-result, call-by-name

88
and call-by-text. Next we consider higher order functions and procedures. Finally we use the
Principles of Abstraction and Correspondence to handle modules and classes; this needs no new
ideas although some of the type-checking issues are interesting.

7.1 Functions in Applicative Languages

We begin with the simplest case where it is possible to define functions of one argument (unary)
functions. Let us consider throughout extensions of the second applicative language of Chapter
3. Add the following kind of function definitions:

d ::= f (x : τ0 ) : τ1 = e

and function calls

e ::= f (e)

where f is another letter we will use to range over variables (but reserving its use to contexts
where functions are expected).

Static Semantics

This is just as before as regards free and defining variables with the extensions

FV(f (x : τ0 ) : τ1 = e) = FV(e)\{x}
DV(f (x : τ0 ) : τ1 = e) = {f }
FV(f (e)) = {f } ∪ FV(e)

It is convenient to consider types a little more systematically than before. Just as we have
expressible and denotable values (EVal and DVal) we now introduce the sets of ETypes and
DTypes, expressible and denotable types (ranged over by et and dt respectively) where

et ::= τ
dt ::= τ | τ0 → τ1

More complex expressible types will be needed later; denotable types of the form τ0 → τ1 will
be used for functions which take arguments of type τ0 and deliver results of type τ1 . Later we
will want also sets of storeable types and other such sets. Now we take

TEnv = Var −→fin DTypes

ranged over, as before, by α and β and give rules for the predicates

α `V e : et

89
where α : V and FV(e) ⊆ V , and

α `V d : β

where α : V and FV(d) ⊆ V . These rules are just as before with the evident extensions for
function calls and definitions:
α `V e : et0
Function Calls: (if α(f ) = et0 → et1 )
α `V f (e) : et1
α `V e : τ 1
Function Definitions:
α `V f (x : τ0 ) : τ1 = e : {τ0 → τ1 }

Dynamic Semantics

We introduce the set, Closures, of closures

Closures = {λx : et0 . e : et1 | {x = et0 } `{x} e : et1 }

and define the set of denotable values by

DVal = Con + Closures

and then we define, as usual,

Env = Var −→fin DVal

and add the following production to the definition of the category of definitions

d ::= ρ

(and put for ρ : V , DV(ρ) = V and FV(ρ) = ∅).

It is important to note that what is meant here is that the sets Dec, Exp, Closures, DVal and
Env are being defined mutually recursively. For example the following is an expression of type
nat

let f = λx : nat
(let{y = 3, g = λy : bool. ∼y : bool} in if g(ff) then x else y) : nat
and w = 5
in f (2) + w

There is no more harm in such recursions than in those found in context-free grammars; a
detailed discussion is left to Appendix B.

90
Note too that closures have in an obvious sense no free variables. This raises the puzzle of
what we intend to do about the free variable in function definitions. In fact in elaborating such
definitions we will bind the free variables to their values in the elaboration environment. This
is known as static binding (= binding of free variables determined by their textual occurrence),
and will be followed throughout these notes. The alternative of delaying binding until the
function is called, and then using the calling environment, is known as dynamic binding, and is
considered in the exercises.

To extend the static semantics we type denotable values defining the predicate for dval in DVal
and dt in DTypes

dval : dt

and for ρ : V in Env and α : V in TEnv define

ρ:α

by the rules

Constants: m : nat t : bool


Closures: (λx : et0 . e : et1 ) : et0 → et1
∀x ∈ V. ρ(x) : α(x)
Environments: (where ρ : V , α : V )
ρ:α

and add the rule for environments considered as definitions


ρ:α
Environments:
β `V ρ : α

With all this we now easily extend the old dynamic semantics with the usual transition relations

ρ `α e −→ e0
ρ `α d −→ d0

by rules for function calls and definition.

• Function Calls:

ρ `α f (e0 ) −→ let x : et0 = e0 in e (if ρ(f ) = λx : et0 . e : et1 )

This rule is just a formal version of the Principle of Correspondence for the language under
consideration.
• Function Definitions:

ρ `α f (x : τ0 ) : τ1 = e −→ {f = λx : τ0 . (let ρ  V in e) : τ1 } (where V = FV(e)\{x})

91
Example 25 We write f (x : τ0 ) : τ1 = e for the less readable f = λx : τ0 . e : τ1 (and miss out
τ0 and/or τ1 when they are obvious). Consider the expression

def
e = let double(x : nat) : nat = 2 ∗ x
in double(double(2))

We have

∅ `∅ e −→ let ρ in double(double(2))

def
where ρ = {double(x) = 2 ∗ x} and now note the computation

ρ ` double(double(2)) −→ let x : nat = double(2) in double(2)


−→ let x : nat = (let x : nat = 2 in 2 ∗ x) in 2 ∗ x
−→3 let x : nat = 4 in 2 ∗ x
−→3 8

and so

∅ ` e −→∗ 8

Our function calls are call-by-value in the sense that the argument is evaluated before the body
of the function. On the other hand it is evaluated just after the function call; a slight variant
effects the evaluation before.

• Function Call (Amended)


ρ `V e −→ e0
(1)
ρ `V f (e) −→ f (e0 )
(2) ρ `V f (con) −→ let x : τ0 = con in e (if f (x : τ0 ) = e is in ρ)

This variant has no effect on the result of our computations (prove this!) although it is not
hard to define imperative languages where there could be a difference (because of side-effects).
Another important possibility – call-by-name – is considered below and in the exercises.

We now consider how to extend the above to definitions of functions of several arguments such
as

max(x : nat, y : nat) : nat = if x ≥ y then x else y

Intending to use the Principle of Correspondence to account for function calls we expect such

92
transitions as

let x : nat, y : nat = 3, 5


ρ ` max(3, 5) −→
in if x ≥ y then x else y

and therefore simultaneous simple definitions. To this end we adopt a “minimalist” approach
adding two syntactic classes to the applicative language of the last chapter.

Formals: This is the set Forms ranged over by form and given by

form ::= · | x : τ, form

Actual Expressions: This is the set AcExp ranged over by ae where

ae ::= · | e, ae

Then we extend the category of definitions allowing more simple definitions and function defi-
nitions

d ::= form = ae | f (form) : τ = e

and adding function calls to the stock of expressions

e ::= f (ae)

To obtain a conventional notation x : τ , · and e, · are written x : τ and e respectively and f ()


replaces f (·). In a “maximalist” solution we could include actual expressions as expressions and
allow corresponding “tuple” types as types of identifiers and function results; see exercise 2.

Static Semantics

Formals give rise to defining variable occurrences

DV(·) = ∅ DV(x : τ, form) = {x} ∪ DV(form)

Then we have free variables in actual expressions

FV(·) = ∅ FV(e, ad) = FV(e) ∪ FV(ae)

and for the new kinds of definitions

FV(form = ae) = FV(ae) DV(form = ae) = DV(form)


FV(f (form) : τ = e) = FV(e)\DV(form) DV(f (form) : τ = e) = {f }

93
and for function calls, FV(f (ae)) = {f } ∪ FV(ae).

Turning to types we now have ETypes, AcETypes (ranged over by aet) and DTypes where

et ::= τ aet ::= · | τ, aet dt ::= et | aet → et

Then with TEnv = Var −→fin DTypes as always we have the evident predicates

α `V e : et α `V ae : aet α `V d : β

Formals give positional information and type environments. So we define T : Formals −→


AcETypes by

T (·) = · T (x : τ, form) = τ, T (form)

and give rules for the predicate form : β

(1) · : ∅
(2) form : β ⇒ (x : τ, form) : {x = τ }, β (if x 6∈ DV(form))

Note that it is here the natural restriction of no variable occurring twice in a formal is made.

Here are the rules for the other predicates:

α `V ae : aet
Function Calls: (if α(f ) = aet → et)
α `V f (ae) : et
form : β α `V ae : aet
Definitions: (where aet = T (form))
α `V (form = ae) : β
form : β α[β] `V ∪V0 e
α `V (f (form) : τ = e) : {f = aet −→ τ }
(where β : V0 and aet = T (form))
Actual Expr.: α `V · : ·
α `V e : et α `V ae : aet
α `V e, ae : et, aet

Dynamic Semantics

We proceed much as before as regards closures, denotable values and environments

Closures = {λform. e : et | ∃β, V. form : β and β : V and β `V e : et}


DVal = Con + Closures
Env = Var −→fin DVal
d ::= ρ

94
with the free and defining variables of ρ as usual and extend the static semantics by defining
the predicates dval : dt and ρ : α much as before.

As regards transition rules we will naturally define ρ `V e −→ e0 and ρ `V d −→ d0 and, for


actuals, ρ `V ae −→ ae0 . The terminal actual configurations are the “actual constants”-tuples
of constants given by the rules

acon ::= · | con, acon

As for formals they give rise to environments in the content of a value for the corresponding
actuals and so we begin with rules for the predicate

acon ` form : ρ

(1) · ` · : ∅
acon ` form : ρ
(2)
con, acon ` (x : τ, form) : ρ ∪ {x = con}

While this is formally adequate enough it does seem odd to use values rather than environments
as dynamic contexts.

The other rules should now be easy to understand.

Function Calls: ρ `α f (ae) −→ let form = ae in e (if ρ(f ) = λform. e : et)


ρ `α ae −→ ae0
Definitions Simple:
ρ ` form = ae −→ form = ae0
acon ` form : ρ0
ρ ` form = acon −→ ρ0
Function: ρ `α f (form) : τ = e −→ {f = λform. let ρ  V in e : τ }
(where V = FV(e)\DV(form))
Actual Expr.: ρ `α e −→ e ⇒ ρ `α e, ae −→ e0 , ae
0

ρ `α ae −→ ae0 ⇒ ρ `α con, ae −→ con, ae0

Example 26 We calculate the maximum of 2 + 3 and 2 ∗ 3. Let ρ0 be the environment {max =


λx : nat, y : nat. let ∅ in if x ≥ y then x else y : nat}. Then we have

∅ ` {let max(x : nat, y : nat) : nat = if x ≥ y then x else y} in max(2 + 3, 2 ∗ 3)


−→ let ρ0 in max(2 + 3, 2 ∗ 3)
−→ let ρ0 in let x : nat, y : nat = 2 + 3, 2 ∗ 3 in let ∅ in (if x ≥ y then x else y)
−→∗ let ρ0 in let {x = 5, y = 6} in let ∅ in (if x ≥ y then x else y)
−→∗ let ρ0 in let {x = 5, y = 6} in let ∅ in 6
−→3 6.

as one sees that

ρ0 ` x : nat, y : nat = 2 + 3, 2 ∗ 3 −→∗ {x = 5, y = 6}

95
Recursion

It will not have escaped the readers attention that no matter how interesting our applicative
language may be it is useless as there is no ability to prescribe interesting computations. For
example we do not succeed in defining the factorial function by

def
d = fact(n : nat) : nat = if n = 0 then 1 else n ∗ fact(n − 1)

as the fact on the right will be taken from the environment of dfact and not understood re-
cursively. (Of course the imperative languages are interesting owing to the possibility of loops;
note too exercise 3, 14, 15.)

Clearly we need to introduce recursion. Syntactically we just postulate a unary operator on


definitions (and later on declarations)

d ::= rec d

Thus rec dfact will define the factorial function. In terms of imports and exports rec d imports
all imports of d other than exports which provide the rest of the imports to d; the exports of
rec d are those of d. In other words define X to be FV(d)\DV(d), Y to be DV(d) and R to be
FV(d) ∩ DV(d). Then X is the set of imports of rec d and Y is the set of its exports with R
being defined recursively. Diagrammatically we have


- R
d
X - - Y

A Recursive Declaration: rec d

The unary recursion operator gives a very flexible way to make recursive definitions since the
d in rec d can take many forms other than simple function definitions like f (x : τ1 . . .) : τ = e.
Simultaneous recursive definitions are written

rec f (. . .) = . . . f . . . g . . . and . . . and


g(. . .) = . . . f . . . g . . .

A narrow scope form of sequential recursive definitions is

rec f (. . .) = . . . f . . . g . . . ; . . . ;
rec g(. . .) = . . . f . . . g . . . ;

where the g in the definition of f is taken from the environment but the f in the definition of

96
g is the recursively defined one. A wide scope form is obtained by writing

rec f (. . .) = . . . f . . . g . . . ; . . . ;
g(. . .) = . . . f . . . g . . .

which is equivalent to the simultaneous definition unless f = g for example.

Static Semantics

For free and defining variables we note that

FV(rec d) = FV(d)\DV(d)
DV(rec d) = DV(d)

We keep TEnv and DTypes, ETypes and AcETypes as before. The natural rule for recursive
declarations is
α[β  R] `V ∪R d : β
(where R = FV(d) ∩ DV(d))
α `V rec d : β
However, this is not easy to use in a top-down fashion as given rec d and α one would have
to guess β. But, as covered by exercise 11, it would work. It is more convenient to use the fact
that in α `V d : β the elaborated β does not depend on α but is uniquely determined by d, the
α only being used to check the validity of β. We make this explicit by defining two predicates
for definitions. First for any V and d with FV(d) ⊆ V and β we define

`V d : β

and secondly for any α : V and d with FV(d) ⊆ V we define

α `V d

The first predicate can be read as saying that if d is a valid definition then it will have type β;
the second says that given α then d is valid. The other predicates will be as before

α `V e : et α `V ae : aet form : β

Rules:

• Definitions:
Nil: (1) `V nil : ∅
(2) α `V nil
form : β
Simple: (1)
`V form = ae : β

97
form : β α `V ae : T (form)
(2)
α `V form = ae
Function: (1) `V f (form) : τ = e : T (form −→ τ )
form : β α[β] `V ∪V0 e : τ
(2) (where β : V0 )
α `V f (form) : τ = e
`V d0 : β0 `V d1 : β1
Sequential: (1)
`V d0 ; d1 : β0 [β1 ]
α `V d0 `V d0 : β α[β] `V ∪V0 d1
(2) (where β : V0 )
α `V d0 ; d1
`V d0 : β0 `V d1 : β1
Simultaneous: (1)
`V d0 and d1 : β0 , β1
α `V d0 α `V d1
(2) (if DV(d0 ) ∩ DV(d1 ) = ∅)
α `V d0 and d1
`V d1 : β1
Private: (1)
`V d0 in d1 : β1
α `V d0 `V d0 : β0 α[β0 ] `V ∪V0 d1
(2) (where β0 : V0 )
α `V d0 in d1
`V d : β
Recursion: (1)
`V rec d : β
`V d : β α[β  R] `V ∪R d
(2) (where R = FV(d) ∩ DV(d))
α `V rec d
The other rules are as before except for expression blocks:
`V d : β α `V d α[β] `V ∪V0 e
(where β : V0 )
α `V let d in e
Example 27 Consider the definition

d = rec f (x : nat) : nat = g(x) and g(x : nat) : nat = f (x)

Here as `∅ f (x : nat) : nat = g(x) : {f = nat → nat}, etc. we have

`∅ d : {f = nat → nat, g = nat −→ nat}.

Then to see that ∅ `∅ d one just shows that {f = nat → nat, g = nat → nat} `f,g d0 (where
rec d0 = d). This example also shows why it is needed to explicitly mention the result (= output)
of functions.

Dynamic Semantics

Before discussing our specific proposal we should admit that this seems, owing to a certain
clumsiness and its somewhat unnatural approach, to be a possible weak point in our treatment
of operational semantics.

98
At first sight one wants to get something of the following effect with recursive definitions
ρ[ρ0  V0 ] `α∪α0 d −→∗ ρ0
(where ρ0 : DV(d) and for suitable α0 : V0 )
ρ `α rec d −→∗ ρ0
Taken literally this is not possible. For example put d = f (x : nat) : nat = f (x) and suppose
ρ0 (f ) = d. Then for V = ∅ and ρ = ∅ we would have

ρ0 `{f } f (x : nat) : nat = f (x) −→ {f = λx : nat. (let ρ0 in f (x)) : nat}

and so we would have d = λx : nat. (let ρ0 in f (x)) : nat which is clearly impossible as d
cannot occur in itself (via ρ0 ). Of course it is just in finding solutions to suitable analogues of
this equation that the Scott-Strachey approach finds one of its major achievements.

Let us try to overcome the problem by not trying to guess ρ0 but trying to elaborate d without
any knowledge of the values of the recursively defined identifiers. Thus in our example we first
elaborate the body

∅ `∅ f (x : nat) : nat = f (x) −→ {f = λx : nat. (let ∅ in f (x)) : nat}

and let ρ0 be the resulting “environment”. Note that we no longer have closures as there can
be free variables in the abstractions. So we know that for any imported value of f that ρ0 gives
the corresponding export. But in rec d the imports and the exports must be the same, that is
def
f = ρ(f ) in some recursive sense and we can take f = rec ρ0 . To get a closure we now take
the all important step of binding f to rec ρ0 in ρ0 and take the elaboration of rec d to be

ρ1 = {f (x : nat) : nat = let rec ρ0 in (let ∅ in f (x) : nat)}

What we have done is unwound the recursive definition by one step and bound into the body
instructions for further unwinding. Indeed it will be the case that

` rec ρ0 −→ ρ1

and so when we call f (e) we will arrive at the expression

let x : nat = e in let rec ρ0 in let ∅ in f (x) : nat

Then we will evaluate the argument e, then we will unwind the definition once more (in prepa-
ration for the next call!), then we will evaluate the body. This is perhaps not too bad; in the
usual operational semantics of recursive definitions (see exercise 7) one first evaluates the ar-
gument, then unwinds the definition for the present call and then evaluates the body. Thus we
have simply performed in advance one step of the needed unwindings during the elaboration.

Let us now turn our attention to the formal details, the changes from previously mostly concern
allowing free variables in closures, and we define

Abstracts = {λform. e : et}

99
and put

DVal = Con + Abstracts

and

Env = Var −→fin DVal

and add

d ::= ρ

To extend the static semantics we define FV(dval) by

FV(con) = ∅ FV(λform. e : et) = FV(e)\DV(form)

and then for ρ : V


[
DV(ρ) = V and FV(ρ) = FV(ρ(x))
x∈V

Now we define predicates `V dval : dt and α `V dval by

Constants: (1) `V m : nat


(2) α `V m
(3) `V t : bool
(4) α `V t
Abstracts: (1) `V λform.e : et : T (form) → et
form : β α[β] `V ∪V0 e
(2) (where β : V0 )
α `V λform. e : et

Then the rules for environments ρ : V

∀x ∈ V. `W ρ(x) : β(x)
(1)
`W ρ : β
∀x ∈ V. α `W ρ(x)
(2)
α `W ρ

Turning to the transition relations we define for α : V and β : W , with W ⊆ V and ρ : α  W ,


and e, e0 in Γα (as before)

ρ `α e −→ e0

and keep the same set Γα of terminal expressions. Similarly we define ρ `α ae −→ ae0 and
ρ `α d −→ d0 .

100
The rules are formally the same as before except that for ρ : W conditions of the form ρ(f ) = . . .
are understood to mean that f ∈ W and ρ(f ) = . . . and similarly for ρ(x) = . . . (this affects
looking up the values of variables and function calls).

We need rules for recursion:


ρ  X `α[α0 ] d −→ d0
(1)
ρ `α rec d −→ rec d0
(where X = FV(d)\DV(d) and taking β from the requirement that ` d : β we have
α0 = β  R where R = FV(d) ∩ DV(d))
(2) ρ `α rec ρ0 −→ {x = con | x = con in ρ0 } ∪
{f (form) : τ = let rec ρ0 \DV(form) in e | f (form) : τ = e in ρ0 }

In other words we first elaborate d without knowing anything about the values of recursively
defined variables and then from the resulting ρ0 we yield ρ0 altered to bind its free variables by
rec ρ0 . Here are a couple of examples. More can be found in the exercises.

Example 28 Consider the traditional definition of factorial

d = rec fact(x : nat) : nat = if x = 0 then 1 else x ∗ fact(x − 1)

Then for any suitable ρ and α we have

ρ `α[α0 ] fact(x : nat) : nat = . . . −→ ρ0 (with α0 as given above)

where ρ0 = {fact(x : nat) : nat = let ∅ in . . .} (and from now on we omit the tedious
“let ∅ in”). Then we have

ρ `α rec d −→ rec ρ0 −→ ρ1

where ρ1 = {fact(x) = let rec ρ0 in . . .}

To compute fact(0) we look at the derivation

∅ `∅ let rec d in fact(0) −→∗ let ρ1 in fact(0)


−→ let x : nat = 0 in let rec ρ0 in . . .
−→∗ let {x = 0} in let ρ1 in if x = 0 then 1 else . . .
−→3 1

Equally for fact(1) we have

∅ `∅ let d in fact(1) −→∗ let {x = 1} in let ρ1 in


if x = 0 then 1 else x ∗ fact(x − 1)

−→ let {x = 1} in let ρ1 in x ∗ fact(x − 1)
−→∗ let {x = 1} in let ρ1 in 1 ∗ [let x : nat = x − 1 in rec ρ0 in . . .]

101
−→∗ let {x = 1} in let ρ1 in 1 ∗ [let x = 0 in let ρ1 in . . .]
−→ 1

Example 29 It is allowed to define natural numbers or truth-values recursively. For example


consider d = (rec x = x + 1). To elaborate d given ρ = {x = 1} we must elaborate x = x + 1
from ρ\{x} = ∅ and that elaboration sticks as we must evaluate x+1 in the empty environment.
It could be helpful to specify a dynamic error in this case. Again the elaboration of

d = rec x = fact(0) and fact(x : nat) : nat = . . .

does not succeed as, intuitively, we need to know the value of fact before the elaboration –
which produces this value – has finished. On the other hand simple things like the elaboration
of rec x = 5 do succeed. If desired we could have specified in the static semantics that only
recursive function definitions were allowed.

7.2 Procedures and Functions

We now consider abstractions in imperative languages. Abstracts of expressions give rise to


functions, as before, but now with the possibility of side-effects as in:

function f (var x : nat) : nat =


begin
y := y + 1
result x + y

In several programming languages the bodies of functions are commands, but are treated, via
special syntactic devices, as expressions – see exercise 12. We take a straightforward view where
the bodies are (clearly) expressions. Abstracts of commands give rise to procedures as in:

procedure p(var x : nat)


begin
y := x + y
end

which may also have side-effects and indeed are often executed for their side-effects. To see why
we write var in the formal parameter let us see how the Principle of Correspondence allows us
to treat a procedure call. First the above declaration, d, will be elaborated thus

ρ `α hd, σi −→ h{p(var x : nat) = {y = l}; y := x + y}, σi

where l = ρ(y). Then the procedure call p(e) in the resulting environment ρ0 will look like this

ρ0 `α hp(e), σi −→ hvar x : nat = e; begin {y = l}; y := x + y end, σi

102
And we see the reason for writing var . . . is to get an easy correspondence with our previous
declaration mechanism. The computation now proceeds by evaluating e, finding a new location
l0 , making l0 refer to the value of e in the state and then executing the body of the procedure
with x bound to l0 . This is very clearly nothing else but the classical call-by-value. Constant
declarations will give rise to a call-by-constant parameter mechanism.

We begin by working these ideas out in the evident extension of the imperative language of
Chapter 3. Then we proceed to other parameter mechanisms by considering the corresponding
declaration mechanisms. (Many real languages will not possess such a convenient correspon-
dence; one way to deal with their parameter mechanisms would be to add the corresponding
declaration mechanisms when defining the set of possible configurations.)

For the extension we drop the const x : τ = e and var x : τ = e productions and add:

Expressions: e ::= let d in e | f (ae) | begin c result e


Actual Expr.: ae ::= · | e, ae
Declarations: d ::= form = ae | function f (form) : τ = e | procedure p(form) c |
rec d
Formals: form ::= · | const x : τ, form | var x : τ, form
Commands: c ::= p(ae)

Static Semantics

We have the following sets of identifiers with the evident definitions and meanings: FI(e), FI(ae),
FI(d), DI(d), DI(form), FI(c). For example

FI(procedure p(form) c) = FI(c)\DI(form)


DI(procedure p(form) c) = {p}
FI(p(ae)) = {p} ∪ FI(ae)

Turning to types we define ETypes, AcETypes and DTypes; these are as before except that
both locations and procedures are denotable, causing a change in DTypes

et ::= τ
aet ::= · | τ, aet
dt ::= et | et loc | aet −→ et | aet proc

and of course TEnv = Id −→fin DTypes. We also need T (form) ∈ AcETypes with the evident
definition

· const x : τ, form var x : τ, form


T · τ, aet τ, aet

103
Then we define the expected predicates

α `I e : et α `I ae : aet `I d : β α `I d form : β α `I c

We give some representative rules:

Procedure (1) `I procedure p(form) c : {p = T (form) proc}


Declarations:
form : β α[β] `I∪I0 c
(2) (where β : I0 )
α `I c
Formals: (1) ·:∅
form : β
(2) (if x 6∈ I0 where β : I0 )
const x : τ, form : {x = τ }, β
form : β
(3) (if x 6∈ I0 where β : I0 )
var x : τ, form : {x = τ loc}, β
α `I ae : aet
Procedure Calls: (1) (if α(p) = aet proc)
α `I p(ae)

Dynamic Semantics

We begin with environments, abstracts and denotable values. First the set, Abstracts (ranged
over by abs), is

Abstracts = {λform. e : et} ∪ {λform. c}

then

DVal = Con + Loc + Abstracts

where Loc is the set Locnat ∪ Locbool of Chapter 3 and

Env = Id −→fin DVal

and we add the production

d ::= ρ

and all the above is to be interpreted recursively as usual.

Then FI(dval) is defined in the obvious way; for example

FI(λform. c) = FI(c)\DI(form)

Then DI(ρ) and FI(ρ) are defined. Next we define the evident predicates

`I dval : dt α `I dval `I ρ : β α `I ρ : β

104
as expected; for example

Procedure (1) `I λform. c : T (form) proc


Abstracts:
form : β α[β] `I∪I0 c
(2) (where β : I0 )
α `I λform. c
Transition Relations: Turning to the transition relations we first need the set of stores

Stores = {σ : L ∈ Loc −→fin Con | σ respects types}

– the same as in Chapter 3.

• Expressions: We have

Γα = {he, σi | ∃et. α `I e : et} (for α : I)

and

Tα = {hcon, σi}

and the evident relation

ρ `α he, σi −→ he0 , σ 0 i

• Actual Expressions: We have

Γα = {hae, σi | ∃aet. α `I ae : aet} (for α : I)

and

Tα = {hacon, σi}

where acon is in AeCon, as before. And we have the relation

ρ `α hae, σi −→ hae0 , σ 0 i

• Declarations: We have

Γα = {hd, σi | α `I d} (for α : I)

and

Tα = {hρ, σi | hρ, σi ∈ Γα }

and the relation

ρ `α hd, σi −→ hd0 , σ 0 i

105
• Formals: We define

acon, L ` form : ρ, σ

meaning that in the context of an actual expression constant acon and given an existing set,
L, of locations the formal (part of a declaration) form yields a new (little) environment ρ
and store σ.
• Commands: We have

Γα , Tα

and

ρ `α hc, σi −→ hc0 , σ 0 i (or σ 0 )

as usual.

Rules: The rules are generally just those we already know and only the new points are covered.

• Declarations:
ρ `α hae, σi −→ hae0 , σ 0 i
Simple: (1)
ρ `α hform = ae, σi −→ hform = ae0 , σ 0 i
acon, L ` form : ρ0 , σ0
(2) (where σ : L)
ρ `α hform = ae, σi −→ hρ0 , σ ∪ σ0 i
Procedure: ρ `α hprocedure p(form) c, σi −→ h{p = λform. ρ\I; c}, σi
(where I = FI(c)\DI(form))
ρ\R `α[α0 ] d −→ d0
Recursive: (1)
ρ `α rec d −→ rec d0
(where if `FI(d) d : β then R = FI(d) ∩ DI(d) and α = β  R)
(2) ρ `α rec ρ0 −→ ρ1

(where ρ1 = {x = con | x = con in ρ0 } ∪


{x = l loc | x = l loc in ρ0 } ∪
{f (form) : et = let rec ρ0 \I in e |
f (form) : et = e in ρ0 and I = DI(form)} ∪
{p(form). rec ρ0 \I; c |
p(form) c in ρ0 and I = DI(form)})
• Formals:
Empty: ·, L ` · : ∅, ∅
• Declarations
acon, L ` form : ρ0 , σ0
Empty:
(con, acon), L ` const x : τ, form : ρ0 ∪ {x = con}, σ0

106
acon, L ∪ {l} ` form : ρ0 , σ0
Variable:
(con, acon), L ` var x : τ, form : {x = l} ∪ ρ0 , {l = con} ∪ σ0
(where l = Newτ (L ∩ Locτ ))

Example 30 The following program demonstrates the use of private variables shared between
several procedures. This provides a nice version of ALGOL’s own variables and anticipates the
facilities provided by classes and abstract data types. Consider the command

c = private var x : nat = 1


within procedure inc() x := x + 1
procedure dec() if x > 0 then x := x − 1 else nil;
begin
inc(); dec()
end

First look at the declaration part, d:

ρ ` hd, σi −→ hprivate {x = l} within procedure inc()−; procedure dec() . . . , σ[l = 1]i


−→ hprivate {x = l} within{inc() = {x = l}; −}; procedure dec() . . . , σ[l = 1]i
−→3 hprivate {x = l} within {inc() = {x = l}; −, dec(){x = l}; . . . , σ[l = 1]i
−→ h{inc(){x = l}; −, dec(){x = l}; . . . , σ[l = 1]i
= hρ, σ[l = 1]i, say

So we see that

ρ ` hc, σi −→∗ hρ0 ; (inc(); dec()), σ[l = 1]i

and so we should examine the computation:

ρ[ρ0 ] ` hinc(); dec(), σ[l = 1]i


−→ h({x = l}; x := x + 1); dec(), σ[l = 1]i
−→∗ hdec(), σ[l = 2]i
−→ h{x = l}; if x > 0 then x := x − 1 else nil, σ[l = 2]i
−→∗ hσ[l = 1]i.

7.3 Other Parameter Mechanisms

Other parameter mechanisms can be considered in the same manner. The general principle is to
admit more ways to declare identifiers (as discussed above) and to admit more ways of evaluating
expressions (and/or actual expressions). The latter is needed because actual expressions can be

107
evaluated to various degrees when abstracts are called. One extreme is absolutely no evaluation
(see exercise 16 for this call-by-text mechanism). We shall first consider call-by-name in the
context of our applicative language which we regard as evaluating the argument to the extent of
binding the call-time environment to it; this well-known idea differs from the official ALGOL-60
definition and is discussed further in exercise 15.

Then we consider call-by-reference in the context of our imperative language where the argu-
ment is evaluated to produce a reference. Other mechanisms are considered in the exercises.
Note that in call-by-name for example the actual parameter may be further evaluated during
computation of the body of the abstract. It is even possible to have mechanisms (e.g., variants
of call-by-result) where some or all of the evaluation is delayed until after the computation of
the body of the abstract.

Call-by-Name

Syntactically it is only necessary to add another possibility for the formal parameters to the
syntax of our applicative language

form ::= name x : τ, form

Static Semantics

The sets of defining variables of name x : τ , form is clearly {x} ∪ DV(form). Regarding types
we add

aet ::= τ name, aet


dt ::= et | τ name | aet → et

The definition of the type T (form) of a formal needs the new clause

T (name x : τ, form) = τ name, T (form)

Here are the new predicate rules

• Formals: form : β =⇒ (name x : τ, form) : {x = τ name} ∪ β (if x 6∈ DV(form))


• Expressions:
Variables: α `V x : τ (if α(x) = τ name)
This rule expresses the fact that if x is a call-by-name formal parameter as in name x : τ
then in the calling environment its denotation can be evaluated to a value of type τ .
α `V e : et α `V ae : aet
• Actual Expr.:
α `V e, ae : et name, aet
It is important to note that this rule is in addition to the previous rule. So given α an
actual expression can have several different types; these are needed as the same expression
can correspond to formals of different types, and that will require different kinds of evaluation.

108
Example 31 Consider these two expressions

let function fred (x : nat, name y : nat) : nat = x + y


in fred (u + v, u − v)

and

let function fred (name x : nat, y : nat) : nat = x + y


in fred (u + v, u − v)

In the first case we need the fact that α ` u + v, u − v : nat, nat name and in the second that
α ` u + v, u − v : nat name, nat (where α = {u = nat, v = nat}).

Dynamic Semantics

Clearly we must add a new component to the set of denotable values, corresponding to the new
denotable types τ name

DVal = Con + NExp + Abstracts

where we need NExp = {e : name τ } to allow free variables in the expressions because of the
possibility of recursive definitions. For example consider

rec name x : nat = f (5) and


f (x : nat) = . . . f . . .

The extension to the definition of FV(dval) is, of course, clear

FV(e : name τ ) = FV(e)

For the predicates `V dval : dt and α `V dval we add the rules


α`e:τ
`V (e : τ name) : name τ
α ` e : τ name
Transition Relations: For expressions and definitions we refine the usual ρ `α e −→ e0 and
ρ `α d −→ d0 a little, parameterising also on the set of variables whose definition is currently
available in the environment (the others will be in the process of being recursively defined). So
for α : V and W ⊆ V we will define the relations

ρ `α,W e −→ e0 and ρ `α,W d −→ d0

where ρ : α  W and e, e0 ∈ Γexp 0 def


α,W and d, d ∈ Γα,W where

Γexp
α,W = {e | ∃et. α `V e : et}

109
Γdef
α,W = {d | α `V d}

We also have the evident Texp def


α,W and Tα,W .

For formals we have the predicate

ae `µ,W form : ρ0

where ae ∈ Tµ,W and µ = M (T (form)).

For actual expressions the result desired will depend on the context and we introduce an
apparatus of different evaluation modes. The set Modes of modes is ranged over by µ given by

µ ::= · | value, µ | name, µ

Each actual expression type, aet, has a mode, M (aet) where

M (·) = ·
M (τ, aet) = value, M (aet)
M (τ name, aet) = name, M (aet)

We define transition relations ρ `α,W,µ ae −→ ae0 which are also parameterised on modes. The
set of configurations is, for α : V , W ⊆ V and mode µ

Γα,W,µ = {ae | ∃aet. α `V ae : aet and M (aet) = µ}

and we define the set Tα,W,µ of terminal actual expressions by some rules of the form `µ,W T (ae)

(1) `·,W T (·)


`µ,W T (ae)
(2)
`(value,µ),W T (con, ae)
`µ,W T (ae)
(3) (if FV(e) ∩ W = ∅)
`(name,µ),W T (e, ae)

It is rule 3 which introduces the need for W , insisting that all variables are bound, except,
possibly, for those being recursively defined.

The transition relation is defined for ρ : α  W and ae, ae0 ∈ Γα,W,µ and has the form ρ `α,W,µ
ae −→ ae0 . The apparatus of modes gives types what might also be called metatypes and this
may be a useful general idea. The reader should not confuse this with one normal usage of the
term mode as synonymous with type.

Transition Rules:

• Expressions: These are the same as before except for identifiers:


Identifiers: (1) ρ ` x −→ con (if ρ(x) = con)

110
(2) ρ ` x −→ e (if ρ(x) = e : τ name)
• Actual Expr.
ρ `α,W e −→ e0
Value Mode: (1)
ρ `α,(value,µ),W e, ae −→ e0 , ae
ρ `α,µ,W ae −→ ae0
(2)
ρ `α,(value,µ),W con, ae −→ con, ae0
Name Mode: (1) ρ `α,(name,µ),W e, ae −→ (let ρ  FV(e) in e), ae
ρ `α,µ,W ae −→ ae0
(2) (if FV(e) ∩ W = ∅)
ρ `α,(name,µ),W e, ae −→ e, ae0
• Definitions: Here we need a rule which ensures that the actual expressions are evaluated in
the right mode. Otherwise the rules are as before.
ρ `α,µ,W ae −→ ae0
Simple: (1) (where µ = M (T (form)))
ρ `α,W form = ae −→ form = ae0
ae ` form : ρ0
(2)
ρ `α,W form = ae −→ ρ0
(if ae ∈ Tα,µ,W where µ = M (T (form))
Formals: (1) · `·,W · : ∅
ae `µ,W form : ρ
(2)
con, ae `(value,µ),W (x : τ, form) : {x = con} ∪ ρ
ae `µ,W form : ρ
(3)
e, ae `(name,µ),W (x : τ name, form) : {x = e : τ name} ∪ ρ

Example 32 The main difference between call-by-name and call-by-value in applicative lan-
guages is that call-by-name may terminate where call-by-value need not. For example consider
the expression

e = let f (x : nat name) : nat = 1 and rec g(x : nat) : nat = g(x) in f (g(2))

Then ρ ` e −→∗ let ρ0 in f (g(2)) where ρ0 = {f (x : nat name) : nat = 1, g(x : nat) : nat =
. . .}. So we look at

ρ0 ` f (g(2)) −→ let x : nat name = g(2) in 1


−→∗ let {x = let g(x : nat) : nat = . . .} in 1
−→ 1

On the other hand if we change the formal parameter of f to be call-by-value instead, then, as
the reader may care to check, the evaluation does not terminate.

111
Call-by-Reference

We consider a variant (the simplest one!) where the actual parameter must be a variable (identi-
fier denoting a location). In other languages the actual parameter could be any of a wide variety
of expressions which are evaluated to produce a location; these might include conditionals and
function calls. This would require a number of design decisions on the permitted expressions
and on how the type-checking should work. For lack of time rather than any intrinsic difficulty
we leave such variants to exercise 17. Just note that it will certainly be necessary to rethink
expression evaluation; this should either be changed so that evaluation yields a natural value
(be it location or primitive value) or else different evaluation modes should be introduced.

Syntactically we consider an extension to our imperative language

form ::= loc x : τ, form.

Static Semantics

Clearly we have DI(locx : τ, form) = {x} ∪ DI(form). For types we add another actual expres-
sion type

aet ::= τ loc, aet

and

T (ref x : τ, form) = τ ref , aet

and we have the rule


form : β
(if x 6∈ I where β : I)
ref x : τ, form : {x = τ ref }, β
• Actual Expressions: These are as before with the addition
α `V ae : aet
(if α(x) = τ loc)
α `V x, et : τ loc, aet
It is here that we insist that actual reference parameters must be variables. As in the case of
call-by-name the type of an actual expression is not determined by its environment alone, but by
its context as well. (A more honest notation might be α, aet `V ae rather than α `V ae : aet.)

Dynamic Semantics

It is not necessary to change the definitions of DVal (or Env or Dec) as locations are already
included. However, we allow locations in AcExp and AeCon

ae ::= l, ae

112
acon ::= l, acon

and clearly FV(l, ae) = FV(ae) and we have the rule

α `I ae : aet
(l ∈ Locτ )
α `I l, ae : τ loc, aet

Transition Rules: We have relations ρ `α he, σi −→ he0 , σ 0 i, ρ `α hd, σi −→ hd0 , σ 0 i and


ρ `α hc, σi −→ hc0 , σ 0 i (or σ 0 ) and a predicate acon, L ` form : ρ, σ as before. For actual
expressions we proceed as with call-by-name and introduce a set, Mode, of evaluation modes

µ ::= · | val, µ | loc, µ

with the evident definition of M (aet) ∈ Mode and put for α : I and µ,

Γα,µ = {hae, σi | ∃aet. α `I ae : aet and µ = M (aet)}


Tα,µ = {hacon, σi ∈ Γα,µ }

and will define the transition relation for ρ : α and µ

ρ `α,µ hae, σi −→ hae0 , σ 0 i

Rules:

• Actual Expressions:
ρ `α he, σi −→ he0 , σ 0 i
Value Mode: (1)
ρ `α,(val,µ) h(e, ae), σi −→ h(e0 , ae), σ 0 i
ρ `α hae, σi −→ hae0 , σ 0 i
(2)
ρ `α,(val,µ) h(con, ae), σi −→ h(con, ae0 ), σ 0 i
Ref. Mode: (1) ρ `α,(ref ,µ) h(x, ae), σi −→ h(l, ae), σi (if ρ(x) = l)
ρ `α,µ hae, σi −→ hae0 , σ 0 i
(2)
ρ `α,(ref ,µ) h(l, ae), σi −→ h(l, ae0 ), σ 0 i
• Definitions:
ρ `α,µ hae, σi −→ hae0 , σ 0 i
Simple: (1) (if µ = M (T (form)))
ρ `α hform = ae, σi −→ hform = ae0 , σ 0 i
acon, L ` form : ρ0 , σ0
(2) (where σ : L)
ρ `α hform = acon, σi −→ hρ0 , σ ∪ σ0 i
Formals: We just add a rule for declaration-by-reference (= location)
acon, L ` form : ρ0 , σ0
(l, acon), L ` loc x : τ, form : {x = l} ∪ ρ0 , σ0
Note: All we have done is to include the construct x == y of Chapter 3 in our simple
declarations.
• Commands: No new rules are needed.

113
Clearly our discussion of binding mechanisms is only a start, even granting the ground covered
in the exercises. I hope the reader will have been led to believe that a more extensive coverage
is feasible. What is missing is a good guiding framework to permit a systematic coverage.

7.4 Higher Types

Since we can define or declare abstractions, such as functions and procedures, Tennent’s Prin-
ciple of Correspondence tells us that we can allow abstractions themselves as parameters of
(other) abstractions. The resulting abstractions are said to be of higher types (the resulting
functions are often called functionals). For example the following recursive definition is of a
function to apply a given function, f , to a given argument, x, a given number, t, of times:

rec Apply(f : nat −→ nat, x : nat, t : nat) : nat =


if t = 0 then x else Apply(f, f (x), t − 1)

We will illustrate this idea by considering a suitable extension of the imperative language of this
chapter (but neglecting call-by-reference). Another principle would be to allow any denotable
type to be an expressible type; this principle would allow locations or functions and procedures
as expressions and, in particular, as results of functions (by the Principle of Abstraction). For
example we could define an expression (naturally, called an abstraction)

λform. e

that would be an abbreviation for the expression let f (form) : τ = e in f . For a suitable
τ , depending on the context, it might, more naturally, be written as: function form. e; such
functions (and other similar abstractions) are often termed anonymous. Then the following
function would output the composition of two given functions

Compose(f : nat → nat, g : nat → nat) : nat → nat = λx : nat. f (g(x))

In this way we obtain (many) versions of the typed λ-calculus. A number of problems arise in
imperative languages where functions are not denotable, but only references to them. In the
definition of Compose one will have locally declared references to functions as the denotations
of f and g; if these are disposed of upon termination of the function call one will have a
dangling reference. Just the same thing happens, but in an even more bare-faced way, if we
allow locations as outputs

function f () : nat loc = let var x : nat = 5 in x

At any rate we will leave these issues to exercises, being moderately confident they can be
handled along the lines we have developed.

Now, let us turn to our language with higher types. We extend the syntax by including the

114
category AcETypes of actual expression types:

aet ::= · | τ, aet | (aet −→ τ ), aet | aet proc, aet

and then add to the stock of formals

form ::= function f : aet → τ, form | procedure p : aet, form

It is clear how this allows functions and procedures of higher type to be defined; they are passed
as arguments via identifiers that denote them.

Static Semantics

Clearly

DI(function f : aet −→ τ, form) = {f } ∪ DI(form) and


DI(procedure p : aet, form) = {p} ∪ DI(form)

The definition of T (form) in AcETypes is also evident and we note

T (function f : aet → τ, form) = (aet → τ ), T (form)


T (procedure p : aet, form) = aet proc, T (form)

As for the predicate form : β we first note the definition of the set, DTypes, of denotable types:

dt ::= et | et loc | aet −→ et | aet proc

The rules are fairly clear and we just note the procedure case:
form : β
(if p 6∈ I where β : I)
procedure p : aet, form : {p = aet proc}, β

Turning to the other predicates we only need to add a rule for actuals:
α `I ae : aet
(where dt = α(x) is either of the form aet → et or aet proc)
α `I x, ae : dt, aet
Example 33 Try type-checking the following imperative version of Apply in the environment
{x = nat}

function double(x : nat) : nat = 2 ∗ x


rec function apply(function f : nat → nat, x : nat, t : nat) : nat =
let var result : nat = x in
begin while t > 0 do begin x := f (x); t := t − 1 end
result result; end
x := apply(double, x, x)

115
Dynamic Semantics

Once more there is no need to change (the form of) the definitions of DVal or Env or Dec. We
must now allow abstracts within actual expressions and also AcCon

ae ::= (λform. e : τ ), ae | (λform. c), ae


acon ::= (λform. e : τ ), acon | (λform. c), acon

with the evident extensions to the definitions of FV(ae) and α `I ae : aet.

Transition Relations: In the following α : I and J ⊆ I.

• Expressions: We define configurations and terminal configurations as usual; for the transi-
tion relation we define for ρ : α  J

ρ `α,J he, σi −→ he0 , σ 0 i

• Actual Expressions: We take

Γα,J = {hae, σi | FI(ae) ⊆ I}

and

Tα,J = {hacon, σi | FI(acon) ∩ J = ∅}

and for ρ : α  J the relation

ρ `α,J hae, σi −→ hae0 , σ 0 i

• Declarations: We define Γα,J , Tα,J in the evident way, and the transition relation ρ `α,J
hd, σi −→ hd0 , σ 0 i is of the evident form.
• Commands: Again the configurations, the terminal configurations and the transition rela-
tion are of the evident forms.
• Formals: We will define the predicate acon, L `J form : ρ0 , σ0 where FI(acon) ∩ J = ∅.

Rules: Expressions, Declarations, Commands as before.

• Actual Expressions: As before, plus

ρ `α,J h(x, ae), σi −→ h(abs, ae), σi (if ρ(x) = abs ∈ Abstracts)

ρ `α,J hae, σi −→ hae0 , σ 0 i


ρ `α,J h(abs, ae), σi −→ h(abs, ae0 ), σ 0 i
• Formals: We just need two more rules
acon, L `J form : β
((λform. e : τ ), acon), L `J function f : aet → τ, form : {f = λform. e : τ }, β

116
(if f 6∈ I where β : I)
acon, L `J form : β
((λform. e), acon), L `J procedure p : aet, form : {p = λform. e}, β
(if p 6∈ I where β : I)

As a matter of fact the J’s are not needed, but we obtain finer control over the allowable actual
expression configurations. This can be useful in extensions of our language where abstractions
are allowed.

7.5 Modules and Classes

There is a certain confusion of terminology in the area of modules and classes. Rather than
enumerate the possibilities let me say what I mean here. First there is a Principle of Denotation
which says that one can in principle use an identifier to denote the value of any syntactic phrase
– where “value” is deliberately ambiguous and may indicate various degrees of “evaluation”.
For expressions this says we can declare constants (in imperative languages) but also allows
declaration by name or by text and so on; for commands it means we can have parameterless
subroutines. For declarations we take it as meaning one can declare identifiers as modules, and
they will denote the environment resulting from the elaboration. (There is a corresponding
Principle of Storeability which the reader will spot for himself; it is anything but clear how
useful these principles are!)

Applying the Principle of Abstraction to declarations on the other hand we obtain what we call
classes. Applying a class to actual arguments gives a declaration which can be used to supply
a denotation to a module identifier; then we say the module is an instance of the class. (Of
course everything we say here applies just as well to applicative languages; by now, however, it
is enough just to consider one case!)

A typical example is providing a random natural number facility. Let drand be the declaration

private
var a = seed mod d
within
function draw () : nat
begin a := a ∗ m mod d
result a/d end

where seed, d and m are assumed declared previously. This would declare a function, draw,
providing a random natural number with its own private variable – inaccessible from the outside.
If one wanted to declare and use two random natural numbers, just declare two modules

module X : draw : · → nat = drand

117
module Y : draw : · → nat = drand
begin . . . X.draw () . . . Y.draw () . . . end

Thus draw is an attribute of both X and Y and the syntax X.draw selects the attribute (in
general there is more than one).

When one wants some parameterisation and/or desires to avoid writing out drand several times,
one can declare and use a class

class random(const seed : nat, const d : nat) : draw : · → nat; drand ;


begin
..
.
module X : draw : · → nat = random(5, 2);
module Y : draw : · → nat = random(2, 3);
begin . . . X.draw () . . . Y.draw () . . . end
..
.
end

Finally we note that it is possible to use the compound forms of declarations to produce similar
effects on classes. For example a version of the SIMULA class-prefixing idea is available.

class CLASS1(form1) . . . ; . . . ;
class CLASS2(form2)—; —;
class PREFIXCLASS(form1, form2) . . . —;
CLASS1(form1); CLASS2(form2)

Naturally we will also be able to use simultaneous and private and recursive class declarations
(can you tell me some good examples of the use of these?). One can also easily envisage classes
of higher types (classicals?), but we do not investigate this idea.

Here is our extension of the syntax of the imperative language of the present chapter (but no
call-by-reference, or higher types).

• Types: We need the categories DTSpecs, AcETypes and DecTSpecs of denotable type spec-
ifications, actual expression types and declaration type specifications

dts ::= τ | τ loc | aet → τ | aet proc | dects | aet → dects


aet ::= · | τ, aet
dects ::= x : dts | x : dts, dects

Clearly dect will be the type of a module identifier and aet → dect will be the type of a class
identifier.
• Expressions: We add five(!!) new categories of expressions, function, procedure, variable,
module and class expressions, called FExp, PExp, VExp, MExp, CExp and ranged over by

118
f e, pe, ve, me, cle and given by the following productions (where we also allow f , p, v, m,
cl as metavariables over the set, Id, of identifiers)

f e ::= f | me.f
pe ::= p | me.p
ve ::= v | me.v
me ::= m | me.m | cle(ae)
cle ::= cl | me.cl

The definition of the set of expressions is extended by

e ::= me.x | f e(ae)

(and the second possibility generalises expressions of the form f (ae)). The set of actual
expressions is defined as before.
• Commands: We generalize commands of the forms p(ae) and x := e (i.e., procedure calls
and assignment statements) by

c ::= pe(ae) | ve := e

• Declarations: We add the following productions to the definition

d ::= module m : dects = d | class cl(form) : dects; d

Note that declaration types are used here to specify the types of the attributes of modules
and classes. If we except recursive declarations this information is redundant, but it could
be argued that it increases readability as the attribute types may be buried deep inside the
declarations.
• Formals: The definition of these remains the same as we do not want class or module
parameters.

Note: In this chapter we have essentially been following a philosophy of different expressions
for different uses. This is somewhat inconsistent with previous chapters where we have merged
different kinds of expressions (e.g., natural number and boolean) and been content to separate
them out again via the static semantics. By now the policy of this chapter looks a little ridicu-
lous and it could well be better to merge everything together. However, the reader may have
appreciated the variation.

Static Semantics

For the definitions of FI(f e), . . . , FI(cle) we do not regard the attribute identifiers as free (but
rather as a different use of identifiers from all previous ones; their occurrences are the same
as constant occurrences and they are thought of as standing for themselves). So for example

119
FI(me) is given by the table

m me.m cle(ae)
FI {m} FI(me) FI(cle) ∪ FI(ae)

For the definitions of FI(e), FI(c) we put

FI(me.x) = FI(me)
FI(f e(ae)) = FI(f e) ∪ FI(ae)
FI(pe(ae)) = FI(pe) ∪ FI(ae)
FI(ve := e) = FI(ve) ∪ FI(e)

and for FI(d) and DI(d) we have

module m : dect = d class cl(form) : dect; d


FI FI(d) FI(d)\DI(form)
DI {m} {d}

(We are really cheating somewhere here. For example the above scheme would not work if we
added the reasonable production

d ::= me

as then with, for example, a command m; begin . . . x . . . end the x can be in the scope of
the m if the command is in the scope of a declaration of the form module m : dect = var x :
nat = . . . ; . . .

Thus it is no longer possible to define the free identifiers of a phrase in a context-free way. Let
us agree to ignore the problem.)

• Types: We define (mutually recursively) the sets ETypes, FETypes, . . . , ClETypes, DTypes,
TEnv of expression types, function expression types, . . . , class expression types, denotable
types and type environments by

et ::= τ
f et ::= aet → τ
pet ::= aet proc
vet ::= τ loc
met ::= α
clet ::= aet −→ α
dt ::= et | vet | f et | pet | met | clet

120
TEnv = Id −→fin DTypes (with α ranging over TEnv)

To see how the sets DTSpecs and DecTSpecs of denotable and declaration type specifications
specify denotable and declaration types respectively, we define predicates

dts : dt and dects : dect

by the formulae
- DTSpecs:
(1) τ : τ
(2) τ loc : τ loc
(3) aet → τ : aet → τ
(4) aet proc : aet proc
dects : α
(5) (where the premise means proved from the rules for DecTSpecs)
dects : α
dects : α
(6)
aet → dects : aet → α
- DecTSpecs:
dts : α
(1)
(x : dts) : {x = α}
dts : α dects : β
(2) (if x 6∈ I for β : I)
(x : dts, dets) : {x = α} ∪ β
Next T (form) ∈ AcETypes is defined as before. Now we must define the predicates

α `I e : et α `I f e : f et, . . . , α `I cle : clet, α `I c,


`I d : β α `I d form : β

The old rules are retained and we add new ones as indicated by the following examples.
• Expressions:
α `I me : β
(1) (if β(x) = dt)
α `I me.x : dt
α `I f e : aet → et α `I ae : aet
(2)
α `I f e(ae) : et
• Function Expressions:
(1) α `I f : f t (if α(f ) = f t ∈ FTypes)
α `I me : β
(2) (if β(f ) = f t ∈ FTypes)
α `I me.f : f t
• Class Expressions:
(1) α `I cle : clet (if α(cl) = clet ∈ ClETypes)
α `I me : β
(2) (if β(cl) = clet ∈ ClETypes)
α `I me.cl : clet
• Commands:

121
α `I pe : aet proc α `I ae : clet
(1)
α `I pe(ae)
α `I vet : τ loc α `I e : τ
(2)
α `I (vet := e)
• Declarations:
- Modules:
dects : β
(1)
(module m : dects = d) : {m = β}
dects : β α `I d : β
(2)
α `I module m : dects = d
- Classes:
dects : β
(1)
(class cl(form) : dects; d) : {cl = T (form) −→ β}
dects : β form : α0 α[α0 ] `I∪I0 d : β
(2) (where α0 : I0 )
α `I class cl(form) : dects : d

Dynamic Semantics

First we define the sets FECon, . . . , ClECon of function expression constants, . . . , class
expression constants by

f econ ::= λform. e : et


pecon ::= λform. c
vecon ::= l
mecon ::= ρ
clecon ::= λform. d : β

and also add the productions

f e ::= f econ, . . . , cle ::= clean | d

and define the sets DVal and Env of denotable values and environments by

dval ::= con | vecon | f econ | pecon | clecon | mecon


Env = Id −→fin DVal

and extend the definition of declarations by the production

d ::= ρ

These are mutually recursive definitions of a harmless kind. The extensions to the definition of
FI(f e), . . . , FI(de), FI(d), DI(d) are evident; for example FI(λform. d : β) = FI(d)\DI(form).

We must also extend the definitions of α `I f e : f et, . . . , α `I cle : clet and `I d : β and α `I d
(the latter two in the case d = ρ). The former extensions are obvious; for example

122
• Class Abstracts:
form : α0 α[α0 ] `I∪I0 d : β
(where α0 : I0 )
α `I (λform. d : β)

For the latter we have to define `I decon : dt and this also presents little difficulty; for example

• Class Abstracts:

`I (λform. d : β) : T (form) → β

Then we have the two rules


∀x ∈ I0 `I ρ(x) : β(x)
(1) (where ρ : I0 )
`I ρ : β
∀x ∈ I0 α `I ρ(x) : β(x)
(2)
α `I ρ

Transition Relations: The set, Stores, is as before.

The configurations, final configurations and the transition relations for expressions, actual ex-
pressions and declarations are as before; for formals we have the same predicate as before. Now
fix α : I and ρ : α  J (for some J ⊆ I).

• Function Expressions: We take Γα = {hf e, σi | ∃f et. α `I f e : f et}, Tα = {hf econ, σi |


∃f et. α `I f econ : f et} and the transition relation has the form ρ `I γ −→ γ 0

The definitions for PExp, . . . , CExp are the analogues of that for function expressions

Rules:

• Class Expressions:
(1) ρ `α hcl, σi −→ hclecon, σi (if ρ(cl) = clecon)
ρ `α hme, σi −→ hme0 , σ 0 i
(2)
ρ `α hme.cl, σi −→ hme0 .cl, σi
(3) ρ `α hρ0 .cl, σi −→ hclecon, σi (if ρ0 (cl) = clecon)

The rules for FExp, . . . , MExp are similar except that in the last case we need also

ρ `α hcle, σi −→ hcle0 , σ 0 i
(1)
ρ `α hcle(ae), σi −→ hcle0 (ae), σ 0 i
(2) ρ `α h(λform. d : β)(ae), σi −→ hprivate form = ae within d, σi
ρ `α hd, σi −→ hd0 , σ 0 i
(3) (where in the top line we mean a transition of Decl)
ρ `α hd, σi −→ hd0 , σ 0 i

123
The new rules for expressions and commands should be clear; for example

• Assignment:
ρ `σ hve, σi −→ hve0 , σ 0 i
(1)
ρ `α hve := e, σi −→ hve0 := e, σ 0 i
ρ `α he, σi −→ he0 , σ 0 i
(2)
ρ `α hl := e, σi −→ hl := e0 , σ 0 i
(3) ρ `α hl := con, σi −→ σ[l = con]

For declarations the new rules are

• Modules:
ρ `α hd, σi −→ hd0 , σ 0 i
(1)
ρ `α hmodule m : dects = d, σi −→ hmodule m : dects = d0 , σ 0 i
(2) ρ `α hmodule m : dects = ρ0 , σi −→ h{m = ρ0 }, σi
• Classes:
ρ `α class cl(form) : dects; d −→ {cl = λform. (ρ\I) in d} (where I = DI(form))

7.6 Exercises

1. Consider dynamic binding in the context of a simple applicative language so that, for
example,

let x = 1; f (y) = x + y
in let x = 2 in f (3)

has value 5. What issues arise with type-checking? Can you program iterations (e.g.,
factorial) without using recursive function definitions?

2. In a maximalist solution to the problem (in the applicative language) of neatly specifying
functions of several arguments one could define the class of formal parameters by

form ::= · | x : τ | form, form

and merge expressions and actual expressions, putting

e ::= · | e, e | f (e)

and amending the definition of definitions

d ::= form = e | f (form) : τ = e

a) Do this, but effectively restrict the extension to the minimalist case by a suitable choice
of static semantics.

124
b) Allow the full extension.
c) Go further and extend the types available in the language by putting

τ ::= nat | bool | τ, τ | ·

thus allowing tuples to be denotable.

3. Consider the maximalist position in a simple imperative programming language.

4. Consider in a simple imperative language how to allow expressions on the left-hand of


assignments:

e0 := e1

and even the boolean expression e0 ≡ e1 which is true precisely when e0 and e1 evaluate
to the same reference. As well as discussing type-checking issues, try the two following
approaches to expression evaluation:
a) Expressions are evaluated to their natural values which will be either locations or basic
values.
b) Modes of evaluation are introduced, as in the text.
Extend the work to the maximalist position where actual expressions and expressions are
merged, thus allowing simultaneous assignments.

5. Just as expressions are evaluated, and so on, formals are matched (to given actual values)
to produce environments (= matchings). The semantics given above can be criticised as
not being dynamic enough as the matching process is not displayed. Provide an answer
to this; you may find configurations of the form

hform, con, ρi

useful where form is the formal being matched, con is the actual value and ρ is the
matching produced so far. A typical rule could be

hx : τ, con ρi −→ ρ ∪ {x = con}

This is all for the applicative case; what about the imperative one? Investigate dynamic er-
rors, allowing constants and repeated variables in the formals (dynamic error = matching
failure).

6. In the phrase rec d all identifiers in R = FV(d) ∩ DV(d) are taken to be recursively
defined. Investigate the alternative rec x1 , . . . , xn .d where {x1 , . . . xn } ⊆ R.

7. In some treatments of recursion to evaluate an expression of the form

let rec (f (x) = . . . f . . . g . . . and g(x) = — f — g —) in f (5)

125
one evaluates f (5) in the environment

ρ = {f (x) = . . . f . . . g . . . , g(x) = — f — g —}

(ignoring free variables) and uses the simple transition:

ρ ` f (5) −→ let x = 5 in . . . f . . . g . . .

I could not see how to make this simple and nice idea (leave the recursively defined
variables free) work in the present setting where one has nested definitions and binary
operations on declarations. Can you make it work?

8. Try some examples of the form

let rec (f (x) = . . . f . . . g . . . & g(x) = — f — g —) in e

where & is any of ;, and or in.

9. Consider the following recursive definitions of constants:

a) rec x : nat = 1
b) rec (y : nat = 1 and x : nat = y)
c) rec (x : nat = y and y : nat = 1)
d) rec (x : nat = x)
e) rec (x : nat = y and y : nat = x)

How are these treated using the above static and dynamic semantics? What do you
think should happen? Specify suitable static and dynamic semantics with any needed
error rules. Justify your decisions, considering how your ideas will extend to imperative
languages with side-effects (which might result in non-determinism).

10 Find definitions d0 and d1 to make different as many as possible of the following definitions:

a) (rec d0 ; d1 )
b) (rec) (rec d0 ; d1 )
c) (rec) (d0 ; rec d1 )
d) (rec) (rec d0 ; rec d1 )

where (rec) d indicates the two possibilities with and without rec.

11. Check that the first alternative for type-checking recursive definitions would work in the

126
sense that

α `V d : β iff `V d : β and α `V d

12. Programming languages like PASCAL often adopt the following idea for function defini-
tion:

function f (form) : τ
begin
c
end

where within c the identifier f as well as possibly denoting a function also denotes a
location, created on function entry and destroyed on exit; the result of a function call is
the final value of this location on exit. For example the following is an obscure definition
of the identity function:

rec function f (x : nat) : nat


begin
f := 1;
if x = 0 then f := 0
else f := f + f (x − 1)
end

Give this idea a semantics.

13. Call-by-need. In applicative languages this is a “delayed evaluation” version of call-by-


name. As in call-by-name the formal is bound to the unevaluated actual, with the local
environment bound in the closure. However, when it is necessary for the first time to
evaluate the actual, the formal is then bound to the result of the evaluation. Give this idea
a semantics. One possibility is to put (some of) the environment into the configurations,
treating it like a store. Another is to bind the actual to a new location and make the
actual the value of that location in a store. Prove call-by-need equivalent to call-by-
name. Consider delayed evaluation variants of parameter mechanisms found in imperative
languages.

14. Call-by-name. Consider (minimalist & maximalist) versions of call-by-name in imperative

127
programming languages. Look out for the dangers inherent in

procedure f (x : nat name) =


begin
..
.
x := · · ·
..
.
end

15. Discover the official ALGOL 60 definition of call-by-name (it works via a substitution
process); give a semantics following the idea and prove it equivalent to one following the
idea in these notes (substitution = binding a closure).

16. Call-by-text. Give a semantics for call-by-text where the formal is bound to the actual
(not binding in the current environment); when a value is desired the actual is evaluated
in the then current environment. Consider also more “concrete” languages in which the
abstract syntax (of the text) is available to the programmer, or even the concrete syntax:
does the latter possibility lead to any alteration of the current framework?

17. Call-by-reference. Give a maximalist discussion of call-by-reference, still only allowing ac-
tual reference parameters to be variables. Extend this to allow a wider class of expressions
which (must) evaluate to a reference. Extend that in turn to allow any expression as an
actual; if it does not evaluate to a reference the formal should be bound to a new reference
and that should have the value of the actual.

18. Call-by-result. Discuss this mechanism where first the actual is evaluated to a reference, l;
second the formal is bound to a new reference l0 (not initialised); third, after computation
of the body of the abstract, the value of l is set to the value of l0 in the then current store.
Discuss too a variant where the actual is not evaluated at all until after the body for the
abstract. [Hint: Use declaration finalisation.]

19. Call-by-value-result. Discuss this mechanism where first the actual is evaluated to a refer-
ence l; second the formal is bound to a new reference l0 which is initialised to the current
value of l; third, after the computation of the abstract of the body, the value of l is set
to the value of l0 in the then current store.

20. Discuss selectors which are really just functions returning references. A suitable syntax
might be

selector f (form) : τ = e

which means that f returns a reference to a τ value. First consider the case where all

128
lifetimes are semi-infinite (extending beyond block execution). Second consider the case
where lifetimes do not persist beyond the block where they were created; in this case
interesting questions arise in the static semantics.

21. Consider higher-order functions in programming languages which may return abstracts
such as functions or procedures. Thus we add the syntax:

e ::= λform. e | λform. c

The issues that arise include those of lifetime addressed in exercise 20.

22. Here is a version of the typed λ-calculus

τ ::= nat | bool | τ −→ τ


e ::= m | t | x | e bop e | if e then e else e |
let x : τ = e in e | e(e) | λx : τ. e

Give a static semantics and two dynamic semantics where the first one is a standard
one using environments and where the second one is for closed expressions only and uses
substitutions as discussed in the exercises of Chapter 3. Prove these equivalent. Add a
recursion operator expression

e ::= Y

with the static semantics α `V Y : (τ −→ τ ) −→ τ (τ 6= nat, bool) and a rule something


like ρ ` Y e0 −→ e0 (Y e0 ). What does this imply about formals which are of functional
type and their evaluation, and why is that important?

129
A A Guide to the Notation

Syntactic Categories
Truthvalues t∈T
Numbers m, n ∈ N
Constants con ∈ Con
Actual Constants acon ∈ ACon
Unary Operations uop ∈ Uop
Binary Operations bop ∈ Bop

Variables v, f ∈ Var V ⊆fin Var


Identifiers x, f, p, m, cl ∈ Id I ⊆fin Id

Expressions e ∈ Exp
Boolean b ∈ BExp
Actual ae ∈ AExp
Variable ve ∈ VExp
Function f e ∈ FExp
Procedure pe ∈ PExp
Module me ∈ MExp
Class cle ∈ CExp

Commands
(=Statements) c ∈ Com

Definitions/
Declarations d ∈ Def/Dec

Formals form ∈ Forms


Types τ ∈ Types
Expression et ∈ ETypes
Actual Expr. aet ∈ AETypes
Denotable
Type Spec. dts ∈ DTSpecs
Declaration
Type Spec. dects ∈ DecTSpecs

Static Semantics
Free Variables/
Identifiers FV/I(e), FI(c), FV/I(d) etc.
Defined Variables/
Identifiers DV/I(d) DV/I(form)

130
Denotable Types dt ∈ DTypes
Type Environments α, β ∈ TEnv (e.g., = Id −→fin DTypes)
Example Formulae α `V e : et α `I c α `I d : β
form : β T (form) = aet

Dynamic Semantics
Denotable Values dval ∈ DVal
Environments ρ ∈ Env (e.g., = Id −→fin DVal)
Storeable Types st ∈ STypes
I ∈ Loc = st Locst L ⊆fin Loc
P
Locations
sval ∈ SVal = st Valst
P
Storeable Values
Stores σ ∈ Stores (e.g., = {σ ∈ Loc −→fin SVal |
∀st ∈ STypes. σ(Locst ) ⊆ SValst })
Evaluation Modes µ ∈ Modes
Transition Systems hΓ, T, −→i γ∈Γ
where Γ is the set of configurations
T ⊆ Γ is the set of final configurations
γ −→ γ 0 is the transition relation
Example
Configurations he, σi; hc, σi, σ; hd, σi
Example Final
Configurations hcon, σi; σ; hρ, σi
Example Transition
Relations ρ `I,µ he, σi −→ he0 , σ 0 i
ρ `I hc, σi −→ hc0 , σ 0 i/σ 0
ρ `I hd, σi −→ hd0 , σ 0 i/ρ0

B Notes on Sets

We use several relations over and operations on sets as well as the (very) standard ones. For
example X ⊆fin Y means X is finite and a subset of Y .

Definition 34 Let Op(X1 , . . . , Xn ) be an operation on sets. It is monotonic if whenever X1 ⊆


X10 , . . . , Xn ⊆ Xn0 we have Op(X1 , . . . , Xn ) ⊆ Op(X10 , . . . , Xn0 ). It is continuous if whenever
X11 ⊆ X12 . . . ⊆ X1m ⊆ . . . is an increasing sequence and . . . and Xn1 ⊆ Xn2 ⊆ . . . ⊆ Xnm ⊆ . . . is
an increasing sequence then

X1m , . . . , Xnm ) = Op(X1m , . . . , Xnm )


[ [ [
(∗) Op(
m m m

Note: Continuity implies monotonicity. Conversely to prove continuity, first prove monotonic-

131
ity. This establishes the 00 ⊇00 half of (∗); then prove the 00 ⊆00 half.

Example 35

• Cartesian Product:

X1 × . . . × Xn = {hx1 , . . . , xn i | x1 ∈ X1 and . . . and xn ∈ Xn }

is monotonic and continuous. Prove this yourself.


• Disjoint Sum:
def
X1 + . . . + Xn = ({1} × X1 ) ∪ . . . ∪ ({n} × Xn )
def
i∈A {i} × Xi
P S
i∈A Xi =

Show that the finite sum operation is continuous. (Finite Sum is just union, but forced to be
disjoint.)
• Finite Functions: The class of finite functions from X to Y is
X
X →fin Y = A→Y
A⊆fin X

Note that the union is necessarily disjoint. Show that →fin is continuous.

For A ⊆fin X if f ∈ A → Y ⊆ X →fin Y (we identify f with hA, f i) we write f : A. This is


used for environments (including type environments) and stores. There are two useful unary
operations on finite functions. Suppose that f : A and B ⊆ A. Then the restriction of f to B
is written f  B, and defined by:

(f  B)(b) = f (b) (for b in B)

Note that f  B : B. For any C ⊆ X we also define f


C = f  (A ∩ C).

There are also two useful binary operations. For f : A and g : B in X →fin Y we define
f [g] : A ∪ B by

 g(c) (c ∈ B)

f [g](c) =
 f (c) (c ∈ A\B)

and in case A ∩ B = ∅ we define f, g : A, B (also written f ∪ g) by:



 f (c) (c ∈ A)

f, g(c) =
 g(c) (c ∈ B)

Note this is a special case of the first definition, but it is very useful and worth separate mention.

132
The Importance of Continuity

Suppose Op(X) is continuous and we want to find an X solving the equation

X = Op(X)

Put X 0 = ∅ and X m+1 = Op(X m ). Then (by induction on m) we have for all m, X m ⊆ X m+1
and putting X = m X m
S

X m)
S
Op(X) = Op( m

Op(X m )
S
= m (by continuity)
X m+1
S
= m

=X

And one can show (do so!) that X is the least solution – that is if Y is any other than X ⊆ Y .
Indeed X is even the least set such that Op(X) ⊆ X.

This can be generalised, suppose Op1 (X1 , . . . , Xn ), . . . , Opn (X1 , . . . , Xn ) are all continuous and
we want to solve the n equations

X1 = Op1 (X1 , . . . , Xn )
..
.
Xn = Opn (X1 , . . . , Xn )

Put Xi0 = ∅ for i = 1, . . . , n and define

Xim+1 = Opi (X1m , . . . , Xnm )

Then for all m and i, Xim ⊆ Xim+1 (prove this) and putting

Xim
[
Xi =
m

we obtain the least solutions to the equations – if Yi are also solutions then for all i, Xi ⊆ Yi .
Indeed the Xi are even the least sets such that Opi (Xi , . . . , Xn ) ⊆ Xi (i = 1, . . . , n). This is
used in the example below. Prove this.

Example 36 Suppose we are given sets Num, Id, Bop and wish to define sets Exp and Com
by the abstract syntax

e ::= m | x | e0 bop e1
e ::= x := e | c0 ; c1 | if e0 = e1 then c0 else c1 | while e0 = e1 do c

133
Then we regard this definition as giving us set equations

Exp = Num + Id + (Exp × Bop × Exp)


Com = (Id × Exp) + Com × Com + (Exp × Exp × Com × Com) + (Exp × Exp × Com)

and also giving us a notation for working with the solution to the equations. First m is identified
with h1, mi ∈ Exp and x is identified with h2, xi in Exp. Next

e0 bop e1 = h3, he0 , bop, e1 ii


x := e = h1, hx, eii
c0 ; c1 = h2, hc0 , c1 ii
if e0 = e1 then c0 else c1 = h3, he0 , e1 , c0 , c1 ii
while e0 = e1 do c0 = h4, he0 , e1 , c0 ii

Now the set equations are easily solved using the above techniques as they are in the form

Exp = Op1 (Exp, Com)


Com = Op2 (Exp, Com)

where Op1 (Exp, Com) = Num + Id + (Exp × Bop × Exp) and Op2 is defined similarly. Clearly
Op1 and Op2 are continuous as they are built up out of (composed from) the continuous disjoint
sum and product operations (prove they are continuous). Therefore we can apply the above
techniques to find a least solution Exp, Com. Note that Exp and Com are therefore the least
sets such that

1. Num ⊆ Exp and Id ⊆ Exp (using the above identifications).


2. If e0 , e1 are in Exp and bop is in Bop then e0 bop e1 is in Exp.
3. If x is in Id and e is in Exp then x := e is in Com.
..
4. . . . .
..
5. . . . .
6. If e0 , e1 are in Exp and c is in Com then while e0 = e1 do c is in Com.

At some points in the text environments (and similar things) were mutually recursively defined
with commands and so on. This is justified using our apparatus of continuous set operators
employing, in particular, the finite function operator.

134
Programming T.A. Standish
Languages Editor
1. Introduction
Guarded Commands,
Nondeterminacy and In Section 2, two statements, an alternative con-
struct and a repetitive construct, are introduced, to-
gether with an intuitive (mechanistic) definition of their
Formal Derivation semantics. The basic building block for both of them

of Programs is the so-called "guarded c o m m a n d , " a statement list


prefixed by a boolean expression: only when this
boolean expression is initially true, is the statement list
Edsger W. Dijkstra eligible for execution. The potential nondeterminacy
Burroughs Corporation allows us to m a p otherwise (trivially) different programs
on the same program text, a circumstance that seems
largely responsible for the fact that programs can now
be derived in a manner more systematic than before.
In Section 3, after a prelude defining the notation,
So-called "guarded commands" are introduced as a a formal definition of the semantics of the two con-
building block for alternative and repetitive constructs structs is given, together with two theorems for each
that allow nondeterministic program components for of the constructs (without proof).
which at least the activity evoked, but possibly even the In Section 4, it is shown how, based upon the above,
final state, is not necessarily uniqilely determined by the a formal calculus for the derivation of programs can
initial state. For the formal derivation of programs be founded. We would like to stress that we do not
expressed in terms of these constructs, a calculus will be present "an algorithm" for the derivation of programs:
be shown. we have used the term "a calculus" for a formal dis-
Key Words and Phrases: programming languages, cipline--a set of rules--such that, if applied successfully:
sequencing primitives, program semantics, programming (1) it will have derived a correct program; and (2) it
language semantics, nondeterminacy, case-construction, will tell us that we have reached such a goal. (We use
repetition, termination, correctness proof, derivation of the term as in "integral calculus.")
programs, programming methodology
CR Categories: 4.20, 4.22
2. Two Statements Made from Guarded Commands

If the reader accepts "other statements" as indi-


cating, say, assignment statements and procedure calls,
we can give the relevant syntax in Br~F [2]. In the follow-
ing we have extended BNF with the convention that the
braces {... } should be read as "followed by zero or more
instances of the enclosed."

(guarded c o m m a n d ) ::= (guard) ~ (guarded list)


(guard) ::= (boolean expression)
(guarded list) :: = (statement) { ; (statement) }
(guarded c o m m a n d set) ::= (guarded c o m m a n d )
{~ (guarded c o m m a n d ) }
(alternative construct) :: = if (guarded c o m m a n d set) fi
(repetitive construct) :: = do (guarded c o m m a n d set) od
(statement) ::= (alternative construct) [
Copyright © 1975, Association for Computing Machinery, Inc.
General permission to republish, but not for profit, all or part (repetitive construct) I "other statements"
of this material is granted provided that ACM's copyright notice
is given and that reference is made to the publication, to its date The semicolons in the guarded list have the usual
of issue, and to the fact that reprinting privileges were granted meaning: when the guarded list is selected for execu-
by permission of the Association for Computing Machinery.
Author's address: Burroughs, Plataanstraat 5, Nuenen--4565, tion its statements will be executed successively in the
The Netherlands. order from left to right; a guarded list will only be

453 Communications August 1975


of Volume 18
the ACM Number 8
selected for execution in a state such that its guard is determine k such that for fixed value n (n > 0) and a
true. Note that a guarded command by itself is not a fixed functionf(i) defined for 0 < i < n, k will eventually
statement: it is a component of a guarded command satisfy:0 < k < n a n d ( V i : 0 _< i < n:f(k) >_f(i)).
set from which statements can be constructed. If the (Eventually k should be the place of a maximum.)
guarded command set consists of more than one guarded
k:= 0;j:= 1;
command, they are mutually separated by the sepa-
doj~ n~iff(j) <f(k) ~j:=jq- 1
rator [~ ; our text is then an arbitrarily ordered enumera-
tion of an unordered set; i.e. the order in which the [~f(j) >_f ( k ) ~ k := j ; j := j q- 1
fi
guarded commands of a set appear in our text is seman-
od.
tically irrelevant.
Our syntax gives two ways for constructing a state- Only permissible final states are possible and each
ment out of a guarded command set. The alternative permissible final state is possible.
construct is written by enclosing it by the special
bracket pair i f . . . fi. If in the initial state none of the
guards is true, the program will abort; otherwise an 3. Formal Definition of the Semantics
arbitrary guarded list with a true guard will be selected
for execution. 3.1 Notational Prelude
Note. If the empty guarded command set were al- In the following sections we shall use the symbols
lowed if fi would be semantically equivalent to "abort". P, Q, and R to denote (predicates defining) boolean
(End of note.) functions defined on all points of the state space;
An example--illustrating the nondeterminacy in a alternatively we shall refer to them as "conditions,"
very modest fashion--would be the program that for satisfied by all states for which the boolean function is
fixed x and y assigns to m the maximum value of x true. Two special predicates that we denote by the
and y: reserved names T and F play a special role: T denotes
ifx > y--,m := x the condition that, by definition, is satisfied by all
~y>_x--,m:=y states; F denotes, by definition, the condition that is
fi. satisfied by no state at all.
The repetitive construct is written down by enclos- The way in which we use predicates (as a tool for
ing a guarded command set by the special bracket pair defining sets of initial or final states) for the definition
d o . . . od. Here a state in which none of the guards is of the semantics of programming language constructs
true will not lead to abortion but to proper termina- has been directly inspired by Hoare [1], the main dif-
tion; the complementary rule, however, is that it will ference being that we have tightened things up a bit:
only terminate in a state in which none of the guards while Hoare introduces sufficient pre-conditions such
is true: when initially or upon completed execution of a that the mechanisms will not produce the wrong result
selected guarded list one or more guards are true, a (but may fail to terminate), we shall introduce necessary
new selection for execution of a guarded list with a and sufficient--i.e, so-called "weakest"--pre-condi-
true guard will take place, and so on. When the repeti- tions such that the mechanisms are guaranteed to
tive construct has terminated properly, we know that produce the right result.
all its guards are false. More specifically: we shall use the notation wp(S, R),
Note. If the empty guarded command set were where S denotes a statement list and R some condition
allowed do od would be semantically equivalent to on the state of the system, to denote the weakest pre-
"skip". (End of note.) condition for the initial state of the system such that
An example--showing the nondeterminacy in some- activation of S is guaranteed to lead to a properly
what greater glory--is the program that assigns to terminating activity leaving the system in a final state
the variables ql, q2, q3, and q4 a permutation of the satisfying the post-condition R. Such a wp--which is
values Q1, Q2, Q3, and Q4, such that ql _< q2 _< called "a predicate transformer" because it associates a
q3 < q4. Using concurrent assignment statements for pre-condition to any post-condition R--has, by defini-
the sake of convenience, we can program tion, the following properties.

ql, q2, q3, q4 := Q1, Q2, Q3, Q4; 1. For any S, we have for all states: wp(S,F) = F (the
do ql > q2 ~ ql, q2 := q2, ql so-called Law of the Excluded Miracle).
[7 q2 > q3 ~ q2, q3 := q3, q2 2. For any S and any two post-conditions, such that
[~ q3 > q4 ~ q3, q4 := q4, q3 for all states P ~ Q, we have for all states:
Qd. wp( S,P) ~ wp( S, Q).
3. For any S and any two post-conditions P and
To conclude this section, we give a program where Q, we have for all states (wp(S,P) and wp(S,Q)) =
not only the computation but also the final state is not wp(S,P and Q).
necessarily uniquely determined. The program should 4. For any deterministic S and any post-conditions P

454 Communications August 1975


of Volume 18
the ACM Number 8
and Q, we have for all states (wp(S,P) or wp(S,Q)) lead to a properly terminating activity leaving the sys-
= wp(S,P or Q). tem in a final state such that the value of t is decreased
by at least 1 (compared to its initial value). In terms of
For nondeterministic mechanisms S, the equality has to
wdec we can formulate the very similar:
be replaced by an implication; the resulting formula
THEOREM 2. From (Vi : 1 < i < n : (Q and B~)
follows from the second property.
wdec(SLi,t)) for all states we can conclude that
Together with the rules of propositional calculus and
(Q and BB) ~ wdec(IF, t) holds for all states.
the semantic definitions to be given below, the above four
Note (which can be skipped at first reading). The
properties take over the role of the "rules of infer-
relation between wp and wdec is as follows. For any
ence" as introduced by Hoare [1 ].
point X in state space we can regard wp(S, t <_ to) as
We take the position that we know the semantics oi'
an equation with to as the unknown. Let its smallest
a mechanism S sufficiently well if we know its predicate
solution for to be tmin(X). (Here we have added the
transformer, i.e. can derive wp(S,R) for any post-con-
explicit dependence on the state X.) Then train(X) can
dition R.
be interpreted as the lowest upper bound for the final
Note. We consider the semantics of S only defined
value of t if the mechanism S is activated with X as initial
for those initial states for which has been established
state. Then, by definition, wdec(S, t) = (tmin(X) <_
a priori that they satisfy wp(S,T), i.e. for which proper
t(X) - 1) = (tmin(X) < t(X)). (End of note.)
termination is guaranteed (even in the face of possibly
non-deterministic behavior); for other initial states we
don't care. By suitably changing S, if necessary, we 3.3 The Repetitive Construct
can always see to it that wp(S,T) is decidable. (End of As is to be expected, the definition of the repetitive
note.) construct
Example 1. The semantics of the empty statement,
denoted by "skip" are given by the definition that for do B1 --o SLx ~] . • • [7 B n ---o SL, od,
any post-condition R, we havewp ("skip", R) = R. that we denote by DO, is more complicated. Let
Example 2. The semantics of the assignment state-
ment "x := E " are given by wp("x := E", R) = REx, Ho(R) = (R and non BB)
in which RB~ denotes a copy of the predicate defining R
in which each occurrence of the variable x is replaced and f o r k > 0,
by (E). Hk(R) = (wp(1F, Hk_I(R)) or Ho(R))
Example 3. The semantics of the semicolon ";" as
concatenation operator are given by (where IF denotes the same guarded command set en-
wp("Sl ; $2", R) = wp(Sl, wp(S2,R)). closed by "if fi"). Then, by definition
wp(DO, R) = ( 3 k : k >_ 0 : Hk(R)).
3.2 The Alternative Construct
In order to define the semantics of the alternative (Intuitively, Hk(R) can be interpreted as the weakest
construct we define two abbreviations. pre-condition guaranteeing proper termination after
Let IF denote at most k selections of a guarded list, leaving the
if B x --o SL1 [7 . . • [7 B ~ - - o SL~ fi; system in a final state satisfying R . ) V i a mathematical
induction we can prove:
let BB denote THEOREM 3. I f we have for all states (P and BB)
(':li : 1 < i < n : B i ) ; (wp(1F, P) and wdec(IF, t) and t >__ 0) we can conclude
that we have for all states P ~ wp(DO, P and non BB).
then, by definition Note. The antecedent of Theorem 3 is of the form
wp(IF, R) = (BB and (Vi : 1 < i < n : Bi ~ wp(SL~,R)),. of the consequents of Theorems 1 and 2. (End of note.)
Because T is the condition by definition satisfied by
(The first term BB requires that the alternative construct all states, wp(S,T) is the weakest pre-condition guaran-
as such will not lead to abortion on account of all teeing proper termination for S. This allows us to
guards false; the second term requires that each guarded formulate an alternative theorem about the repetitive
list eligible for execution will lead to an acceptable construct, viz. :
final state.) F r o m this definition we can derive--by THEOREM 4. From (P and BB) ~ wp(IF, P ) f o r all
simple substitutions: states, we can conclude that we have for all states
THEOREM 1. From (Vi : 1 < i < n : (Q and Bi) (P and wp(DO, T)) ~ wp(DO, P and non BB).
wp(SLi,R)) for all states we can conclude that (Q and Note. In connection with the above theorems, P
BB) ~ wp(1F, R) holds for all states. is called "the invariant relation" and t is called "the
Let t denote some integer function, defined on the variant function." Theorems 3 and 4 are easily proved
state space, and let wdec(S,t) denote the weakest pre- by mathematical induction, with k as the induction
condition such that activation of S is guaranteed to variable. (End of note.)

455 Communications August 1975


of Volume 18
the ACM Number 8
4. Formal Derivation of P r o g r a m s Because the guard must be a computable boolean
expression and should not contain the computation of
The formal requirement of our program performing gcd(X, Y ) - - f o r that was the whole p r o b l e m - - w e must
m := m a x ( x , y ) - - s e e a b o v e - - i s that for fixed x and y see to it that the expressions E1 and E2 are so chosen,
it establishes the relation that the first term gcd(X, Y) = gcd(E1, E2) is implied
by P, which is true if gcd(x, y) = gcd(E1, E2). In other
R: (m = x o r m = y) a n d r n > _ x a n d m > _ y .
words we are invited to massage the value pair (x,y) in
N o w the Axiom of Assignment tells us that such a fashion that their god is not changed. Because--
"rn := x " is the standard way of establishing the truth and this is the place at which to mobilize our mathemati-
of m = x for fixed x, which is a way of establishing the cal knowledge about the gcd-function--gcd(x, y ) =
truth of the first term of R. Will "m := x " do the j o b ? gcd(x -- y, y), a possible guarded list would be
In order to investigate this, we derive and simplify: x := x -- y. Deriving w p ( " x := x -- y", P) =
(gcd(X, Y) = gcd(x - y, y) and x -- y > 0 and y > 0)
wp("m := x " , R ) = (x = x o r x = y)
and omitting all terms of the conjunction implied by P,
andx>_ xandx_> y
we find the guard x > y as far as the invariance of P is
= x>_y.
concerned. Besides that we must require guaranteed
Taking this weakest pre-condition as its guard, Theo- decrease of the variant function t. Let us investigate the
rem 1 tells us that consequences of the choice t = x + y. F r o m
i f x _> y - - ~ m := x f i w p ( " x := x - y", t < to)
will produce the correct result if it terminates success- = w p ( " x := x - y", x + y _< to) = (x _< to),
fully. The disadvantage of this p r o g r a m is that BB ~ T; we conclude that tmin = x; therefore wdec("x :=
i.e. it might lead to abortion; weakening BB means x--y",t) = (x<x+y) = (y > 0).
looking for alternatives which might introduce new The requirement of monotonic decrease of t imposes
guards. The obvious alternative is the assignment no further restriction of the guard because wdec("x :--
" m := y " with the guard wp("m := y " , R) = y _> x; x -- y", t) is fully implied by P, and at our first effort
thus we are led to our p r o g r a m we come to
i f x _> y---~m := x
x:= X;y:= Y;
~ y > x---~m := y
dox > y - - ~ x := x -- y o d .
fi
and by this time BB = T, and :herefore we have solved Alas, this single guard is insufficient: f r o m P and
the problem. (In the meantime we have proved that non BB we are not allowed to conclude x = gcd(X, Y).
the m a x i m u m of two values is always defined, viz. that In a completely analogous manner, the alternative
R considered as equation for m has always a solution.) y := y - x will require as its guard y > x, and our
As an example of the derivation of a repetitive con- next effort is
struet we shall derive a p r o g r a m for the greatest com- x := X ; y := Y;
m o n divisor of two positive numbers; i.e. for fixed,
dox > y - - o x := x - y
positive X and Y we have to establish the final relation
y> x---*y:=y--x
x = gcd(X,Y). od.
The formal machinery only gets in motion, once we
have chosen our invariant relation and our variant N o w the j o b is done, because with this last p r o g r a m
function. The p r o g r a m then gets the structure non BB = (x = y) and ( P and x = y) ~ (x = gcd(X, Y),
because gcd(x,x) = x.
"establish the relation P to be kept invariant";
Note. The choice of t = x + 2 y and the knowledge
do "decrease t as long as possible under variance of P "
of the fact that the god is a symmetric function could
od.
have led to the p r o g r a m
Suppose that we choose for the invariant relation
x : = X ; y := Y;
P: gcd(X,Y) = gcd(x,y) and x > 0 and y > 0, dox>y--*x:= x--y
a relation that has the advantage of being easily es- y > x - - - ~ x , y := y , x
tablished by x := X; y := Y. od.
The most general " s o m e t h i n g " to be done under
The swap x,y := y , x can never destroy P: the guard of
invariance of P is of the form x, y := E l , E2, and we
the last guarded list is fully caused by the requirement
are interested in a guard B such that
that t is effectively decreased. (End of note.)
( P and B) ~ wp("x, y := El, E2", P) In both cases the final game has been to find a large
= (gcd(X, Y) = gcd(E1, E2) enough set of such guarded lists that BB, the disjunc-
and E1 > 0 and E2 > 0). tion of their guards, was sufficiently weak: in the ease

4$6 Communications August 1975


of Volume 18
the ACM Number 8
of the alternative construct the purpose is avoiding as a formal routine and which part of it seems to re-
abortion, in the ease of the repetitive construct the goal quire "invention." While the design of an alternative
is getting B B weak enough such that P and non B B is construct now seems to be a reasonably straightforward
strong enough to imply the desired post-condition R. activity, that of a repetitive construct requires what I
It is illuminating to compare our first version of regard as "the invention" of an invariant relation and a
Euclid's Algorithm with what we would have written variant function. M y presentation of this calculus
down with the traditional clauses: should, however, not be interpreted as my suggestion
that all programs should be developed in this way:
x := X; y := Y; (version A)
it just gives us another handle.
whilex ~ydoifx > y t h e n x := x - - y
The calculus does, however, explain my preference
e l s e y := y - - x f i o d
for the axiomatic definition of programming language
and semantics via predicate transformers above other defini-
tion techniques: the definition via predicate transform-
x := X; y := Y; (version B)
ers seems to lend itself most readily to being forged into
whilex~ydowhilex > ydox:= x--yod;
a tool for the goal-directed activity of program compo-
w h i l e y > x d o y := y - - x o d
sition.
od.
Finally, I would like to add a word or two about the
In the fully symmetric version with the guarded com- potential nondeterminacy. Having worked mainly with
mands the algorithm has been reduced to its bare essen- hardly self-checking hardware, with which nonrepro-
tials, while the traditional clauses force us to choose ducing behavior of user programs is a very strong indi-
between versions A and B (and others), a choice that cation of a machine malfunctioning, I had to overcome a
can only be justified by making assumptions about the considerable mental resistance before I found myself
time taken for tests and about expectation values for willing to consider nondeterministic programs seriously.
traversal frequencies. (But even taking the time taken It is, however, fair to say that I could never have dis-
for tests into account, it is not clear that we have lost: covered the calculus before having taken that hurdle:
the average number of necessary tests per assignment the simplicity and elegance of the above would have
ranges with guarded commands from 1 to 2, equals 2 been destroyed by requiring the derivation of deter-
for version A and ranges from 1 to 2.5 for version B. ministic programs only. Whether nondeterminacy is
If the guards of a guarded command set are evaluated eventually removed mechanically--in order not to
concurrently--nothing in our semantics excludes t h a t - - mislead the maintenance engineer--or (perhaps only
the new version is time-wise superior to all the others.) partly) by the programmer himself because, at second
The virtues of the case-construction have been ex- thought, he does care--e.g, for reasons of efficiency--
tended to repetition as well. which alternative is chosen is something I leave entirely
to the circumstances. In any case we can appreciate the
nondeterministic program as a helpful stepping stone.
5. Concluding Remarks
Acknowledgments. In the first place my acknowledg-
The research, the outcome of which is reported in ments are due to the members of the IVIP. Working
this article, was triggered by the observation that Group W.G.2.3 on "Programming Methodology."
Euclid's Algorithm could also be regarded as syn- Besides them, W.H.J. Feijen, D.E. Knuth, M. Rem,
chronizing the two cyclic processes "do x := x -- y od" and C.S. Scholten have been directly helpful in one way
and "do y := y -- x od" in such a way that the relation or another. I should also thank the various audiences--
x > 0 and y > 0 would be kept invariantly true. It was in Albuquerque (courtesy NSF), in San Diego and
only after this observation that we saw that the formal Luxembourg (courtesy Burroughs Corporation)-- that
techniques we had already developed for the derivation have played their role of critical sounding board
of the synchronizing conditions that ensure the har- beyond what one is entitled to hope.
monious cooperation of (cyclic) sequential processes,
such as can be identified in the total activity of operat- Received July 1974; revised January 1975
ing systems, could be transferred lock, stock, and barrel References
to the development of sequential programs as shown 1. Hoare, C.A.R. An axiomatic basis for computer programming.
in this article. The main difference is that while for Comm. ACM 12, 10 (Oct. 1969), 576-583.
sequential programs the situation "all guards false" 2. Naur, Peter (Ed.). Report on the algorithmic language ALGOL
60. Comm. ACM 3, (May 1960), 299-314.
is a desirable goal--for it means termination of a
repetitive construct--one tries to avoid it in operating
systems--for there it means deadlock.
The second reason to pursue these investigations
was my personal desire to get a better appreciation,
which part of the programming activity can be regarded

457 Communications August 1975


of Volume 18
the ACM Number 8
Per Martin-Löf

ON THE MEANINGS OF THE LOGICAL


CONSTANTS AND THE JUSTIFICATIONS
OF THE LOGICAL LAWS

Preface

The following three lectures were given in the form of a short course
at the meeting Teoria della Dimostrazione e Filosofia della Logica, or-
ganized in Siena, 6–9 April 1983, by the Scuola di Specializzazione in
Logica Matematica of the Università degli Studi di Siena. I am very
grateful to Giovanni Sambin and Aldo Ursini of that school, not only
for recording the lectures on tape, but, above all, for transcribing the
tapes produced by the recorder: no machine could have done that work.
This written version of the lectures is based on their transcription. The
changes that I have been forced to make have mostly been of a stylistic
nature, except at one point. In the second lecture, as I actually gave
it, the order of conceptual priority between the notions of proof and
immediate inference was wrong. Since I discovered my mistake later
the same month as the meeting was held, I thought it better to let
the written text diverge from the oral presentation rather than possi-
bly confusing others by letting the mistake remain. The oral origin of
these lectures is the source of the many redundancies of the written
text. It is also my sole excuse for the lack of detailed references.

First lecture

When I was asked to give these lectures about a year ago, I sug-
gested the title On the Meanings of the Logical Constants and the
Justifications of the Logical Laws. So that is what I shall talk about,
eventually, but, first of all, I shall have to say something about, on
the one hand, the things that the logical operations operate on, which
we normally call propositions and propositional functions, and, on the
Nordic Journal of Philosophical Logic, Vol. 1, No. 1, pp. 11–60.

c 1996 Scandinavian University Press.
other hand, the things that the logical laws, by which I mean the rules
of inference, operate on, which we normally call assertions. We must
remember that, even if a logical inference, for instance, a conjunction
introduction, is written
A B
A&B
which is the way in which we would normally write it, it does not take
us from the propositions A and B to the proposition A & B. Rather, it
takes us from the affirmation of A and the affirmation of B to the affir-
mation of A & B, which we may make explicit, using Frege’s notation,
by writing it
`A `B
` A&B
instead. It is always made explicit in this way by Frege in his writings,
and in Principia, for instance. Thus we have two kinds of entities here:
we have the entities that the logical operations operate on, which we
call propositions, and we have those that we prove and that appear
as premises and conclusion of a logical inference, which we call asser-
tions. It turns out that, in order to clarify the meanings of the logical
constants and justify the logical laws, a considerable portion of the
philosophical work lies already in clarifying the notion of proposition
and the notion of assertion. Accordingly, a large part of my lectures
will be taken up by a philosophical analysis of these two notions.
Let us first look at the term proposition. It has its origin in the Gr.
prìtasij, used by Aristotle in the Prior Analytics, the third part of the
Organon. It was translated, apparently by Cicero, into Lat. propositio,
which has its modern counterparts in It. proposizione, Eng. proposi-
tion and Ger. Satz. In the old, traditional use of the word proposition,
propositions are the things that we prove. We talk about proposition
and proof, of course, in mathematics: we put up a proposition and let
it be followed by its proof. In particular, the premises and conclusion
of an inference were propositions in this old terminology. It was the
standard use of the word up to the last century. And it is this use
which is retained in mathematics, where a theorem is sometimes called
a proposition, sometimes a theorem. Thus we have two words for the
things that we prove, proposition and theorem. The word proposition,
Gr. prìtasij, comes from Aristotle and has dominated the logical tra-
dition, whereas the word theorem, Gr. qe¸rhma, is in Euclid, I believe,
and has dominated the mathematical tradition.
With Kant, something important happened, namely, that the
term judgement, Ger. Urteil, came to be used instead of proposition.
Perhaps one reason is that proposition, or a word with that stem, at
least, simply does not exist in German: the corresponding German
word would be Lehrsatz, or simply Satz. Be that as it may, what hap-
pened with Kant and the ensuing German philosophical tradition was
that the word judgement came to replace the word proposition. Thus,
in that tradition, a proof, Ger. Beweis, is always a proof of a judge-
ment. In particular, the premises and conclusion of a logical inference
are always called judgements. And it was the judgements, or the cat-
egorical judgements, rather, which were divided into affirmations and
denials, whereas earlier it was the propositions which were so divided.
The term judgement also has a long history. It is the Gr. krÐsij,
translated into Lat. judicium, It. giudizio, Eng. judgement, and Ger.
Urteil. Now, since it has as long a history as the word proposition,
these two were also previously used in parallel. The traditional way of
relating the notions of judgement and proposition was by saying that a
proposition is the verbal expression of a judgement. This is, as far as I
know, how the notions of proposition and judgement were related dur-
ing the scholastic period, and it is something which is repeated in the
Port Royal Logic, for instance. You still find it repeated by Brentano
in this century. Now, this means that, when, in German philosophy
beginning with Kant, what was previously called a proposition came
to be called a judgement, the term judgement acquired a double mean-
ing. It came to be used, on the one hand, for the act of judging, just
as before, and, on the other hand, it came to be used instead of the old
proposition. Of course, when you say that a proposition is the verbal
expression of a judgement, you mean by judgement the act of judging,
the mental act of judging in scholastic terms, and the proposition is the
verbal expression by means of which you make the mental judgement
public, so to say. That is, I think, how one thought about it. Thus,
with Kant, the term judgement became ambiguous between the act of
judging and that which is judged, or the judgement made, if you prefer.
German has here the excellent expression gefälltes Urteil, which has no
good counterpart in English.

judgement
z }| {
the act of judging that which is judged
old tradition judgement proposition
Kant Urteil(sakt) (gefälltes) Urteil

This ambiguity is not harmful, and sometimes it is even convenient,


because, after all, it is a kind of ambiguity that the word judgement
shares with other nouns of action. If you take the word proposition,
for instance, it is just as ambiguous between the act of propounding
and that which is propounded. Or, if you take the word affirmation, it
is ambiguous between the act of affirming and that which is affirmed,
and so on.
It should be clear, from what I said in the beginning, that there is
a difference between what we now call a proposition and a proposition
in the old sense. In order to trace the emergence of the modern notion
of proposition, I first have to consider the division of propositions in
the old sense into affirmations and denials. Thus the propositions, or
the categorical propositions, rather, were divided into affirmations and
denials.
(categorical) proposition
z }| {
affirmation denial
And not only were the categorical propositions so divided: the very
definition of a categorical proposition was that a categorical proposition
is an affirmation or a denial. Correlatively, to judge was traditionally,
by which I mean before Kant, defined as to combine or separate ideas
in the mind, that is, to affirm or deny. Those were the traditional
definitions of the notions of proposition and judgement.
The notions of affirmation and denial have fortunately remained
stable, like the notion of proof, and are therefore easy to use without
ambiguity. Both derive from Aristotle. Affirmation is Gr. katˆfasij,
Lat. affirmatio, It. affermazione, and Ger. Bejahung, whereas denial
is Gr. ‚pìfasij, Lat. negatio, It. negazione, and Ger. Verneinung. In
Aristotelian logic, an affirmation was defined as a proposition in which
something, called the predicate, is affirmed of something else, called
the subject, and a denial was defined as a proposition in which the
predicate is denied of the subject. Now, this is something that we have
certainly abandoned in modern logic. Neither do we take categorical
judgements to have subject-predicate form, nor do we treat affirmation
and denial symmetrically. It seems to have been Bolzano who took the
crucial step of replacing the Aristotelian forms of judgement by the
single form

A is, A is true, or A holds.

In this, he was followed by Brentano, who also introduced the opposite


form

A is not, or A is false,
and Frege. And, through Frege’s influence, the whole of modern logic
has come to be based on the single form of judgement, or assertion, A
is true.
Once this step was taken, the question arose, What sort of thing
is it that is affirmed in an affirmation and denied in a denial? that is,
What sort of thing is the A here? The isolation of this concept belongs
to the, if I may so call it, objectivistically oriented branch of German
philosophy in the last century. By that, I mean the tradition which you
may delimit by mentioning the names of, say, Bolzano, Lotze, Frege,
Brentano, and the Brentano disciples Stumpf, Meinong, and Husserl,
although, with Husserl, I think one should say that the split between
the objectivistic and the Kantian branches of German philosophy is
finally overcome. The isolation of this concept was a step which was
entirely necessary for the development of modern logic. Modern logic
simply would not work unless we had this concept, because it is on the
things that fall under it that the logical operations operate.
This new concept, which simply did not exist before the last cen-
tury, was variously called. And, since it was something that one had not
met before, one had difficulties with what one should call it. Among
the terms that were used, I think the least committing one is Ger.
Urteilsinhalt, content of a judgement, by which I mean that which is
affirmed in an affirmation and denied in a denial. Bolzano, who was
the first to introduce this concept, called it proposition in itself, Ger.
Satz an sich. Frege also grappled with this terminological problem.
In Begriffsschrift, he called it judgeable content, Ger. beurteilbarer In-
halt. Later on, corresponding to his threefold division into expres-
sion, sense, and reference, in the case of this kind of entity, what was
the expression, he called sentence, Ger. Satz, what was the sense, he
called thought, Ger. Gedanke, and what was the reference, he called
truth value, Ger. Wahrheitswert. So the question arises, What should
I choose here? Should I choose sentence, thought, or truth value? The
closest possible correspondence is achieved, I think, if I choose Gedanke,
that is, thought, for late Frege. This is confirmed by the fact that, in
his very late logical investigations, he called the logical operations the
Gedankengefüge. Thus judgeable content is early Frege and thought
is late Frege. We also have the term state of affairs, Ger. Sachverhalt,
which was introduced by Stumpf and used by Wittgenstein in the Trac-
tatus. And, finally, we have the term objective, Ger. Objektiv, which
was the term used by Meinong. Maybe there were other terms as well
in circulation, but these are the ones that come immediately to my
mind.
Now, Russell used the term proposition for this new notion, which
has become the standard term in Anglo-Saxon philosophy and in mod-
ern logic. And, since he decided to use the word proposition in this
new sense, he had to use another word for the things that we prove and
that figure as premises and conclusion of a logical inference. His choice
was to translate Frege’s Urteil, not by judgement, as one would expect,
but by assertion. And why, one may ask, did he choose the word as-
sertion rather than translate Urteil literally by judgement? I think it
was to avoid any association with Kantian philosophy, because Urteil
was after all the central notion of logic as it was done in the Kantian
tradition. For instance, in his transcendental logic, which forms part
of the Kritik der reinen Vernunft, Kant arrives at his categories by
analysing the various forms that a judgement may have. That was his
clue to the discovery of all pure concepts of reason, as he called it.
Thus, in Russell’s hands, Frege’s Urteil came to be called assertion,
and the combination of Frege’s Urteilsstrich, judgement stroke, and
Inhaltsstrich, content stroke, came to be called the assertion sign.
Observe now where we have arrived through this development,
namely, at a notion of proposition which is entirely different, or dif-
ferent, at least, from the old one, that is, from the Gr. prìtasij and
the Lat. propositio. To repeat, the things that we prove, in particu-
lar, the premises and conclusion of a logical inference, are no longer
propositions in Russell’s terminology, but assertions. Conversely, the
things that we combine by means of the logical operations, the con-
nectives and the quantifiers, are not propositions in the old sense, that
is, what Russell calls assertions, but what he calls propositions. And,
as I said in the very beginning, the rule of conjunction introduction,
for instance, really allows us to affirm A & B, having affirmed A and
having affirmed B,
`A `B
` A&B
It is another matter, of course, that we may adopt conventions that
allow us to suppress the assertion sign, if it becomes too tedious to
write it out. Conceptually, it would nevertheless be there, whether I
write it as above or
A true B true
A & B true
as I think that I shall do in the following.
So far, I have made no attempt at defining the notions of judge-
ment, or assertion, and proposition. I have merely wanted to give a
preliminary hint at the difference between the two by showing how the
terminology has evolved.
To motivate my next step, consider any of the usual inference rules
of the propositional or predicate calculus. Let me take the rule of
disjunction introduction this time, for some change,
A
A∨B

or, writing out the affirmation,


A true
A ∨ B true

Now, what do the variables A and B range over in a rule like this?
That is, what are you allowed to insert into the places indicated by
these variables? The standard answer to this question, by someone
who has received the now current logical education, would be to say
that A and B range over arbitrary formulas of the language that you are
considering. Thus, if the language is first order arithmetic, say, then
A and B should be arithmetical formulas. When you start thinking
about this answer, you will see that there is something strange about
it, namely, its language dependence. Because it is clearly irrelevant for
the validity of the rule whether A and B are arithmetical formulas, cor-
responding to the language of first order arithmetic, or whether they
contain, say, predicates defined by transfinite, or generalized, induc-
tion. The unary predicate expressing that a natural number encodes
an ordinal of the constructive second number class, for instance, is cer-
tainly not expressible in first order arithmetic, and there is no reason
at all why A and B should not be allowed to contain that predicate.
Or, surely, for the validity of the rule, A and B might just as well
be set theoretical formulas, supposing that we have given such a clear
sense to them that we clearly recognize that they express propositions.
Thus what is important for the validity of the rule is merely that A and
B are propositions, that is, that the expressions which we insert into
the places indicated by the variables A and B express propositions. It
seems, then, that the deficiency of the first answer, by which I mean
the answer that A and B should range over formulas, is eliminated
by saying that the variables A and B should range over propositions
instead of formulas. And this is entirely natural, because, after all, the
notion of formula, as given by the usual inductive definition, is nothing
but the formalistic substitute for the notion of proposition: when you
divest a proposition in some language of all sense, what remains is the
mere formula. But then, supposing we agree that the natural way out
of the first difficulty is to say that A and B should range over arbitrary
propositions, another difficulty arises, because, whereas the notion of
formula is a syntactic notion, a formula being defined as an expression
that can be formed by means of certain formation rules, the notion
of proposition is a semantic notion, which means that the rule is no
longer completely formal in the strict sense of formal logic. That a rule
of inference is completely formal means precisely that there must be
no semantic conditions involved in the rule: it may only put conditions
on the forms of the premises and conclusion. The only way out of this
second difficulty seems to be to say that, really, the rule has not one
but three premises, so that, if we were to write them all out, it would
read
A prop B prop A true
A ∨ B true
that is, from A and B being propositions and from the truth of A, we
are allowed to conclude the truth of A ∨ B. Here I am using

A prop

as an abbreviated way of saying that


A is a proposition.
Now the complete formality of the rule has been restored. Indeed, for
the variables A and B, as they occur in this rule, we may substitute
anything we want, and, by anything, I mean any expressions. Or, to
be more precise, if we categorize the expressions, as Frege did, into
complete, or saturated, expressions and incomplete, unsaturated, or
functional, expressions, then we should say that we may substitute for
A and B any complete expressions we want, because propositions are
always expressed by complete expressions, not by functional expres-
sions. Thus A and B now range over arbitrary complete expressions.
Of course, there would be needed here an analysis of what is under-
stood by an expression, but that is something which I shall not go into
in these lectures, in the belief that it is a comparatively trivial matter,
as compared with explaining the notions of proposition and judgement.
An expression in the most general sense of the word is nothing but a
form, that is, something that we can passively recognize as the same in
its manifold occurrences and actively reproduce in many copies. But
I think that I shall have to rely here upon an agreement that we have
such a general notion of expression, which is formal in character, so
that the rule can now count as a formal rule.
Now, if we stick to our previous decision to call what we prove, in
particular, the premises and conclusion of a logical inference, by the
word judgement, or assertion, the outcome of the preceding considera-
tions is that we are faced with a new form of judgement. After all, A
prop and B prop have now become premises of the rule of disjunction
introduction. Hence, if premises are always judgements,
A is a proposition
must count as a form of judgement. This immediately implies that the
traditional definition of the act of judging as an affirming or denying
and of the judgement, that is, the proposition in the terminology then
used, as an affirmation or denial has to be rejected, because A prop is
certainly neither an affirmation nor a denial. Or, rather, we are faced
with the choice of either keeping the old definition of judgement as
an affirmation or a denial, in which case we would have to invent a
new term for the things that we prove and that figure as premises and
conclusion of a logical inference, or else abandoning the old definition
of judgement, widening it so as to make room for A is a proposition
as a new form of judgement. I have chosen the latter alternative, well
aware that, in so doing, I am using the word judgement in a new way.
Having rejected the traditional definition of a judgement as an af-
firmation or a denial, by what should we replace it? How should we
now delimit the notion of judgement, so that A is a proposition, A
is true, and A is false all become judgements? And there are other
forms of judgement as well, which we shall meet in due course. Now,
the question, What is a judgement? is no small question, because the
notion of judgement is just about the first of all the notions of logic,
the one that has to be explained before all the others, before even the
notions of proposition and truth, for instance. There is therefore an
intimate relation between the answer to the question what a judgement
is and the very question what logic itself is. I shall start by giving a
very simple answer, which is essentially right: after some elaboration,
at least, I hope that we shall have a sufficiently clear understanding of
it. And the definition would simply be that, when understood as an act
of judging, a judgement is nothing but an act of knowing, and, when
understood as that which is judged, it is a piece or, more solemnly, an
object of knowledge.
judgement
z }| {
the act of judging that which is judged
the act of knowing the object of knowledge
Thus, first of all, we have the ambiguity of the term judgement between
the act of judging and that which is judged. What I say is that an act
of judging is essentially nothing but an act of knowing, so that to judge
is the same as to know, and that what is judged is a piece, or an object,
of knowledge. Unfortunately, the English language has no counterpart
of Ger. eine Erkenntnis, a knowledge.
This new definition of the notion of judgement, so central to logic,
should be attributed in the first place to Kant, I think, although it may
be difficult to find him ever explicitly saying that the act of judging is
the same as the act of knowing, and that what is judged is the object of
knowledge. Nevertheless, it is behind all of Kant’s analysis of the notion
of judgement that to judge amounts to the same as to know. It was
he who broke with the traditional, Aristotelian definition of judgement
as an affirmation or a denial. Explicitly, the notions of judgement
and knowledge were related by Bolzano, who simply defined knowledge
as evident judgement. Thus, for him, the order of priority was the
reverse: knowledge was defined in terms of judgement rather than the
other way round. The important thing to realize is of course that to
judge and to know, and, correlatively, judgement and knowledge, are
essentially the same. And, when the relation between judgement, or
assertion, if you prefer, and knowledge is understood in this way, logic
itself is naturally understood as the theory of knowledge, that is, of
demonstrative knowledge, Aristotle’s âpist mh ‚podeiktik . Thus logic
studies, from an objective point of view, our pieces of knowledge as
they are organized in demonstrative science, or, if you think about it
from the act point of view, it studies our acts of judging, or knowing,
and how they are interrelated.
As I said a moment ago, this is only a first approximation, because
it would actually have been better if I had not said that an act of
judging is an act of knowing, but if I had said that it is an act of, and
here there are many words that you may use, either understanding,
or comprehending, or grasping, or seeing, in the metaphorical sense
of the word see in which it is synonymous with understand. I would
prefer this formulation, because the relation between the verb to know
and the verb to understand, comprehend, grasp, or see, is given by the
equation
to know = to have understood, comprehended, grasped, seen,
which has the converse
to understand, comprehend, grasp, see = to get to know.
The reason why the first answer needs elaboration is that you may
use know in English both in the sense of having understood and in
the sense of getting to understand. Now, the first of the preceding
two equations brings to expression something which is deeply rooted
in the Indo-European languages. For instance, Gr. oÚda, I know, is the
perfect form of the verb whose present form is Gr. eÒdw, I see. Thus
to know is to have seen merely by the way the verb has been formed
in Greek. It is entirely similar in Latin. You have Lat. nosco, I get to
know, which has present form, and Lat. novi, I know, which has perfect
form. So, in these and other languages, the verb to know has present
sense but perfect form. And the reason for the perfect form is that to
know is to have seen. Observe also the two metaphors for the act of
understanding which you seem to have in one form or the other in all
European languages: the metaphor of seeing, first of all, which was so
much used by the Greeks, and which we still use, for instance, when
saying that we see that there are infinitely many prime numbers, and,
secondly, the metaphor of grasping, which you also find in the verb to
comprehend, derived as it is from Lat. prehendere, to seize. The same
metaphor is found in Ger. fassen and begreifen, and I am sure that you
also have it in Italian. (Chorus. Afferare!) Of course, these are two
metaphors that we use for this particular act of the mind: the mental
act of understanding is certainly as different from the perceptual act
of seeing something as from the physical act of grasping something.
Is a judgement a judgement already before it is grasped, that is,
becomes known, or does it become a judgement only through our act
of judging? And, in the latter case, what should we call a judgement
before it has been judged, that is, has become known? For example, if
you let G be the proposition that every even number is the sum of two
prime numbers, and then look at
G is true,
is it a judgement, or is it not a judgement? Clearly, in one sense, it
is, and, in another sense, it is not. It is not a judgement in the sense
that it is not known, that is, that it has not been proved, or grasped.
But, in another sense, it is a judgement, namely, in the sense that G
is true makes perfectly good sense, because G is a proposition which
we all understand, and, presumably, we understand what it means
for a proposition to be true. The distinction I am hinting at is the
distinction which was traditionally made between an enunciation and a
proposition. Enunciation is not a word of much currency in English, but
I think that its Italian counterpart has fared better. The origin is the
Gr. ‚pìfansij as it appears in De Interpretatione, the second part of
the Organon. It has been translated into Lat. enuntiatio, It. enunciato,
and Ger. Aussage. An enunciation is what a proposition, in the old
sense of the word, is before it has been proved, or become known. Thus
it is a proposition stripped of its epistemic force. For example, in this
traditional terminology, which would be fine if it were still living, G is
true is a perfectly good enunciation, but it is not a proposition, not yet
at least. But now that we have lost the term proposition in its old sense,
having decided to use it in the sense in which it started to be used by
Russell and is now used in Anglo-Saxon philosophy and modern logic,
I think we must admit that we have also lost the traditional distinction
between an enunciation and a proposition. Of course, we still have
the option of keeping the term enunciation, but it is no longer natural.
Instead, since I have decided to replace the term proposition in its old
sense, as that which we prove and which may appear as premise or
conclusion of a logical inference, by the term judgement, as it has been
used in German philosophy from Kant and onwards, it seems better,
when there is a need of making the distinction between an enunciation
and a proposition, that is, between a judgement before and after it has
been proved, or become known, to speak of a judgement and an evident
judgement, respectively. This is a wellestablished usage in the writings
of Bolzano, Brentano, and Husserl, that is, within the objectivistically
oriented branch of German philosophy that I mentioned earlier. If we
adopt this terminology, then we are faced with a fourfold table, which
I shall end by writing up.

judgement proposition
evident judgement true proposition

Thus, correlated with the distinction between judgement and propo-


sition, there is the distinction between evidence of a judgement and
truth of a proposition.
So far, I have said very little about the notions of proposition and
truth. The essence of what I have said is merely that to judge is the
same as to know, so that an evident judgement is the same as a piece,
or an object, of knowledge, in agreement with Bolzano’s definition of
knowledge as evident judgement. Tomorrow’s lecture will have to be
taken up by an attempt to clarify the notion of evidence and the notions
of proposition and truth.

Second lecture

Under what condition is it right, or correct, to make a judgement,


one of the form

A is true,
which is certainly the most basic form of judgement, for instance?
When one is faced with this question for the first time, it is tempt-
ing to answer simply that it is right to say that A is true provided that
A is true, and that it is wrong to say that A is true provided that A is
not true, that is, provided that A is false. In fact, this is what Aristotle
says in his definition of truth in the Metaphysics. For instance, he says
that it is not because you rightly say that you are white that you are
white, but because you are white that what you say is correct. But a
moment’s reflection shows that this first answer is simply wrong. Even
if every even number is the sum of two prime numbers, it is wrong of
me to say that unless I know it, that is, unless I have proved it. And it
would have been wrong of me to say that every map can be coloured
by four colours before the recent proof was given, that is, before I ac-
quired that knowledge, either by understanding the proof myself, or
by trusting its discoverers. So the condition for it to be right of me to
affirm a proposition A, that is, to say that A is true, is not that A is
true, but that I know that A is true. This is a point which has been
made by Dummett and, before him, by Brentano, who introduced the
apt term blind judgement for a judgement which is made by someone
who does not know what he is saying, although what he says is correct
in the weaker sense that someone else knows it, or, perhaps, that he
himself gets to know it at some later time. When you are forced into
answering a yes or no question, although you do not know the answer,
and happen to give the right answer, right as seen by someone else, or
by you yourself when you go home and look it up, then you make a
blind judgement. Thus you err, although the teacher does not discover
your error. Not to speak of the fact that the teacher erred more greatly
by not giving you the option of giving the only the answer which would
have been honest, namely, that you did not know.
The preceding consideration does not depend on the particular form
of judgement, in this case, A is true, that I happened to use as an
example. Quite generally, the condition for it to be right of you to
make a judgement is that you know it, or, what amounts to the same,
that it is evident to you. The notion of evidence is related to the notion
of knowledge by the equation
evident = known.
When you say that a judgement is evident, you merely express that
you have understood, comprehended, grasped, or seen it, that is, that
you know it, because to have understood is to know. This is reflected
in the etymology of the word evident, which comes from Lat. ex, out
of, from, and videre, to see, in the metaphorical sense, of course.
There is absolutely no question of a judgement being evident in
itself, independently of us and our cognitive activity. That would be
just as absurd as to speak of a judgement as being known, not by
somebody, you or me, but in itself. To be evident is to be evident to
somebody, as inevitably as to be known is to be known by somebody.
That is what Brouwer meant by saying, in Consciousness, Philosophy,
and Mathematics, that there are no nonexperienced truths, a basic
intuitionistic tenet. This has been puzzling, because it has been under-
stood as referring to the truth of a proposition, and clearly there are
true propositions whose truth has not been experienced, that is, propo-
sitions which can be shown to be true in the future, although they have
not been proved to be true now. But what Brouwer means here is not
that. He does not speak about propositions and truth: he speaks about
judgements and evidence, although he uses the term truth instead of
the term evidence. And what he says is then perfectly right: there is
no evident judgement whose evidence has not been experienced, and
experience it is what you do when you understand, comprehend, grasp,
or see it. There is no evidence outside our actual or possible experi-
ence of it. The notion of evidence is by its very nature subject related,
relative to the knowing subject, that is, in Kantian terminology.
As I already said, when you make, or utter, a judgement under
normal circumstances, you thereby express that you know it. There is
no need to make this explicit by saying,

I know that . . .

For example, when you make a judgement of the form

A is true

under normal circumstances, by so doing, you already express that you


know that A is true, without having to make this explicit by saying,

I know that A is true,

or the like. A judgement made under normal circumstances claims by


itself to be evident: it carries its claim of evidence automatically with
it. This is a point which was made by Wittgenstein in the Tractatus
by saying that Frege’s Urteilsstrich, judgement stroke, is logically quite
meaningless, since it merely indicates that the proposition to which it
is prefixed is held true by the author, although it would perhaps have
been better to say, not that it is meaningless, but that it is superfluous,
since, when you make a judgement, it is clear already from its form that
you claim to know it. In speech act philosophy, this is expressed by
saying that knowing is an illocutionary force: it is not an explicit part
of what you say that you know it, but it is implicit in your saying of
it. This is the case, not only with judgements, that is, acts of knowing,
but also with other kinds of acts. For instance, if you say,

Would she come tonight!

it is clear from the form of your utterance that you express a wish.
There is no need of making this explicit by saying,

I wish that she would come tonight.

Some languages, like Greek, use the optative mood to make it clear
that an utterance expresses a wish or desire.
Consider the pattern that we have arrived at now,
act object
z }| { z }| {
I know A is true

Here the grammatical subject I refers to the subject, self, or ego, and
the grammatical predicate know to the act, which in this particular
case is an act of knowing, but might as well have been an act of con-
jecturing, doubting, wishing, fearing, etc. Thus the predicate know
indicates the modality of the act, that is, the way in which the subject
relates to the object, or the particular force which is involved, in this
case, the epistemic force. Observe that the function of the grammatical
moods, indicative, subjunctive, imperative, and optative, is to express
modalities in this sense. Finally, A is true is the judgement or, in gen-
eral, the object of the act, which in this case is an object of knowledge,
but might have been an object of conjecture, doubt, wish, fear, etc.
The closest possible correspondence between the analysis that I am
giving and Frege’s notation for a judgement

`A

is obtained by thinking of the vertical, judgement stroke as carrying


the epistemic force

I know . . .

and the horizontal, content stroke as expressing the affirmation

. . . is true.
Then it is the vertical stroke which is superfluous, whereas the hori-
zontal stroke is needed to show that the judgement has the form of an
affirmation. But this can hardly be read out of Frege’s own account of
the assertion sign: you have to read it into his text.
What is a judgement before it has become evident, or known? That
is, of the two, judgement and evident judgement, how is the first to be
defined? The characteristic of a judgement in this sense is merely that
it has been laid down what knowledge is expressed by it, that is, what
you must know in order to have the right to make, or utter, it. And
this is something which depends solely on the form of the judgement.
For example, if we consider the two forms of judgement

A is a proposition

and

A is true,

then there is something that you must know in order to have the right
to make a judgement of the first form, and there is something else
which you must know, in addition, in order to have the right to make
a judgement of the second form. And what you must know depends
in neither case on A, but only on the form of the judgement, . . . is
a proposition or . . . is true, respectively. Quite generally, I may say
that a judgement in this sense, that is, a not yet known, and perhaps
even unknowable, judgement, is nothing but an instance of a form
of judgement, because it is for the various forms of judgement that
I lay down what you must know in order to have the right to make
a judgement of one of those forms. Thus, as soon as something has
the form of a judgement, it is already a judgement in this sense. For
example, A is a proposition is a judgement in this sense, because it
has a form for which I have laid down, or rather shall lay down, what
you must know in order to have the right to make a judgement of that
form. I think that I may make things a bit clearer by showing again
in a picture what is involved here. Let me take the first form to begin
with.
evident judgement
z }| {
judgement
z }| {

I know A  is a proposition
 
 A
expression form of judgement
Here is involved, first, an expression A, which should be a complete
expression. Second, we have the form . . . is a proposition, which is
the form of judgement. Composing these two, we arrive at A is a
proposition, which is a judgement in the first sense. And then, third,
we have the act in which I grasp this judgement, and through which it
becomes evident. Thus it is my act of grasping which is the source of
the evidence. These two together, that is, the judgement and my act
of grasping it, become the evident judgement. And a similar analysis
can be given of a judgement of the second form.
evident judgement
z }| {
judgement
z }| {

I know A  is true
 
 A
proposition form of judgement
Such a judgement has the form . . . is true, but what fills the open place,
or hole, in the form is not an expression any longer, but a proposition.
And what is a proposition? A proposition is an expression for which
the previous judgement has already been grasped, because there is no
question of something being true unless you have previously grasped it
as a proposition. But otherwise the picture remains the same here.
Now I must consider the discussion of the notion of judgement fin-
ished and pass on to the notion of proof. Proof is a good word, because,
unlike the word proposition, it has not changed its meaning. Proof ap-
parently means the same now as it did when the Greeks discovered
the notion of proof, and therefore no terminological difficulties arise.
Observe that both Lat. demonstratio and the corresponding words in
the modern languages, like It. dimostrazione, Eng. demonstration, and
Ger. Beweis, are literal translations of Gr. ‚pìdeicij, deriving as it does
from Gr. deÐknumi, I show, which has the same meaning as Lat. mon-
strare and Ger. weisen.
If you want to have a first approximation to the notion of proof, a
first definition of what a proof is, the strange thing is that you can-
not look it up in any modern textbook of logic, because what you get
out of the standard textbooks of modern logic is the definition of what
a formal proof is, at best with a careful discussion clarifying that a
formal proof in the sense of this definition is not what we ordinarily
call a proof in mathematics. That is, you get a formal proof defined
as a finite sequence of formulas, each one of them being an immediate
consequence of some of the preceding ones, where the notion of imme-
diate consequence, in turn, is defined by saying that a formula is an
immediate consequence of some other formulas if there is an instance
of one of the figures, called rules of inference, which has the other for-
mulas as premises and the formula itself as conclusion. Now, this is not
what a real proof is. That is why you have the warning epithet formal
in front of it, and do not simply say proof.
What is a proof in the original sense of the word? The ordinary
dictionary definition says, with slight variations, that a proof is that
which establishes the truth of a statement. Thus a proof is that which
makes a mathematical statement, or enunciation, into a theorem, or
proposition, in the old sense of the word which is retained in mathe-
matics. Now, remember that I have reserved the term true for true
propositions, in the modern sense of the word, and that the things
that we prove are, in my terminology, judgements. Moreover, to avoid
terminological confusion, judgements qualify as evident rather than
true. Hence, translated into the terminology that I have decided upon,
the dictionary definition becomes simply,
A proof is what makes a judgement evident.
Accepting this, that is, that the proof of a judgement is that which
makes it evident, we might just as well say that the proof of a judge-
ment is the evidence for it. Thus proof is the same as evidence. Com-
bining this with the outcome of the previous discussion of the notion of
evidence, which was that it is the act of understanding, comprehend-
ing, grasping, or seeing a judgement which confers evidence on it, the
inevitable conclusion is that the proof of a judgement is the very act
of grasping it. Thus a proof is, not an object, but an act. This is what
Brouwer wanted to stress by saying that a proof is a mental construc-
tion, because what is mental, or psychic, is precisely our acts, and the
word construction, as used by Brouwer, is but a synonym for proof.
Thus he might just as well have said that the proof of a judgement is
the act of proving, or grasping, it. And the act is primarily the act as it
is being performed. Only secondarily, and irrevocably, does it become
the act that has been performed.
As is often the case, it might have been better to start with the verb
rather than the noun, in this case, with the verb to prove rather than
with the noun proof. If a proof is what makes a judgement evident,
then, clearly, to prove a judgement is to make it evident, or known.
To prove something to yourself is simply to get to know it. And to
prove something to someone else is to try to get him, or her, to know
it. Hence
to prove = to get to know = to understand,
comprehend, grasp, or see.
This means that prove is but another synonym for understand, com-
prehend, grasp, or see. And, passing to the perfect tense,
to have proved = to know = to have understood,
comprehended, grasped, or seen.
We also speak of acquiring and possessing knowledge. To possess
knowledge is the same as to have acquired it, just as to know something
is the same as to have understood, comprehended, grasped, or seen it.
Thus the relation between the plain verb to know and the venerable
expressions to acquire and to possess knowledge is given by the two
equations,
to get to know = to acquire knowledge
and
to know = to possess knowledge.
On the other hand, the verb to prove and the noun proof are related
by the two similar equations,
to prove = to acquire, or construct, a proof
and
to have proved = to possess a proof.
It is now manifest, from these equations, that proof and knowledge are
the same. Thus, if proof theory is construed, not in Hilbert’s sense,
as metamathematics, but simply as the study of proofs in the original
sense of the word, then proof theory is the same as theory of knowledge,
which, in turn, is the same as logic in the original sense of the word, as
the study of reasoning, or proof, not as metamathematics.
Remember that the proof of a judgement is the very act of knowing
it. If this act is atomic, or indivisible, then the proof is said to be im-
mediate. Otherwise, that is, if the proof consists of a whole sequence,
or chain, of atomic actions, it is mediate. And, since proof and knowl-
edge are the same, the attributes immediate and mediate apply equally
well to knowledge. In logic, we are no doubt more used to saying of
inferences, rather than proofs, that they are immediate or mediate, as
the case may be. But that makes no difference, because inference and
proof are the same. It does not matter, for instance, whether we say
rules of inference or proof rules, as has become the custom in program-
ming. And, to take another example, it does not matter whether we
say that a mediate proof is a chain of immediate inferences or a chain
of immediate proofs. The notion of formal proof that I referred to in
the beginning of my discussion of the notion of proof has been arrived
at by formalistically interpreting what you mean by an immediate in-
ference, by forgetting about the difference between a judgement and
a proposition, and, finally, by interpreting the notion of proposition
formalistically, that is, by replacing it by the notion of formula. But a
real proof is and remains what it has always been, namely, that which
makes a judgement evident, or simply, the evidence for it. Thus, if we
do not have the notion of evidence, we do not have the notion of proof.
That is why the notion of proof has fared so badly in those branches
of philosophy where the notion of evidence has fallen into disrepute.
We also speak of a judgement being immediately and mediately
evident, respectively. Which of the two is the case depends of course on
the proof which constitutes the evidence for the judgement. If the proof
is immediate, then the judgement is said to be immediately evident.
And an immediately evident judgement is what we call an axiom. Thus
an axiom is a judgement which is evident by itself, not by virtue of
some previously proved judgements, but by itself, that is, a self-evident
judgement, as one has always said. That is, always before the notion
of evidence became disreputed, in which case the notion of axiom and
the notion of proof simply become deflated: we cannot make sense
of the notion of axiom and the notion of proof without access to the
notion of evidence. If, on the other hand, the proof which constitutes
the evidence for a judgement is a mediate one, so that the judgement
is evident, not by itself, but only by virtue of some previously proved
judgements, then the judgement is said to be mediately evident. And
a mediately evident judgement is what we call a theorem, as opposed
to an axiom. Thus an evident judgement, that is, a proposition in the
old sense of the word which is retained in mathematics, is either an
axiom or a theorem.
Instead of applying the attributes immediate and mediate to proof,
or knowledge, I might have chosen to speak of intuitive and discursive
proof, or knowledge, respectively. That would have implied no differ-
ence of sense. The proof of an axiom can only be intuitive, which is
to say that an axiom has to be grasped immediately, in a single act.
The word discursive, on the other hand, comes from Lat. discurrere,
to run to and fro. Thus a discursive proof is one which runs, from
premises to conclusion, in several steps. It is the opposite of an intu-
itive proof, which brings you to the conclusion immediately, in a single
step. When one says that the immediate propositions in the old sense
of the word proposition, that is, the immediately evident judgements
in my terminology, are unprovable, what is meant is of course only that
they cannot be proved discursively. Their proofs have to rest intuitive.
This seems to be all that I have to say about the notion of proof at the
moment, so let me pass on to the next item on the agenda, the forms
of judgement and their semantical explanations.
The forms of judgement have to be displayed in a table, simply,
and the corresponding semantical explanations have to be given, one
for each of those forms. A form of judgement is essentially just what
is called a category, not in the sense of category theory, but in the
logical, or philosophical, sense of the word. Thus I have to say what
my forms of judgement, or categories, are, and, for each one of those
forms, I have to explain what you must know in order to have the right
to make a judgement of that form. By the way, the forms of judgement
have to be introduced in a specific order. Actually, not only the forms
of judgement, but all the notions that I am undertaking to explain
here have to come in a specific order. Thus, for instance, the notion
of judgement has to come before the notion of proposition, and the
notion of logical consequence has to be dealt with before explaining
the notion of implication. There is an absolute rigidity in this order.
The notion of proof, for instance, has to come precisely where I have
put it here, because it is needed in some other explanations further on,
where it is presupposed already. Revealing this rigid order, thereby
arriving eventually at the concepts which have to be explained prior
to all other concepts, turns out to be surprisingly difficult: you seem
to arrive at the very first concepts last of all. I do not know what
it should best be called, maybe the order of conceptual priority, one
concept being conceptually prior to another concept if it has to be
explained before the other concept can be explained.
Let us now consider the first form of judgement,
A is a proposition,
or, as I shall continue to abbreviate it,
A prop.
What I have just displayed to you is a linguistic form, and I hope that
you can recognize it. What you cannot see from the form, and which
I therefore proceed to explain to you, is of course its meaning, that is,
what knowledge is expressed by, or embodied in, a judgement of this
form. The question that I am going to answer is, in ontological terms,
What is a proposition?
This is the usual Socratic way of formulating questions of this sort. Or
I could ask, in more knowledge theoretical terminology,
What is it to know a proposition?
or, if you prefer,
What knowledge is expressed by a judgement
of the form A is a proposition?
or, this may be varied endlessly,
What does a judgement of the form A is a proposition mean?
These various ways of posing essentially the same question reflect
roughly the historical development, from a more ontological to a more
knowledge theoretical way of posing, and answering, questions of this
sort, finally ending up with something which is more linguistic in na-
ture, having to do with form and meaning.
Now, one particular answer to this question, however it be formu-
lated, is that a proposition is something that is true or false, or, to use
Aristotle’s formulation, that has truth or falsity in it. Here we have
to be careful, however, because what I am going to explain is what a
proposition in the modern sense is, whereas what Aristotle explained
was what an enunciation, being the translation of Gr. ‚pìfansij, is.
And it was this explanation that he phrased by saying that an enun-
ciation is something that has truth or falsity in it. What he meant by
this was that it is an expression which has a form of speech such that,
when you utter it, you say something, whether truly or falsely. That
is certainly not how we now interpret the definition of a proposition as
something which is true or false, but it is nevertheless correct that it
echoes Aristotle’s formulation, especially in its symmetric treatment of
truth and falsity.
An elaboration of the definition of a proposition as something that
is true or false is to say that a proposition is a truth value, the true
or the false, and hence that a declarative sentence is an expression
which denotes a truth value, or is the name of a truth value. This
was the explanation adopted by Frege in his later writings. If a propo-
sition is conceived in this way, that is, simply as a truth value, then
there is no difficulty in justifying the laws of the classical propositional
calculus and the laws of quantification over finite, explicitly listed, do-
mains. The trouble arises when you come to the laws for forming
quantified propositions, the quantifiers not being restricted to finite
domains. That is, the trouble is to make the two laws
A(x) prop A(x) prop
(∀x)A(x) prop (∃x)A(x) prop
evident when propositions are conceived as nothing but truth values.
To my mind, at least, they simply fail to be evident. And I need not
be ashamed of the reference to myself in this connection: as I said
in my discussion of the notion of evidence, it is by its very nature
subject related. Others must make up their minds whether these laws
are really evident to them when they conceive of propositions simply
as truth values. Although we have had this notion of proposition and
these laws for forming quantified propositions for such a long time,
we still have no satisfactory explanations which serve to make them
evident on this conception of the notion of proposition. It does not
help to restrict the quantifiers, that is, to consider instead the laws
(x ∈ A) (x ∈ A)
B(x) prop B(x) prop
(∀x ∈ A)B(x) prop (∃x ∈ A)B(x) prop

unless we restrict the quantifiers so severely as to take the set A here


to be a finite set, that is, to be given by a list of its elements. Then, of
course, there is no trouble with these rules. But, as soon as A is the set
of natural numbers, say, you have the full trouble already. Since, as I
said earlier, the law of the excluded middle, indeed, all the laws of the
classical propositional calculus, are doubtlessly valid on this conception
of the notion of proposition, this means that the rejection of the law
of excluded middle is implicitly also a rejection of the conception of a
proposition as something which is true or false. Hence the rejection of
this notion of proposition is something which belongs to Brouwer. On
the other hand, he did not say explicitly by what it should be replaced.
Not even the wellknown papers by Kolmogorov and Heyting, in which
the formal laws of intuitionistic logic were formulated for the first time,
contain any attempt at explaining the notion of proposition in terms of
which these laws become evident. It appears only in some later papers
by Heyting and Kolmogorov from the early thirties. In the first of these,
written by Heyting in 1930, he suggested that we should think about
a proposition as a problem, Fr. problème, or expectation, Fr. attente.
And, in the wellknown paper of the following year, which appeared
in Erkenntnis, he used the terms expectation, Ger. Erwartung, and
intention, Ger. Intention. Thus he suggested that one should think of
a proposition as a problem, or as an expectation, or as an intention.
And, another year later, there appeared a second paper by Kolmogorov,
in which he observed that the laws of the intuitionistic propositional
calculus become evident upon thinking of the propositional variables
as ranging over problems, or tasks. The term he actually used was
Ger. Aufgabe. On the other hand, he explicitly said that he did not
want to equate the notion of proposition with the notion of problem
and, correlatively, the notion of truth of a proposition with the notion
of solvability of a problem. He merely proposed the interpretation of
propositions as problems, or tasks, as an alternative interpretation,
validating the laws of the intuitionistic propositional calculus.
Returning now to the form of judgement

A is a proposition,

the semantical explanation which goes together with it is this, and here
I am using the knowledge theoretical formulation, that to know a propo-
sition, which may be replaced, if you want, by problem, expectation,
or intention, you must know what counts as a verification, solution,
fulfillment, or realization of it. Here verification matches with propo-
sition, solution with problem, fulfillment with expectation as well as
with intention, and realization with intention. Realization is the term
introduced by Kleene, but here I am of course not using it in his sense:
Kleene’s realizability interpretation is a nonstandard, or nonintended,
interpretation of intuitionistic logic and arithmetic. The terminology
of intention and fulfillment was taken over by Heyting from Husserl, via
Oskar Becker, apparently. There is a long chapter in the sixth, and last,
of his Logische Untersuchungen which bears the title Bedeutungsinten-
tion und Bedeutungserfüllung, and it is these two terms, intention and
fulfillment, Ger. Erfüllung, that Heyting applied in his analysis of the
notions of proposition and truth. And he did not just take the terms
from Husserl: if you observe how Husserl used these terms, you will see
that they were appropriately applied by Heyting. Finally, verification
seems to be the perfect term to use together with proposition, coming
as it does from Lat. verus, true, and facere, to make. So to verify is to
make true, and verification is the act, or process, of verifying something.
For a long time, I tried to avoid using the term verification, because
it immediately gives rise to discussions about how the present account
of the notions of proposition and truth is related to the verificationism
that was discussed so much in the thirties. But, fortunately, this is fifty
years ago now, and, since we have a word which lends itself perfectly
to expressing what needs to be expressed, I shall simply use it, without
wanting to get into discussion about how the present semantical theory
is related to the verificationism of the logical positivists.
What would an example be? If you take a proposition like,

The sun is shining,


to know that proposition, you must know what counts as a verification
of it, which in this case would be the direct seeing of the shining sun.
Or, if you take the proposition,
The temperature is 10◦ C,
then it would be a direct thermometer reading. What is more interest-
ing, of course, is what the corresponding explanations look like for the
logical operations, which I shall come to in my last lecture.
Coupled with the preceding explanation of what a proposition is, is
the following explanation of what a truth is, that is, of what it means
for a proposition to be true. Assume first that
A is a proposition,

and, because of the omnipresence of the epistemic force, I am really


asking you to assume that you know, that is, have grasped, that A
is a proposition. On that assumption, I shall explain to you what a
judgement of the form
A is true,
or, briefly,
A true,

means, that is, what you must know in order to have the right to
make a judgement of this form. And the explanation would be that, to
know that a proposition is true, a problem is solvable, an expectation
is fulfillable, or an intention is realizable, you must know how to verify,
solve, fulfill, or realize it, respectively. Thus this explanation equates
truth with verifiability, solvability, fulfillability, or realizability. The
important point to observe here is the change from is in A is true to
can in A can be verified, or A is verifiable. Thus what is expressed in
terms of being in the first formulation really has the modal character
of possibility.
Now, as I said earlier in this lecture, to know a judgement is the
same as to possess a proof of it, and to know a judgement of the
particular form A is true is the same as to know how, or be able, to
verify the proposition A. Thus knowledge of a judgement of this form
is knowledge how in Ryle’s terminology. On the other hand, to know
how to do something is the same as to possess a way, or method, of
doing it. This is reflected in the etymology of the word method, which
is derived from Gr. metˆ, after, and ådìj, way. Taking all into account,
we arrive at the conclusion that a proof that a proposition A is true
is the same as a method of verifying, solving, fulfilling, or realizing A.
This is the explanation for the frequent appearance of the word method
in Heyting’s explanations of the meanings of the logical constants. In
connection with the word method, notice the tendency of our language
towards hypostatization. I can do perfectly well without the concept of
method in my semantical explanations: it is quite sufficient for me to
have access to the expression know how, or knowledge how. But it is in
the nature of our language that, when we know how to do something,
we say that we possess a method of doing it.
Summing up, I have now explained the two forms of categorical
judgement,
A is a proposition
and
A is true,
respectively, and they are the only forms of categorical judgement that
I shall have occasion to consider. Observe that knowledge of a judge-
ment of the second form is knowledge how, more precisely, knowledge
how to verify A, whereas knowledge of a judgement of the first form
is knowledge of a problem, expectation, or intention, which is knowl-
edge what to do, simply. Here I am introducing knowledge what as a
counterpart of Ryle’s knowledge how. So the difference between these
two kinds of knowledge is the difference between knowledge what to do
and knowledge how to do it. And, of course, there can be no question
of knowing how to do something before you know what it is that is to
be done. The difference between the two kinds of knowledge is a cat-
egorical one, and, as you see, what Ryle calls knowledge that, namely,
knowledge that a proposition is true, is equated with knowledge how
on this analysis. Thus the distinction between knowledge how and
knowledge that evaporates on the intuitionistic analysis of the notion
of truth.

Third lecture

The reason why I said that the word verification may be dangerous
is that the principle of verification formulated by the logical positivists
in the thirties said that a proposition is meaningful if and only if it is
verifiable, or that the meaning of a proposition is its method of ver-
ification. Now that is to confuse meaningfulness and truth. I have
indeed used the word verifiable and the expression method of verifica-
tion. But what is equated with verifiability is not the meaningfulness
but the truth of a proposition, and what qualifies as a method of ver-
ification is a proof that a proposition is true. Thus the meaning of a
proposition is not its method of verification. Rather, the meaning of a
proposition is determined by what it is to verify it, or what counts as
a verification of it.
The next point that I want to bring up is the question,
Are there propositions which are true,
but which cannot be proved to be true?
And it suffices to think of mathematical propositions here, like the
Goldbach conjecture, the Riemann hypothesis, or Fermat’s last theo-
rem. This fundamental question was once posed to me outright by a
colleague of mine in the mathematics department, which shows that
even working mathematicians may find themselves puzzled by deep
philosophical questions. At first sight, at least, there seem to be two
possible answers to this question. One is simply,
No,
and the other is,
Perhaps,
although it is of course impossible for anybody to exhibit an example
of such a proposition, because, in order to do that, he would already
have to know it to be true. If you are at all puzzled by this question,
it is an excellent subject of meditation, because it touches the very
conflict between idealism and realism in the theory of knowledge, the
first answer, No, being indicative of idealism, and the second answer,
Perhaps, of realism. It should be clear, from any point of view, that
the answer depends on how you interpret the three notions in terms
of which the question is formulated, that is, the notion of proposition,
the notion of truth, and the notion of proof. And it should already be
clear, I believe, from the way in which I have explained these notions,
that the question simply ceases to be a problem, and that it is the first
answer which is favoured.
To see this, assume first of all that A is a proposition, or problem.
Then
A is true
is a judgement which gives rise to a new problem, namely, the problem
of proving that A is true. To say that that problem is solvable is pre-
cisely the same as saying that the judgement that A is true is provable.
Now, the solvability of a problem is always expressed by a judgement.
Hence
(A is true) is provable

is a new judgement. What I claim is that we have the right to make this
latter judgement if and only if we have the right to make the former
judgement, that is, that the proof rule
A is true
(A is true) is provable

as well as its inverse


(A is true) is provable
A is true

are both valid. This is the sense of saying that A is true if and only if
A can be proved to be true. To justify the first rule, assume that you
know its premise, that is, that you have proved that A is true. But, if
you have proved that A is true, then you can, or know how to, prove
that A is true, which is what you need to know in order to have the
right to judge the conclusion. In this step, I have relied on the principle
that, if something has been done, then it can be done. To justify the
second rule, assume that you know its premise, that is, that you know
how to prove the judgement A is true. On that assumption, I have to
explain the conclusion to you, which is to say that I have to explain
how to verify the proposition A. This is how you do it. First, put your
knowledge of the premise into practice. That yields as result a proof
that A is true. Now, such a proof is nothing but knowledge how to
verify, or a method of verifying, the proposition A. Hence, putting it,
in turn, into practice, you end up with a verification of the proposition
A, as required. Observe that the inference in this direction is essentially
a contraction of two possibilities into one: if you know how to know
how to do something, then you know how to do it.
All this is very easy to say, but, if one is at all puzzled by the ques-
tion whether there are unprovable truths, then it is not an easy thing to
make up one’s mind about. For instance, it seems, from Heyting’s writ-
ings on the semantics of intuitionistic logic in the early thirties, that
he had not arrived at this position at that time. The most forceful and
persistent criticism of the idea of a knowledge independent, or knowl-
edge transcendent, notion of truth has been delivered by Dummett,
although it seems difficult to find him ever explicitly committing him-
self in his writings to the view that, if a proposition is true, then it can
also be proved to be true. Prawitz seems to be leaning towards this
nonrealistic principle of truth, as he calls it, in his paper Intuitionistic
Logic: A Philosophical Challenge. And, in his book Det Osägbara,
printed in the same year, Stenlund explicitly rejects the idea of true
propositions that are in principle unknowable. The Swedish proof the-
orists seem to be arriving at a common philosophical position.
Next I have to say something about hypothetical judgements, be-
fore I proceed to the final piece, which consists of the explanations
of the meanings of the logical constants and the justifications of the
logical laws. So far, I have only introduced the two forms of categor-
ical judgement A is a proposition and A is true. The only forms of
judgement that I need to introduce, besides these, are forms of hypo-
thetical judgement. Hypothetical means of course the same as under
assumptions. The Gr. Ípìqesij, hypothesis, was translated into Lat.
suppositio, supposition, and they both mean the same as assumption.
Now, what is the rule for making assumptions, quite generally? It is
simple. Whenever you have a judgement in the sense that I am using
the word, that is, a judgement in the sense of an instance of a form of
judgement, then it has been laid down what you must know in order
to have the right to make it. And that means that it makes perfectly
good sense to assume it, which is the same as to assume that you know
it, which, in turn, is the same as to assume that you have proved it.
Why is it the same to assume it as to assume that you know it? Be-
cause of the constant tacit convention that the epistemic force, I know
. . . , is there, even if it is not made explicit. Thus, when you assume
something, what you do is that you assume that you know it, that is,
that you have proved it. And, to repeat, the rule for making assump-
tions is simply this: whenever you have a judgement, in the sense of an
instance of a form of judgement, you may assume it. That gives rise
to the notion of hypothetical judgement and the notion of hypothetical
proof, or proof under hypotheses.
The forms of hypothetical judgement that I shall need are not so
many. Many more can be introduced, and they are needed for other
purposes. But what is absolutely necessary for me is to have access to
the form
A1 true, . . . , An true | A prop,
which says that A is a proposition under the assumptions that
A1, . . . , An are all true, and, on the other hand, the form

A1 true, . . . , An true | A true,

which says that the proposition A is true under the assumptions that
A1, . . . , An are all true. Here I am using the vertical bar for the relation
of logical consequence, that is, for what Gentzen expressed by means of
the arrow → in his sequence calculus, and for which the double arrow
⇒ is also a common notation. It is the relation of logical consequence,
which must be carefully distinguished from implication. What stands
to the left of the consequence sign, we call the hypotheses, in which
case what follows the consequence sign is called the thesis, or we call
the judgements that precede the consequence sign the antecedents and
the judgement that follows after the consequence sign the consequent.
This is the terminology which Gentzen took over from the scholastics,
except that, for some reason, he changed consequent into succedent and
consequence into sequence, Ger. Sequenz, usually improperly rendered
by sequent in English.
hypothetical judgement
(logical) consequence
z }| {
A1 true, . . . , An true | A prop
A1 true, . . . , An true | A true
| {z } | {z }
antecedents consequent
hypotheses thesis
Since I am making the assumptions A1 true, . . . , An true, I must be
presupposing something here, because, surely, I cannot make those
assumptions unless they are judgements. Specifically, in order for A1
true to be a judgement, A1 must be a proposition, and, in order for
A2 true to be a judgement, A2 must be a proposition, but now merely
under the assumption that A1 is true, . . . , and, in order for An true
to be a judgement, An must be a proposition under the assumptions
that A1, . . . , An−1 are all true. Unlike in Gentzen’s sequence calculus,
the order of the assumptions is important here. This is because of the
generalization that something being a proposition may depend on other
things being true. Thus, for the assumptions to make sense, we must
presuppose
A1 prop,
A1 true | A2 prop,
..
.
A1 true, . . . , An−1 true | An prop.
Supposing this, that is, supposing that we know this, it makes perfectly
good sense to assume, first, that A1 is true, second, that A2 is true,
. . . , finally, that An is true, and hence
A1 true, . . . , An true | A prop
is a perfectly good judgement whatever expression A is, that is, what-
ever expression you insert into the place indicated by the variable A.
And why is it a good judgement? To answer that question, I must
explain to you what it is to know such a judgement, that is, what
constitutes knowledge, or proof, of such a judgement. Now, quite gen-
erally, a proof of a hypothetical judgement, or logical consequence, is
nothing but a hypothetical proof of the thesis, or consequent, from the
hypotheses, or antecedents. The notion of hypothetical proof, in turn,
which is a primitive notion, is explained by saying that it is a proof
which, when supplemented by proofs of the hypotheses, or antecedents,
becomes a proof of the thesis, or consequent. Thus the notion of cate-
gorical proof precedes the notion of hypothetical proof, or inference, in
the order of conceptual priority. Specializing this general explanation
of what a proof of a hypothetical judgement is to the particular form
of hypothetical judgement

A1 true, . . . , An true | A prop

that we are in the process of considering, we see that the defining


property of a proof
A1 true · · · An true
·· ·
· ··
A prop
of such a judgement is that, when it is supplemented by proofs
.. ..
. ··· .
A1 true An true
of the hypotheses, or antecedents, it becomes a proof
.. ..
. ··· .
A1 true An true
·· ·
· ··
A prop

of the thesis, or consequent.


Consider now a judgement of the second form

A1 true, . . . , An true | A true.

For it to make good sense, that is, to be a judgement, we must know,


not only
A1 prop,
A1 true | A2 prop,
..
.
A1 true, . . . , An−1 true | An prop,
as in the case of the previous form of judgement, but also

A1 true, . . . , An true | A prop.

Otherwise, it does not make sense to ask oneself whether A is true


under the assumptions A1 true, . . . , An true. As with any proof of a
hypothetical judgement, the defining characteristic of a proof
A1 true · · · An true
·· ·
· ··
A true

of a hypothetical judgement of the second form is that, when supple-


mented by proofs
.. ..
. ··· .
A1 true An true
of the antecedents, it becomes a categorical proof
.. ..
. ··· .
A1 true An true
·· ·
· ··
A true

of the consequent.
I am sorry that I have had to be so brief in my treatment of hypo-
thetical judgements, but what I have said is sufficient for the following,
except that I need to generalize the two forms of hypothetical judge-
ment so as to allow generality in them. Thus I need judgements which
are, not only hypothetical, but also general, which means that the first
form is turned into
A1 (x1 , . . . , xm) true, . . . , An (x1 , . . . , xm) true |x1,...,xm A(x1 , . . . , xm ) prop

and the second form into


A1 (x1 , . . . , xm) true, . . . , An (x1 , . . . , xm ) true |x1,...,xm A(x1 , . . . , xm ) true.

Both of these forms involve a generality, indicated by subscribing the


variables that are being generalized to the consequence sign, which
must be carefully distinguished from, and which must be explained
prior to, the generality which is expressed by means of the universal
quantifier. It was only to avoid introducing all complications at once
that I treated the case without generality first. Now, the meaning of
a hypothetico-general judgement is explained by saying that, to have
the right to make such a judgement, you must possess a free variable
proof of the thesis, or consequent, from the hypotheses, or antecedents.
And what is a free variable proof? It is a proof which remains a proof
when you substitute anything you want for its free variables, that is,
any expressions you want, of the same arities as those variables. Thus

A1 (x1, . . . , xm) true · · · An (x1, . . . , xm) true


·· ·
· ··
A(x1, . . . , xm) prop

is a proof of a hypothetico-general judgement of the first form provided


it becomes a categorical proof
.. ..
. ··· .
A1 (a1, . . . , am ) true An (a1 , . . . , am) true
·· ·
· ··
A(a1, . . . , am) prop

when you first substitute arbitrary expressions a1 , . . ., am , of the same


respective arities as the variables x1 , . . . , xm, for those variables, and
then supplement it with proofs
.. ..
. ··· .
A1 (a1, . . . , am ) true An (a1 , . . . , am) true

of the resulting substitution instances of the antecedents. The expla-


nation of what constitutes a proof of a hypothetico-general judgement
of the second form is entirely similar.
The difference between an inference and a logical consequence, or
hypothetical judgement, is that an inference is a proof of a logical con-
sequence. Thus an inference is the same as a hypothetical proof. Now,
when we infer, or prove, we infer the conclusion from the premises.
Thus, just as a categorical proof is said to be a proof of its conclusion,
a hypothetical proof is said to be a proof, or an inference, of its con-
clusion from its premises. This makes it clear what is the connection
as well as what is the difference between an inference with its premises
and conclusion on the one hand, and a logical consequence with its
antecedents and consequent on the other hand. And the difference is
precisely that it is the presence of a proof of a logical consequence that
turns its antecedents into premises and the consequent into conclusion
of the proof in question. For example, if A is a proposition, then

A true | ⊥ true

is a perfectly good logical consequence with A true as antecedent and


⊥ true as consequent, but
A true
⊥ true
is not an inference, not a valid inference, that is, unless A is false. In
that case, only, may the conclusion ⊥ true be inferred from the premise
A true.
Let us now pass on to the rules of inference, or proof rules, and their
semantical explanations. I shall begin with the rules of implication.
Now, since I am treating A is a proposition as a form of judgement,
which is on a par with the form of judgement A is true, what we
ordinarily call formation rules will count as rules of inference, but that
is merely a terminological matter. So let us look at the formation rule
for implication.
⊃-formation.
(A true)
A prop B prop
A ⊃ B prop
This rule says, in words, that, if A is a proposition and B is a propo-
sition provided that A is true, then A ⊃ B is a proposition. In the
second premise, I might just as well have used the notation for logical
consequence
A true | B prop
that I introduced earlier in this lecture, because to have a proof of
this logical consequence is precisely the same as to have a hypothetical
proof of B prop from the assumption A true. But, for the moment, I
shall use the more suggestive notation

(A true)
B prop

in imitation of Gentzen. It does not matter, of course, which notation


of the two that I employ. The meaning is in any case the same.
Explanation. The rule of implication formation is a rule of immedi-
ate inference, which means that you must make the conclusion evident
to yourself immediately, without any intervening steps, on the assump-
tion that you know the premises. So assume that you do know the
premises, that is, that you know the proposition A, which is to say
that you know what counts as a verification of it, and that you know
that B is a proposition under the assumption that A is true. My obli-
gation is to explain to you what proposition A ⊃ B is. Thus I have
to explain to you what counts as a verification, or solution, of this
proposition, or problem. And the explanation is that what counts as a
verification of A ⊃ B is a hypothetical proof
A true
..
.
B true

that B is true under the assumption that A is true. In the Kolmogorov


interpretation, such a hypothetical proof appears as a method of solving
the problem B provided that the problem A can be solved, that is,
a method which together with a method of solving the problem A
becomes a method of solving the problem B. The explanation of the
meaning of implication, which has just been given, illustrates again the
rigidity of the order of conceptual priority: the notions of hypothetical
judgement, or logical consequence, and hypothetical proof have to be
explained before the notion of implication, because, when you explain
implication, they are already presupposed.
Given the preceding explanation of the meaning of implication, it
is not difficult to justify the rule of implication introduction.
⊃-introduction.
(A true)
B true
A ⊃ B true
As you see, I am writing it in the standard way, although, of course, it
is still presupposed that A is a proposition and that B is a proposition
under the assumption that A is true. Thus you must know the premises
of the formation rule and the premise of the introduction rule in order
to be able to grasp its conclusion.
Explanation. Again, the rule of implication introduction is a rule of
immediate inference, which means that you must make the conclusion
immediately evident to yourself granted that you know the premises,
that is, granted that you possess a hypothetical proof that B is true
from the hypothesis that A is true. By the definition of implication,
such a proof is nothing but a verification of the proposition A ⊃ B.
And what is it that you must know in order to have the right to judge
A ⊃ B to be true? You must know how to get yourself a verification
of A ⊃ B. But, since you already possess it, you certainly know how
to acquire it: just take what you already have. This is all that there
is to be seen in this particular rule. Observe that its justification rests
again on the principle that, if something has been done, then it can be
done.
Next we come to the elimination rule for implication, which I shall
formulate in the standard way, as modus ponens, although, if you want
all elimination rules to follow the same pattern, that is, the pattern
exhibited by the rules of falsehood, disjunction, and existence elimina-
tion, there is another formulation that you should consider, and which
has been considered by Schroeder-Heister. But I shall have to content
myself with the standard formulation in these lectures.
⊃-elimination.
A ⊃ B true A true
B true
Here it is still assumed, of course, that A is a proposition and that B
is a proposition provided that A is true.
Explanation. This is a rule of immediate inference, so assume that
you know the premises, that is, that you possess proofs
.. ..
. and .
A ⊃ B true A true
of them, and I shall try to make the conclusion evident to you. Now, by
the definitions of the notion of proof and the notion of truth, the proof
of the first premise is knowledge how to verify the proposition A ⊃ B.
So put that knowledge of yours into practice. What you then end up
with is a verification of A ⊃ B, and, because of the way implication
was defined, that verification is nothing but a hypothetical proof
A true
..
.
B true

that B is true from the assumption that A is true. Now take your proof
of the right premise and adjoin it to the verification of A ⊃ B. Then
you get a categorical proof ..
.
A true
..
.
B true
of the conclusion that B is true. Here, of course, I am implicitly using
the principle that, if you supplement a hypothetical proof with proofs
of its hypotheses, then you get a proof of its conclusion. But this is in
the nature of a hypothetical proof: it is that property which makes a
hypothetical proof into what it is. So now you have a proof that B is
true, a proof which is knowledge how to verify B. Putting it, in turn,
into practice, you end up with a verification of B. This finishes my
explanation of how the proposition B is verified.
In the course of my semantical explanation of the elimination rule
for implication, I have performed certain transformations which are
very much like an implication reduction in the sense of Prawitz. Indeed,
I have explained the semantical role of this syntactical transformation.
The place where it belongs in the meaning theory is precisely in the
semantical explanation, or justification, of the elimination rule for im-
plication. Similarly, the reduction rules for the other logical constants
serve to explain the elimination rules associated with those constants.
The key to seeing the relationship between the reduction rules and
the semantical explanations of the elimination rules is this: to verify
a proposition by putting a proof of yours that it is true into practice
corresponds to reducing a natural deduction to introductory form and
deleting the last inference. This takes for granted, as is in fact the
case, that an introduction is an inference in which you conclude, from
the possession of a verification of a proposition, that you know how to
verify it. In particular, verifying a proposition B by means of a proof
that B is true .. ..
. .
A ⊃ B true A true
B true
which ends with an application of modus ponens, corresponds to re-
ducing the proof of the left premise to introductory form

(A true)
..
.
..
B true .
A ⊃ B true A true
B true

then performing an implication reduction in the sense of Prawitz, which


yields the proof
..
.
A true
..
.
B true
as result, and finally reducing the latter proof to introductory form
and deleting its last, introductory inference. This is the syntactical
counterpart of the semantical explanation of the elimination rule for
implication.
The justifications of the remaining logical laws follow the same pat-
tern. Let me take the rules of conjunction next.
& -formation.
A prop B prop
A & B prop
Explanation. Again, assume that you know the premises, and I
shall explain the conclusion to you, that is, I shall tell you what counts
as a verification of A & B. The explanation is that a verification of
A & B consists of a proof that A is true and a proof that B is true,
.. ..
. and .
A true B true
that is, of a method of verifying A and a method of verifying B. In
the Kolmogorov interpretation, A & B appears as the problem which
you solve by constructing both a method of solving A and a method of
solving B.
& -introduction.
A true B true
A & B true
Here the premises of the formation rule are still in force, although not
made explicit, which is to say that A and B are still assumed to be
propositions.
Explanation. Assume that you know the premises, that is, that you
possess proofs .. ..
. and .
A true B true
of them. Because of the meaning of conjunction, just explained, this
means that you have verified A & B. Then you certainly can, or know
how to, verify the proposition A & B, by the principle that, if something
has been done, then it can be done. And this is precisely what you need
to know in order to have the right to judge A & B to be true.
If you want the elimination rule for conjunction to exhibit the same
pattern as the elimination rules for falsehood, disjunction, and exis-
tence, it should be formulated differently, but, in its standard formula-
tion, it reads as follows.
& -elimination.
A & B true A & B true
A true B true

Thus, in this formulation, there are two rules and not only one. Also,
it is still presupposed, of course, that A and B are propositions.
Explanation. It suffices for me to explain one of the rules, say the
first, because the explanation of the other is completely analogous. To
this end, assume that you know the premise, and I shall explain to you
the conclusion, which is to say that I shall explain how to verify A.
This is how you do it. First use your knowledge of the premise to get a
verification of A & B. By the meaning of conjunction, just explained,
that verification consists of a proof that A is true as well as a proof
that B is true, .. ..
. and .
A true B true
Now select the first of these two proofs. By the definitions of the
notions of proof and truth, that proof is knowledge how to verify A.
So, putting it into practice, you end up with a verification of A. This
finishes the explanations of the rules of conjunction.
The next logical operation to be treated is disjunction. And, as
always, the formation rule must be explained first.
∨-formation.
A prop B prop
A ∨ B prop
Explanation. To justify it, assume that you know the premises, that
is, that you know what it is to verify A as well as what it is to verify
B. On that assumption, I explain to you what proposition A ∨ B is by
saying that a verification of A ∨ B is either a proof that A is true or a
proof that B is true, .. ..
. or .
A true B true
Thus, in the wording of the Kolmogorov interpretation, a solution to
the problem A ∨ B is either a method of solving the problem A or a
method of solving the problem B.
∨-introduction.
A true B true
A ∨ B true A ∨ B true

In both of these rules, the premises of the formation rule, which say
that A and B are propositions, are still in force.
Explanation. Assume that you know the premise of the first rule
of disjunction introduction, that is, that you have proved, or possess a
proof of, the judgement that A is true. By the definition of disjunction,
this proof is a verification of the proposition A ∨ B. Hence, by the
principle that, if something has been done, then it can be done, you
certainly can, or know how to, verify the proposition A ∨ B. And it is
this knowledge which you express by judging the conclusion of the rule,
that is, by judging the proposition A ∨ B to be true. The explanation
of the second rule of disjunction introduction is entirely similar.
∨-elimination.
(A true) (B true)
A ∨ B true C true C true
C true

Here it is presupposed, not only that A and B are propositions, but also
that C is a proposition provided that A ∨ B is true. Observe that, in
this formulation of the rule of disjunction elimination, C is presupposed
to be a proposition, not outright, but merely on the hypothesis that
A ∨ B is true. Otherwise, it is just like the Gentzen rule.
Explanation. Assume that you know, or have proved, the premises.
By the definition of truth, your knowledge of the first premise is knowl-
edge how to verify the proposition A ∨ B. Put that knowledge of yours
into practice. By the definition of disjunction, you then end up either
with a proof that A is true or with a proof that B is true,
.. ..
. or .
A true B true
In the first case, join the proof that A is true to the proof that you
already possess of the second premise, which is a hypothetical proof
that C is true under the hypothesis that A is true,
A true
..
.
C true

You then get a categorical, or nonhypothetical, proof that C is true,


..
.
A true
..
.
C true

Again, by the definition of truth, this proof is knowledge how to verify


the proposition C. So, putting this knowledge of yours into practice,
you verify C. In the second case, join the proof that B is true, which
you ended up with as a result of putting your knowledge of the first
premise into practice, to the proof that you already possess of the
third premise, which is a hypothetical proof that C is true under the
hypothesis that B is true,
B true
..
.
C true
You then get a categorical proof that C is true,
..
.
B true
..
.
C true

As in the first case, by the definition of truth, this proof is knowl-


edge how to verify the proposition C. So, putting this knowledge into
practice, you verify C. This finishes my explanation how to verify the
proposition C, which is precisely what you need to know in order to
have the right to infer the conclusion that C is true.
⊥-formation.
⊥ prop
Explanation. This is an axiom, but not in its capacity of mere
figure: to become an axiom, it has to be made evident. And, to make
it evident, I have to explain what counts as a verification of ⊥. The
explanation is that there is nothing that counts as a verification of
the proposition ⊥. Under no condition is it true. Thinking of ⊥ as a
problem, as in the Kolmogorov interpretation, it is the problem which
is defined to have no solution.
An introduction is an inference in which you conclude that a propo-
sition is true, or can be verified, on the ground that you have verified it,
that is, that you possess a verification of it. Therefore, ⊥ being defined
by the stipulation that there is nothing that counts as a verification of
it, there is no introduction rule for falsehood.
⊥-elimination.
⊥ true
C true
Here, in analogy with the rule of disjunction elimination, C is pre-
supposed to be a proposition, not outright, but merely under the as-
sumption that ⊥ is true. This is the only divergence from Gentzen’s
formulation of ex falso quodlibet.
Explanation. When you infer by this rule, you undertake to verify
the proposition C when you are provided with a proof that ⊥ is true,
that is, by the definition of truth, with a method of verifying ⊥. But
this is something that you can safely undertake, because, by the defi-
nition of falsehood, there is nothing that counts as a verification of ⊥.
Hence ⊥ is false, that is, cannot be verified, and hence it is impossible
that you ever be provided with a proof that ⊥ is true. Observe the
step here from the falsity of the proposition ⊥ to the unprovability of
the judgement that ⊥ is true. The undertaking that you make when
you infer by the rule of falsehood elimination is therefore like saying,

I shall eat up my hat if you do such and such,

where such and such is something of which you know, that is, are
certain, that it cannot be done.
Observe that the justification of the elimination rule for falsehood
only rests on the knowledge that ⊥ is false. Thus, if A is a proposition,
not necessarily ⊥, and C is a proposition provided that A is true, then
the inference
A true
C true
is valid as soon as A is false. Choosing C to be ⊥, we can conclude, by
implication introduction, that A ⊃ ⊥ is true provided that A is false.
Conversely, if A ⊃ ⊥ is true and A is true, then, by modus ponens, ⊥
would be true, which it is not. Hence A is false if A ⊃ ⊥ is true. These
two facts together justify the nominal definition of ∼A, the negation of
A, as A ⊃ ⊥, which is commonly made in intuitionistic logic. However,
the fact that A is false if and only if ∼A is true should not tempt one
to define the notion of denial by saying that

A is false

means that

∼A is true.
That the proposition A is false still means that it is impossible to verify
A, and this is a notion which cannot be reduced to the notions of nega-
tion, negation of propositions, that is, and truth. Denial comes before
negation in the order of conceptual priority, just as logical consequence
comes before implication, and the kind of generality which a judgement
may have comes before universal quantification.
As has been implicit in what I have just said,
A is false = A is not true = A is not verifiable
= A cannot be verified.
Moreover, in the course of justifying the rule of falsehood elimination,
I proved that ⊥ is false, that is, that ⊥ is not true. Now, remember
that, in the very beginning of this lecture, we convinced ourselves that a
proposition is true if and only if the judgement that it is true is provable.
Hence, negating both members, a proposition is false if and only if the
judgement that it is true cannot be proved, that is, is unprovable. Using
this in one direction, we can conclude, from the already established
falsity of ⊥, that the judgement that ⊥ is true is unprovable. This is,
if you want, an absolute consistency proof: it is a proof of consistency
with respect to the unlimited notion of provability, or knowability, that
pervades these lectures. And
(⊥ is true) is unprovable
is the judgement which expresses the absolute consistency, if I may call
it so. By my chain of explanations, I hope that I have succeeded in
making it evident.
The absolute consistency brings with it as a consequence the rel-
ative consistency of any system of correct, or valid, inference rules.
Suppose namely that you have a certain formal system, a system of
inference rules, and that you have a formal proof in that system of the
judgement that ⊥ is true. Because of the absolute consistency, that is,
the unprovability of the judgement that ⊥ is true, that formal proof, al-
though formally correct, is no proof, not a real proof, that is. How can
that come about? Since a formal proof is a chain of formally immediate
inferences, that is, instances of the inference rules of the system, that
can only come about as a result of there being some rule of inference
which is incorrect. Thus, if you have a formal system, and you have
convinced yourself of the correctness of the inference rules that belong
to it, then you are sure that the judgement that ⊥ is true cannot be
proved in the system. This means that the consistency problem is real-
ly the problem of the correctness of the rules of inference, and that, at
some stage or another, you cannot avoid having to convince yourself
of their correctness. Of course if you take any old formal system, it
may be that you can carry out a metamathematical consistency proof
for it, but that consistency proof will rely on the intuitive correctness
of the principles of reasoning that you use in that proof, which means
that you are nevertheless relying on the correctness of certain forms
of inference. Thus the consistency problem is really the problem of
the correctness of the rules of inference that you follow, consciously or
unconsciously, in your reasoning.
After this digression on consistency, we must return to the seman-
tical explanations of the rules of inference. The ones that remain are
the quantifier rules.
∀-formation.
A(x) prop
(∀x)A(x) prop
Explanation. The premise of this rule is a judgement which has
generality in it. If I were to make it explicit, I would have to write it

|x A(x) prop.

It is a judgement which has generality in it, although it is free from


hypotheses. And remember what it is to know such a judgement: it is
to possess a free variable proof of it. Now, assume that you do know
the premise of this rule, that is, that you possess a free variable proof
of the judgement that A(x) is a proposition. On that assumption, I
explain the conclusion to you by stipulating that a verification of the
proposition (∀x)A(x) consists of a free variable proof that A(x) is true,
graphically, ..
.
A(x) true
By definition, that is a proof in which the variable x may be replaced
by anything you want, that is, any expression you want of the same
arity as the variable x. Thus, if x is a variable ranging over complete
expressions, then you must substitute a complete expression for it, and,
similarly, if it ranges over incomplete expressions of some arity. In the
Kolmogorov interpretation, the explanation of the meaning of the uni-
versal quantifier would be phrased by saying that (∀x)A(x) expresses
the problem of constructing a general method of solving the problem
A(x) for arbitrary x.
∀-introduction.
A(x) true
(∀x)A(x) true
Here the premise of the formation rule, to the effect that A(x) is a
proposition for arbitrary x, is still in force.
Explanation. Again, the premise of this rule is a general judgement,
which would read
|x A(x) true
if I were to employ the systematic notation that I introduced earlier
in this lecture. Now, assume that you know this, that is, assume that
you possess a free variable proof of the judgement that A(x) is true.
Then, by the principle that, if something has been done, then it can
be done, you certainly can give such a proof, and this is precisely what
you must be able, or know how, to do in order to have the right to infer
the conclusion of the rule.
∀-elimination.
(∀x)A(x) true
A(a) true
Here it is presupposed, of course, that A(x) is a proposition for arbi-
trary x. And, as you see, I have again chosen the usual formulation
of the elimination rule for the universal quantifier rather than the one
which is patterned upon the elimination rules for falsehood, disjunc-
tion, and existence.
Explanation. First of all, observe that, because of the tacit assump-
tion that A(x) is a proposition for arbitrary x, both (∀x)A(x) and A(a)
are propositions, where a is an expression of the same arity as the vari-
able x. Now, assume that you know the premise, that is, that you know
how to verify the proposition (∀x)A(x), and I shall explain to you how
to verify the proposition A(a). To begin with, put your knowledge of
the premise into practice. That will give you a verification of (∀x)A(x),
which, by the definition of the universal quantifier, is a free variable
proof that A(x) is true, ..
.
A(x) true
Now, this being a free variable proof means precisely that it remains a
proof whatever you substitute for x. In particular, it remains a proof
when you substitute a for x so as to get
..
.
A(a) true

So now you have acquired a proof that A(a) is true. By the definitions
of the notions of proof and truth, this proof is knowledge how to verify
the proposition A(a). Thus, putting it into practice, you end up with
a verification of A(a), as required.
∃-formation.
A(x) prop
(∃x)A(x) prop
Explanation. Just as in the formation rule associated with the uni-
versal quantifier, the premise of this rule is really the general judgement

|x A(x) prop,

although I have not made the generality explicit in the formulation of


the rule. Assume that you know the premise, that is, assume that you
possess a free variable proof
...
A(x) prop

guaranteeing that A(x) is a proposition, and I shall explain to you what


proposition (∃x)A(x) is, that is, what counts as a verification of it. The
explanation is that a verification of (∃x)A(x) consists of an expression
a of the same arity as the variable x and a proof
..
.
A(a) true

showing that the proposition A(a) is true. Observe that the knowledge
of the premise is needed in order to guarantee that A(a) is a proposition,
so that it makes sense to talk about a proof that A(a) is true. In
the Kolmogorov interpretation, (∃x)A(x) would be explained as the
problem of finding an expression a, of the same arity as the variable x,
and a method of solving the problem A(a).
∃-introduction.
A(a) true
(∃x)A(x) true
Here, as usual, the premise of the formation rule is still in force, which
is to say that A(x) is assumed to be a proposition for arbitrary x.
Explanation. Assume that you know the premise, that is, assume
that you possess a proof that A(a) is true,
..
.
A(a) true
By the preceding explanation of the meaning of the existential quanti-
fier, the expression a together with this proof make up a verification of
the proposition (∃x)A(x). And, possessing a verification of the propo-
sition (∃x)A(x), you certainly know how to verify it, which is what you
must know in order to have the right to conclude that (∃x)A(x) is true.
Like in my explanations of all the other introduction rules, I have here
taken for granted the principle that, if something has been done, then
it can be done.
∃-elimination.
(A(x) true)
(∃x)A(x) true C true
C true
Here it is presupposed, not only that A(x) is a proposition for arbitrary
x, like in the introduction rule, but also that C is a proposition provided
that the proposition (∃x)A(x) is true.
Explanation. First of all, in order to make it look familiar, I have
written the second premise in Gentzen’s notation
(A(x) true)
C true
rather than in the notation
A(x) true |x C true,
but there is no difference whatever in sense. Thus the second premise
is really a hypothetico-general judgement. Now, assume that you know
the premises. By the definition of the notion of truth, your knowledge of
the first premise is knowledge how to verify the proposition (∃x)A(x).
Put that knowledge of yours into practice. You then end up with
a verification of the proposition (∃x)A(x). By the definition of the
existential quantifier, this verification consists of an expression a of the
same arity as the variable x and a proof that the proposition A(a) is
true, ..
.
A(a) true
Now use your knowledge, or proof, of the second premise. Because of
the meaning of a hypothetico-general judgement, this proof
A(x) true
..
.
C true
is a free variable proof that C is true from the hypothesis that A(x)
is true. Being a free variable proof means that you may substitute
anything you want, in particular, the expression a, for the variable x.
You then get a hypothetical proof

A(a) true
..
.
C true

that C is true from the hypothesis that A(a) is true. Supplementing this
hypothetical proof with the proof that A(a) is true that you obtained
as a result of putting your knowledge of the first premise into practice,
you get a proof ..
.
A(a) true
..
.
C true
that C is true, and this proof is nothing but knowledge how to verify
the proposition C. Thus, putting it into practice, you end up having
verified the proposition C, as required.
The promise of the title of these lectures, On the Meanings of the
Logical Constants and the Justifications of the Logical Laws, has now
been fulfilled. As you have seen, the explanations of the meanings of
the logical constants are precisely the explanations belonging to the
formation rules. And the justifications of the logical laws are the ex-
planations belonging to the introduction and elimination rules, which
are the rules that we normally call rules of inference. For lack of time,
I have only been able to deal with the pure logic in my semantical ex-
planations. To develop some interesting parts of mathematics, you also
need axioms for ordinary inductive definitions, in particular, axioms of
computation and axioms for the natural numbers. And, if you need
predicates defined by transfinite, or generalized, induction, then you
will have to add the appropriate formation, introduction, and elimina-
tion rules for them.
I have already explained how you see the consistency of a formal
system of correct inference rules, that is, the impossibility of construct-
ing a proof ..
.
⊥ true
that falsehood is true which proceeds according to those rules, not by
studying metamathematically the proof figures divested of all sense, as
was Hilbert’s program, but by doing just the opposite: not divesting
them of sense, but endowing them with sense. Similarly, suppose that
you have a proof ..
.
A true
that a proposition A is true which depends, neither on any assumptions,
nor on any free variables. By the definition of truth and the identifica-
tion of proof and knowledge, such a proof is nothing but knowledge how
to verify the proposition A. And, as I remarked earlier in this lecture,
verifying the proposition A by putting that knowledge into practice is
the same as reducing the proof to introductory form and deleting the
last, introductory inference. Moreover, the way of reducing the proof
which corresponds to the semantical explanations, notably of the elim-
ination rules, is precisely the way that I utilized for the first time in
my paper on iterated inductive definitions in the Proceedings of the
Second Scandinavian Logic Symposium, although merely because of
its naturalness, not for any genuine semantical reasons, at that time.
But no longer do we need to prove anything, that is, no longer do we
need to prove metamathematically that the proof figures, divested of
sense, reduce to introductory form. Instead of proving it, we endow
the proof figures with sense, and then we see it! Thus the definition
of convertibility, or computability, and the proof of normalization have
been transposed into genuine semantical explanations which allow you
to see this, just as you can see consistency semantically. And this is
the point that I had intended to reach in these lectures.
Postscript, Feb. 1996

The preceding three lectures were originally published in the Atti


degli Incontri di Logica Matematica, Vol. 2, Scuola di Specializzazione
in Logica Matematica, Dipartimento di Matematica, Università di
Siena, 1985, pp. 203–281. Since they have been difficult to obtain,
and are now even out of print, they are reprinted here by kind per-
mission of the Dipartimento di Matematica, Università di Siena. Only
typing errors have been corrected. The reader who wishes to follow
the further development of the ideas that were brought up for the first
time in these lectures is referred to the papers listed below.

Per Martin-Löf (1987) Truth of a proposition, evidence of a judgement,


validity of a proof. Synthese, 73, pp. 407–420.
—— (1991) A path from logic to metaphysics. In Atti del Congresso
Nuovi Problemi della Logica e della Filosofia della Scienza, Viareg-
gio, 8–13 gennaio 1990, Vol. II, pp. 141–149. CLUEB, Bologna.
—— (1994) Analytic and synthetic judgements in type theory. In
Paolo Parrini (ed.), Kant and Contemporary Epistemology, pp. 87–
99. Kluwer Academic Publishers, Dordrecht/Boston/London.
—— (1995) Verificationism then and now. In W. DePauli-Schimanovich,
E. Köhler, and F. Stadler (eds.), The Foundational Debate: Com-
plexity and Constructivity in Mathematics and Physics, pp. 187–
196. Kluwer Academic Publishers, Dordrecht/Boston/London.
—— (1996) Truth and knowability: on the principles C and K of
Michael Dummett. In H. G. Dales and G. Oliveri (eds.), Truth
in Mathematics. Clarendon Press, Oxford. Forthcoming.

Department of Mathematics
University of Stockholm
Sweden
Notions of computation and monads
Eugenio Moggi∗

Abstract
The λ-calculus is considered an useful mathematical tool in the study of programming
languages, since programs can be identified with λ-terms. However, if one goes further and
uses βη-conversion to prove equivalence of programs, then a gross simplification is introduced
(programs are identified with total functions from values to values), that may jeopardise the
applicability of theoretical results. In this paper we introduce calculi based on a categorical
semantics for computations, that provide a correct basis for proving equivalence of programs,
for a wide range of notions of computation.

Introduction
This paper is about logics for reasoning about programs, in particular for proving equivalence of
programs. Following a consolidated tradition in theoretical computer science we identify programs
with the closed λ-terms, possibly containing extra constants, corresponding to some features of
the programming language under consideration. There are three semantic-based approaches to
proving equivalence of programs:
• The operational approach starts from an operational semantics, e.g. a partial function
mapping every program (i.e. closed term) to its resulting value (if any), which induces a
congruence relation on open terms called operational equivalence (see e.g. [Plo75]). Then
the problem is to prove that two terms are operationally equivalent.
• The denotational approach gives an interpretation of the (programming) language in a
mathematical structure, the intended model. Then the problem is to prove that two terms
denote the same object in the intended model.
• The logical approach gives a class of possible models for the (programming) language.
Then the problem is to prove that two terms denotes the same object in all possible models.
The operational and denotational approaches give only a theory: the operational equivalence ≈
or the set T h of formulas valid in the intended model respectively. On the other hand, the logical
approach gives a consequence relation `, namely Ax ` A iff the formula A is true in all models
of the set of formulas Ax, which can deal with different programming languages (e.g. functional,
imperative, non-deterministic) in a rather uniform way, by simply changing the set of axioms
Ax, and possibly extending the language with new constants. Moreover, the relation ` is often
semidecidable, so it is possible to give a sound and complete formal system for it, while T h and ≈
are semidecidable only in oversimplified cases.
We do not take as a starting point for proving equivalence of programs the theory of βη-
conversion, which identifies the denotation of a program (procedure) of type A → B with a
total function from A to B, since this identification wipes out completely behaviours like non-
termination, non-determinism or side-effects, that can be exhibited by real programs. Instead, we
proceed as follows:
1. We take category theory as a general theory of functions and develop on top a categorical
semantics of computations based on monads.
∗ Research partially supported by EEC Joint Collaboration Contract # ST2J-0374-C(EDB).

1
2. We consider simple formal systems matching the categorical semantics of computation.
3. We extend stepwise categorical semantics and formal system in order to interpret richer
languages, in particular the λ-calculus.
4. We show that w.l.o.g. one may consider only (monads over) toposes, and we exploit this fact
to establish conservative extension results.
The methodology outlined above is inspired by [Sco80]1 , and it is followed in [Ros86, Mog86] to
obtain the λp -calculus. The view that “category theory comes, logically, before the λ-calculus”led
us to consider a categorical semantics of computations first, rather than to modify directly the
rules of βη-conversion to get a correct calculus.

Related work
The operational approach to find correct λ-calculi w.r.t. an operational equivalence, was first
considered in [Plo75] for call-by-value and call-by-name operational equivalence. This approach
was later extended, following a similar methodology, to consider other features of computations like
nondeterminism (see [Sha84]), side-effects and continuations (see [FFKD86, FF89]). The calculi
based only on operational considerations, like the λv -calculus, are sound and complete w.r.t. the
operational semantics, i.e. a program M has a value according to the operational semantics iff it
is provably equivalent to a value (not necessarily the same) in the calculus, but they are too weak
for proving equivalences of programs.
Previous work on axiom systems for proving equivalence of programs with side effects has
shown the importance of the let-constructor (see [Mas88, MT89a, MT89b]). In the framework of
the computational lambda-calculus the importance of let becomes even more apparent.
The denotational approach may suggest important principles, e.g. fix-point induction (see
[Sco69, GMW79]), that can be found only after developing a semantics based on mathematical
structures rather than term models, but it does not give clear criteria to single out the general
principles among the properties satisfied by the model. Moreover, the theory at the heart of De-
notational Semantics, i.e. Domain Theory (see [GS89, Mos89]), has focused on the mathematical
structures for giving semantics to recursive definitions of types and functions (see [SP82]), while
other structures, that might be relevant to a better understanding of programming languages, have
been overlooked. This paper identify one of such structures, i.e. monads, but probably there are
others just waiting to be discovered.
The categorical semantic of computations presented in this paper has been strongly influenced
by the reformulation of Denotational Semantics based on the category of cpos, possibly without
bottom, and partial continuous functions (see [Plo85]) and the work on categories of partial mor-
phisms in [Ros86, Mog86]. Our work generalises the categorical account of partiality to other
notions of computations, indeed partial cartesian closed categories turn out to be a special case of
λc -models (see Definition 3.9).
A type theoretic approach to partial functions and computations is proposed in [CS87, CS88]
by introducing a type-constructor Ā, whose intuitive meaning is the set of computations of type
A. Our categorical semantics is based on a similar idea. Constable and Smith, however, do not
adequately capture the general axioms for computations (as we do), since their notion of model,
based on an untyped partial applicative structure, accounts only for partial computations.

1 A categorical semantics of computations


The basic idea behind the categorical semantics below is that, in order to interpret a programming
language in a category C, we distinguish the object A of values (of type A) from the object T A of

1 “I am trying to find out where λ-calculus should come from, and the fact that the notion of a cartesian closed

category is a late developing one (Eilenberg & Kelly (1966)), is not relevant to the argument: I shall try to explain
in my own words in the next section why we should look to it first”.

2
computations (of type A), and take as denotations of programs (of type A) the elements of T A.
In particular, we identify the type A with the object of values (of type A) and obtain the object
of computations (of type A) by applying an unary type-constructor T to A. We call T a notion
of computation, since it abstracts away from the type of values computations may produce. There
are many choices for T A corresponding to different notions of computations.
Example 1.1 We give few notions of computation in the category of sets.
• partiality T A = A⊥ (i.e. A + {⊥}), where ⊥ is the diverging computation
• nondeterminism T A = Pf in (A)
S
• side-effects T A = (A × S) , where S is a set of states, e.g. a set U L of stores or a set of
input/output sequences U ∗
• exceptions T A = (A + E), where E is the set of exceptions
A
• continuations T A = R(R ) , where R is the set of results
• interactive input T A = (µγ.A + γ U ), where U is the set of characters.
More explicitly T A is the set of U -branching trees with finite branches and A-labelled leaves
• interactive output T A = (µγ.A + (U × γ)).
More explicitly T A is (isomorphic to) U ∗ × A.
Further examples (in a category of cpos) could be given based on the denotational semantics for
various programming languages (see [Sch86, GS89, Mos89]).
Rather than focusing on a specific T , we want to find the general properties common to all notions
of computation, therefore we impose as only requirement that programs should form a category.
The aim of this section is to convince the reader, with a sequence of informal argumentations, that
such a requirement amounts to say that T is part of a Kleisli triple (T, η, ∗ ) and that the category
of programs is the Kleisli category for such a triple.
Definition 1.2 ([Man76]) A Kleisli triple over a category C is a triple (T, η, ∗ ), where T : Obj(C) →
Obj(C), ηA : A → T A for A ∈ Obj(C), f ∗ : T A → T B for f : A → T B and the following equations
hold:

• ηA = idT A
• ηA ; f ∗ = f for f : A → T B
• f ∗ ; g ∗ = (f ; g ∗ )∗ for f : A → T B and g: B → T C.
A Kleisli triple satisfies the mono requirement provided ηA is mono for A ∈ C.
Intuitively ηA is the inclusion of values into computations (in several cases ηA is indeed a mono) and
f ∗ is the extension of a function f from values to computations to a function from computations
to computations, which first evaluates a computation and then applies f to the resulting value. In
summary
ηA
a: A 7−→ [a]: T A

f
a: A 7−→ f (a): T B
f∗
c: T A 7−→ (let x⇐c in f (x)): T B
In order to justify the axioms for a Kleisli triple we have first to introduce a category C T whose
morphisms correspond to programs. We proceed by analogy with the categorical semantics for
terms, where types are interpreted by objects and terms of type B with a parameter (free variable)
of type A are interpreted by morphisms from A to B. Since the denotation of programs of type B
are supposed to be elements of T B, programs of type B with a parameter of type A ought to be

3
interpreted by morphisms with codomain T B, but for their domain there are two alternatives, either
A or T A, depending on whether parameters of type A are identified with values or computations
of type A. We choose the first alternative, because it entails the second. Indeed computations
of type A are the same as values of type T A. So we take CT (A, B) to be C(A, T B). It remains
to define composition and identities in CT (and show that they satisfy the unit and associativity
axioms for categories).

Definition 1.3 Given a Kleisli triple (T, η, ) over C, the Kleisli category CT is defined as
follows:
• the objects of CT are those of C
• the set CT (A, B) of morphisms from A to B in CT is C(A, T B)
• the identity on A in CT is ηA : A → T A
• f ∈ CT (A, B) followed by g ∈ CT (B, C) in CT is f ; g ∗ : A → T C.
It is natural to take ηA as the identity on A in the category CT , since it maps a parameter x to [x],
i.e. to x viewed as a computation. Similarly composition in CT has a simple explanation in terms
of the intuitive meaning of f ∗ , in fact
f g
x: A 7−→ f (x): T B y: B 7−→ g(y): T C

f ;g
x: A 7−→ (let y⇐f (x) in g(y)): T C

i.e. f followed by g in CT with parameter x is the program which first evaluates the program
f (x) and then feed the resulting value as parameter to g. At this point we can give also a simple
justification for the three axioms of Kleisli triples, namely they are equivalent to the unit and
associativity axioms for CT :

• f ; ηB = f for f : A → T B
• ηA ; f ∗ = f for f : A → T B
• (f ; g ∗ ); h∗ = f ; (g; h∗ )∗ for f : A → T B, g: B → T C and h: C → T D.

Example 1.4 We go through the notions of computation given in Example 1.1 and show that they
are indeed part of suitable Kleisli triples.

• partiality T A = A⊥ (= A + {⊥})
ηA is the inclusion of A into A⊥
if f : A → T B, then f ∗ (⊥) = ⊥ and f ∗ (a) = f (a) (when a ∈ A)
• nondeterminism T A = Pf in (A)
ηA is the singleton map a 7→ {a}
if f : A → T B and c ∈ T A, then f ∗ (c) = ∪x∈c f (x)
S
• side-effects T A = (A × S)
ηA is the map a 7→ (λs: S.ha, si)
if f : A → T B and c ∈ T A, then f ∗ (c) = λs: S.(let ha, s0 i = c(s) in f (a)(s0 ))
• exceptions T A = (A + E)
ηA is the injection map a 7→ inl(a)
if f : A → T B, then f ∗ (inr(e)) = e (when e ∈ E) and f ∗ (inl(a)) = f (a) (when a ∈ A)
A
• continuations T A = R(R )
ηA is the map a 7→ (λk: RA .k(a))
if f : A → T B and c ∈ T A, then f ∗ (c) = (λk: RB .c(λa: A.f (a)(k)))

4
• interactive input T A = (µγ.A + γ U )
ηA maps a to the tree consisting only of one leaf labelled with a
if f : A → T B and c ∈ T A, then f ∗ (c) is the tree obtained by replacing leaves of c labelled
by a with the tree f (a)
• interactive output T A = (µγ.A + (U × γ))
ηA is the map a 7→ h, ai
if f : A → T B, then f ∗ (hs, ai) = hs ∗ s0 , bi, where f (a) = hs0 , bi and s ∗ s0 is the concatenation
of s followed by s0 .

Kleisli triples are just an alternative description for monads. Although the formers are easy
to justify from a computational perspective, the latters are more widely used in the literature on
Category Theory and have the advantage of being defined only in terms of funtors and natural
transformations, which make them more suitable for abstract manipulation.
Definition 1.5 ([Mac71]) A monad over a category C is a triple (T, η, µ), where T : C → C is
. .
a functor, η: IdC → T and µ: T 2 → T are natural transformations and the following diagrams
commute:
µT A ηT A T ηA
T 3A > T 2A TA > T 2A < TA
@
@ id
T µA µA @ T A µA
@ idT A
@
∨ ∨ R
@ ∨
T 2A > TA TA
µA

Proposition 1.6 ([Man76]) There is a one-one correspondence between Kleisli triples and mon-
ads.

Proof Given a Kleisli triple (T, η, ∗ ), the corresponding monad is (T, η, µ), where T is the extension
of the function T to an endofunctor by taking T (f ) = (f ; ηB )∗ for f : A → B and µA = id∗T A .
Conversely, given a monad (T, η, µ), the corresponding Kleisli triple is (T, η, ∗ ), where T is the
restriction of the functor T to objects and f ∗ = (T f ); µB for f : A → T B.

Remark 1.7 In general the categorical semantics of partial maps, based on a category C equipped
with a dominion M (see [Ros86]), cannot be reformulated in terms of a Kleisli triple over C
satisfying some additional properties, unless C has lifting, i.e. the inclusion functor from C into the
category of partial maps P(C, M) has a right adjoint ⊥ characterised by the natural isomorphism
·
C(A, B⊥ ) ∼
= P(C, M)(A, B)

This mismatch disappears when considering partial cartesian closed categories.

2 Simple languages for monads


In this section we consider two formal systems motivated by different objectives: reasoning about
programming languages and reasoning about programs in a fixed programming language. When
reasoning about programming languages one has different monads (for simplicity we assume that
they are over the same category), one for each programming language, and the main aim is to
study how they relate to each other. So it is natural to base a formal system on a metalanguage
for a category and treat monads as unary type-constructors. When reasoning about programs one
has only one monad, because the programming language is fixed, and the main aim is to prove
properties of programs. In this case the obvious choice for the term language is the programming
language itself, which is more naturally interpreted in the Kleisli category.

5
Remark 2.1 We regard the metalanguage as more fundamental. In fact, its models are more
general, as they don’t have to satisfy the mono requirement, and the interpretation of programs (of
some given programming language) can be defined simply by translation into (a suitable extension
of) the metalanguage. It should be pointed out that the mono requirement cannot be axiomatised
in the metalanguage, as we would need conditional equations [x]T = [y]T → x = y, and that
existence assertions cannot be translated into formulas of the metalanguage, as we would need

existentially quantified formulas (e ↓σ ) ≡ (∃!x: σ.e◦ = [x]T )2 .
In Section 2.3 we will explain once for all the correspondence between theories of a simple
programming language and categories with a monad satisfying the mono requirement. For other
programming languages we will give only their translation in a suitable extension of the metalan-
guage. In this way, issues like call-by-value versus call-by-name affect the translation, but not the
metalanguage.
In Categorical Logic it is common practice to identify a theory T with a category F(T ) with
additional structure such that there is a one-one correspondence between models of T in a category
C with additional structure and structure preserving functors from F(T ) to C (see [KR77]) 3 . This
identification was originally proposed by Lawvere, who also showed that algebraic theories can be
viewed as categories with finite products.
In Section 2.2 we give a class of theories that can be viewed as categories with a monad, so that
any category with a monad is, up to equivalence (of categories with a monad), one of such theories.
Such a reformulation in terms of theories is more suitable for formal manipulation and more
appealing to those unfamiliar with Category Theory. However, there are other advantages in having
an alternative presentation of monads. For instance, natural extensions of the syntax may suggest
extensions of the categorical structure that may not be immediate to motivate and justify otherwise
(we will exploit this in Section 3). In Section 2.3 we take a programming language perspective
and establish a correspondence between theories (with equivalence and existence assertions) for a
simple programming language and categories with a monad satisfying the mono requirement, i.e.
ηA mono for every A.
As starting point we take many sorted monadic equational logic, because it is more primitive
than many sorted equational logic, indeed monadic theories are equivalent to categories without
any additional structure.

2.1 Many sorted monadic equational logic


The language and formal system of many sorted monadic equational logic are parametric in a
signature, i.e. a set of base types A and unary function symbols f: A1 → A2 . The language is made
of types ` A type, terms x: A1 ` e: A2 and equations x: A1 ` e1 =A2 e2 defined by the following
formation rules:

A A base type
` A type
` A type
var
x: A ` x: A
x: A ` e1 : A1
f f: A1 → A2
x: A ` f(e1 ): A2
x: A1 ` e1 : A2 x: A1 ` e2 : A2
eq
x: A1 ` e1 =A2 e2

2 The uniqueness of x s.t. e◦ = [x]T follows from the mono requirement.


3 In [LS86] a stronger relation is sought between theories and categories with additional structure, namely an

equivalence between the category of theories and translations and the category of small categories with additional
structure and structure preserving functors. In the case of typed λ-calculus, for instance, such an equivalence
between λ-theories and cartesian closed categories requires a modification in the definition of λ-theory, which allows
not only equations between λ-terms but also equations between type expressions.

6
RULE SYNTAX SEMANTICS
A
` A type = [[A]]

var
` A type = c
x: A ` x: A = idc

f: A1 → A2
x: A ` e1 : A1 = g
x: A ` f(e1 ): A2 = g; [[f]]

eq
x: A1 ` e1 : A2 = g1
x: A1 ` e2 : A2 = g2
x: A1 ` e1 =A2 e2 ⇐⇒ g1 = g2

Table 1: Interpretation of Many Sorted Monadic Equational Language

Remark 2.2 Terms of (many sorted) monadic equational logic have exactly one free variable (the
one declared in the context) which occurs exactly once, and equations are between terms with the
same free variable.
An interpretation [[ ]] of the language in a category C is parametric in an interpretation of the
symbols in the signature and is defined by induction on the derivation of well-formedness for
(types,) terms and equations (see Table 1) according to the following general pattern:
• the interpretation [[A]] of a base type A is an object of C
• the interpretation [[f]] of an unary function f: A1 → A2 is a morphism from [[A1 ]] to [[A2 ]] in
C; similarly for the interpretation of a term x: A1 ` e: A2
• the interpretation of an assertion x: A ` φ (in this case just an equation) is either true or
false.

Remark 2.3 The interpretation of equations is standard. However, if one want to consider more
complex assertions, e.g. formulas of first order logic, then they should be interpreted by subobjects;
in particular equality = : A should be interpreted by the diagonal ∆[[A]] .

The formal consequence relation on the set of equations is generated by the inference rules for
equivalences ((refl), (simm) and (trans)), congruence and substitutivity (see Table 2). This formal
consequence relation is sound and complete w.r.t. interpretation of the language in categories, i.e.
an equation is formally derivable from a set of equational axioms if and only if all the interpretations
satisfying the axioms satisfy the equation. Soundness follows from the admissibility of the inference
rules in any interpretation, while completeness follows from the fact that any theory T (i.e. a set
of equations closed w.r.t. the inference rules) is the set of equations satisfied by the canonical
interpretation in the category F(T ), i.e. T viewed as a category.
Definition 2.4 Given a monadic equational theory T , the category F(T ) is defined as follows:
• objects are (base) types A,
• morphisms from A1 to A2 are equivalence classes [x: A1 ` e: A2 ]T of terms w.r.t. the equiv-
alence relation induced by the theory T , i.e.

(x: A1 ` e1 : A2 ) ≡ (x: A1 ` e2 : A2 ) ⇐⇒ (x: A1 ` e1 =A2 e2 ) ∈ T

7
x: A ` e: A1
refl
x: A ` e =A1 e

x: A ` e1 =A1 e2
symm
x: A ` e2 =A1 e1
x: A ` e1 =A1 e2 x: A ` e2 =A1 e3
trans
x: A ` e2 =A1 e3
x: A ` e1 =A1 e2
congr f: A1 → A2
x: A ` f(e1 ) =A2 f(e2 )
x: A ` e: A1 x: A1 ` φ
subst
x: A ` [e/x]φ

Table 2: Inference Rules of Many Sorted Monadic Equational Logic

• composition is substitution, i.e.

[x: A1 ` e1 : A2 ]T ; [x: A2 ` e2 : A3 ]T = [x: A1 ` [e1 /x]e2 : A3 ]T

• identity over A is [x: A ` x: A]T .


There is also a correspondence in the opposite direction, namely every category C (with additional
structure) can be viewed as a theory TC (i.e. the theory of C over the language for C), so that C and
F(TC ) are equivalent as categories (with additional structure). Actually, in the case of monadic
equational theories and categories, C and F(TC ) are isomorphic.
In the sequel we consider other equational theories. They can be viewed as categories in the
same way described above for monadic theories; moreover, these categories are equipped with
additional structure, depending on the specific nature of the theories under consideration.

2.2 The Simple metalanguage


We extend many sorted monadic equational logic to match categories equipped with a monad (or
equivalently a Kleisli triple). Although we consider only one monad, it is conceptually straightfor-
ward to have several monads at once.
The first step is to extend the language. This could be done in several ways without affecting
the correspondence between theories and monads, we choose a presentation inspired by Kleisli
triples, more specifically we introduce an unary type-constructor T and the two term-constructors,
[ ] and let, used informally in Section 1. The definition of signature is slightly modified, since the
domain and codomain of an unary function symbol f: τ1 → τ2 can be any type, not just base types
(the fact is that in many sorted monadic logic the only types are base types). An interpretation
[[ ]] of the language in a category C with a Kleisli triple (T, η, ∗ ) is parametric in an interpretation
of the symbols in the signature and is defined by induction on the derivation of well-formedness
for types, terms and equations (see Table 3). Finally we add to many sorted monadic equational
logic appropriate inference rules capturing axiomatically the properties of the new type- and term-
constructors after interpretation (see Table 4).
Proposition 2.5 Every theory T of the simple metalanguage, viewed as a category F(T ), is
equipped with a Kleisli triple (T, η, ∗ ):
• T (τ ) = T τ ,

• ητ = [x: τ `ml [x]T : T τ ]T ,


• ([x: τ1 `ml e: T τ2 ]T )∗ = [x0 : T τ1 `ml (letT x⇐x0 in e): T τ2 ]T .

8
RULE SYNTAX SEMANTICS
A
`ml A type = [[A]]

T
`ml τ type = c
`ml T τ type = Tc

var
`ml τ type = c
x: τ `ml x: τ = idc

f: τ1 → τ2
x: τ `ml e1 : τ1 = g
x: τ `ml f(e1 ): τ2 = g; [[f]]

[ ]T
x: τ `ml e: τ 0 = g
x: τ `ml [e]T : T τ 0 = g; η[[τ 0]]

let
x: τ `ml e1 : T τ1 = g1
x1 : τ1 `ml e2 : T τ2 = g2
x: τ `ml (letT x1 ⇐e1 in e2 ): T τ2 = g1 ; g2∗

eq
x: τ1 `ml e1 : τ2 = g1
x: τ1 `ml e2 : τ2 = g2
x: τ1 `ml e1 =τ2 e2 ⇐⇒ g 1 = g2

Table 3: Interpretation of the Simple Metalanguage

x: τ `ml e1 =τ1 e2
[ ].ξ
x: τ `ml [e1 ]T =T τ1 [e2 ]T

x: τ `ml e1 =T τ1 e2 x0 : τ1 `ml e01 =T τ2 e02


let.ξ
x: τ `ml (letT x ⇐e1 in e1 ) =T τ2 (letT x0 ⇐e2 in e02 )
0 0

x: τ `ml e1 : T τ1 x1 : τ1 `ml e2 : T τ2 x2 : τ2 `ml e3 : T τ3


ass
x: τ `ml (letT x2 ⇐(letT x1 ⇐e1 in e2 ) in e3 ) =T τ3 (letT x1 ⇐e1 in (letT x2 ⇐e2 in e3 ))
x: τ `ml e1 : τ1 x1 : τ1 `ml e2 : T τ2
T.β
x: τ `ml (letT x1 ⇐[e1 ]T in e2 ) =T τ2 [e1 /x1 ]e2
x: τ `ml e1 : T τ1
T.η
x: τ `ml (letT x1 ⇐e1 in [x1 ]T ) =T τ1 e1

Table 4: Inference Rules of the Simple Metalanguage

9
Proof We have to show that the three axioms for Kleisli triples are valid. The validity of each
axiom amounts to the derivability of an equation. For instance, ητ∗ = idT τ is valid provided
x0 : T τ `ml (letT x⇐x0 in [x]T ) =T τ x0 is derivable, indeed it follows from (T.η). The reader can
check that the equations corresponding to the axioms ητ ; f ∗ = f and f ∗ ; g ∗ = (f ; g ∗ )∗ follow from
(T.β) and (ass) respectively.

2.3 A Simple Programming Language


In this section we take a programming language perspective by introducing a simple programming
language, whose terms are interpreted by morphisms of the Kleisli category for a monad. Unlike
the metalanguage of Section 2.2, the programming language does not allow to consider more than
one monad at once.
The interpretation in the Kleisli category can also be given indirectly via a translation in the
simple metalanguage of Section 2.2 mapping programs of type τ into terms of type T τ . If we try to
establish a correspondence between equational theories of the simple programming language and
categories with one monad (as done for the metalanguage), then we run into problems, since there
is no way (in general) to recover C from CT . What we do instead is to establish a correspondence
between theories with equivalence and existence assertions and categories with one monad satisfying
the mono requirement, i.e. ηA is mono for every object A (note that ηT A is always a mono, because
ηT A ; µA = idT A ). The intended extension of the existence predicate on computations of type A is
the set of computations of the form [v] for some value v of type A, so it is natural to require η A to
be mono and interpret the existence predicate as the subobject corresponding to η A .
The simple programming language is parametric in a signature, i.e. a set of base types and
unary command symbols. To stress that the interpretation is in CT rather than C, we use unary
command symbols p: τ1 * τ2 (instead of unary function symbols f: τ1 → τ2 ), we call x: τ1 `pl e: τ2
a program (instead of a term) and write ≡τ (instead of =T τ ) as equality of computations
of type τ . Given a category C with a Kleisli triple (T, η, ∗ ) satisfying the mono requirement, an
interpretation [[ ]] of the programming language is parametric in an interpretation of the symbols
in the signature and is defined by induction on the derivation of well-formedness for types, terms
and equations (see Table 5) following the same pattern given for many sorted monadic equational
logic, but with C replaced by CT , namely:
• the interpretation [[τ ]] of a (base) type τ is an object of CT , or equivalently an object of C
• the interpretation [[p]] of an unary command p: τ1 * τ2 is a morphism from [[τ1 ]] to [[τ2 ]] in
CT , or equivalently a morphism from [[τ1 ]] to T [[τ2 ]] in C; similarly for the interpretation of a
program x: τ1 `pl e: τ2
• the interpretation of an equivalence or existence assertion is a truth value.

Remark 2.6 The let-constructor play a fundamental role: operationally it corresponds to sequen-
tial evaluation of programs and categorically it corresponds to composition in the Kleisli category
CT (while substitution corresponds to composition in C). In the λv -calculus (let x⇐e in e0 ) is treated
as syntactic sugar for (λx.e0 )e. We think that this is not the right way to proceed, because it ex-
plains the let-constructor (i.e. sequential evaluation of programs) in terms of constructors available
only in functional languages. On the other hand, (let x⇐e in e0 ) cannot be treated as syntactic
sugar for [e/x]e0 (involving only the more primitive substitution) without collapsing computations
to values.
The existence predicate e ↓ is inspired by the logic of partial terms/elements (see [Fou77, Sco79,
Mog88]); however, there are important differences, e.g.
x: τ `pl p(e) ↓τ2
strict p: τ1 * τ2
x: τ `pl e ↓τ1
is admissible for partial computations, but not in general. For certain notions of computation there
may be other predicates on computations worth considering, or the existence predicate itself may
have a more specialised meaning, for instance:

10
RULE SYNTAX SEMANTICS
A
`pl A type = [[A]]

T
`pl τ type = c
`pl T τ type = Tc

var
`pl τ type = c
x: τ `pl x: τ = ηc

p: τ1 * τ2
x: τ `pl e1 : τ1 = g

x: τ `pl p(e1 ): τ2 = g; [[p]]

[]
x: τ `pl e: τ 0 = g
x: τ `pl [e]: T τ 0 = g; ηT [[τ 0 ]]

µ
x: τ `pl e: T τ 0 = g
x: τ `pl µ(e): τ 0 = g; µ[[τ 0]]

let
x: τ `pl e1 : τ1 = g1
x1 : τ1 `pl e2 : τ2 = g2
x: τ `pl (let x1 ⇐e1 in e2 ): τ2 = g 1 ; g2 ∗

eq
x: τ1 `pl e1 : τ2 = g1
x: τ1 `pl e2 : τ2 = g2
x: τ1 `pl e1 ≡τ2 e2 ⇐⇒ g1 = g2

ex
x: τ1 `pl e: τ2 = g
x: τ1 `pl e ↓τ2 ⇐⇒ ∃!h: [[τ1 ]] → [[τ2 ]] s.t. g = h; η[[τ2 ]]

Table 5: Interpretation of the Simple Programming Language

11
x: τ `pl e: τ1
refl
x: τ `pl e ≡τ1 e

x: τ `pl e1 ≡τ1 e2
symm
x: τ `pl e2 ≡τ1 e1
x: τ `pl e1 ≡τ1 e2 x: τ `pl e2 ≡τ1 e3
trans
x: τ `pl e2 ≡τ1 e3
x: τ `pl e1 ≡τ1 e2
congr p: τ1 * τ2
x: τ `pl p(e1 ) ≡τ2 p(e2 )
`pl τ type
E.x
x: τ `pl x ↓τ
x: τ `pl e1 ≡τ1 e2 x: τ `pl e1 ↓τ1
E.congr
x: τ `pl e2 ↓τ1
x: τ `pl e ↓τ1 x: τ1 `pl φ
subst
x: τ `pl [e/x]φ

Table 6: General Inference Rules

• a partial computation exists iff it terminates;


• a non-deterministic computation exists iff it gives exactly one result;
• a computation with side-effects exists iff it does not change the store.

Programs can be translated into terms of the metalanguage via a translation ◦ s.t. for every well-
formed program x: τ1 `pl e: τ2 the term x: τ1 `ml e◦ : T τ2 is well-formed and [[x: τ1 `pl e: τ2 ]] =
[[x: τ1 `ml e◦ : T τ2 ]] (the proof of these properties is left to the reader).
Definition 2.7 Given a signature Σ for the programming language, let Σ◦ be the signature for the
metalanguage with the same base types and a function p: τ1 → T τ2 for each command p: τ1 * τ2
in Σ. The translation ◦ from programs over Σ to terms over Σ◦ is defined by induction on raw
programs:

• x◦ ≡ [x]T

• (let x1 ⇐e1 in e2 )◦ ≡ (letT x1 ⇐e1 ◦ in e2 ◦ )
◦ ∆
• p(e1 ) ≡ (letT x⇐e1 ◦ in p(x))

• [e]◦ ≡ [e◦ ]T
◦ ∆
• µ(e) ≡ (letT x⇐e◦ in x)
The inference rules for deriving equivalence and existence assertions of the simple programming
language can be partitioned as follows:
• general rules (see Table 6) for terms denoting computations, but with variables ranging over
values; these rules replace those of Table 2 for many sorted monadic equational logic
• rules capturing the properties of type- and term-constructors (see Table 7) after interpretation
of the programming language; these rules replace the additional rules for the metalanguage
given in Table 4.

12
x: τ `pl e1 ≡τ1 e2
[ ].ξ
x: τ `pl [e1 ] ≡T τ1 [e2 ]

x: τ `pl e1 : τ1
E.[ ]
x: τ `pl [e1 ] ↓T τ1
x: τ `pl e1 ≡T τ1 e2
µ.ξ
x: τ `pl µ(e1 ) ≡τ1 µ(e2 )
x: τ `pl e1 : τ1
µ.β
x: τ ` µ([e1 ]) ≡τ1 e1
x: τ `pl e1 ↓T τ1
µ.η
x: τ ` [µ(e1 )] ≡T τ1 e1
x: τ `pl e1 ≡τ1 e2 x0 : τ1 `pl e01 ≡τ2 e02
let.ξ
x: τ `pl (let x ⇐e1 in e1 ) ≡τ2 (let x0 ⇐e2 in e02 )
0 0

x: τ `pl e1 : τ1
unit
x: τ `pl (let x1 ⇐e1 in x1 ) ≡τ1 e1
x: τ `pl e1 : τ1 x1 : τ1 `pl e2 : τ2 x2 : τ2 `pl e3 : τ3
ass
x: τ `pl (let x2 ⇐(let x1 ⇐e1 in e2 ) in e3 ) ≡τ3 (let x1 ⇐e1 in (let x2 ⇐e2 in e3 ))
x: τ `pl e1 ↓τ1 x1 : τ1 `pl e2 : τ2
let.β
x: τ `pl (let x1 ⇐e1 in e2 ) ≡τ2 [e1 /x1 ]e2
x: τ `pl e1 : τ1
let.p p: τ1 * τ2
x: τ `pl p(e1 ) ≡τ1 (let x1 ⇐e1 in p(x1 ))

Table 7: Inference Rules of the Simple Programming Language

13
Soundness and completeness of the formal consequence relation w.r.t. interpretation of the
simple programming language in categories with a monad satisfying the mono requirement is
established in the usual way (see Section 2.1). The only step which differs is how to view a theory
T of the simple programming language (i.e. a set of equivalence and existence assertions closed
w.r.t. the inference rules) as a category F(T ) with the required structure.
Definition 2.8 Given a theory T of the simple programming language, the category F(T ) is de-
fined as follows:
• objects are types τ ,
• morphisms from τ1 to τ2 are equivalence classes [x: τ1 `pl e: τ2 ]T of existing programs x: τ1 `pl
e ↓τ2 ∈ T w.r.t. the equivalence relation induced by the theory T , i.e.

(x: τ1 `pl e1 : τ2 ) ≡ (x: τ1 `pl e2 : τ2 ) ⇐⇒ (x: τ1 `pl e1 ≡τ2 e2 ) ∈ T

• composition is substitution, i.e.

[x: τ1 `pl e1 : τ2 ]T ; [x: τ2 `pl e2 : τ3 ]T = [x: τ1 `pl [e1 /x]e2 : τ3 ]T

• identity over τ is [x: τ `pl x: τ ]T .


In order for composition in F(T ) to be well-defined, it is essential to consider only equivalence
classes of existing programs, since the simple programming language satisfies only a restricted form
of substitutivity.
Proposition 2.9 Every theory T of the simple programming language, viewed as a category F(T ),
is equipped with a Kleisli triple (T, η, ∗ ) satisfying the mono requirement:
• T (τ ) = T τ ,
• ητ = [x: τ `pl [x]: T τ ]T ,
• ([x: τ1 `pl e: T τ2 ]T )∗ = [x0 : T τ1 `pl [(let x⇐µ(x0 ) in µ(e))]: T τ2 ]T .

Proof We have to show that the three axioms for Kleisli triples are valid. The validity of each axiom
amounts to the derivability of an existence and equivalence assertion. For instance, η τ∗ = idT τ is
valid provided x0 : T τ `pl x0 ↓T τ and x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ x0 are derivable. The
existence assertion follows immediately from (E.x), while the equivalence is derived as follows:
• x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ [(let x⇐µ(x0 ) in x)]
by (µ.β), (refl) and (let.ξ)
• x0 : T τ `pl [(let x⇐µ(x0 ) in x)] ≡T τ [µ(x0 )] by (unit) and (let.ξ)
• x0 : T τ `pl [µ(x0 )] ≡T τ x0 by (E.x) and (µ.η)
• x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ x0 by (trans).
We leave to the reader the derivation of the existence and equivalence assertions corresponding
to the other axioms for Kleisli triples, and prove instead the mono requirement i.e. that f 1 ; ητ =
f2 ; ητ implies f1 = f2 . Let fi be [x: τ 0 `pl ei : τ ]T , we have to derive x: τ 0 `pl e1 ≡τ e2 from
x: τ 0 `pl [e1 ] ≡T τ [e2 ] (and x: τ 0 `pl ei ↓τ ) :
• x: τ 0 `pl µ([e1 ]) ≡τ µ([e2 ]) by the first assumption and (µ.ξ)
• x: τ 0 `pl µ([ei ]) ≡τ ei by (µ.β)
• x: τ 0 `pl e1 ≡τ e2 by (trans).

14
Remark 2.10 One can show that the canonical interpretation of a program x: τ1 `pl e: τ2 in the
category F(T ) is the morphism [x: τ1 `pl [e]: T τ2 ]T . This interpretation establishes a one-one
correspondence between morphisms from τ1 to T τ2 in the category F(T ), i.e. morphisms from τ1
to τ2 in the Kleisli category, and equivalence classes of programs x: τ1 `pl e: τ2 (not necessarely
existing). The inverse correspondence maps a morphism [x: τ1 `pl e0 : T τ2 ]T to the equivalence class
of x: τ1 `pl µ(e0 ): τ2 . Indeed, x: τ1 `pl e ≡τ2 µ([e]) and x: τ1 `pl e0 ≡τ2 [µ(e0 )] are derivable provided
x: τ1 `pl e0 ↓T τ2 .

3 Extending the simple metalanguage


So far we have considered only languages and formal systems for monadic terms x: τ 1 ` e: τ2 , having
exactly one free variable (occurring once). In this section we want to extend these languages (and
formal systems) by allowing algebraic terms x1 : τ1 , . . . , xn : τn ` e: τ , having a finite number of free
variables (occurring finitely many times) and investigate how this affects the interpretation and
the structure on theories viewed as categories. For convenience in relating theories and categories
with additional structure, we also allow types to be closed w.r.t. finite products4 , in particular
a typing context x1 : τ1 , . . . , xn : τn can be identified with a type. In general, the interpretation of
an algebraic term x1 : τ1 , . . . , xn : τn ` e: τ in a category (with finite products) is a morphism from
([[τ1 ]] × . . . × [[τn ]]) to [[τ ]].
The extension of monadic equational logic to algebraic terms is equational logic, whose theories
correspond to categories with finite products. We will introduce the metalanguage, i.e. the ex-
tension of the simple metalanguage described in Section 2.2 to algebraic terms, and show that its
theories correspond to categories with finite products and a strong monad , i.e. a monad and a natu-
ral transformation tA,B : A×T B → T (A×B). Intuitively tA,B transforms a pair value-computation
into a computation of a pair of values, as follows
tA,B
a: A, c: T B 7−→ (let y⇐c in [ha, yi]): T (A × B)

Remark 3.1 To understand why a category with finite products and a monad is not enough to
interpret the metalanguage (and where the natural transformation t is needed), one has to look at
the interpretation of a let-expression
Γ `ml e1 : T τ1 Γ, x: τ1 `ml e2 : T τ2
let
Γ `ml (letT x⇐e1 in e2 ): T τ2

where Γ is a typing context. Let g1 : c → T c1 and g2 : c × c1 → T c2 be the interpretations of Γ `ml


e1 : T τ1 and Γ, x: τ1 `ml e2 : T τ2 respectively, where c is the interpretation of the typing context Γ
and ci is the interpretation of the type τi , then the interpretation of Γ `ml (letT x⇐e1 in e2 ): T τ2
ought to be a morphism g: c → T c2 . If (T, η, µ) is the identity monad , i.e. T is the identity
functor over C and η and µ are the identity natural transformation over T , then computations get
identified with values. In this case (letT x⇐e1 in e2 ) can be replaced by [e1 /x]e2 , so g is simply
hidc , g1 i; g2 : c → c2 . In the general case Table 3 suggests that ; above is indeed composition in
the Kleisli category, therefore hidc , g1 i; g2 should be replaced by hidc , g1 i; g2 ∗ . But in hidc , g1 i; g2 ∗
there is a type mismatch, since the codomain of hidc , g1 i is c × T c1 , while the domain of T g1 is
T (c × c1 ). The natural transformation tA,B : A × T B → T (A × B) mediates between these two
objects, so that g can be defined as hidc , g1 i; tc,c1 ; g2 ∗ .

4 If the metalanguage does not have finite products, we conjecture that its theories would no longer correspond to

categories with finite products and a strong monad (even by taking as objects contexts and/or the Karoubi envelope,
used in [Sco80] to associate a cartesian closed category to an untyped λ-theory), but instead to multicategories with
a Kleisli triple. We felt the greater generality (of not having products in the metalanguage) was not worth the
mathematical complications.

15
Definition 3.2 A strong monad over a category C with (explicitly given) finite products is a
monad (T, η, µ) together with a natural transformation tA,B from A × T B to T (A × B) s.t.
TA
∧ @
I
@
rT A @
T rA @
@
t1,A @
1 × TA > T (1 × A)

tA×B,C
(A × B) × T C > T ((A × B) × C)
@
@
αA,B,T C @
T αA,B,C @
@
∨ idA × tB,C tA,B×CR
@
A × (B × T C) > A × T (B × C) > T (A × (B × C))
A×B
@
@
idA × ηB @
ηA×B @
@
∨ tA,B R@
A × TB > T (A × B)
∧ @
I
@
idA × µB @
µA×B @
@
tA,T B T tA,B @
2
A×T B > T (A × T B) > T 2 (A × B)
where r and α are the natural isomorphisms

rA : (1 × A) → A , αA,B,C : (A × B) × C → A × (B × C)

Remark 3.3 The diagrams above are taken from [Koc72], where a characterisation of strong mon-
ads is given in terms of C-enriched categories (see [Kel82]). Kock fixes a commutative monoidal
closed category C (in particular a cartesian closed category), and in this setup he establishes a
one-one correspondence between strengths stA,B : B A → (T B)T A and tensorial strengths tA,B : A ⊗
T B → T (A ⊗ B) for a endofunctor T over C (see Theorem 1.3 in [Koc72]). Intuitively a strength
stA,B internalises the action of T on morphisms from A to B, and more precisely it makes (T, st)
a C-enriched endofunctor on C enriched over itself (i.e. the hom-object C(A, B) is B A ). In this
setting the diagrams of Definition 3.2 have the following meaning:
• the first two diagrams are (1.7) and (1.8) in [Koc72], saying that t is a tensorial strength of
T . So T can be made into a C-enriched endofunctor.
. .
• the last two diagrams say that η: IdC → T and µ: T 2 → T are C-enriched natural transfor-
mations, where IdC , T and T 2 are enriched in the obvious way (see Remark 1.4 in [Koc72]).
There is another purely categorical characterisation of strong monads, suggested to us by G.
Plotkin, in terms of C-indexed categories (see [JP78]). Both characterisations are instances of a
general methodological principle for studying programming languages (or logics) categorically (see
[Mog89b]):
when studying a complex language the 2-category Cat of small categories, functors and
natural transformations may not be adequate; however, one may replace Cat with a
different 2-category, whose objects captures better some fundamental structure of the
language, while less fundamental structure can be modelled by 2-categorical concepts.

16
Monads are a 2-categorical concept, so we expect notions of computations for a complex language
to be modelled by monads in a suitable 2-category.
The first characterisation takes a commutative monoidal closed structure on C (used in [Laf88,
See87] to model a fragment of linear logic), so that C can be enriched over itself. Then a strong
monad over a cartesian closed category C is just a monad over C in the 2-category of C-enriched
categories.
The second characterisation takes a class D of display maps over C (used in [HP87] to model
dependent types), and defines a C-indexed category C/D . Then a strong monad over a category
C with finite products amounts to a monad over C/D in the 2-category of C-indexed categories,
where D is the class of first projections (corresponding to constant type dependency).

In general the natural transformation t has to be given explicitly as part of the additional
structure. However, t is uniquely determined (but it may not exists) by T and the cartesian
structure on C, when C has enough points.
Proposition 3.4 (Uniqueness) If (T, η, µ) is a monad over a category C with finite products and
enough points (i.e. ∀h: 1 → A.h; f = h; g implies f = g for any f, g: A → B), then (T, η, µ, t) is
a strong monad over C if and only if tA,B is the unique family of morphisms s.t. for all points
a: 1 → A and b: 1 → T B
ha, bi; tA,B = b; T (h!B ; a, idB i)
where !B : B → 1 is the unique morphism from B to the terminal object.

Proof Note that there is at most one tA,B s.t. ha, bi; tA,B = b; T (h!B ; a, idB i) for all points a: 1 → A
and b: 1 → T B, because C has enough points.
First we show that if (T, η, µ, t) is a strong monad, then tA,B satisfies the equation above. By
naturality of t and by the first diagram in Definition 3.2 the following diagram commutes

ha, bi tA,B
1 > A × TB > T (A × B)
@ ∧ ∧
@ hid , bi
@ 1
a × idT B T (a × idB )
@
@
R
@ t1,B
1 × TB > T (1 × B)
@
@ r
@ TB T rB
@
@
R
@ ∨
TB
Since rB is an isomorphism (with inverse h!B , idB i), then the two composite morphisms ha, bi; tA,B
−1
and hid1 , bi; rT B ; T (rB ); T (a×idB ) from 1 to T (A×B) must coincide. But the second composition
can be rewritten as b; T (h!B ; a, idB i).
Second we have to show that if t is the unique family of morphisms satisfying the equation
above, then (T, η, µ, t) is a strong monad. This amount to prove that t is a natural transformation
and that the three diagrams in Definition 3.2 commute. The proof is a tedious diagram chasing,
which relies on C having enough points. For instance, to prove that t1,A ; T rA = rT A it is enough
to show that hid1 , ai; t1,A ; T rA = hid1 , ai; rT A for all points a: 1 → A.

Example 3.5 We go through the monads given in Example 1.4 and show that they have a tensorial
strength.
• partiality T A = A⊥ (= A + {⊥})
tA,B (a, ⊥) = ⊥ and tA,B (a, b) = ha, bi (when b ∈ B)
• nondeterminism T A = Pf in (A)
tA,B (a, c) = {ha, bi|b ∈ c}

17
S
• side-effects T A = (A × S)
tA,B (a, c) = (λs: S.(let hb, s0 i = c(s) in hha, bi, s0 i))
• exceptions T A = (A + E)
tA,B (a, inr(e)) = inr(e) (when e ∈ E) and
tA,B (a, inl(b)) = inl(ha, bi) (when b ∈ B)
A
• continuations T A = R(R )
tA,B (a, c) = (λk: RA×B .c(λb: B.k(ha, bi)))
• interactive input T A = (µγ.A + γ U )
tA,B (a, c) is the tree obtained by replacing leaves of c labelled by b with the leaf labelled by
ha, bi
• interactive output T A = (µγ.A + (U × γ))
tA,B (a, hs, bi) = hs, ha, bii.

Remark 3.6 The tensorial strength t induces a natural transformation ψA,B from T A × T B to
T (A × B), namely
ψA,B = cT A,T B ; tT B,A ; (cT B,A ; tA,B )∗
where c is the natural isomorphism cA,B : A × B → B × A.
The morphism ψA,B has the correct domain and codomain to interpret the pairing of a com-
putation of type A with one of type B, obtained by first evaluating the first argument and then
the second, namely
ψA,B
c1 : T A, c2 : T B 7−→ (let x⇐c1 in (let y⇐c2 in [hx, yi])): T (A × B)

There is also a dual notion of pairing, ψ̃A,B = cT A,T B ; ψB,A ; T cB,A (see [Koc72]), which amounts
to first evaluating the second argument and then the first.

3.1 Interpretation and formal system


We are now in a position to give the metalanguage for algebraic terms, its interpretation and
inference rules.
Definition 3.7 (metalanguage) An interpretation [[ ]] of the metalanguage in a category C with
terminal object !A : A → 1, binary products πiA1 ,A2 : A1 × A2 → Ai and a strong monad (T, η, µ, t)
is parametric in an interpretation of the symbols in the signature and is defined by induction on
the derivation of well-formedness for types (see Table 8), terms and equations (see Table 9).
Finite products πiA1 ,...,An : A1 ×. . .×An → Ai used to interpret contexts and variables are defined
by induction on n:

0 A1 × . . . × A 0 = 1

n + 1 A1 × . . . × An+1 = (A1 × . . . × An ) × An+1
1A ,...,An+1 (A1 ×...×An ),An+1
– πn+1 = π2
A1 ,...,An+1 (A1 ×...×An ),An+1
– πi = π1 ; πiA1 ,...,An
The inference rules for the metalanguage (see Table 10) are divided into three groups:
• general rules for many sorted equational logic
• rules for finite products
• rules for T

18
RULE SYNTAX SEMANTICS
A
`ml A type = [[A]]

T
`ml τ type = c
`ml T τ type = Tc

1
`ml 1 type = 1

×
`ml τ1 type = c1
`ml τ2 type = c2
`ml τ1 × τ2 type = c 1 × c2


`ml τi type (1 ≤ i ≤ n) = ci
x1 : τ 1 , . . . , x n : τ n ` = c1 × . . . × c n

Table 8: Interpretation of types in the Metalanguage

Proposition 3.8 Every theory T of the metalanguage, viewed as a category F(T ), is equipped
with finite products and a strong monad whose tensorial strength is

tτ1 ,τ2 = [x: τ1 × T τ2 `ml (letT x2 ⇐π2 x in [hπ1 x, x2 i]T ): T (τ1 × τ2 )]T

Proof Similar to that of Proposition 2.5

Once we have a metalanguage for algebraic terms it is straightforward to add data-types charac-
terised by universal properties and extend the categorical semantics accordingly 5. For instance, if
we want to have function spaces, then we simply require the category C (where the metalanguage
is interpreted) to have exponentials B A and add the inference rules for the simply typed λ-calculus
(see Table 11) to those for the metalanguage. From a programming language perspective the situ-
ation is more delicate. For instance, the semantics of functional types should reflect the choice of
calling mechanism 6 :
• in call-by-value a procedure of type A → B expects a value of type A and computes a result
A
of type B, so the interpretation of A → B is (T B) ;
• in call-by-name a procedure of type A → B expects a computation of type A, which is
evaluated only when needed, and computes a result of type B, so the interpretation of
A → B is (T B)T A .
In both cases the only exponentials needed to interpret the functional types of a programming
A
language are of the form (T B) . By analogy with partial cartesian closed categories (pccc), where
only p-exponentials are required to exists (see [Mog86, Ros86]), we adopt the following definition
of λc -model:

5 The next difficult step in extending the metalanguage is the combination of dependent types and computations,

which is currently under investigation.


6 call-by-need does not have a simple categorical semantics, since the environment in which an expression is

evaluated may itself undergo evaluation.

19
RULE SYNTAX SEMANTICS
vari
`ml τi type (1 ≤ i ≤ n) = ci
x1 : τ 1 , . . . , x n : τ n ` x i : τ i = πic1 ,...,cn


Γ ` ∗: 1 = ![[Γ]]

hi
Γ ` e 1 : τ1 = g1
Γ ` e 2 : τ2 = g2
Γ ` he1 , e2 i: τ1 × τ2 = hg1 , g2 i

πi
Γ ` e: τ1 × τ2 = g
[[τ ]],[[τ ]]
Γ ` πi (e): τ1 = g; πi 1 2

f: τ1 → τ2
Γ `ml e1 : τ1 = g
Γ `ml f(e1 ): τ2 = g; [[f]]

[ ]T
Γ `ml e: τ = g
Γ `ml [e]T : T τ = g; η[[τ ]]

let
Γ `ml e1 : T τ1 = g1
Γ, x: τ1 `ml e2 : T τ2 = g2
Γ `ml (letT x⇐e1 in e2 ): T τ2 = hid[[Γ]] , g1 i; t[[Γ]],[[τ1]] ; g2∗

eq
Γ `ml e1 : τ = g1
Γ `ml e2 : τ = g2
Γ `ml e1 =τ e2 ⇐⇒ g1 = g2

Table 9: Interpretation of terms in the Metalanguage

20
Γ ` e: τ
refl
Γ ` e =τ e
Γ ` e 1 =τ e2
symm
Γ ` e 2 =τ e1
Γ ` e 1 =τ e2 Γ ` e 2 =τ e3
trans
Γ ` e 2 =τ e3
Γ ` e 1 = τ1 e 2
congr f: τ1 → τ2
Γ ` f(e1 ) =τ2 f(e2 )
Γ ` e: τ Γ, x: τ ` φ
subst
Γ ` [e/x]φ
Inference Rules of Many Sorted Equational Logic
1.η Γ ` ∗ =1 x
Γ ` e1 =τ1 e01 Γ ` e2 =τ2 e02
hi.ξ
Γ ` he1 , e2 i =τ1 ×τ2 he01 , e02 i
Γ ` e 1 : τ1 Γ ` e 2 : τ2
×.β
Γ ` πi (he1 , e2 i) =τi ei
Γ ` e: τ1 × τ2
×.η
Γ ` hπ1 (e), π2 (e)i =τ1 ×τ2 e
rules for product types
Γ `ml e1 =τ e2
[ ].ξ
Γ `ml [e1 ]T =T τ [e2 ]T
Γ `ml e1 =T τ1 e2 Γ, x: τ1 `ml e01 =T τ2 e02
let.ξ
Γ `ml (letT x⇐e1 in e01 ) =T τ2 (letT x⇐e2 in e02 )
Γ `ml e1 : T τ1 Γ, x1 : τ1 `ml e2 : T τ2 Γ, x2 : τ2 `ml e3 : T τ3
ass
Γ `ml (letT x2 ⇐(letT x1 ⇐e1 in e2 ) in e3 ) =T τ3 (letT x1 ⇐e1 in (letT x2 ⇐e2 in e3 ))
Γ `ml e1 : τ1 Γ, x1 : τ1 `ml e2 : T τ2
T.β
Γ `ml (letT x1 ⇐[e1 ]T in e2 ) =T τ2 [e1 /x1 ]e2
Γ `ml e1 : T τ1
T.η
Γ `ml (letT x1 ⇐e1 in [x1 ]T ) =T τ1 e1

Table 10: Inference Rules of the Metalanguage

Γ ` e1 =τ1 e01 Γ ` e =τ1 →τ2 e0


app.ξ
Γ ` ee1 =τ2 e0 e01

Γ, x: τ1 ` e1 =τ2 e2
λ.ξ
Γ ` (λx: τ1 .e1 ) =τ1 →τ2 (λx: τ1 .e2 )
Γ ` e 1 : τ1 Γ, x: τ1 ` e2 : τ2
→ .β
Γ ` (λx: τ1 .e2 )e1 =τ2 [e1 /x]e2
Γ ` e: τ1 → τ2
→ .η x 6∈ DV(Γ)
Γ ` (λx: τ1 .ex) =τ1 →τ2 e

Table 11: rules for function spaces

21
Definition 3.9 A λc -model is a category C with finite products, a strong monad (T, η, µ, t) satis-
A
fying the mono requirement (i.e. ηA mono for every A ∈ C) and T -exponential (T B) for every
A, B ∈ C.

Remark 3.10 The definition of λc -model generalises that of pccc, in the sense that every pccc
can be viewed as a λc -model. By analogy with p-exponentials, a T -exponential can be defined by
giving an isomorphism CT (C × A, B) ∼
A
= C(C, (T B) ) natural in C ∈ C. We refer to [Mog89c] for
the interpretation of a call-by-value programming language in a λc -model and the corresponding
formal system, the λc -calculus.

4 Strong monads over a topos


In this section we show that, as far as monads or strong monads are concerned, we can assume
w.l.o.g. that they are over a topos (see Theorem 4.9). The proof of Theorem 4.9 involves non-
elementary notions from Category Theory, and we postpone it after discussing some applications,
with particular emphasis on further extensions of the metalanguage and on conservative extension
results.
Let us take as formal system for toposes the type theory described in [LS86], this is a many
sorted intuitionistic higher order logic with equality and with a set of types satisfying the following
closure properties7 :
• the terminal object 1, the natural number object N and the subobject classifier Ω are types
• if A is a type, then the power object P A is a type
• if A and B are types, then the binary product A × B and the function space A → B are
types
• if A is a type and φ: A → Ω is a predicate, then {x ∈ A|φ(x)} is a type.

Notation 4.1 We introduce some notational conventions for formal systems:


• MLT is the metalanguage for algebraic terms, whose set of types is closed under terminal
object, binary products and T A;
• λMLT is the extension of MLT with function spaces A → B (interpreted as exponentials);
• HMLT is the type theory described above (see [LS86]) extended with objects of computations
T A;
• PL is the programming language for algebraic terms (see [Mog89c]);
• λc PL is the extension of PL with function spaces A * B (interpreted as T -exponentials),
called λc -calculus in [Mog89c].

Definition 4.2 We say that a formal system (L2 , `2 ), where `2 ⊆ P(L2 ) × L2 is a formal conse-
quence relation8 over L2 , is a conservative extension of (L1 , `1 ) provided L1 ⊆ L2 and `1 is
the restriction of `2 to P(L1 ) × L1 .

Theorem 4.3 HMLT is a conservative extension of MLT and λMLT . In particular λMLT is a
conservative extension of MLT .

7 Lambek and Scott do not require closure under function spaces and subsets {x ∈ A|φ(x)}.
8 For
instance, in the case of MLT the elements of L are well-formed equality judgements Γ `ml e1 =τ e2 and
P ` C iff there exists a derivation of C, where all assumptions are in P .

22
Proof The first result follows from Theorem 4.9, which implies that for every model C of ML T
the Yoneda embedding maps the interpretation of an MLT -term in C to its interpretation in C, ˆ
and the faithfulness of the Yoneda embedding, which implies that two MLT -terms have the same
ˆ The second result follows, because
interpretation in C iff they have the same interpretation in C.
the Yoneda embedding preserves function spaces. The third conservative extension result follows
immediately from the first two.

The above result means that we can think of computations naively in terms of sets and func-
tions, provided we treat them intuitionistically, and can use the full apparatus of higher-order
(intuitionistic) logic instead of the less expressive many sorted equational logic.
Before giving a conservative extension result for the programming language, we have to express
the mono requirement, equivalence and existence in HMLT . The idea is to extend the translation
from PL-terms to MLT -terms given in Definition 2.7 and exploit the increased expressiveness of
HMLT over MLT to axiomatise the mono requirement and translate existence and equivalence
assertions (see Remark 2.1):
• the mono requirement for τ , i.e. ητ is mono, is axiomatised by

mono.τ (∀x, y: τ.[x]T =T τ [y]T → x =τ y)

• the equalising requirement for τ , i.e. ητ is the equaliser of T (ητ ) and ηT τ , is axiomatised
by (mono.τ ) and the axiom

eqls.τ (∀x: T τ.[x]T =T 2 τ (letT y⇐x in [[y]T ]T ) → (∃!y: τ.x =T τ [y]T ))


• the translation is extended to assertions and functional types as follows:
◦ ∆
– (e1 ≡τ e2 ) ≡ e1 ◦ =T τ e2 ◦
◦ ∆
– (e1 ↓τ ) ≡ (∃!x: τ.e1 ◦ =T τ [x]T )

– (τ1 * τ2 )◦ ≡ τ1 ◦ → T τ2 ◦

Theorem 4.4 HMLT +{(mono.τ )| τ type of PL} (i.e. τ is built using only base types, 1, T A, and
A×B) is a conservative extension of PL (after translation). Similarly, HML T +{(mono.τ )| τ type of λc PL}
(i.e. τ is built using only base types, 1, T A, A × B and A → B) is a conservative extension of
λc PL (after translation).

Proof The proof proceeds as in the previous theorem. The only additional step is to show that for
ˆ under the assumption that C satisfies
every type τ of PL (or λc PL) the axiom (mono.τ ) holds in C,
the mono requirement. Let c be the interpretation of τ in C (therefore Yc is the interpretation of
ˆ then the axiom (mono.τ ) holds in Cˆ provided η̂Yc is a mono. ηc is mono (by the mono
τ in C),
requirement), so η̂Yc = Y(ηc ) is mono (as Y preserves monos).

In the theorem above only types from the programming language have to satisfy the mono require-
ment. Indeed, HMLT + {(mono.τ )| τ type of HMLT } is not a conservative extension of PL (or
λc PL).
Lemma 4.5 If (T, η, µ) is a monad over a topos C satisfying the mono requirement, then it satisfies
also the equalising requirement.

Proof See Lemma 6 on page 110 of [BW85].

In other words, for any type τ the axiom (eqls.τ ) is derivable in HMLT from the set of axioms
{(mono.τ )| τ type of HMLT }. In general, when C is not a topos, the mono requirement does not
entail the equalising requirement; one can easily define strong monads (over an Heyting algebra)
that satisfy the mono but not the equalising requirement (just take T (A) = A ∨ B, for some
element B 6= ⊥ of the Heyting algebra). In terms of formal consequence relation this means

23
that in HMLT + mono requirement the existence assertion Γ `pl e ↓τ is derivable from Γ `pl
[e] ≡T τ (let x⇐e in [x]), while such derivation is not possible in λc PL. We do not know whether
HMLT + equalising requirement is a conservative extension of PL + equalising requirement, or
whether λc PL is a conservative extension of PL.
A language which combines computations and higher order logic, like HML T , seems to be the
ideal framework for program logics that go beyond proving equivalence of programs, like Hoare’s
logic for partial correctness of imperative languages. In HML T (as well as MLT and PL) one can
describe a programming language by introducing additional constant and axioms. In λML T or
λc PL such constants correspond to program-constructors, for instance:
• lookup: L → T U , which given a location l ∈ L produces the value of such location in the
current store, and update: L × U → T 1, which changes the current store by assigning to l ∈ L
the value u ∈ U ;
• if : Bool × T A × T A → T A and while: T (Bool) × T 1 → T 1;
• new: 1 → T L, which returns a newly created location;
• read: 1 → T U , which computes a value by reading it from the input, and write: U → T 1,
which writes a value u ∈ U on the output.
In HMLT one can describe also a program logic, by adding constants p: T A → Ω corresponding to
properties of computations.
Example 4.6 Let T be the monad for non-deterministic computations (see Example 1.4), then we
can define a predicate may: A × T A → Ω such that may(a, c) is true iff the value a is a possible
outcome of the computation c (i.e. a ∈ c). However, there is a more uniform way of defining the
may predicate for any type. Let 3: T Ω → Ω be the predicate such that 3(X) = > iff > ∈ X,
where Ω is the set {⊥, >} (note that 3( ) = may(>, )). Then, may(a, c) can be defined as
3(letT x⇐c in [a =τ x]T ).
The previous example suggests that predicates defined uniformly on computations of any type
can be better described in terms of modal operators γ: T Ω → Ω, relating a computation of truth
values to a truth value. This possibility has not been investigated in depth, so we will give only a
tentative definition.
Definition 4.7 If (T, η, µ) is a monad over a topos C, then a T -modal operator is a T -algebra
γ: T Ω → Ω, i.e.
µΩ ηΩ
T 2Ω > TΩ < Ω

Tγ γ
idT Ω
∨ ∨
TΩ >Ω
γ
where Ω is the subobject classifier in C.
The commutativity of the two diagrams above can be expressed in the metalanguage:
• x: Ω ` γ([x]T ) ←→ x
• c: T 2 Ω ` γ(let x⇐c in x) ←→ γ(let x⇐c in [γ(x)]T )
We consider some examples and non-examples of modal operators.
Example 4.8 For the monad T of non-deterministic computations (see Example 1.4) there are
only two modal operators 2 and 3:
• 2(X) = ⊥ iff ⊥ ∈ X;

24
• 3(X) = > iff > ∈ X.
Given a nondeterministic computation e of type τ and a predicate A(x) over τ , i.e. a term of type
Ω, then 2(letT x⇐e in [A(x)]T ) is true iff all possible results of e satisfy A(x).
For the monad T of computations with side-effects (see Example 1.4) there is an operator
S
2: (Ω × S) → Ω that can be used to express Hoare’s triples:
• 2f = > iff for all s ∈ S there exists s0 ∈ S s.t. f s = h>, s0 i
this operator does not satisfy the second equivalence, as only one direction is valid, namely
c: T 2 Ω ` γ(let x⇐c in [γ(x)]T ) → γ(let x⇐c in x)
Let P : U → Ω and Q: U × U → Ω be predicates over storable values, e ∈ T 1 a computation of type
1 and x, y ∈ L locations. The intended meaning of the triple {P (x)}e{Q(x, y)} is “if in the initial
state the content u of x satisfies P (u), then in the final state (i.e. after executing e) the content
v of y satisfies Q(u, v)”. This intended meaning can be expressed formally in terms of the modal
operator 2 and the program-constructors lookup and update as follows:

∀u: U.P (u) → 2(letT v⇐(update(x, u); e; lookup(y)) in [Q(u, v)]T )



where ; : T A × T B → T B is the derived operation e1 ; e2 ≡ (letT x⇐e1 in e2 ) with x not free in e2 .
Finally, we state the main theorem and outline its proof. In doing so we assume that the reader
is familiar with non-elementary concepts from Category Theory.
Theorem 4.9 Let C be a small category, Cˆ the topos of presheaves over C and Y the Yoneda
embedding of C into C. ˆ Then for every monad (T, η, µ) over C, there exists a monad (T̂ , η̂, µ̂) over
Cˆ such that the following diagram commutes9
T
C >C

Y Y
∨ ∨
Cˆ > Cˆ

and for all a ∈ C the following equations hold

η̂Ya = Y(ηa ) , µ̂Ya = Y(µa )

Moreover, for every strong monad (T, η, µ, t) over C, there exists a natural transformation t̂ such
that (T̂ , η̂, µ̂, t̂) is a strong monad over Cˆ and for all a, b ∈ C the following equation holds

t̂Ya,Yb = Y(ta,b )
where we have implicitly assume that the Yoneda embedding preserves finite products on the nose,
i.e. the following diagrams commute
1 ×
1 >C< C×C
@
@
@ Y Y×Y
1 @
@ ∨ ∨
R
@
Cˆ < C × Cˆ
ˆ
×

9 This is a simplifying assumption. For our purposes it would be enough to have a natural isomorphism σ: T ; Y → .

Y; T̂ , but then the remaining equations have to be patched. For instance, the equation relating η and η̂ would become
η̂Ya = Y(ηa ); σa .

25
and for all a, b ∈ C. the following equations hold
!Ya = Y(!a ) , πiYa,Yb = Y(πia,b )
Definition 4.10 ([Mac71]) Let T : C → D be a functor between two small categories and A a
cocomplete category. Then, the left Kan extension LA C
T:A → A
D
is the left adjoint of AT and
can be defined as follows:
A
LAT (F )(d) = ColimT ↓d (π; F )
where F : C → A , d ∈ D, T ↓ d is the comma category whose objects are pairs hc ∈ C, f : T c → di,
π: T ↓ d → C is the projection functor (mapping a pair hc, f i to c) and Colim A I
I : A → A (with I
small category) is a functor mapping an I-diagram in A to its colimit.
The following proposition is a 2-categorical reformulation of Theorem 1.3.10 of [MR77]. For the
sake of simplicity, we use the strict notions of 2-functor and 2-natural transformation, although we
should have used pseudo-functors and pseudo-natural transformations.
Proposition 4.11 Let Cat be the 2-category of small categories, CAT the 2-category of locally
small categories and : Cat → CAT the inclusion 2-functor. Then, the following ˆ: Cat → CAT
is a 2-functor
op
• if C is a small category, then Cˆ is the topos of presheaves SetC
• if T : C → D is a functor, then T̂ is the left Kan extension LSet
T op
. ˆ then σ̂F is the natural transfor-
• if σ: S → T : C → D is a natural transformation and F ∈ C,
mation corresponding to idT̂ F via the following sequence of steps
ˆ T op ; T̂ F ) < ∼
C(F, D̂(T̂ F, T̂ F )

ˆ σ op ; T̂ F )
C(F,

ˆ S op ; T̂ F ) ∼
C(F, > D̂(ŜF, T̂ F )
.
Moreover, Y: → ˆ is a 2-natural transformation.
Since monads are a 2-categorical concept (see [Str72]), the 2-functor ˆ maps monads in Cat to
monads in CAT. Then, the statement of Theorem 4.9 about lifting of monads follows immediately
from Proposition 4.11. It remains to define the lifting t̂ of a tensorial strength t for a monad (T, η, µ)
over a small category C.
Proposition 4.12 If C is a small category with finite products and T is an endofunctor over
C, then for every natural transformation ta,b : a × T b → T (a × b) there exists a unique natural
transformation t̂F,G : F × T̂ G → T̂ (F × G) s.t. t̂Ya,Yb = Y(ta,b ) for all a, b ∈ C.

Proof Every F ∈ Cˆ is isomorphic to the colimit ColimĈY↓F (π; Y) (shortly Colimi Yi), where Y is
the Yoneda embedding of C into C. ˆ Similarly G is isomorphic to Colim Yj. Both functors ( × T̂ )
j
and T̂ ( × ) from Cˆ × Cˆ to Cˆ preserves colimits (as T̂ and × F are left adjoints) and commute
with the Yoneda embedding (as Y(a × b) = Ya × Yb and T̂ (Ya) = Y(T a)). Therefore, F × T̂ G and
T̂ (F × G) are isomorphic to the colimits Colimi,j Yi × T̂ (Yj) and Colimi,j T̂ (Yi × Yj) respectively.
Let t̂ be the natural transformation we are looking for, then
Y(ti,j )
Yi × T̂ (Yj) > T̂ (Yi × Yj)

f × T̂ g T̂ (f × g)
∨ ∨
F × T̂ (G) > T̂ (F × G)
t̂F,G

26
for all f : Yi → F and g: Yj → g (by naturality of t̂ and t̂Yi,Yj = Y(ti,j )). But there exists exactly
one morphism t̂F,G making the diagram above commute, as hti,j |i, ji is a morphism between
diagrams in Cˆ of the same shape, and these diagrams have colimit cones hf × T̂ g|f, gi and hT̂ (f ×
g)|f, gi respectively.
Remark 4.13 If T is a monad of partial computations, i.e. it is induced by a dominion M on C
s.t. P(C, M)(a, b) ∼
= C(a, T b), then the lifting T̂ is the monad of partial computations induced by
the dominion M̂ on C, ˆ obtained by lifting M to the topos of presheaves, as described in [Ros86].
For other monads, however, the lifting is not the expected one. For instance, if T is the monad
S YS
of side-effects ( × S) , then T̂ is not (in general) the endofunctor ( × YS) on the topos of
presheaves.

Conclusions and further research


The main contribution of this paper is the category-theoretic semantics of computations and the
general principle for extending it to more complex languages (see Remark 3.3 and Section 4), while
the formal systems presented are a straightforward fallout, easy to understand and relate to other
calculi.
Our work is just an example of what can be achieved in the study of programming languages
by using a category-theoretic methodology, which avoids irrelevant syntactic detail and focus in-
stead on the important structures underlying programming languages. We believe that there is a
great potential to be exploited here. Indeed, in [Mog89b] we give a categorical account of phase
distinction and program modules, that could lead to the introduction of higher order modules in
programming languages like ADA or ML (see [HMM90]), while in [Mog89a] we propose a “modular
approach” to Denotational Semantics based on the idea of monad-constructor (i.e. an endofunctor
on the category of monads over a category C).
The metalanguage open also the possibility to develop a new Logic of Computable Functions
(see [Sco69]), based on an abstract semantic of computations rather than domain theory, for
studying axiomatically different notions of computation and their relations. Some recent work by
Crole and Pitts (see [CP90]) has consider an extension of the metalanguage equipped with a logic
for inductive predicates, which goes beyond equational reasoning. A more ambitious goal would
be to try exploiting the capabilities offered by higher-order logic in order to give a uniform account
of various program logics, based on the idea of “T -modal operator” (see Definition 4.7).
The semantics of computations corroborates the view that (constructive) proofs and programs
are rather unrelated, although both of them can be understood in terms of functions. Indeed,
monads (and comonads) used to model logical modalities, e.g. possibility and necessity in modal
logic or why not and of course of linear logic, usually do not have a tensorial strength. In general,
one should expect types suggested by logic to provide a more fine-grained type system without
changing the nature of computations.
We have identified monads as important to model notions of computations, but computational
monads seem to have additional properties, e.g. they have a tensorial stregth and may satisfy the
mono requirement. It is likely that there are other properties of computational monads still to be
identified, and there is no reason to believe that such properties have to be found in the literature
on monads.

Acknowledgements
I have to thank many people for advice, suggestions and criticisms, in particular: R. Amadio, R.
Burstall, M. Felleisen, R. Harper, F. Honsell, M. Hyland, B. Jay, A. Kock, Y. Lafont, G. Longo,
R. Milner, A. Pitts, G. Plotkin, J. Power and C. Talcott.

References
[BW85] M. Barr and C. Wells. Toposes, Triples and Theories. Springer Verlag, 1985.

27
[CP90] R.L. Crole and A.M. Pitts. New foundations for fixpoint computations. In 4th LICS
Conf. IEEE, 1990.
[CS87] R.L. Constable and S.F. Smith. Partial objects in constructive type theory. In 2nd
LICS Conf. IEEE, 1987.
[CS88] R.L. Constable and S.F. Smith. Computational foundations of basic recursive function
theory. In 3rd LICS Conf. IEEE, 1988.
[FF89] M. Felleisen and D.P. Friedman. A syntactic theory of sequential state. Theoretical
Computer Science, 69(3), 1989.
[FFKD86] M. Felleisen, D.P. Friedman, E. Kohlbecker, and B. Duba. Reasoning with continua-
tions. In 1st LICS Conf. IEEE, 1986.
[Fou77] M.P. Fourman. The logic of topoi. In J. Barwise, editor, Handbook of Mathematical
Logic, volume 90 of Studies in Logic. North Holland, 1977.
[GMW79] M.J.C. Gordon, R. Milner, and C.P. Wadsworth. Edinburgh LCF: A Mechanized Logic
of Computation, volume 78 of Lecture Notes in Computer Science. Springer Verlag,
1979.
[GS89] C. Gunter and S. Scott. Semantic domains. Technical Report MS-CIS-89-16, Dept.
of Comp. and Inf. Science, Univ. of Pennsylvania, 1989. to appear in North Holland
Handbook of Theoretical Computer Science.
[HMM90] R. Harper, J. Mitchell, and E. Moggi. Higher-order modules and the phase distinction.
In 17th POPL. ACM, 1990.
[HP87] J.M.E. Hyland and A.M. Pitts. The theory of constructions: Categorical semantics and
topos-theoretic models. In Proc. AMS Conf. on Categories in Comp. Sci. and Logic
(Boulder 1987), 1987.
[JP78] P.T. Johnstone and R. Pare, editors. Indexed Categories and their Applications, volume
661 of Lecture Notes in Mathematics. Springer Verlag, 1978.
[Kel82] G.M. Kelly. Basic Concepts of Enriched Category Theory. Cambridge University Press,
1982.
[Koc72] A. Kock. Strong functors and monoidal monads. Archiv der Mathematik, 23, 1972.
[KR77] A. Kock and G.E. Reyes. Doctrines in categorical logic. In J. Barwise, editor, Handbook
of Mathematical Logic, volume 90 of Studies in Logic. North Holland, 1977.
[Laf88] Y. Lafont. The linear abstract machine. Theoretical Computer Science, 59, 1988.
[LS86] J. Lambek and P.J. Scott. Introduction to Higher-Order Categorical Logic, volume 7 of
Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1986.
[Mac71] S. MacLane. Categories for the Working Mathematician. Springer Verlag, 1971.
[Man76] E. Manes. Algebraic Theories, volume 26 of Graduate Texts in Mathematics. Springer
Verlag, 1976.
[Mas88] I.A. Mason. Verification of programs that destructively manipulate data. Science of
Computer Programming, 10, 1988.
[Mog86] E. Moggi. Categories of partial morphisms and the partial lambda-calculus. In Pro-
ceedings Workshop on Category Theory and Computer Programming, Guildford 1985,
volume 240 of Lecture Notes in Computer Science. Springer Verlag, 1986.

28
[Mog88] E. Moggi. The Partial Lambda-Calculus. PhD thesis, University of Edinburgh, 1988.
[Mog89a] E. Moggi. An abstract view of programming languages. Technical Report ECS-LFCS-
90-113, Edinburgh Univ., Dept. of Comp. Sci., 1989. Lecture Notes for course CS 359,
Stanford Univ.
[Mog89b] E. Moggi. A category-theoretic account of program modules. In Proceedings of the
Conference on Category Theory and Computer Science, Manchester, UK, Sept. 1989,
volume 389 of Lecture Notes in Computer Science. Springer Verlag, 1989.
[Mog89c] E. Moggi. Computational lambda-calculus and monads. In 4th LICS Conf. IEEE, 1989.
[Mos89] P. Mosses. Denotational semantics. Technical Report MS-CIS-89-16, Dept. of Comp.
and Inf. Science, Univ. of Pennsylvania, 1989. to appear in North Holland Handbook
of Theoretical Computer Science.
[MR77] M. Makkai and G. Reyes. First Order Categorical Logic. Springer Verlag, 1977.
[MT89a] I. Mason and C. Talcott. Programming, transforming, and proving with function ab-
stractions and memories. In 16th Colloquium on Automata, Languages and Program-
ming. EATCS, 1989.
[MT89b] I. Mason and C. Talcott. A sound and complete axiomatization of operational equiva-
lence of programs with memory. In POPL 89. ACM, 1989.
[Plo75] G.D. Plotkin. Call-by-name, call-by-value and the λ-calculus. Theoretical Computer
Science, 1, 1975.
[Plo85] G.D. Plotkin. Denotational semantics with partial functions. Lecture Notes at C.S.L.I.
Summer School, 1985.
[Ros86] G. Rosolini. Continuity and Effectiveness in Topoi. PhD thesis, University of Oxford,
1986.
[Sch86] D.A. Schmidt. Denotational Semantics: a Methodology for Language Development.
Allyn & Bacon, 1986.
[Sco69] D.S. Scott. A type-theoretic alternative to CUCH, ISWIM, OWHY. Oxford notes,
1969.
[Sco79] D.S. Scott. Identity and existence in intuitionistic logic. In M.P. Fourman, C.J. Mul-
vey, and D.S. Scott, editors, Applications of Sheaves, volume 753 of Lecture Notes in
Mathematics. Springer Verlag, 1979.
[Sco80] D.S. Scott. Relating theories of the λ-calculus. In R. Hindley and J. Seldin, editors, To
H.B. Curry: essays in Combinarory Logic, lambda calculus and Formalisms. Academic
Press, 1980.
[See87] R.A.G. Seely. Linear logic, ∗-autonomous categories and cofree coalgebras. In Proc.
AMS Conf. on Categories in Comp. Sci. and Logic (Boulder 1987), 1987.
[Sha84] K. Sharma. Syntactic aspects of the non-deterministic lambda calculus. Master’s thesis,
Washington State University, September 1984. available as internal report CS-84-127
of the comp. sci. dept.
[SP82] M. Smith and G. Plotkin. The category-theoretic solution of recursive domain equa-
tions. SIAM Journal of Computing, 11, 1982.
[Str72] R. Street. The formal theory of monads. Journal of Pure and Applied Algebra, 2, 1972.

29
Principal type-schemes for functional programs

Luis Darnas* and Robin Milne~

Edinburgh University

1. Introduction of successful use of the language, both in LCF and

This paper is concerned with the polymorphic other research and in teaching to undergraduates,

type discipline of NL, which is a general purpose it has become important to answer these questions -

functional programming language, although it was particularly because the combination of flexibility

first introduced as a metalanguage (whence its (due to polymorphism) , robustness (due to semantic

name) for conducting proofs in the LCF proof system soundness) and detection of errors at compile time

[GMW] . The type discipline was studied in [Mil] , has proved to be one of the strongest aspects of ML.

where it was shown to be semantically sound, in a


The discipline can be well illustrated by a
sense made precise below, but where one important
small example. Let us define in ML the function
question was left open: does the type-checking
“map”, which maps a given function over a given list
algorithm - or more precisely, the type assignment
- that is,
algorithm (since types are assigned by the compiler,
map f [xl;. ..;xn] = [f(xl); . . .. f(xn)]
and need not be mentioned by the programmer) - find
The required declaration is
the most general type possible for every expression
letrec map f s = if null s then nil

and declaration? Here we answer the question in
else cons(f (hd s)) (map f (tl s))
the affirmative, for the purely applicative part
The type-checker will deduce a type-scheme for “map”
of ML. It follows immediately that it is decid-
from existing type-schemes for “null”, “nil”, “cons”,
able whether a program is well-typed, in contrast
“hd” and “tl”; the term “type-scheme” is appropriate
with the elegant and slightly more permissive type
since all these objects are polymorphic. In fact,
discipline of Coppo [Cop] . After several years
from
* The work of this author is supported by the
Portuguese Instituto National de Investigacao null : Va(a list+ bool)
Cientifica.
nil : Va(a list)
Permksion to copy without fee all or part of this material k granted
cons : Va(a + (a list + u list))
provided that the copies are not made ordktributed fordkect
commercial advantage, the ACM copyright notice and the title of the
hd : Va(u list + a)
publication and ha date appear, and notice k given that copying is by
permission of the Association for Computing Machinery. To copy
otherwise, or to republish, requires a fee and/or specific permission.
tl : VU([ list + a list)

will be deduced

@ 1982 ACM O-89791-065-6/82/OOl/0207 $00.75 maP : V’WV(3 (u + !3) + (a list+5 list) .

207
Types are built from type constants (bool, . . . ) and on-line use of ML declarations

type variables (a,~, . . . ) using type operators (such letx=e

as infixed + for functions and postfixed “list” for are allowed, whose scope (e’) is the remainder

lists) ; a type-scheme is a type with (possibly) of the on-line session. As illustrated in the

quantification over type variables at the outermost. introduction, it must be possible to assign type-

schemes to identifiers thus declared.


Thus , the main result of this paper is that the

type-scheme deduced for such a declaration (and more Note that types are absent from the language

generally, for any ML expression) is a principal Exp . Assuming a set of type variables a and of

type-schemer i.e. that any other type-scheme for the primitive types I, the syntax of types T and

declaration is a generic instance of it. This is of typeschemes u is given by

a generalisation of Hindley’s result for Combinatory T::= UlllT+T

Logic [Hin]. U ::= TlvCiU

A type-scheme VUl. ..VanT (which we may write


ML may be contrasted with ALGOL 68, in which
Vet~ . ..~nT) has generic type variables
al’” ””’un-
there is no polymorphism, and with Russell [DDI , in
A monotype u is a type containing no type
which parametric types appear explicitly as argu-
variables.
ments to polymorphic functions. The generic types

of Ada may be compared with type schemes. For sim-

3. Type Instantiation
plicity, our definitions and results here are form-
If S is a substitution of types for type
ulated for a skeletal language, since their extension
variables, often written [T1/CIl,...,Tanlnl or
to ML is a routine matter. For example, recursion

[Ti/ail, and IS is a type-scheme, then SO is


is omitted since it can be introduced by simply add-
the type-scheme obtained by replacing each free
ing the polymorphic fixed-point operator
occurrence of a, in a by Ti, renaming the
1
fix : Va((u+U)+a)
generic variables of o if necessary. Then So
and likewise for conditional expressions.
is called an instance of a ; the notions of

substitution and instance extend naturally to


2. The language
larger syntactic constructs containing type-<
Assuming a set Id of identifiers x , the

schemes.
language Exp of expressions e is given by the

syntax By contrast, a type scheme 5 = val. ..clmT

e ::= Xle e, I lx.e I let x = e & e’ has a generic instance a’ = VB1. ..BnT’ if

(where parentheses may be used to avoid ambiguity). T’ = [Ti/ai]T for some types T1r. ..r~mt and

Only the last clause extends the l-calculus. Indeed,


the ~j are nOt free in u. In this case we

for type checking purposes every let


— expression could shall write 5>0’. Note that instantiation acts

be eliminated (by replacing x by e everywhere in on free variables, while generic instantiation

e’) , except for the important consideration that in acts on bound variables. It follOWS that o > u’

208
implies so > Su’. restricted scope.

The remainder of this paper proceeds as


4. Semantics
follows. First we present an inference system
The semantic domain V for Exp is a complete
for inferring valid assertions. Next we present
partial order satisfying the following equations up
an algorithm W for computing a type-scheme for
to isomorphism, where Bi is a cpo corresponding
any expression, under assumptions A. We then show
to primitive type Ii:

that W is sound, i.n the sense that any type–


v=B+B +... + F + W (disjoint sum)
01
scheme which it yields is derivable in the inference
F=v+v (function space)

system. Finally we show that W is complete,


w= {.} (error element)

in the sense that derivable type–scheme is an


To each monotype u corresponds a subset of V, as

instance of that computed by W.


detailed in [Mil] ; ifv EV is in the subset for

u, we write V:U. Further, we write v:T if V:U

for every monotype instance N of T, and we

write v:a if V:T for every T which is a generic


5. Type Inference
instance of u.
From now on we shall assume that A contains

Now let Env = Id+V be the domain of environ- at most one assumption about each identifier x.

ments q. The semantic function &:Exp+Env+v A stands for the result of removing any assump-
X

is given in [Mill. Using it, we wish to attach tion about x from A.

meaning to assertions of the form


For assumptions A, expression e and type-
A* e:a
scheme a we write
where e E Exp and A is a set of assumptions of
A1-e:u
the form x:u’, xE Id. If the assertion is closed,
if this sentence may be derived from the following
i.e. if A and o contain no free type variables,
inference rules:
then the sentence is said to hold iff, for every
TAUT: A 1- X:U (X:U in A)
environment rI, whenever TI[[x II :u’ for each member
Al- e:u
INST: — (0 > u’)
X:ls’ of A, it follows that ~[[ell n:u. Further,
Al- e:u’

an assertion holds iff all its closed instances


Al- e:o
GEN : (CY not free in A)
hold . Thus, to verify the assertion A F. e:b’cm

X:cl, f:v@( ~+(3) 1= (f X):@ A1-e:T’+T A1’-e’:T’


COMB: —
Al- (e e’):T
It is enough to verify it for every monotype u in

AxU{X:T’} ~ e:T
place of 0,. This example illustrates that free
ASS :
A + (kx. e) :T’+T
type-variables in an assertion are implicitly

Al- e:u A U{X:5} 1- e’:T


quantified over the whole assertion, while x
LET :
a 1- (let x=e in e’) :T
explicit quantification in a type scheme has —

209
The following example of a derivation is organised Proof We construct a derivation of AxU{x:o}ke:u
o
as a tree, in which each node follows from those from that of AXU{X:O’} + e:u o by substituting each

immediately above it by an inference rule. use of TAUT for X:u’ with X:U followed by an

INST step to derive x:0’ . Note that GEN steps

remain valid since if a occurs free in o then it


I
INSTI i:Va(ct+a) Fi:Vct[a,+u) also occurs free in 0’ . K1

i: Va(a+a) 1-
I
i:(a+a)+(a+a) INST
6. The type assignment algorithm W

I I
The type inference system by itself does not

X:a.1- X:(Y
provide an easy method for finding, given A and e,

COMB
a type-scheme o such that A+e:u. We now present
AB S

an algorithm W for this purpose. In fact, W


+
b J,x.x:ly,+cl
I goes a step further. Given A and e, if W

succeeds it finds a substitution S and a type T ,


GEN

which are most general in a sense to be made precise

1- Ax. x:va(cl+cx)
below, such that

J-----4 K (leti i = Ax. x in



ii) : cx+a
To define W
SAt-e:T

we require the unification algor -

ithm of Robinson [Rob] .


The following proposition, stating the semantic

sOundness of inference, can be proved by induction


Proposition 3 (Robinson) There is an algorithm U

one.
which, given a pair of types, either returns a sub-

stitution V or fails; further


Proposition 1 (soundness of inference) If A 1- e :0

(i) If U(T ,T ‘ ) returns V , then V unifies T


then AKe:u.

and T’, i.e. V’r=v’ r’.


We will also require later the two following
(ii)If S unifieS T and T’, then U(T,T’)

properties of the inference system.


returns some V and there is another substit-

Proposition 2 If S is a substitution and A 1- e:u ution R such that S = RV.

then SA I- e:Sa. Moreover if there is a deriv- Moreover, V inVOIVe S Only variables in T and r’.

ation of A !- e:o of height n then there is also


We also need to define the closure of a type T
a derivation of SA t e:Su of height less or equal
with respect to assumptions A ;

to n.
Z(T) = Val... anT

Proof by induction on n. B where o. ,...,a are the type-variables occurring


In

free in T but not in A.

Lemma 1 If 0>0’ and AxlJ{x:a’} !- e:o then


0

also Axu{x:a} + e:~o .

210
Algorithm W (ii) w other o for which A h e:u is a generic

W(A, e) = (S, T) where instance of o .


P

(i) Ifeisx and there is an assumption


Our main result, restricted to the simple case
X: b’C4 . .. CinTg in A then S = ld anc3
1
in which A contains no free type-variables, may
T = [B./a IT’ where the t3i’s are new.
li
be stated as follows:
(ii) If e is then
‘le2

let W(A, e2) = (S1, T2) If A*e:o, for some a , then w computes

and W(SIA, e2) = (S2, T2) a PrlnC1pal type scheme for e under A.

and U(S2T1, T2+B) = V where R is new;


‘I’his is a direct corollary of the following general

then S = VS2S1 and T=v~.


theorem, which is a stronger result suited to induc–

(iii) If e is Ax. el then let ~ be a new type


tive proof:

variable

Theorem (Completeness of W). Given A and e,


and W(AxU{x:i3}, el) = (S1,T1) ;

let A’ be an instance of A and o a type-scheme


then S = S and T = S16+T1.
1
such that
(iv) If e is let x = el &e2 then

let W(A,el) = (SI,T2) A’ 1- e:o

and W(SIAxU{x:~A(T )},e2) = (S2,T2) ; Then (i) W(A, e) succeeds


11

then S = S2S1 and T = T2 . (ii) If W(A, e) = (S, T) then, for some sub-

NOTE: When any of the conditions above is not met stitution R,

W fails. A’ = RSA and R s—fi(T) > u m

The following proposition proves that W meets In fact, from the theorem one also derives as

our requirements. corollaries that it is decidable whether e has any

type at all under assumptions A, and that, if so,


Proposition 4 (Soundness of W) If W(A, e)
it has a principal type scheme under A.
succeeds with (S,T) then there is a derivation of

SAI- e:T . The detailed proofs of results in this paper,

and related results, will appear in the first


Proof By induction on e using proposition 2. @

author’s forthcoming Ph.D. Thesis.

It follows that there is also a derivation of

SA + e:~(~). We refer to =(T) as a type-scheme

computed by W for e under SA.

7. Completeness of W

Given A and e, we will call Op a principal

type-scheme of e under assumptions A iff

(i) A 1- e:op

211
References

[LNCSn stands for Vol n, Lecture Notes in Computer

Science, Springer-Verlag].

[Cop] M. Coppo, m extended polymorphic type system

for applicative languages, (1980), LNCS 88,

pp 194-204.

[DD] A. Demers and J. Donahue, Report on the prog-

ramming language Russell, (1979), Report No.

TR 79-371, Computer Science Department,

Cornell University.

[GMW] M. Gordon, R. Miiner and C. Wadsworth, (1979),

Edinburgh LCF, LNCS 78.

[Hin] R. Hindley, The principal type-scheme of an

object in Combinatory Logic, (1969), Trans

AMS 146, pp 29-60.

[Mil] R. Milner, A theory of type polymorphism in

programming (1978), JCSS 17,3, pp 348-375.

[Rob] J.A. Robinson, A machine-oriented logic based

on the resolution principle, JACM 12,1 (1965) ,

23-41.

212
Recursive Functions of Symbolic Expressions
and Their Computation by Machine, Part I

John McCarthy, Massachusetts Institute of Technology, Cambridge, Mass.
April 1960

1 Introduction
A programming system called LISP (for LISt Processor) has been developed
for the IBM 704 computer by the Artificial Intelligence group at M.I.T. The
system was designed to facilitate experiments with a proposed system called
the Advice Taker, whereby a machine could be instructed to handle declarative
as well as imperative sentences and could exhibit “common sense” in carrying
out its instructions. The original proposal [1] for the Advice Taker was made
in November 1958. The main requirement was a programming system for
manipulating expressions representing formalized declarative and imperative
sentences so that the Advice Taker system could make deductions.
In the course of its development the LISP system went through several
stages of simplification and eventually came to be based on a scheme for rep-
resenting the partial recursive functions of a certain class of symbolic expres-
sions. This representation is independent of the IBM 704 computer, or of any
other electronic computer, and it now seems expedient to expound the system
by starting with the class of expressions called S-expressions and the functions
called S-functions.
Putting this paper in LATEXpartly supported by ARPA (ONR) grant N00014-94-1-0775

to Stanford University where John McCarthy has been since 1962. Copied with minor nota-
tional changes from CACM, April 1960. If you want the exact typography, look there. Cur-
rent address, John McCarthy, Computer Science Department, Stanford, CA 94305, (email:
jmc@cs.stanford.edu), (URL: http://www-formal.stanford.edu/jmc/ )

1
In this article, we first describe a formalism for defining functions recur-
sively. We believe this formalism has advantages both as a programming
language and as a vehicle for developing a theory of computation. Next, we
describe S-expressions and S-functions, give some examples, and then describe
the universal S-function apply which plays the theoretical role of a universal
Turing machine and the practical role of an interpreter. Then we describe the
representation of S-expressions in the memory of the IBM 704 by list structures
similar to those used by Newell, Shaw and Simon [2], and the representation
of S-functions by program. Then we mention the main features of the LISP
programming system for the IBM 704. Next comes another way of describ-
ing computations with symbolic expressions, and finally we give a recursive
function interpretation of flow charts.
We hope to describe some of the symbolic computations for which LISP
has been used in another paper, and also to give elsewhere some applications
of our recursive function formalism to mathematical logic and to the problem
of mechanical theorem proving.

2 Functions and Function Definitions


We shall need a number of mathematical ideas and notations concerning func-
tions in general. Most of the ideas are well known, but the notion of conditional
expression is believed to be new1 , and the use of conditional expressions per-
mits functions to be defined recursively in a new and convenient way.

a. Partial Functions. A partial function is a function that is defined only


on part of its domain. Partial functions necessarily arise when functions are
defined by computations because for some values of the arguments the com-
putation defining the value of the function may not terminate. However, some
of our elementary functions will be defined as partial functions.

b. Propositional Expressions and Predicates. A propositional expression is


an expression whose possible values are T (for truth) and F (for falsity). We
shall assume that the reader is familiar with the propositional connectives ∧
(“and”), ∨ (“or”), and ¬ (“not”). Typical propositional expressions are:
1
reference Kleene

2
x<y
(x < y) ∧ (b = c)
x is prime

A predicate is a function whose range consists of the truth values T and F.

c. Conditional Expressions. The dependence of truth values on the values


of quantities of other kinds is expressed in mathematics by predicates, and the
dependence of truth values on other truth values by logical connectives. How-
ever, the notations for expressing symbolically the dependence of quantities of
other kinds on truth values is inadequate, so that English words and phrases
are generally used for expressing these dependences in texts that describe other
dependences symbolically. For example, the function |x| is usually defined in
words. Conditional expressions are a device for expressing the dependence of
quantities on propositional quantities. A conditional expression has the form

(p1 → e1 , · · · , pn → en )

where the p’s are propositional expressions and the e’s are expressions of any
kind. It may be read, “If p1 then e1 otherwise if p2 then e2 , · · · , otherwise if
pn then en ,” or “p1 yields e1 , · · · , pn yields en .” 2
We now give the rules for determining whether the value of

(p1 → e1 , · · · , pn → en )

is defined, and if so what its value is. Examine the p’s from left to right. If
a p whose value is T is encountered before any p whose value is undefined is
encountered then the value of the conditional expression is the value of the
corresponding e (if this is defined). If any undefined p is encountered before
2
I sent a proposal for conditional expressions to a CACM forum on what should be
included in Algol 60. Because the item was short, the editor demoted it to a letter to the
editor, for which CACM subsequently apologized. The notation given here was rejected for
Algol 60, because it had been decided that no new mathematical notation should be allowed
in Algol 60, and everything new had to be English. The if . . . then . . . else that Algol 60
adopted was suggested by John Backus.

3
a true p, or if all p’s are false, or if the e corresponding to the first true p is
undefined, then the value of the conditional expression is undefined. We now
give examples.

(1 < 2 → 4, 1 > 2 → 3) = 4

(2 < 1 → 4, 2 > 1 → 3, 2 > 1 → 2) = 3

(2 < 1 → 4, T → 3) = 3

0
(2 < 1 → , T → 3) = 3
0
0
(2 < 1 → 3, T → ) is undefined
0

(2 < 1 → 3, 4 < 1 → 4) is undefined


Some of the simplest applications of conditional expressions are in giving
such definitions as

|x| = (x < 0 → −x, T → x)

δij = (i = j → 1, T → 0)

sgn(x) = (x < 0 → −1, x = 0 → 0, T → 1)

d. Recursive Function Definitions. By using conditional expressions we


can, without circularity, define functions by formulas in which the defined
function occurs. For example, we write

n! = (n = 0 → 1, T → n · (n − 1)!)
When we use this formula to evaluate 0! we get the answer 1; because of the
way in which the value of a conditional expression was defined, the meaningless

4
expression 0 · (0 - 1)! does not arise. The evaluation of 2! according to this
definition proceeds as follows:

2! = (2 = 0 → 1, T → 2 · (2 − 1)!)
= 2 · 1!
= 2 · (1 = 0 → 1T → ·(1 − 1)!)
= 2 · 1 · 0!
= 2 · 1 · (0 = 0 → 1, T → 0 · (0 − 1)!)
= 2·1·1
= 2

We now give two other applications of recursive function definitions. The


greatest common divisor, gcd(m,n), of two positive integers m and n is com-
puted by means of the Euclidean algorithm. This algorithm is expressed by
the recursive function definition:

gcd(m, n) = (m > n → gcd(n, m), rem(n, m) = 0 → m, T → gcd(rem(n, m), m))

where rem(n, m) denotes the remainder left when n is divided by m.


The Newtonian algorithm for obtaining an approximate square root of a
number a, starting with an initial approximation x and requiring that an
acceptable approximation y satisfy |y 2 − a| < , may be written as
sqrt(a, x, )

= (|x2 − a| <  → x,T → sqrt (a, 12 (x + xa ), ))


The simultaneous recursive definition of several functions is also possible,
and we shall use such definitions if they are required.
There is no guarantee that the computation determined by a recursive
definition will ever terminate and, for example, an attempt to compute n!
from our definition will only succeed if n is a non-negative integer. If the
computation does not terminate, the function must be regarded as undefined
for the given arguments.
The propositional connectives themselves can be defined by conditional
expressions. We write

5
p∧q = (p → q, T → F )
p∨q = (p → T, T → q)
¬p = (p → F, T → T )
p⊃q = (p → q, T → T )
It is readily seen that the right-hand sides of the equations have the correct
truth tables. If we consider situations in which p or q may be undefined, the
connectives ∧ and ∨ are seen to be noncommutative. For example if p is false
and q is undefined, we see that according to the definitions given above p ∧ q
is false, but q ∧ p is undefined. For our applications this noncommutativity is
desirable, since p ∧ q is computed by first computing p, and if p is false q is not
computed. If the computation for p does not terminate, we never get around
to computing q. We shall use propositional connectives in this sense hereafter.

e. Functions and Forms. It is usual in mathematics—outside of mathe-


matical logic—to use the word “function” imprecisely and to apply it to forms
such as y 2 + x. Because we shall later compute with expressions for functions,
we need a distinction between functions and forms and a notation for express-
ing this distinction. This distinction and a notation for describing it, from
which we deviate trivially, is given by Church [3].
Let f be an expression that stands for a function of two integer variables.
It should make sense to write f (3, 4) and the value of this expression should be
determined. The expression y 2 + x does not meet this requirement; y 2 + x(3, 4)
is not a conventional notation, and if we attempted to define it we would be
uncertain whether its value would turn out to be 13 or 19. Church calls an
expression like y 2 + x, a form. A form can be converted into a function if we
can determine the correspondence between the variables occurring in the form
and the ordered list of arguments of the desired function. This is accomplished
by Church’s λ-notation.
If E is a form in variables x1 , · · · , xn , then λ((x1 , · · · , xn ), E) will be taken
to be the function of n variables whose value is determined by substituting
the arguments for the variables x1 , · · · , xn in that order in E and evaluating
the resulting expression. For example, λ((x, y), y 2 + x) is a function of two
variables, and λ((x, y), y 2 + x)(3, 4) = 19.
The variables occurring in the list of variables of a λ-expression are dummy
or bound, like variables of integration in a definite integral. That is, we may

6
change the names of the bound variables in a function expression without
changing the value of the expression, provided that we make the same change
for each occurrence of the variable and do not make two variables the same
that previously were different. Thus λ((x, y), y 2 + x), λ((u, v), v 2 + u) and
λ((y, x), x2 + y) denote the same function.
We shall frequently use expressions in which some of the variables are
bound by λ’s and others are not. Such an expression may be regarded as
defining a function with parameters. The unbound variables are called free
variables.
An adequate notation that distinguishes functions from forms allows an
unambiguous treatment of functions of functions. It would involve too much
of a digression to give examples here, but we shall use functions with functions
as arguments later in this report.
Difficulties arise in combining functions described by λ-expressions, or by
any other notation involving variables, because different bound variables may
be represented by the same symbol. This is called collision of bound variables.
There is a notation involving operators that are called combinators for com-
bining functions without the use of variables. Unfortunately, the combinatory
expressions for interesting combinations of functions tend to be lengthy and
unreadable.

f. Expressions for Recursive Functions. The λ-notation is inadequate for


naming functions defined recursively. For example, using λ’s, we can convert
the definition
1 a
sqrt(a, x, ) = (|x2 − a| <  → x, T → sqrt(a, (x + ), ))
2 x
into

1 a
sqrt = λ((a, x, ), (|x2 − a| <  → x, T → sqrt(a, (x + ), ))),
2 x
but the right-hand side cannot serve as an expression for the function be-
cause there would be nothing to indicate that the reference to sqrt within the
expression stood for the expression as a whole.
In order to be able to write expressions for recursive functions, we intro-
duce another notation. label(a, E) denotes the expression E, provided that
occurrences of a within E are to be interpreted as referring to the expression

7
as a whole. Thus we can write

label(sqrt, λ((a, x, ), (|x2 − a| <  → x, T → sqrt(a, 21 (x + xa ), ))))

as a name for our sqrt function.

The symbol a in label (a, E) is also bound, that is, it may be altered
systematically without changing the meaning of the expression. It behaves
differently from a variable bound by a λ, however.

3 Recursive Functions of Symbolic Expressions


We shall first define a class of symbolic expressions in terms of ordered pairs
and lists. Then we shall define five elementary functions and predicates, and
build from them by composition, conditional expressions, and recursive def-
initions an extensive class of functions of which we shall give a number of
examples. We shall then show how these functions themselves can be ex-
pressed as symbolic expressions, and we shall define a universal function apply
that allows us to compute from the expression for a given function its value
for given arguments. Finally, we shall define some functions with functions as
arguments and give some useful examples.

a. A Class of Symbolic Expressions. We shall now define the S-expressions


(S stands for symbolic). They are formed by using the special characters

·
)
(
and an infinite set of distinguishable atomic symbols. For atomic symbols,
we shall use strings of capital Latin letters and digits with single imbedded

8
blanks.3 Examples of atomic symbols are

A
ABA
AP P LE P IE NUMBER 3

There is a twofold reason for departing from the usual mathematical prac-
tice of using single letters for atomic symbols. First, computer programs fre-
quently require hundreds of distinguishable symbols that must be formed from
the 47 characters that are printable by the IBM 704 computer. Second, it is
convenient to allow English words and phrases to stand for atomic entities for
mnemonic reasons. The symbols are atomic in the sense that any substructure
they may have as sequences of characters is ignored. We assume only that dif-
ferent symbols can be distinguished. S-expressions are then defined as follows:

1. Atomic symbols are S-expressions.


2. If e1 and e2 are S-expressions, so is (e1 · e2 ).
Examples of S-expressions are

AB
(A · B)
((AB · C) · D)

An S-expression is then simply an ordered pair, the terms of which may be


atomic symbols or simpler S-expressions. We can can represent a list of arbi-
trary length in terms of S-expressions as follows. The list

(m1 , m2 , · · · , mn )
is represented by the S-expression

(m1 · (m2 · (· · · (mn · NIL) · · ·)))


Here NIL is an atomic symbol used to terminate lists. Since many of the
symbolic expressions with which we deal are conveniently expressed as lists,
we shall introduce a list notation to abbreviate certain S-expressions. We have

3
1995 remark: Imbedded blanks could be allowed within symbols, because lists were then
written with commas between elements.

9
l. (m) stands for (m ·NIL).
2. (m1 , · · · , mn ) stands for (m1 · (· · · (mn · NIL) · · ·)).
3. (m1 , · · · , mn · x) stands for (m1 · (· · · (mn · x) · · ·)).

Subexpressions can be similarly abbreviated. Some examples of these ab-


breviations are
((AB, C), D) for ((AB · (C · NIL)) · (D · NIL))
((A, B), C, D · E) for ((A · (B · NIL)) · (C · (D · E)))

Since we regard the expressions with commas as abbreviations for those


not involving commas, we shall refer to them all as S-expressions.

b. Functions of S-expressions and the Expressions That Represent Them.


We now define a class of functions of S-expressions. The expressions represent-
ing these functions are written in a conventional functional notation. However,
in order to clearly distinguish the expressions representing functions from S-
expressions, we shall use sequences of lower-case letters for function names
and variables ranging over the set of S-expressions. We also use brackets and
semicolons, instead of parentheses and commas, for denoting the application
of functions to their arguments. Thus we write

car[x]
car[cons[(A · B); x]]
In these M-expressions (meta-expressions) any S-expression that occur stand
for themselves.

c. The Elementary S-functions and Predicates. We introduce the following


functions and predicates:
1. atom. atom[x] has the value of T or F according to whether x is an
atomic symbol. Thus

atom [X] = T
atom [(X · A)] = F

2. eq. eq [x;y] is defined if and only if both x and y are atomic. eq [x; y]
= T if x and y are the same symbol, and eq [x; y] = F otherwise. Thus

10
eq [X; X] = T
eq [X; A] = F
eq [X; (X · A)] is undefined.

3. car. car[x] is defined if and only if x is not atomic. car [(e1 · e2 )] = e1 .


Thus car [X] is undefined.

car [(X · A)] = X


car [((X · A) · Y )] = (X · A)

4. cdr. cdr [x] is also defined when x is not atomic. We have cdr
[(e1 · e2 )] = e2 . Thus cdr [X] is undefined.

cdr [(X · A)] = A cdr [((X · A) · Y )] = Y

5. cons. cons [x; y] is defined for any x and y. We have cons [e1 ; e2 ] =
(e1 · e2 ). Thus

cons [X; A] = (X A)
cons [(X · A); Y ] = ((X · A)Y )

car, cdr, and cons are easily seen to satisfy the relations

car [cons [x; y]] = x


cdr [cons [x; y]] = y
cons [car [x]; cdr [x]] = x, provided that x is not atomic.

The names “car” and “cons” will come to have mnemonic significance only
when we discuss the representation of the system in the computer. Composi-
tions of car and cdr give the subexpressions of a given expression in a given
position. Compositions of cons form expressions of a given structure out of
parts. The class of functions which can be formed in this way is quite limited
and not very interesting.

d. Recursive S-functions. We get a much larger class of functions (in fact,


all computable functions) when we allow ourselves to form new functions of
S-expressions by conditional expressions and recursive definition. We now give

11
some examples of functions that are definable in this way.
1. ff[x]. The value of ff[x] is the first atomic symbol of the S-expression x
with the parentheses ignored. Thus

ff[((A · B) · C)] = A
We have

ff[x] = [atom[x] → x; T → ff[car[x]]]


We now trace in detail the steps in the evaluation of

ff [((A · B) · C)]:
ff [((A · B) · C)]

= [atom[((A · B) · C)] → ((A · B) · C); T → ff[car[((A · B)C·)]]]

= [F → ((A · B) · C); T → ff[car[((A · B) · C)]]]

= [T → ff[car[((A · B) · C)]]]

= ff[car[((A · B) · C)]]

= ff[(A · B)]

= [atom[(A · B)] → (A · B); T → ff[car[(A · B)]]]

= [F → (A · B); T → ff[car[(A · B)]]]

= [T → ff[car[(A · B)]]]

= ff[car[(A · B)]]

= ff[A]

12
= [atom[A] → A; T → ff[car[A]]]

= [T → A; T → ff[car[A]]]

= A

2. subst [x; y; z]. This function gives the result of substituting the S-
expression x for all occurrences of the atomic symbol y in the S-expression z.
It is defined by

subst [x; y; z] = [atom [z] → [eq [z; y] → x; T → z];


T → cons [subst [x; y; car [z]]; subst [x; y; cdr [z]]]]

As an example, we have

subst[(X · A); B; ((A · B) · C)] = ((A · (X · A)) · C)


3. equal [x; y]. This is a predicate that has the value T if x and y are the
same S-expression, and has the value F otherwise. We have
equal [x; y] = [atom [x] ∧ atom [y] ∧ eq [x; y]]

∨[¬ atom [x] ∧¬ atom [y] ∧ equal [car [x]; car [y]]

∧ equal [cdr [x]; cdr [y]]]

It is convenient to see how the elementary functions look in the abbreviated


list notation. The reader will easily verify that

(i) car[(m1 , m2 , · · · , mn )] = m1
(ii) cdr[(ms , m2 , · · · , mn )] = (m2 , · · · , mn )
(iii) cdr[(m)] = NIL
(iv) cons[m1 ; (m2 , · · · , mn )] = (m1 , m2 , · · · , mn )
(v) cons[m; NIL] = (m)

We define

13
null[x] = atom[x] ∧ eq[x; NIL]
This predicate is useful in dealing with lists.
Compositions of car and cdr arise so frequently that many expressions can
be written more concisely if we abbreviate

cadr[x] for car[cdr[x]],


caddr[x] for car[cdr[cdr[x]]], etc.

Another useful abbreviation is to write list [e1 ; e2 ; · · · ; en ]


for cons[e1 ; cons[e2 ; · · · ; cons[en ; NIL] · · ·]].

This function gives the list, (e1 , · · · , en ), as a function of its elements.

The following functions are useful when S-expressions are regarded as lists.

1. append [x;y].

append [x; y] = [null[x] → y; T → cons [car [x]; append [cdr [x]; y]]]

An example is

append [(A, B); (C, D, E)] = (A, B, C, D, E)

2. among [x;y]. This predicate is true if the S-expression x occurs among


the elements of the list y. We have

among[x; y] = ¬null[y] ∧ [equal[x; car[y]] ∨ among[x; cdr[y]]]


3. pair [x;y]. This function gives the list of pairs of corresponding elements
of the lists x and y. We have

pair[x; y] = [null[x]∧null[y] → NIL; ¬atom[x]∧¬atom[y] → cons[list[car[x]; car[y]]; pair[cdr[x]; cdr[y]]]

An example is

pair[(A, B, C); (X, (Y, Z), U)] = ((A, X), (B, (Y, Z)), (C, U)).

14
4. assoc [x;y]. If y is a list of the form ((u1, v1 ), · · · , (un , vn )) and x is one
of the u’s, then assoc [x; y] is the corresponding v. We have

assoc[x; y] = eq[caar[y]; x] → cadar[y]; T → assoc[x; cdr[y]]]


An example is

assoc[X; ((W, (A, B)), (X, (C, D)), (Y, (E, F )))] = (C, D).

5. sublis[x; y]. Here x is assumed to have the form of a list of pairs


((u1 , v1 ), · · · , (un , vn )), where the u’s are atomic, and y may be any S-expression.
The value of sublis[x; y] is the result of substituting each v for the correspond-
ing u in y. In order to define sublis, we first define an auxiliary function. We
have

sub2[x; z] = [null[x] → z; eq[caar[x]; z] → cadar[x]; T → sub2[cdr[x]; z]]

and

sublis[x; y] = [atom[y] → sub2[x; y]; T → cons[sublis[x; car[y]]; sublis[x; cdr[y]]]

We have

sublis [((X, (A, B)), (Y, (B, C))); (A, X · Y)] = (A, (A, B), B, C)

e. Representation of S-Functions by S-Expressions. S-functions have been


described by M-expressions. We now give a rule for translating M-expressions
into S-expressions, in order to be able to use S-functions for making certain
computations with S-functions and for answering certain questions about S-
functions.
The translation is determined by the following rules in rich we denote the
translation of an M-expression E by E*.
1. If E is an S-expression E* is (QUOTE, E).
2. Variables and function names that were represented by strings of lower-
case letters are translated to the corresponding strings of the corresponding
uppercase letters. Thus car* is CAR, and subst* is SUBST.
3. A form f [e1 ; · · · ; en ] is translated to (f ∗ , e∗1 · · · , e∗n ). Thus cons [car [x];
cdr [x]]∗ is (CONS, (CAR, X), (CDR, X)).
4. {[p1 → e1 ; · · · ; pn → en ]}∗ is (COND, (p∗1 , e∗1 ), · · · , (p∗n · e∗n )).

15
5. {λ[[x1 ; · · · ; xn ]; E]}∗ is (LAMBDA, (x∗1 , · · · , x∗n ), E ∗).
6. {label[a; E]}∗ is (LABEL, a∗ , E ∗ ).
With these conventions the substitution function whose M-expression is
label [subst; λ [[x; y; z]; [atom [z] → [eq [y; z] → x; T → z]; T → cons [subst
[x; y; car [z]]; subst [x; y; cdr [z]]]]]] has the S-expression

(LABEL, SUBST, (LAMBDA, (X, Y, Z), (COND ((ATOM, Z), (COND,


(EQ, Y, Z), X), ((QUOTE, T), Z))), ((QUOTE, T), (CONS, (SUBST, X, Y,
(CAR Z)), (SUBST, X, Y, (CDR, Z)))))))

This notation is writable and somewhat readable. It can be made easier


to read and write at the cost of making its structure less regular. If more
characters were available on the computer, it could be improved considerably.4
f. The Universal S-Function apply. There is an S-function apply with the
property that if f is an S-expression for an S-function f 0 and args is a list of
arguments of the form (arg1 , · · · , argn ), where arg1 , · · · , argn are arbitrary S-
expressions, then apply[f ; args] and f 0 [arg1 ; · · · ; argn ] are defined for the same
values of arg1 , · · · , argn , and are equal when defined. For example,

λ[[x; y]; cons[car[x]; y]][(A, B); (C, D)]

= apply[(LAMBDA, (X, Y ), (CONS, (CAR, X), Y )); ((A, B), (C, D))] = (A, C, D)

The S-function apply is defined by

apply[f ; args] = eval[cons[f ; appq[args]]; NIL],


where

appq[m] = [null[m] → NIL; T → cons[list[QUOT E; car[m]]; appq[cdr[m]]]]

and
eval[e; a] = [

4
1995: More characters were made available on SAIL and later on the Lisp machines.
Alas, the world went back to inferior character sets again—though not as far back as when
this paper was written in early 1959.

16
atom [e] → assoc [e; a];

atom [car [e]] → [

eq [car [e]; QUOTE] → cadr [e];

eq [car [e]; ATOM] → atom [eval [cadr [e]; a]];

eq [car [e]; EQ] → [eval [cadr [e]; a] = eval [caddr [e]; a]];

eq [car [e]; COND] → evcon [cdr [e]; a];

eq [car [e]; CAR] → car [eval [cadr [e]; a]];

eq [car [e]; CDR] → cdr [eval [cadr [e]; a]];

eq [car [e]; CONS] → cons [eval [cadr [e]; a]; eval [caddr [e];

a]]; T → eval [cons [assoc [car [e]; a];

evlis [cdr [e]; a]]; a]];

eq [caar [e]; LABEL] → eval [cons [caddar [e]; cdr [e]];

cons [list [cadar [e]; car [e]; a]];

eq [caar [e]; LAMBDA] → eval [caddar [e];

append [pair [cadar [e]; evlis [cdr [e]; a]; a]]]

and

evcon[c; a] = [eval[caar[c]; a] → eval[cadar[c]; a]; T → evcon[cdr[c]; a]]


and

evlis[m; a] = [null[m] → NIL; T → cons[eval[car[m]; a]; evlis[cdr[m]; a]]]

17
We now explain a number of points about these definitions. 5
1. apply itself forms an expression representing the value of the function
applied to the arguments, and puts the work of evaluating this expression onto
a function eval. It uses appq to put quotes around each of the arguments, so
that eval will regard them as standing for themselves.
2. eval[e; a] has two arguments, an expression e to be evaluated, and a list
of pairs a. The first item of each pair is an atomic symbol, and the second is
the expression for which the symbol stands.
3. If the expression to be evaluated is atomic, eval evaluates whatever is
paired with it first on the list a.
4. If e is not atomic but car[e] is atomic, then the expression has one of the
forms (QUOT E, e) or (AT OM, e) or (EQ, e1 , e2 ) or (COND, (p1, e1 ), · · · , (pn , en )),
or (CAR, e) or (CDR, e) or (CONS, e1 , e2 ) or (f, e1 , · · · , en ) where f is an
atomic symbol.
In the case (QUOT E, e) the expression e, itself, is taken. In the case of
(AT OM, e) or (CAR, e) or (CDR, e) the expression e is evaluated and the
appropriate function taken. In the case of (EQ, e1 , e2 ) or (CONS, e1 , e2 ) two
expressions have to be evaluated. In the case of (COND, (p1 , e1 ), · · · (pn , en ))
the p’s have to be evaluated in order until a true p is found, and then the
corresponding e must be evaluated. This is accomplished by evcon. Finally, in
the case of (f, e1 , · · · , en ) we evaluate the expression that results from replacing
f in this expression by whatever it is paired with in the list a.
5. The evaluation of ((LABEL, f, E), e1 , · · · , en ) is accomplished by eval-
uating (E, e1 , · · · , en ) with the pairing (f, (LABEL, f, E)) put on the front of
the previous list a of pairs.
6. Finally, the evaluation of ((LAMBDA, (x1 , · · · , xn ), E), e1, · · · en ) is ac-
complished by evaluating E with the list of pairs ((x1 , e1 ), · · · , ((xn , en )) put
on the front of the previous list a.
The list a could be eliminated, and LAMBDA and LABEL expressions
evaluated by substituting the arguments for the variables in the expressions
E. Unfortunately, difficulties involving collisions of bound variables arise, but
they are avoided by using the list a.
5
1995: This version isn’t quite right. A comparison of this and other versions of eval
including what was actually implemented (and debugged) is given in “The Influence of the
Designer on the Design” by Herbert Stoyan and included in Artificial Intelligence and Math-
ematical Theory of Computation: Papers in Honor of John McCarthy, Vladimir Lifschitz
(ed.), Academic Press, 1991

18
Calculating the values of functions by using apply is an activity better
suited to electronic computers than to people. As an illustration, however, we
now give some of the steps for calculating
apply [(LABEL, FF, (LAMBDA, (X), (COND, (ATOM, X), X), ((QUOTE,
T),(FF, (CAR, X))))));((A· B))] = A
The first argument is the S-expression that represents the function ff defined
in section 3d. We shall abbreviate it by using the letter φ. We have
apply [φ; ( (A·B) )]
= eval [((LABEL, FF, ψ), (QUOTE, (A·B))); NIL]

where ψ is the part of φ beginning (LAMBDA

= eval[((LAMBDA, (X), ω), (QUOTE, (A·B)));((FF, φ))]

where ω is the part of ψ beginning (COND

= eval [(COND, (π1 , 1 ), (π2 , 2 )); ((X, (QUOTE, (A·B) ) ), (FF, φ) )]

Denoting ((X, (QUOTE, (A·B))), (FF, φ)) by a, we obtain

= evcon [((π1 , 1 ), (π2 , 2 )); a]

This involves eval [π1 ; a]

= eval [( ATOM, X); a]

= atom [eval [X; a]]

= atom [eval [assoc [X; ((X, (QUOTE, (A·B))), (FF,φ))];a]]

= atom [eval [(QUOTE, (A·B)); a]]

= atom [(A·B)],

=F

Our main calulation continues with

19
apply [φ; ((A·B))]

= evcon [((π2 , 2 , )); a],

which involves eval [π2 ; a] = eval [(QUOTE, T); a] = T

Our main calculation again continues with

apply [φ; ((A·B))]

= eval [2 ; a]

= eval [(FF, (CAR, X));a]

= eval [Cons [φ; evlis [((CAR, X)); a]]; a]

Evaluating evlis [((CAR, X)); a] involves

eval [(CAR, X); a]

= car [eval [X; a]]

= car [(A·B)], where we took steps from the earlier computation of atom [eval [X; a]] = A,

and so evlis [((CAR, X)); a] then becomes

list [list [QUOTE; A]] = ((QUOTE, A)),

and our main quantity becomes

= eval [(φ, (QUOTE, A)); a]

The subsequent steps are made as in the beginning of the calculation. The
LABEL and LAMBDA cause new pairs to be added to a, which gives a new
list of pairs a1 . The π1 term of the conditional eval [(ATOM, X); a1 ] has the

20
value T because X is paired with (QUOTE, A) first in a1 , rather than with
(QUOTE, (A·B)) as in a.
Therefore we end up with eval [X; a1 ] from the evcon, and this is just A.

g. Functions with Functions as Arguments. There are a number of useful


functions some of whose arguments are functions. They are especially useful
in defining other functions. One such function is maplist[x; f ] with an S-
expression argument x and an argument f that is a function from S-expressions
to S-expressions. We define

maplist[x; f ] = [null[x] → NIL; T → cons[f [x]; maplist[cdr[x]; f ]]]


The usefulness of maplist is illustrated by formulas for the partial derivative
with respect to x of expressions involving sums and products of x and other
variables. The S-expressions that we shall differentiate are formed as follows.
1. An atomic symbol is an allowed expression.
2. If e1 , e2 , · · · , en are allowed expressions, ( PLUS, e1 , · · · , en ) and (TIMES,
e1 , · · · , en ) are also, and represent the sum and product, respectively, of e1 , · · · , en .
This is, essentially, the Polish notation for functions, except that the in-
clusion of parentheses and commas allows functions of variable numbers of
arguments. An example of an allowed expression is (TIMES, X, (PLUS, X,
A), Y), the conventional algebraic notation for which is X(X + A)Y.
Our differentiation formula, which gives the derivative of y with respect to
x, is

diff [y; x] = [atom [y] → [eq [y; x] → ONE; T → ZERO]; eq [car [Y]; PLUS]
→ cons [PLUS; maplist [cdr [y]; λ[[z]; diff [car [z]; x]]]]; eq[car [y]; TIMES] →
cons[PLUS; maplist[cdr[y]; λ[[z]; cons [TIMES; maplist[cdr [y]; λ[[w]; ¬ eq [z;
w] → car [w]; T → diff [car [[w]; x]]]]]]]
The derivative of the expression (TIMES, X, (PLUS, X, A), Y), as com-
puted by this formula, is
(PLUS, (TIMES, ONE, (PLUS, X, A), Y), (TIMES, X, (PLUS, ONE,
ZERO), Y), (TIMES, X, (PLUS, X, A), ZERO))
Besides maplist, another useful function with functional arguments is search,
which is defined as

search[x; p; f ; u] = [null[x] → u; p[x] → f [x]; T → search[cdr[x]; p; f ; u]

21
The function search is used to search a list for an element that has the property
p, and if such an element is found, f of that element is taken. If there is no
such element, the function u of no arguments is computed.

4 The LISP Programming System


The LISP programming system is a system for using the IBM 704 computer to
compute with symbolic information in the form of S-expressions. It has been
or will be used for the following purposes:
l. Writing a compiler to compile LISP programs into machine language.
2. Writing a program to check proofs in a class of formal logical systems.
3. Writing programs for formal differentiation and integration.
4. Writing programs to realize various algorithms for generating proofs in
predicate calculus.
5. Making certain engineering calculations whose results are formulas
rather than numbers.
6. Programming the Advice Taker system.
The basis of the system is a way of writing computer programs to evaluate
S-functions. This will be described in the following sections.
In addition to the facilities for describing S-functions, there are facilities
for using S-functions in programs written as sequences of statements along the
lines of FORTRAN (4) or ALGOL (5). These features will not be described
in this article.
a. Representation of S-Expressions by List Structure. A list structure is a
collection of computer words arranged as in figure 1a or 1b. Each word of the
list structure is represented by one of the subdivided rectangles in the figure.
The lef t box of a rectangle represents the address field of the word and the
right box represents the decrement field. An arrow from a box to another
rectangle means that the field corresponding to the box contains the location
of the word corresponding to the other rectangle.

22
- - -

- - - -

- - - -

- - - -
- - -

-
- -

- -

Fig. 1

It is permitted for a substructure to occur in more than one place in a list


structure, as in figure 1b, but it is not permitted for a structure to have cycles,
as in figure 1c. An atomic symbol is represented in the computer by a list
structure of special form called the association list of the symbol. The address
field of the first word contains a special constant which enables the program to
tell that this word represents an atomic symbol. We shall describe association
lists in section 4b.
An S-expression x that is not atomic is represented by a word, the address
and decrement parts of which contain the locations of the subexpressions car[x]
and cdr[x], respectively. If we use the symbols A, B, etc. to denote the
locations of the association list of these symbols, then the S-expression ((A ·
B) · (C · (E · F ))) is represented by the list structure a of figure 2. Turning
to the list form of S-expressions, we see that the S-expression (A, (B, C), D),
which is an abbreviation for (A · ((B · (C · NIL)) · (D · NIL))), is represented
by the list structure of figure 2b.

23
- -A - -D
-C

-E F
-B -C
-A B

(a) (b)

Figure 2

When a list structure is regarded as representing a list, we see that each term
of the list occupies the address part of a word, the decrement part of which
points to the word containing the next term, while the last word has NIL in
its decrement.
An expression that has a given subexpression occurring more than once
can be represented in more than one way. Whether the list structure for
the subexpression is or is not repeated depends upon the history of the pro-
gram. Whether or not a subexpression is repeated will make no difference
in the results of a program as they appear outside the machine, although it
will affect the time and storage requirements. For example, the S-expression
((A·B)·(A·B)) can be represented by either the list structure of figure 3a or
3b.

- -

-A B -A B -
-A B

(a) (b)

Figure 3

The prohibition against circular list structures is essentially a prohibition

24
against an expression being a subexpression of itself. Such an expression could
not exist on paper in a world with our topology. Circular list structures would
have some advantages in the machine, for example, for representing recursive
functions, but difficulties in printing them, and in certain other operations,
make it seem advisable not to use them for the present.
The advantages of list structures for the storage of symbolic expressions
are:
1. The size and even the number of expressions with which the program
will have to deal cannot be predicted in advance. Therefore, it is difficult to
arrange blocks of storage of fixed length to contain them.
2. Registers can be put back on the free-storage list when they are no longer
needed. Even one register returned to the list is of value, but if expressions
are stored linearly, it is difficult to make use of blocks of registers of odd sizes
that may become available.
3. An expression that occurs as a subexpression of several expressions need
be represented in storage only once.
b. Association Lists6 . In the LISP programming system we put more in
the association list of a symbol than is required by the mathematical system
described in the previous sections. In fact, any information that we desire to
associate with the symbol may be put on the association list. This information
may include: the print name, that is, the string of letters and digits which
represents the symbol outside the machine; a numerical value if the symbol
represents a number; another S-expression if the symbol, in some way, serves
as a name for it; or the location of a routine if the symbol represents a function
for which there is a machine-language subroutine. All this implies that in the
machine system there are more primitive entities than have been described in
the sections on the mathematical system.
For the present, we shall only describe how print names are represented
on association lists so that in reading or printing the program can establish
a correspondence between information on punched cards, magnetic tape or
printed page and the list structure inside the machine. The association list of
the symbol DIFFERENTIATE has a segment of the form shown in figure 4.
Here pname is a symbol that indicates that the structure for the print name
of the symbol whose association list this is hanging from the next word on
the association list. In the second row of the figure we have a list of three
words. The address part of each of these words points to a Word containing
6
1995: These were later called property lists.

25
six 6-bit characters. The last word is filled out with a 6-bit combination that
does not represent a character printable by the computer. (Recall that the
IBM 7O4 has a 36-bit word and that printable characters are each represented
by 6 bits.) The presence of the words with character information means that
the association lists do not themselves represent S-expressions, and that only
some of the functions for dealing with S-expressions make sense within an
association list.

... - pname - - ....

- - -

- DIFFER - ENTIAT - E ??????

Figure 4

c. Free-Storage List. At any given time only a part of the memory reserved
for list structures will actually be in use for storing S-expressions. The remain-
ing registers (in our system the number, initially, is approximately 15,000) are
arranged in a single list called the free-storage list. A certain register, FREE,
in the program contains the location of the first register in this list. When
a word is required to form some additional list structure, the first word on
the free-storage list is taken and the number in register FREE is changed to
become the location of the second word on the free-storage list. No provision
need be made for the user to program the return of registers to the free-storage
list.
This return takes place automatically, approximately as follows (it is nec-
essary to give a simplified description of this process in this report): There is
a fixed set of base registers in the program which contains the locations of list
structures that are accessible to the program. Of course, because list struc-
tures branch, an arbitrary number of registers may be involved. Each register
that is accessible to the program is accessible because it can be reached from
one or more of the base registers by a chain of car and cdr operations. When

26
the contents of a base register are changed, it may happen that the register
to which the base register formerly pointed cannot be reached by a car − cdr
chain from any base register. Such a register may be considered abandoned
by the program because its contents can no longer be found by any possible
program; hence its contents are no longer of interest, and so we would like to
have it back on the free-storage list. This comes about in the following way.
Nothing happens until the program runs out of free storage. When a free
register is wanted, and there is none left on the free-storage list, a reclamation7
cycle starts.
First, the program finds all registers accessible from the base registers and
makes their signs negative. This is accomplished by starting from each of the
base registers and changing the sign of every register that can be reached from
it by a car − cdr chain. If the program encounters a register in this process
which already has a negative sign, it assumes that this register has already
been reached.
After all of the accessible registers have had their signs changed, the pro-
gram goes through the area of memory reserved for the storage of list structures
and puts all the registers whose signs were not changed in the previous step
back on the free-storage list, and makes the signs of the accessible registers
positive again.
This process, because it is entirely automatic, is more convenient for the
programmer than a system in which he has to keep track of and erase un-
wanted lists. Its efficiency depends upon not coming close to exhausting the
available memory with accessible lists. This is because the reclamation process
requires several seconds to execute, and therefore must result in the addition
of at least several thousand registers to the free-storage list if the program is
not to spend most of its time in reclamation.

d. Elementary S-Functions in the Computer. We shall now describe the


computer representations of atom, = , car, cdr, and cons. An S-expression
is communicated to the program that represents a function as the location of
the word representing it, and the programs give S-expression answers in the
same form.
atom. As stated above, a word representing an atomic symbol has a special
7
We already called this process “garbage collection”, but I guess I chickened out of using
it in the paper—or else the Research Laboratory of Electronics grammar ladies wouldn’t let
me.

27
constant in its address part: atom is programmed as an open subroutine that
tests this part. Unless the M-expression atom[e] occurs as a condition in a
conditional expression, the symbol T or F is generated as the result of the
test. In case of a conditional expression, a conditional transfer is used and the
symbol T or F is not generated.
eq. The program for eq[e; f ] involves testing for the numerical equality of
the locations of the words. This works because each atomic symbol has only
one association list. As with atom, the result is either a conditional transfer
or one of the symbols T or F .
car. Computing car[x] involves getting the contents of the address part of
register x. This is essentially accomplished by the single instruction CLA 0, i,
where the argument is in index register, and the result appears in the address
part of the accumulator. (We take the view that the places from which a
function takes its arguments and into which it puts its results are prescribed
in the definition of the function, and it is the responsibility of the programmer
or the compiler to insert the required datamoving instructions to get the results
of one calculation in position for the next.) (“car” is a mnemonic for “contents
of the address part of register.”)
cdr. cdr is handled in the same way as car, except that the result appears
in the decrement part of the accumulator (“cdr” stands for “contents of the
decrement part of register.”)
cons. The value of cons[x; y] must be the location of a register that has x
and y in its address and decrement parts, respectively. There may not be such
a register in the computer and, even if there were, it would be time-consuming
to find it. Actually, what we do is to take the first available register from the
free-storage list, put x and y in the address and decrement parts, respectively,
and make the value of the function the location of the register taken. (“cons”
is an abbreviation for “construct.”)
It is the subroutine for cons that initiates the reclamation when the free-
storage list is exhausted. In the version of the system that is used at present
cons is represented by a closed subroutine. In the compiled version, cons is
open.

e. Representation of S-Functions by Programs. The compilation of func-


tions that are compositions of car, cdr, and cons, either by hand or by a
compiler program, is straightforward. Conditional expressions give no trouble
except that they must be so compiled that only the p’s and e’s that are re-

28
quired are computed. However, problems arise in the compilation of recursive
functions.
In general (we shall discuss an exception), the routine for a recursive func-
tion uses itself as a subroutine. For example, the program for subst[x; y; z] uses
itself as a subroutine to evaluate the result of substituting into the subexpres-
sions car[z] and cdr[z]. While subst[x; y; cdr[z]] is being evaluated, the result
of the previous evaluation of subst[x; y; car[z]] must be saved in a temporary
storage register. However, subst may need the same register for evaluating
subst[x; y; cdr[z]]. This possible conflict is resolved by the SAVE and UN-
SAVE routines that use the public push-down list 8 . The SAVE routine is
entered at the beginning of the routine for the recursive function with a re-
quest to save a given set of consecutive registers. A block of registers called
the public push-down list is reserved for this purpose. The SAVE routine has
an index that tells it how many registers in the push-down list are already
in use. It moves the contents of the registers which are to be saved to the
first unused registers in the push-down list, advances the index of the list, and
returns to the program from which control came. This program may then
freely use these registers for temporary storage. Before the routine exits it
uses UNSAVE, which restores the contents of the temporary registers from
the push-down list and moves back the index of this list. The result of these
conventions is described, in programming terminology, by saying that the re-
cursive subroutine is transparent to the temporary storage registers.

f. Status of the LISP Programming System (February 1960). A variant of


the function apply described in section 5f has been translated into a program
APPLY for the IBM 704. Since this routine can compute values of S-functions
given their descriptions as S-expressions and their arguments, it serves as an
interpreter for the LISP programming language which describes computation
processes in this way.
The program APPLY has been imbedded in the LISP programming system
which has the following features:
1. The programmer may define any number of S-functions by S-expressions.
these functions may refer to each other or to certain S-functions represented
by machine language program.
2. The values of defined functions may be computed.
3. S-expressions may be read and printed (directly or via magnetic tape).
8
1995: now called a stack

29
4. Some error diagnostic and selective tracing facilities are included.
5. The programmer may have selected S-functions compiled into machine
language programs put into the core memory. Values of compiled functions
are computed about 60 times as fast as they would if interpreted. Compilation
is fast enough so that it is not necessary to punch compiled program for future
use.
6. A “program feature” allows programs containing assignment and go to
statements in the style of ALGOL.
7. Computation with floating point numbers is possible in the system, but
this is inefficient.
8. A programmer’s manual is being prepared. The LISP programming
system is appropriate for computations where the data can conveniently be
represented as symbolic expressions allowing expressions of the same kind as
subexpressions. A version of the system for the IBM 709 is being prepared.

5 Another Formalism for Functions of Sym-


bolic Expressions
There are a number of ways of defining functions of symbolic expressions which
are quite similar to the system we have adopted. Each of them involves three
basic functions, conditional expressions, and recursive function definitions, but
the class of expressions corresponding to S-expressions is different, and so are
the precise definitions of the functions. We shall describe one of these variants
called linear LISP.
The L-expressions are defined as follows:
1. A finite list of characters is admitted.
2. Any string of admitted characters in an L-expression. This includes the
null string denoted by Λ.
There are three functions of strings:
1. f irst[x] is the first character of the string x.
f irst[Λ] is undefined. For example: f irst[ABC] = A
2. rest[x] is the string of characters which remains when the first character
of the string is deleted.
rest[Λ] is undefined. For example: rest[ABC] = BC.
3. combine[x; y] is the string formed by prefixing the character x to the
string y. For example: combine[A; BC] = ABC

30
There are three predicates on strings:
1. char[x], x is a single character.
2. null[x], x is the null string.
3. x = y, defined for x and y characters.
The advantage of linear LISP is that no characters are given special roles,
as are parentheses, dots, and commas in LISP. This permits computations
with all expressions that can be written linearly. The disadvantage of linear
LISP is that the extraction of subexpressions is a fairly involved, rather than
an elementary, operation. It is not hard to write, in linear LISP, functions that
correspond to the basic functions of LISP, so that, mathematically, linear LISP
includes LISP. This turns out to be the most convenient way of programming,
in linear LISP, the more complicated manipulations. However, if the functions
are to be represented by computer routines, LISP is essentially faster.

6 Flowcharts and Recursion


Since both the usual form of computer program and recursive function defi-
nitions are universal computationally, it is interesting to display the relation
between them. The translation of recursive symbolic functions into computer
programs was the subject of the rest of this report. In this section we show
how to go the other way, at least in principle.
The state of the machine at any time during a computation is given by the
values of a number of variables. Let these variables be combined into a vector
ξ. Consider a program block with one entrance and one exit. It defines and is
essentially defined by a certain function f that takes one machine configuration
into another, that is, f has the form ξ 0 = f (ξ). Let us call f the associated
function of the program block. Now let a number of such blocks be combined
into a program by decision elements π that decide after each block is completed
which block will be entered next. Nevertheless, let the whole program still have
one entrance and one exit.

31
?
-?

π1
X XX
z


f1 f2
HH 
j 
H +
S
π2
 
?
f 
3
T
?
π3
 
?
f4
?
?

Figure 5

We give as an example the flowcart of figure 5. Let us describe the function


r[ξ] that gives the transformation of the vector ξ between entrance and exit
of the whole block. We shall define it in conjunction with the functions s(ξ),
and t[ξ], which give the transformations that ξ undergoes between the points
S and T, respectively, and the exit. We have

r[ξ] = [π1 1[ξ] → S[f1 [ξ]]; T → S[f2 [ξ]]]


S[ξ] = [π2 1[ξ] → r[ξ]; T → t[f3 [ξ]]]
t[ξ] = [π3I[ξ] → f4 [ξ]; π3 2[ξ] → r[ξ]; T → t[f3 [ξ]]]

Given a flowchart with a single entrance and a single exit, it is easy to


write down the recursive function that gives the transformation of the state
vector from entrance to exit in terms of the corresponding functions for the
computation blocks and the predicates of the branch. In general, we proceed
as follows.
In figure 6, let β be an n-way branch point, and let f1 , · · · , fn be the
computations leading to branch points β1 , β2 , · · · , βn . Let φ be the function

32
that transforms ξ between β and the exit of the chart, and let φ1 , · · · , φn be
the corresponding functions for β1 , · · · , βn . We then write

φ[ξ] = [p1 [ξ] → φ1 [f1 [ξ]]; · · · ; pn [ξ] → φn [ξ]]]

@
β φ

@
@

A

@ A

A

?
.... AAU
f1 f2 ..... f n

 C
 1 C
?  CW
φn
 φ φ2 

.....



β1 β2 βn

Figure 6

7 Acknowledgments
The inadequacy of the λ-notation for naming recursive functions was noticed
by N. Rochester, and he discovered an alternative to the solution involving
label which has been used here. The form of subroutine for cons which per-
mits its composition with other functions was invented, in connection with
another programming system, by C. Gerberick and H. L. Gelernter, of IBM
Corporation. The LlSP programming system was developed by a group in-
cluding R. Brayton, D. Edwards, P. Fox, L. Hodes, D. Luckham, K. Maling,
J. McCarthy, D. Park, S. Russell.
The group was supported by the M.I.T. Computation Center, and by the
M.I.T. Research Laboratory of Electronics (which is supported in part by the
the U.S. Army (Signal Corps), the U.S. Air Force (Office of Scientific Research,
Air Research and Development Command), and the U.S. Navy (Office of Naval
Research)). The author also wishes to acknowledge the personal financial sup-

33
port of the Alfred P. Sloan Foundation.

REFERENCES

1. J. McCARTHY, Programs with common sense, Paper presented at the


Symposium on the Mechanization of Thought Processes, National Physical
Laboratory, Teddington, England, Nov. 24-27, 1958. (Published in Proceed-
ings of the Symposium by H. M. Stationery Office).
2. A. NEWELL AND J. C. SHAW, Programming the logic theory machine,
Proc. Western Joint Computer Conference, Feb. 1957.
3. A. CHURCH, The Calculi of Lambda-Conversion (Princeton University
Press, Princeton, N. J., 1941).
4. FORTRAN Programmer’s Reference Manual, IBM Corporation, New
York, Oct. 15, 1956.
5. A. J. PERLIS AND K. SAMELS0N, International algebraic language,
Preliminary Report, Comm. Assoc. Comp. Mach., Dec. 1958.

34
Syracuse University
SURFACE
College of Engineering and Computer Science -
Former Departments, Centers, Institutes and College of Engineering and Computer Science
Projects

1998

Definitional interpreters for higher-order programming languages


John C. Reynolds
Syracuse University, Systems and Information Science, john.reynolds@cd.cmu.edu

Follow this and additional works at: https://surface.syr.edu/lcsmith_other

Part of the Programming Languages and Compilers Commons

Recommended Citation
Reynolds, John C., "Definitional interpreters for higher-order programming languages" (1998). College of
Engineering and Computer Science - Former Departments, Centers, Institutes and Projects. 13.
https://surface.syr.edu/lcsmith_other/13

This Article is brought to you for free and open access by the College of Engineering and Computer Science at
SURFACE. It has been accepted for inclusion in College of Engineering and Computer Science - Former Departments,
Centers, Institutes and Projects by an authorized administrator of SURFACE. For more information, please contact
surface@syr.edu.
Higher-Order and Symbolic Computation, 11, 363–397 (1998)
°
c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Definitional Interpreters
for Higher-Order Programming Languages*
JOHN C. REYNOLDS**
Systems and Information Science, Syracuse University

Abstract. Higher-order programming languages (i.e., languages in which procedures or labels can occur as
values) are usually defined by interpreters that are themselves written in a programming language based on the
lambda calculus (i.e., an applicative language such as pure LISP). Examples include McCarthy’s definition of
LISP, Landin’s SECD machine, the Vienna definition of PL/I, Reynolds’ definitions of GEDANKEN, and recent
unpublished work by L. Morris and C. Wadsworth. Such definitions can be classified according to whether the
interpreter contains higher-order functions, and whether the order of application (i.e., call by value versus call by
name) in the defined language depends upon the order of application in the defining language. As an example,
we consider the definition of a simple applicative programming language by means of an interpreter written in a
similar language. Definitions in each of the above classifications are derived from one another by informal but
constructive methods. The treatment of imperative features such as jumps and assignment is also discussed.

Keywords: programming language, language definition, interpreter, lambda calculus, applicative language,
higher-order function, closure, order of application, continuation, LISP, GEDANKEN, PAL, SECD machine,
J-operator, reference.

1. Introduction

An important and frequently used method of defining a programming language is to give an


interpreter for the language that is written in a second, hopefully better understood language.
(We will call these two languages the defined and defining languages, respectively.) In this
paper, we will describe and classify several varieties of such interpreters, and show how
they may be derived from one another by informal but constructive methods. Although
our approach to “constructive classification” is original, the paper is basically an attempt to
review and systematize previous work in the field, and we have tried to make the presentation
accessible to readers who are unfamiliar with this previous work.
(Of course, interpretation can provide an implementation as well as a definition, but there
are large practical differences between these usages. Definitional interpreters often achieve
clarity by sacrificing all semblance of efficiency.)
We begin by noting some salient characteristics of programming languages themselves.
The features of these languages can be divided usefully into two categories: applicative
features, such as expression evaluation and the definition and application of functions,
and imperative features, such as statement sequencing, labels, jumps, assignment, and

* Work supported by Rome Air Force Development Center Contract No. 30602-72-C-0281 and ARPA Contract
No. DAHC04-72-C-0003. This paper originally appeared in the Proceedings of the ACM National Conference,
volume 2, August, 1972, ACM, New York, pages 717–740.
** Current address: Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
e-mail: John.Reynolds@cs.cmu.edu
364 REYNOLDS

procedural side-effects. Most user-oriented languages provide features in both categories.


Although machine languages are usually purely imperative, there are few “higher-level”
languages that are purely imperative. (IPL/V might be an example.) On the other hand,
there is at least one well-known example of a purely applicative language: LISP (i.e., the
language defined in McCarthy’s original paper [1]; most LISP implementations provide
an extended language including imperative features). There are also several more recent,
rather theoretical languages (ISWIM [2], PAL [3], and GEDANKEN [4]) that have been
designed by starting with an applicative language and adding imperative extensions.
Purely applicative languages are often said to be based on a logical system called the
lambda calculus [5, 6], or even to be “syntactically sugared” versions of the lambda calculus.
In particular, Landin [7] has shown that such languages can be reduced to the lambda calculus
by treating each type of expression as an abbreviation for some expression of the lambda
calculus. Indeed, this kind of reducibility could be taken as a precise definition of the
notion of “purely applicative.” However, as we will see, although an unsugared applicative
language is syntactically equivalent to the lambda calculus, there is a subtle semantic
difference. Essentially, the semantics of the “real” lambda calculus implies a different
“order of application” (i.e., normal-order evaluation) than most applicative programming
languages.
A second useful characterization is the notion of a higher-order programming language.
In analogy with mathematical logic, we will say that a programming language is higher-
order if procedures or labels can occur as data, i.e., if these entities can be used as arguments
to procedures, as results of functions, or as values of assignable variables. A language that
is not higher-order will be called first-order.
In ALGOL and its various descendents, procedures and labels can be used as procedure
arguments, and in more recent languages such as PL/I and ALGOL 68, they may also be
used as function results and assignable values, subject to certain “scope” restrictions (which
are imposed to preserve a stack discipline for the storage allocation of the representations
of functions and labels). However, the unrestricted use of procedures and labels as data is
permitted in only a handful of languages which sacrifice efficiency for generality: LISP
(in most of its interpretive implementations), ISWIM, PAL, GEDANKEN, and (roughly)
POP-2.
With regard to current techniques of language definition, there is a substantial disparity
between first-order and higher-order languages. As a result of work by Floyd [8], Manna
[9], Hoare [10], and others, most aspects of first-order languages can be defined logically,
i.e., one can give an effective method for transforming a program in the defined language
into a logical statement of the relation between its inputs and outputs. However, it has not
yet been possible to apply this approach to higher-order languages. (Although recent work
by Scott [12, 13, 14, 15] and Milner [16] represents a major step in this direction.)
Almost invariably, higher-order languages have been defined by the approach discussed
in this paper, i.e., by giving interpreters that are themselves written in a programming
language (An apparent exception is definition of ALGOL given by Burstall [17], but this
can be characterized as a logical definition of a first-order interpreter for a higher-order
language.) Moreover, even when the defined language contains imperative features, the
defining language is usually purely applicative (probably because applicative languages are
well suited for computations with symbolic expressions). Examples include McCarthy’s
DEFINITIONAL INTERPRETERS 365

definition of LISP [1], Landin’s SECD machine [7], the Vienna definition of PL/I [18],
Reynolds’ definitions of GEDANKEN [19], and recent unpublished work by L. Morris [20]
and C. Wadsworth.
(There are a few instances of definitional interpreters that fall outside the conceptual
framework developed in this paper. A broader review of the field is given by deBakker
[21].)
These examples exhibit considerable variety, ranging from very concise and abstract
interpreters to much more elaborate and machine-like ones. To achieve a more precise
classification, we will introduce two criteria. First, we ask whether the defining language is
higher-order, or more precisely, whether any of the functions that comprise the interpreter
either accept or produce values that are themselves functions.
The second criterion involves the notion of order of application. In designing any language
that allows the use of procedures or functions, one must choose between two orders of
application which are called (following ALGOL terminology) call by value and call by
name. Even when the language is purely applicative, this choice will affect the meaning
of some, but not all, programs that can be written in the language. Remembering that an
interpreter is a specific program, we obtain our second criterion: Does the meaning of the
interpreter depend upon the order of application chosen for the defining language?
These two criteria establish four possible classes of interpreters, each of which contains
one or more of the examples cited earlier:

Order-of- Use of higher-order functions:


application
dependence: yes no

yes direct interpreter McCarthy’s


for GEDANKEN definition of LISP

no Morris-Wadsworth SECD machine,


method Vienna definition

The main goal of this paper is to illustrate and relate these classes of definitional inter-
preters. In the next section we will introduce a simple applicative language, which we will
use as the defining language and also, with several restrictions, as the defined language.
Then we will present a simple interpreter that uses higher-order functions and is order-of-
application dependent, and we will transform this interpreter into examples of the three
remaining classes. Finally, we will consider the problem of adding imperative features to
the defined language (while keeping the defining language purely applicative).
366 REYNOLDS

2. A Simple Applicative Language

In an applicative language, the meaningful phrases of a program are called expressions, the
process of executing or interpreting these expressions is called evaluation, and the result of
evaluating an expression is called a value. However, as is evident from a simple arithmetic
expression such as x + y, different evaluations of the same expression can produce different
values, so that the process of evaluation must depend upon something more than just the
expression being evaluated. It is evident that this “something more” must specify a value
for every variable that might occur in the expression (more precisely, occur free). We will
call such a specification an environment, and say that it binds variables to values.
It is also evident that the evaluation process may involve the creation of new environments
from old ones. Suppose x1 , . . . , xn are variables, v1 , . . . , vn are values, and e and e0 are
environments. If e0 specifies the value vi for each xi , and behaves the same way as e for all
other variables, then we will say that e0 is the extension of e that binds the xi ’s to the vi ’s.
The simplest expressions in our applicative language are constants and variables. The
evaluation of a constant always gives the same value, regardless of the environment. We
will not specify the set of constants precisely, but will assume that it contains the integers
and the Boolean constants true and false. The evaluation of a variable simply produces the
value that is bound to that variable by the environment. In the programs in this paper we
will denote variables by alphanumeric strings, with occasional superscripts and subscripts.
If our language is going to involve functions, then we must have a form of expression
whose evaluation will cause the application of function to its arguments. If r0 , r1 , . . . , rn
are expressions, then r0 (r1 , . . . , rn ) is an application expression, whose operator is r0
and whose operands are r1 , . . . , rn . The evaluation of an application expression in an
environment proceeds as follows:
1. The subexpressions r0 , r1 , . . . , rn are evaluated in the same environment to obtain
values f , a1 , . . . , an .
2. If f is not a function of n arguments, then an error stop occurs.
3. Otherwise, the function f is applied to the arguments a1 , . . . , an , and if this application
produces a result, then the result is the value of the application expression.
There are several assumptions hiding behind this description that need to be made explicit:
1. A “function of n arguments” is a kind of value that can be subjected to the process of
being “applied” to a sequence of n values called “arguments”.
2. For some functions and arguments, the process of application may never produce a
result, either because the process does not terminate (i.e., it runs on forever), or because
it causes an error stop. Similarly, for some expressions and environments, the process
of evaluation may never produce a value.
3. In a purely applicative language, the application of the same function to the same
sequence of arguments will always have the same effect, i.e., both the result that is
produced, and the prior question of whether any result is produced, depend only upon
the function and its arguments. Similarly, the evaluation of the same expression in the
same environment will always have the same effect.
DEFINITIONAL INTERPRETERS 367

4. During the evaluation of an application expression, the application process does not
begin until after the operator and all of its operands have been evaluated. This is the
call-by-value order of application mentioned in the introduction. In the alternative order
of application, known as call by name, the application process would begin as soon as
the operator had been evaluated, and each operand would only be evaluated when (and
if) the function being applied actually depended upon its value. This distinction will
be clarified below.

5. Although we have specified that all of the subexpressions r0 , . . . , rn are to be evaluated


before the application process begins we have not specified the relative order in which
these subexpressions are to be evaluated. In a purely applicative language, this choice
has no effect. (A slight exception occurs if the evaluation of one subexpression never
terminates while the evaluation of another gives an error stop.) However, the choice will
become significant when we start adding imperative features to the defined language.
In anticipation of this extension, we will assume that the subexpressions are evaluated
successively from left to right.

Next, we must have a form of expression whose evaluation will produce a function.
If x1 , . . . , xn are variables and r is an expression, then λ(x1 , . . . , xn ). r is a lambda
expression, whose formal parameters are x1 , . . . , xn and whose body is r. (The parentheses
may be omitted if there is only one formal parameter.) The evaluation of a lambda expression
with n formal parameters always terminates and always produces a function of n arguments.
To describe this function, we must specify what will happen when it is applied to its
arguments.
Suppose that f is the function obtained by evaluating λ(x1 , . . . , xn ). r in an environment
e. Then the application of f to the arguments a1 , . . . , an will cause the evaluation of the
body r in the environment that is the extension of e that binds each xi to the corresponding
ai . If this evaluation produces a value, then the value becomes the result of the application
of f .
The key point is that the environment in which the body is evaluated during application is
an extension of the earlier environment in which the lambda expression was evaluated (rather
than the more recent environment in which the application takes place). As a consequence, if
a lambda expression contains global variables (i.e., variables that are not formal parameters),
its evaluation in different environments can produce different functions. For example, the
lambda expression λx. x + y can produce an incrementing function, an identity function
(for the integers), or a decrementing function, when evaluated in environments that bind y
to the values 1, 0, or −1 respectively.
Nowadays, it is generally accepted that this behavior of lambda expressions and environ-
ments is a basic characteristic of a well-designed higher-order language. Its importance is
that it permits functional data to depend upon the partial results of a program.
Having introduced application and lambda expressions, we may now clarify the distinc-
tion between call by value and call by name. Consider the evaluation of an application
expression r0 (r1 , . . . , rn ) in an environment ea , and suppose that the value of the oper-
ator r0 is a function f that was originally created by evaluating the lambda expression
λ(x1 , . . . , xn ). rλ in an environment eλ . (Possibly this lambda expression is r0 itself, but
more generally r0 may be a non-lambda expression whose functional value was created
368 REYNOLDS

earlier in the computation.) When call by value is used, the following steps will occur
during the evaluation of the application expression:
1. r0 is evaluated in the environment ea to obtain the function value f .
2. r1 , . . . , rn are evaluated in the environment ea to obtain arguments a1 , . . . , an .
3. rλ is evaluated in the extension of eλ that binds each xi to the corresponding ai , to
obtain the value of the application expression.
When call by name is used, the same expressions are evaluated in the same environments.
But the evaluations of the operands r1 , . . . , rn will occur at a later time and may occur a
different number of times. Specifically, instead of being evaluated before step (3), each
operand ri is repeatedly evaluated during step (3), each time that its value ai is actually
used (as a function to be applied, a Boolean value determining a branch, or an argument of
a primitive operation).
At first sight, since the evaluation of the same expression in the same environment al-
ways produces the same effect, it would appear that the result of a program in a purely
applicative language should be unaffected by changing the order of application (although
it is evident that the repeated evaluation of operands occurring with call by name can be
grossly inefficient). But this overlooks the possibility that “repeatedly” may mean “never”.
During step (3) of the evaluation of r0 (r1 , . . . , rn ), it may happen that certain arguments
ai are never used, so that the corresponding operands ri will never be evaluated under call
by name. Now suppose that the evaluation of one of these ri never terminates (or gives an
error stop). Then the evaluation of the original application expression will terminate under
call by name but not call by value. In brief, changing the order of application can affect the
value of an application expression when the function being applied is independent of some
of its arguments and the corresponding operands are nonterminating.
(In ALGOL the distinction between call by value and call by name also involves a change
in “coercion conventions”. However, this change is irrelevant in the absence of assignment.)
In the defined language, we will consider only the use of call by value, but in the defin-
ing language we will consider both orders of application. In particular, we will inquire
whether the above-described situation occurs in our interpreters, so that changing the order
of application in the defining language can affect the meaning of the defined language.
We now introduce some additional kinds of expressions. If rp , rc and ra are expressions,
then if rp then rc else ra is a simple conditional expression, whose premiss is rp , whose
conclusion is rc , and whose alternative is ra . The evaluation of a conditional expression
in an environment e begins with the evaluation of its premiss rp in the same environment.
Then, depending upon whether the value of the premiss is true or false, the value of the
conditional expression is obtained by evaluating either the conclusion rc or the alternative
ra in e. Any other value of the premiss causes an error stop.
It is also convenient to use a LISP-like notation for “multiple” conditional expressions.
If rp1 , . . . , rpn and rc1 , . . . , rcn are expressions, then

(rp1 → rc1 , rp2 , → rc2 , . . . , rpn → rcn )

is a multiple conditional expression, with the same meaning as the following sequence of
simple conditional expressions:
DEFINITIONAL INTERPRETERS 369

if rp1 then rc1 else if rp2 then rc2 else · · · if rpn then rcn else error.

Next, we introduce a form of expression (due to Landin [7]) that is analogous to the block
in ALGOL. If x1 , . . . , xn are variables, and r1 , . . . , rn and rb are expressions, then

let x1 = r1 and · · · and xn = rn in rb

is a let expression, whose declared variables are x1 , . . . , xn , whose declaring expressions


are r1 , . . . , rn , and whose body is rb . (We will call each pair xi = ri a declaration.) The
evaluation of a let expression in an environment e begins with the evaluation of its declaring
expressions ri in the same environment. Then the value of the let expression is obtained by
evaluating its body rb in the environment that is the extension of e that binds each declared
variable xi to the value of the corresponding declaring expression ri .
It should be noted that the extended environment only affects the evaluation of the body,
not the declaring expressions. For example, in an environment that binds x to 4, the value
of let x = x + 1 and y = x − 1 in x × y is 15. As a consequence, let expressions cannot be
used (at least directly) to define recursive functions. One might expect, for instance, that

let f = λx. if x = 0 then 1 else x × f (x − 1) in · · ·

would create an extended environment in which f was bound to a recursive function (for
computing the factorial). But in fact, the occurrence of f inside the declaring expression
will not “feel” the binding of f to the value of the declaring expression, so that the resulting
function will not call itself recursively.
To overcome this problem, we introduce a second kind of block-like expression. If
x1 , . . . , xn are variables, `1 , . . . , `n are lambda expressions, and rb is an expression, then

letrec x1 = `1 and · · · and xn = `n in rb

is a recursive let expression, whose declared variables are x1 , . . . , xn , whose declaring


expressions are `1 , . . . , `n , and whose body is rb . The value of a recursive let expression
in an environment e is obtained by evaluating its body in an environment e0 which satisfies
the following property: e0 is the extension of e that binds each declared variable xi to the
function obtained by evaluating the corresponding declaring lambda expression `i in the
environment e0 .
There is a circularity in the property “e0 is the . . . in the environment e0 ” that is char-
acteristic of recursion, and that prevents this property from being an explicit definition of
e0 . To be rigorous, we would have to show that there actually exists an environment that
satisfies this property, and also deal with the possibility that this environment might not be
unique. The mathematical techniques needed to achieve this rigor are beyond the scope
of this paper [22, 12, 13, 14, 15]. However, we will eventually derive an interpreter that
defines recursive let expressions more explicitly.
(It is possible to generalize recursive let expressions by allowing arbitrary declaring
expressions. We have chosen not to do so, since the generalization would considerably
complicate some of the definitional interpreters, and is not unique.)
To maintain generality, we have avoided specifying the set of data that can occur as the
result of expression evaluation (beyond asserting that this set should contain functions and
370 REYNOLDS

the Boolean values true and false). However, it is evident that our language must contain
basic (i.e., built-in) operations and tests for manipulating this data. For example, if integers
are to occur as data, we will need at least an incrementing operation and a test for integer
equality. More likely, we will want all of the usual arithmetic operations and tests. If some
form of structured data is to be used, we will need operations for constructing and analyzing
the structures, and tests for classifying them.
Regardless of the specific nature of the data, there are three ways to introduce basic
operations and tests into our applicative language:
1. We may introduce constants denoting the basic functions (whose application will per-
form the basic operations and tests).
2. We may introduce predefined variables denoting the basic functions. These variables
differ from constants in that the programmer can redefine them with his own decla-
rations. They are specified by introducing an initial environment, to be used for the
evaluation of the entire program, that binds the predefined variables to their functional
values.
3. We may introduce special expressions whose evaluation will perform the basic oper-
ations and tests. Since this approach is used in most programming languages (and in
mathematical notation), we will frequently use the common forms of arithmetic and
Boolean expressions without explanation.

3. The Defined Language

Although our defining language will use all of the features described in the previous section,
along with appropriate basic operations and tests, the defined language will be considerably
more limited, in order to avoid complications that would be out of place in an introductory
paper. Specifically:
1. Functions will be limited to a single argument. Thus all applicative expressions will
have a single operand, and all lambda expressions will have a single formal parameter.
2. Only call by value will be used.
3. Only simple conditional expressions will be used.
4. Nonrecursive let expressions will be excluded.
5. All recursive let expressions will contain a single declaration.
6. Values will be integers, booleans, and functions. The only basic operations and tests
will be functions for incrementing integers and for testing integer equality, denoted by
the predefined variables succ and equal, respectively.
The reader may accept an assurance that these limitations will eliminate a variety of
tedious complications without evading any intellectually significant problems. Indeed,
with slight exceptions, the eliminated features can be regarded as syntactic sugar, i.e., they
can be defined as abbreviations for expressions in the restricted language [7, 4].
DEFINITIONAL INTERPRETERS 371

4. Abstract Syntax

We now turn our attention to the defining language. To permit the writing of interpreters, the
values used in the defining language must include expressions of the defined language. At
first sight, this suggests that we should use character strings as values denoting expressions,
but this approach would enmesh us in questions of grammar and parsing that are beyond the
scope of this paper. (An excellent review of these matters is contained in Reference [23].)
Instead, we use the approach of abstract syntax, originally suggested by McCarthy [24].
In this approach, it is assumed that programs are “really” abstract, hierarchically structured
data objects, and that the character strings that one actually reads into the computer are
simply representations of these abstract objects (in the same sense that digit strings are
representations of integers). Thus the problems of grammar and parsing can be set aside as
“input editing”. (Of course, this does not eliminate these problems, but it separates them
clearly from semantic considerations. See, for example, Wozencraft and Evans [25].)
We are left with two closely related problems: how to define sets of abstract expressions
(and other structured data to be used by the interpreters), and how to define the basic
functions for constructing, analyzing, and classifying these objects. Both problems are
solved by introducing three forms of abstract-syntax equations. (A more elaborate defined
language would require a more complex treatment of abstract syntax, as given in Reference
[18], for example.) Within these equations, upper-case letter strings denote sets, and lower-
case letter strings denote basic functions.
Let S0 , S1 , . . . , Sn be upper-case letter strings and a1 , . . . , an be lowercase letter strings.
Then a record equation of the form

S0 = [a1 : S1 , . . . , an : Sn ]

implies that:
1. S0 is a set, disjoint from any other set defined by a record equation, whose members
are records with n fields in which the value of the ith field belongs to the set Si .
(Mathematically, S0 is a disjoint set in one-to-one correspondence with the Cartesian
product S1 × · · · × Sn .)
2. Each ai (is a predefined variable which) denotes the selector function that accepts a
member of S0 and produces its ith field value.
3. Let s0 be the string obtained from S0 by lowering the case of each character. Then s0 ?
denotes the classifier function that tests whether its argument belong to S0 , and mk-s0
denotes the constructor function of n arguments (belonging to the sets S1 , . . . , Sn ) that
creates a record in S0 from its field values.
For example, the record equation

APPL = [opr: EXP, opnd: EXP]

implies that an application expression (i.e., a member of APPL) is a two-field record whose
field values are both expressions (i.e., members of EXP). It also implies that opr and
opnd are selector functions that produce the first and second field values of an application
372 REYNOLDS

expression, that appl? is a classifier function that tests whether a value is an application
expression, and that mk-appl is a two-argument constructor function that constructs an
application expression from its field values. It is evident that if r1 and r2 are expressions,
¡ ¢
opr mk-appl(r1 , r2 ) = r1
¡ ¢
opnd mk-appl(r1 , r2 ) = r2 ,

and if appl?(r) is true,


¡ ¢
mk-appl opr(r), opnd(r) = r.

The remaining forms of abstract syntax equations are the union equation:

S0 = S1 ∪ · · · ∪ Sn ,

which implies that S0 is the union of sets S1 , . . . , Sn , and the function equation:

S0 = S1 , . . . , Sn → Sr ,

which implies that S0 is the set of n-argument functions that accept arguments in S1 , . . . , Sn
and produce results in Sr . (More precisely, S0 is the set of n-argument functions f with
the property that if f is applied to arguments in the sets S1 , . . . , Sn , and if f terminates
without an error stop, then the result of f belongs to Sr .)
We may now use these forms of abstract syntax equations to define the principal set of
data used by our interpreters, i.e., the set EXP of expressions of the defined language:

EXP = CONST ∪ VAR ∪ APPL ∪ LAMBDA ∪ COND ∪ LETREC


APPL = [opr: EXP, opnd: EXP]
LAMBDA = [fp: VAR, body: EXP]
COND = [prem: EXP, conc: EXP, altr: EXP]
LETREC = [dvar: VAR, dexp: LAMBDA, body: EXP].
A cumbersome but fairly accurate translation into English is that an expression (i.e., a
member of EXP) is one of the following:

1. A constant (a member of CONST),

2. A variable (a member of VAR),

3. An application expression (a member of APPL), which consists of an expression called


its operator (selected by the basic function opr) and an expression called its operand
(selected by opnd),

4. A lambda expression (a member of LAMBDA), which consists of a variable called its


formal parameter (selected by fp) and an expression called its body (selected by body),
DEFINITIONAL INTERPRETERS 373

5. A conditional expression (a member of COND), which consists of an expression called


its premiss (selected by prem) and an expression called its conclusion (selected by conc)
and an expression called its alternative (selected by altr),

6. A recursive let expression (a member of LETREC), which consists of a variable called its
declared variable (selected by dvar), a lambda expression called its declaring expression
(selected by dexp), and an expression called its body (selected by body).

We have purposely left the sets CONST and VAR unspecified. For CONST, we will
assume only that there is a basic function const? which tests whether its argument is a
constant, and a basic function evcon which maps each constant into the value that it denotes.
For VAR, we will assume that there is a basic function var? which tests whether its argument
is a variable, that variables can be tested for equality (of the variables themselves, not their
values), and that two particular variables are denoted by the quoted strings “succ” and
“equal”.
We must also define the abstract syntax of two other data sets that will be used by our
interpreter. The first is the set VAL of values of the defined language:

VAL = INTEGER ∪ BOOLEAN ∪ FUNVAL


FUNVAL = VAL → VAL.
One must be careful not to confuse values in the defined and defining languages. Strictly
speaking, VAL is a subset of the values of the defining language whose members represent
the values of the defined language. However, since the variety of values provided in the
defining language is richer than in the defined language, we have been able to represent
each defined-language value by the same defining-language value. In our later interpreters
this situation will change, and it will become more evident that VAL is a set of value
representations.
Finally, we must define the set ENV of environments. Since the purpose of an environment
is to specify the value that is bound to each variable, the simplest approach is to assume
that an environment is a function from variables to values, i.e.,

ENV = VAR → VAL.

Within the various interpreters that we will present, each variable will range over some
set defined by abstract syntax equations. For clarity, we will use different variables for
different sets, as summarized in the following table:

Variable Range Variable Range


r EXP e e0 ENV
xz VAR c c0 CONT
` LAMBDA m m0 m00 MEM
ab VAL rf REF
f FUNVAL n INTEGER

(The sets CONT, MEM, and REF will be defined later.)


374 REYNOLDS

5. A Meta-Circular Interpreter

Our first interpreter is a straightforward transcription of the informal language definition


we have already given. Its central component is a function eval that produces the value of
an expression r in a environment e:

eval = λ(r, e). I.1


¡
const?(r) → evcon(r), I.2
var?(r) → e(r), I.3
¡ ¢¡ ¢
appl?(r) → eval(opr(r), e) eval(opnd(r), e) , I.4
lambda?(r) → evlambda(r, e), I.5
cond?(r) → if eval(prem(r), e) I.6
then eval(conc(r), e) else eval(altr(r), e), I.7
letrec?(r) → letrec e0 = I.8
λx. if x = dvar(r) then evlambda(dexp(r), e0 ) else e(x) I.9
¢
in eval(body(r), e0 ) I.10
¡ ¢
evlambda = λ(`, e). λa. eval body(`), ext(fp(`), a, e) I.11
ext = λ(z, a, e). λx. if x = z then a else e(x). I.12

The subsidiary function evlambda produces the value of a lambda expression ` in an


environment e. (We have extracted it as a separate function since it is called from two
places, in lines I.5 and I.9.) The subsidiary function ext produces the extension of an
environment e that binds the variable z to the value a. It should be noted that, in the
evaluation of a recursive let expression (lines I.8 to I.10), the circularity in the definition of
the extended environment e0 is handled by making e0 a recursive function. (However, it is
a rather unusual recursive function which, instead of calling itself, calls another function
evlambda, to which it provides itself as an argument.)
The function eval does not define the meaning of the predefined variables. For this
purpose, we introduce the “main” function interpret, which causes a complete program r
to be evaluated in an initial environment initenv that maps each predefined variable into the
corresponding basic function:

interpret = λr. eval(r, initenv) I.13


¡
initenv = λx. x = “succ” → λa. succ(a), I.14
... ¢
x = “equal” → λa. λb. equal(a, b) . I.15
...

In the last line we have used a trick called Currying (after the logician H. Curry) to
solve the problem of introducing a binary operation into a language where all functions
must accept a single argument. (The referee comments that although “Currying” is tastier,
“Schönfinkeling” might be more accurate.) In the defined language, equal is a function
which accepts a single argument a and returns another function, which in turn accepts a
single argument b and returns true or false depending upon whether a = b. Thus in the
defined language, one would write (equal(a))(b) instead of equal(a, b).
DEFINITIONAL INTERPRETERS 375

(Each of our interpreters will consist of a sequence of function declarations. We will


assume that these are implicitly embedded in a recursive let expression whose body is
interpret(R), where R is the program to be interpreted.)
We have coined the word “meta-circular” to indicate the basic character of this interpreter:
It defines each feature of the defined language by using the corresponding feature of the
defining language. For example, when eval is applied to an application expression (lambda
expression, conditional expression, recursive let expression) of the defined language, it
evaluates an application expression (lambda expression, conditional expression, recursive
let expression) in the defining language. Similarly, the initial environment defines the basic
functions of the defined language in terms of the same functions in the defining language.
In one sense, this situation is not undesirable. For the reader who already has a thorough
and correct understanding of the defining language, a meta-circular definition will provide
a concise and complete description of the defined language. (Of course this is a rather
vacuous accomplishment when the defined language is a subset of the defining language.)
The problem is that any misunderstandings about the defining language are likely to be
carried over to the defined language intact. For example, if we were to assume that in
the defining language, the function succ decreases an integer by one, or that a conditional
expression gives the same result when the value of its premiss is non-Boolean as when
it is false, the above interpreter would lead us to the same assumptions about the defined
language.
These particular difficulties are easily overcome; we could define functions such as succ
in terms of elementary mathematics, and we could insert explicit tests for erroneous values.
But there are three objections to meta-circularity that are much more serious:

1. The meta-circular interpreter does not shed much light on the nature of higher-order
functions. For this purpose, we would prefer an interpreter of a higher-order defined
language that was written in a first-order defining language.
2. Changing the order of application used in the defining language induces a similar change
in the defined language. To see this, suppose that eval is applied to an application
expression r0 (r1 ) of the defined language. Then the result of eval will be obtained by
evaluating the application expression (line I.4)
¡ ¢¡ ¢
eval(r0 , e) eval(r1 , e)

in the defining language. If call by value is used in the defining language, then eval(r1 , e)
will be evaluated before the functional value of eval(r0 , e) is applied. But evaluating
eval(r1 , e) interprets the evaluation of r1 , and applying the value of eval(r0 , e) interprets
the application of the value of r0 . Thus in terms of the defined language, r1 will be
evaluated before the value of r0 is applied, i.e., call by value will be used in the defined
language.
On the other hand, if call by name is used in the defining language, then the application
of the functional value of eval(r0 , e) will begin as soon as eval(r0 , e) has been evaluated,
and the operand eval(r1 , e) will only be evaluated when and if the function being applied
depends upon its value. In terms of the defined language, the application of the value of
r0 will begin as soon as r0 has been evaluated, and the operand r1 will only be evaluated
376 REYNOLDS

when and if the function being applied depends upon its value, i.e., call by name will
be used in the defined language.

3. Suppose we wish to extend the defined language by introducing the imperative features
of labels and jumps (including jumps out of blocks). As far as is known, it is impossible
to extend the meta-circular definition straightforwardly to accommodate these features
(without introducing similar features into the defining language).

In the following sections we will develop transformations of the meta-circular interpreter


that will meet the first two of these objections. Then we will find that the transformation
designed to meet the second objection also meets the third.
It should be emphasized that, although these transformations are motivated by their ap-
plication to interpreters, they are actually applicable to any program written in the defining
language, and their validity depends entirely upon the properties of the defining language.

6. Elimination of Higher-Order Functions

Our first task is to modify the meta-circular interpreter so that none of the functions that
comprise this interpreter accept arguments or produce results that are functions. An exam-
ination of the abstract syntax shows that this goal will be met if we can replace the two sets
FUNVAL and ENV by sets of values that are not functions. Specifically, the new members
of these sets will be records that represent functions.
We first consider the set FUNVAL. Since the new members of this set are to be records
rather than functions, we can no longer apply these members directly to arguments. Instead
we will introduce a new function apply that will “interpret” the new members of FUNVAL.
Specifically, if fnew is a record in FUNVAL that represents a function fold and if a is any
member of VAL, then apply(fnew , a) will produce the same result as fold (a). Assuming
for the moment that we will be able to define apply, we must replace each application of
a member of FUNVAL (to an argument a) by an application of apply (to the member of
FUNVAL and the argument a). In fact, the only such application occurs in line I.4, which
must become
¡ ¢
appl?(r) → apply eval(opr(r), e), eval(opnd(r), e) . I.40

To decide upon the form of the new members of FUNVAL, we recall that whenever a
function is obtained by evaluating a lambda expression, the function will be determined
by two items of information: (1) the lambda expression itself, and (2) the values that were
bound to the global variables of the lambda expression at the time of its evaluation. It is
evident that these items of information will be sufficient to represent the function. This
suggests that the new set FUNVAL should be a union of disjoint sets of records, one set
for each lambda expression whose value belonged to the old FUNVAL, and that the fields
of each record should contain values of the global variables of the corresponding lambda
expression.
In fact, the meta-circular interpreter contains four lambda expressions (indicated by solid
underlining) that produce members of FUNVAL. The following table gives their locations
and global variables, and the equations defining the new sets of records that will represent
DEFINITIONAL INTERPRETERS 377

their values. (The connotations of the set and selector names we have chosen will become
apparent when we discuss the role of these entities in the interpretation of the defined
language.)

Location Global Variables New Record Equation


I.11 `e CLOSR = [lam: LAMBDA, en: ENV]
I.14 none SC = [ ]
I.15 (outer) none EQ1 = [ ]
I.15 (inner) a EQ2 = [arg1: VAL]

Thus the new set FUNVAL will be

FUNVAL = CLOSR ∪ SC ∪ EQ1 ∪ EQ2,

and the overall structure of apply will be:


apply = λ(f, a).
¡
closr?(f ) → · · · ,
sc?(f ) → · · · ,
eq1?(f ) → · · · ,
¢
eq2?(f ) → · · · .

Our remaining task is to replace each of the four solidly underlined lambda expressions
by appropriate record-creation operations, and to insert expressions in the branches of apply
that will interpret the corresponding records. The lambda expression in line I.11 must be
replaced by an expression that creates a CLOSR-record containing the value of the global
variables ` and e:
evlambda = λ(`, e). mk-closr(`, e). I.110

Now apply(f, a) must produce the result of applying the function represented by f to
the argument a. When f is a CLOSR-record, this result may be obtained by evaluating the
body
¡ ¢
eval body(`), ext(fp(`), a, e)

of the replaced lambda expression in an appropriate environment. This environment must


bind the formal parameter a of the replaced lambda expression to the value of a and must bind
the global variables ` and e of the lambda expression to the same value as the environment
in which the CLOSR-record f was created. Since the latter values are stored in the fields
of f , we have:
apply = λ(f, a).
¡
closr?(f ) → let a = a and ` = lam(f ) and e = en(f )
¡ ¢
in eval body(`), ext(fp(`), a, e) ,
¢
... .
378 REYNOLDS

(In this particular case, but not in general, the declaration a = a is unnecessary, since the
formal parameter of the replaced lambda expression and the second formal parameter of
apply are the same variable. From now on, we will omit such vacuous declarations.)
A similar treatment (somewhat simplified since there are no global variables) of the
lambda expression in I.14 and the outer lambda expression in I.15 gives:
¡
initenv = λx. x = “succ” → mk-sc(), I.140
... ¢
x = “equal” → mk-eq1() I.150
...

and

apply = λ(f, a).


¡
closr?(f ) → let ` = lam(f ) and e = en(f )
¡ ¢
in eval body(`), ext(fp(`), a, e) ,
sc?(f ) → succ(a),
eq1?(f ) → λb. equal(a, b),
¢
eq2?(f ) → · · · .
Finally, we must replace the lambda expression that originally occurred as the inner
expression in I.15. Although we have already moved this expression into the body of apply
(since it was the body of a previously replaced lambda expression), the same basic treatment
can be applied to the new occurrence, giving:

apply = λ(f, a).


¡
closr?(f ) → let ` = lam(f ) and e = en(f )
¡ ¢
in eval body(`), ext(fp(`), a, e) ,
sc?(f ) → succ(a),
eq1?(f ) → mk-eq2(a),
¢
eq2?(f ) → let b = a and a = arg1(f ) in equal(a, b) .
(Note that the declaration relating formal parameters is not vacuous in this case.)
The entire transformation that converts FUNVAL from a set of functions to a set of records
has been informally justified by appealing to an understanding of the defining language,
without regard to the meaning or use of the particular program being transformed. But now
it is illuminating to examine the different kinds of records in FUNVAL in terms of their
role in the interpretation of the defined language. The records in the set CLOSR represent
functional values that are produced by evaluating the lambda expressions occurring in the
defined language programs. They are equivalent to the objects called FUNARG triplets
in LISP and closures in the work of Landin [7]. The unique records in the one-element
sets SC and EQ1 represent the basic functions succ and equal. Finally, the records in EQ2
represent the functions that are created by applying equal to one argument.
A similar transformation can be used to “defunctionalize” the set ENV of environments.
To interpret the new members of ENV, we will introduce a interpretive function get, with
the property that if enew represents an environment eold and x is a member of VAR, then
DEFINITIONAL INTERPRETERS 379

get(enew , x) = eold (x). Applications of get must be inserted at the three points (in lines
I.3, I.9, and I.12) in the interpreter where environments are applied to variables:

var?(r) → get(e, r), I.30


..
.
λx. if x = dvar(r) then evlambda(dexp(r), e0 ) else get(e, x) I.90
..
.
ext = λ(z, a, e). λx. if x = z then a else get(e, x). I.120

Next, there are three lambda expressions that produce environments; they are indicated by
broken underlining which we have carefully preserved during the previous transformations.
The following table gives their locations and global variables, and the equations defining
the new sets of records that will represent their values:

Location Global Variables New Record Equation


0 0
I.14 -15 none INIT = [ ]
I.120 zae SIMP = [bvar: VAR, bval: VAL, old: ENV]
I.90 r e e0 REC = [letx: LETREC, old: ENV, new: ENV]

Thus the new set of environment representations is:

ENV = INIT ∪ SIMP ∪ REC.

Replacement of the three environment-producing lambda expressions gives:

letrec?(r) → letrec e0 = mk-rec(r, e, e0 ) · · · I.8-900


..
.
ext = λ(z, a, e). mk-simp(z, a, e) I.1200
..
.
initenv = mk-init(), I.1400 -1500

and the environment-interpreting function is:

get = λ(e, x).


¡ ¡ ¢
init?(e) → x = “succ” → mk-sc(), x = “equal” → mk-eq1() ,
simp?(e) → let z = bvar(e) and a = bval(e) and e = old(e)
in if x = z then a else get(e, x),
rec?(e) → let r = letx(e) and e = old(e) and e0 = new(e)
¢
in if x = dvar(r) then evlambda(dexp(r), e0 ) else get(e, x) .

But now we are faced with a new problem. By eliminating the lambda expression in I.90 ,
we have created a recursive let expression
380 REYNOLDS

letrec e0 = mk-rec(r, e, e0 ) · · ·

that violates the structure of the defining language, since its declaring subexpression is no
longer a lambda expression. However, there is still an obvious intuitive interpretation of
this illicit construction: it binds e0 to a “cyclic” record, whose last field is (a pointer to) the
record itself.
If we accept this interpretation, then whenever e is a member of REC, we will have
new(e) = e. This allows us to replace the only occurrence of new(e) by e, so that the
penultimate line of get becomes:

rec?(e) → let r = letx(e) and e = old(e) and e0 = e · · · .

But now our program no longer contains any references to the cyclic new fields, so that
these fields can be deleted from the records in REC. Thus the record equation for REC is
reduced to:

REC = [letx: LETREC, old: ENV],

and the offending recursive let expression becomes:


letrec?(r) → let e0 = mk-rec(r, e) · · · . I.80 -9000

At this point, once we have collected the bits and pieces produced by the various trans-
formations, we will have obtained an interpreter that no longer contains any higher-order
functions. However, it is convenient to make a few simplications:
1. let expressions can be eliminated by substituting the declaring expressions for each
occurrence of the corresponding declared variables in the body.
2. Line I.110 can be eliminated by replacing occurrences of evlambda by mk-closr.
3. Line I.1200 can be eliminated by replacing occurrences of ext by mk-simp.
4. Lines I.1400 -1500 can be eliminated by replacing occurrences of initenv by mk-init().

Thus we obtain our second interpreter:

FUNVAL = CLOSR ∪ SC ∪ EQ1 ∪ EQ2


CLOSR = [lam: LAMBDA, en: ENV]
SC = [ ]
EQ1 = [ ]
EQ2 = [arg1: VAL]
ENV = INIT ∪ SIMP ∪ REC
INIT = [ ]
SIMP = [bvar: VAR, bval: VAL, old: ENV]
REC = [letx: LETREC, old: ENV]
DEFINITIONAL INTERPRETERS 381

¡ ¢
interpret = λr. eval r, mk-init() II.1
eval = λ(r, e). II.2
¡
const?(r) → evcon(r), II.3
var?(r) → get(e, r), II.4
¡ ¢
appl?(r) → apply eval(opr(r), e), eval(opnd(r), e) , II.5
lambda?(r) → mk-closr(r, e), II.6
cond?(r) → if eval(prem(r), e) II.7
then eval(conc(r), e) else eval(altr(r), e), II.8
¢
letrec?(r) → eval(body(r), mk-rec(r, e)) II.9
apply = λ(f, a). II.10
¡
closr?(f ) → II.11
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )) , II.12
sc?(f ) → succ(a), II.13
eq1?(f ) → mk-eq2(a), II.14
¢
eq2?(f ) → equal(arg1(f ), a) II.15
get = λ(e, x). II.16
¡ ¡ ¢
init?(e) → x = “succ” → mk-sc(), x = “equal” → mk-eq1() , II.17
simp?(e) → if x = bvar(e) then bval(e) else get(old(e), x), II.18
rec?(e) → if x = dvar(letx(e)) II.19
¢
then mk-closr(dexp(letx(e)), e) else get(old(e), x) . II.20
Just as with FUNVAL, we may examine the different kinds of records in ENV with regard
to their role in the interpretation of the defined language. The unique record in INIT has
no subfields, while the records in SIMP and REC each have one field (selected by old) that
is another member of ENV. Thus environments in our second interpreter are linear lists (in
which each element specifies the binding of a single variable), and the unique record in
INIT serves as the empty list.
It is easily seen that get(e, x) searches such a list to find the binding of the variable x.
When get encounters a record in SIMP, it compares x with the bvar field, and if a match
occurs, it returns the value stored in the bval field. When get encounters a record in REC,
it compares x with dvar(letx(e)) (the declared variable of the recursive let expression
that created the binding), and if a match occurs, it returns the value obtained by evaluating
dexp(letx(e)) (the declaring subexpression of the same recursive let expression) in the
environment e. The fact that e includes the very binding that is being “looked up” reflects
the essential recursive characteristic that the declaring subexpression should “feel” the effect
of the declaration in which it is embedded. When get encounters the empty list, it compares
x with each of the predefined variables, and if a match is found, it returns the appropriate
value.
The definition of get reveals the consequences of our restricting recursive let expressions
by requiring that their declaring subexpressions should be lambda expressions. Because of
this restriction, the declaring subexpressions are always evaluated by the trivial operation
of forming a closure. Therefore, the function get always terminates, since it never calls any
other recursive function, and can never call itself more times than the length of the list that
382 REYNOLDS

it is searching. (On the other hand, if we had permitted arbitrary declaring subexpressions,
line II.20 would contain eval(dexp(letx(e)), e) instead of mk-closr(dexp(letx(e)), e).
This seemingly slight modification would convert get into a function that might run on
forever, as for example, when looking up the variable k in an environment created by the
defined-language construction letrec k = k + 1 in · · · .)
The second interpreter is similar in style, and in many details, to McCarthy’s definition of
LISP [1]. The main differences arise from our insistence upon FUNARG binding, the use
of recursive let expressions instead of label expressions, and the use of predefined variables
instead of variables with flagged property lists.

7. Continuations

The transition from the meta-circular interpreter to our second interpreter has not elimi-
nated order-of-application dependence. It can easily be seen that a change in the order of
application used in the defining-language expression (in II.5)
¡ ¢
apply eval(opr(r), e), eval(opnd(r), e)

will cause a similar change for all application expressions of the defined language.
To eliminate this dependence, we must first identify the circumstances under which an
arbitrary program in the defining language will be affected by the order of application. The
essential effect of switching from call by value to call by name is to postpone the evaluation
of the operands of application expressions (and declaring subexpressions of let expressions),
and to alter the number of times these operands are evaluated. We have already seen that in
a purely applicative language, the only way in which this change can affect the meaning of
a program is to avoid the evaluation of a nonterminating operand. Now suppose we define
an expression to be serious if there is any possibility that its evaluation might not terminate.
Then a sufficient condition for order-of-application independence is that a program should
contain no serious operands or declaring expressions.
Next, suppose that we can divide the functions that may be applied by our program into
serious functions, whose application may sometimes run on forever, and trivial functions,
whose application will always terminate. (Of course, it is well-known that one cannot
effectively decide whether an arbitrary function will always terminate, but one can still
establish this classification in a “fail-safe” manner, i.e., classify a function as serious unless
it can be shown to terminate for all arguments.) Then an expression will only be serious
if its evaluation can cause the application of a serious function, and a program will be
independent of order-of-application if no operand or declaring expression can cause such
an application.
At first sight, this condition appears to be so restrictive that it could not be met in a
nontrivial program. As can be seen with a little thought, the condition implies that whenever
some function calls a serious function, the calling function must return the same result as
the called function, without performing any further computation. But any function that
calls a serious function must be serious itself. Thus by induction, as soon as any serious
function returns a result, every function must immediately return the same result, which
must therefore be the final result of the entire program.
DEFINITIONAL INTERPRETERS 383

Nevertheless, there is a method for transforming an arbitrary program into one that meets
our apparently restrictive condition. The underlying idea has appeared in a variety of
contexts [26, 27, 28], but its application to definitional interpreters is due to L. Morris
[20] and Wadsworth. Basically, one replaces each serious function fold (except the main
program) by a new serious function fnew that accepts an additional argument c called a
continuation. The continuation will be a function itself, and fnew is expected to compute
the same result as fold , apply the continuation to this result, and then return the result of
the continuation, i.e.,
¡ ¢
fnew (x1 , . . . , xn , c) = c fold (x1 , . . . , xn ) .

This introduction of continuations provides an additional “degree of freedom” that can


be used to meet the condition of order-of-application independence. Essentially, instead
of performing further actions after a serious function has returned, one embeds the further
actions in the continuation that is passed to the serious function.
To transform our second interpreter, we must first classify its functions. Since the defined
language contains expressions and functions whose evaluation and application may never
terminate, the defining-language functions eval and apply are serious and must be altered
to accept continuations. On the other hand, since we have seen that get always terminates,
it is trivial and will not be altered. (Note that this situation would change if the defined
language permitted recursive let expressions with arbitrary declaring subexpressions.)
Both eval and apply produce results in the set VAL, so that the arguments of continua-
tions will belong to this set. The result of a continuation will always be the value of the
entire program being interpreted, which will also belong to the set VAL. Thus the set of
continuations is:

CONT = VAL → VAL.

(In a more complicated interpreter in which different serious functions produced different
kinds of results, we would introduce different kinds of continuations.)
The overall form of our transformed interpreter will be:

interpret = λr. eval(r, mk-init(), λa. a) II.10


eval = λ(r, e, c). · · · II.20
apply = λ(f, a, c). · · · II.100
get = same as in Interpreter II. II.16–20

Note that the “main level” call of eval by interpret provides an identity function as the
initial continuation.
We must now alter each branch of eval and apply to apply the continuation c to the
former results of these functions. In lines II.3, 4, 6, 13, 14, and 15, the branches evaluate
expressions which are not serious, and which are therefore permissible operands. Thus in
these cases, we may simply apply the continuation c to each expression:
384 REYNOLDS

eval = λ(r, e, c). II.20


¡
const?(r) → c(evcon(r)), II.30
var?(r) → c(get(e, r)), II.40
..
.
¢
lambda?(r) → c(mk-closr(r, e)), . . . II.60
¡
apply = λ(f, a, c). . . . , II.100
sc?(f ) → c(succ(a)), II.130
eq1?(f ) → c(mk-eq2(a)), II.140
¢
eq2?(f ) → c(equal(arg1(f ), a)) . II.150

In lines II.9 and II.12, the branches evaluate expressions that are serious themselves
but contain no serious operands. By themselves, these expressions are permissible, but
they must not be used as operands in applications of the continuation. The solution is
straightforward; instead of applying the continuation c to the result of eval, we pass c as an
argument to eval, i.e., we “instruct” eval to apply c before returning its result:
¢
letrec?(r) → eval(body(r), mk-rec(r, e), c) II.90
..
.
¡
closr?(f ) → II.110
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )), c . II.120

The most complex part of our transformation occurs in the branch of eval that evaluates
application expressions in line II.5. Here we must perform four serious operations:

1. Evaluate the operator.

2. Evaluate the operand.

3. Apply the value of the operator to the value of the operand.

4. Apply the continuation c to the result of (3).

Moreover, we must specify explicitly that these operations are to be done in the above order.
This will insure that the defined language uses call by value, and also that the subexpressions
of an application expression are evaluated from left to right (operator before operand).
The solution is to call eval to perform operation (1), to give this call of eval a continuation
that will call eval to perform operation (2), to give the second call of eval a continuation that
will call apply to perform (3), and to give apply a continuation (the original continuation c)
that will perform (4). Thus we have:
¡ ¡ ¢¢
appl?(r) → eval opr(r), e, λf. eval opnd(r), e, λa. apply(f, a, c) . II.50

A similar approach handles the branch that evaluates conditional expressions in lines II.7
and 8. Here there are three serious operations to be performed successively:
DEFINITIONAL INTERPRETERS 385

1. Evaluate the premiss.

2. Evaluate the conclusion or the alternative, depending on the result of (1).

3. Apply the continuation c to the result of (2).

The transformed branch is:


¡
cond?(r) → eval prem(r), e, II.70
¢
λb. if b then eval(conc(r), e, c) else eval(altr(r), e, c) . II.80

Combining the scattered pieces of our transformed interpreter, we have:

interpret = λr. eval(r, mk-init(), λa. a) II.10


eval = λ(r, e, c). II.20
¡
const?(r) → c(evcon(r)), II.30
var?(r) → c(get(e, r)), II.40
¡ ¡ ¢¢
appl?(r) → eval opr(r), e, λf. eval opnd(r), e, λa. apply(f, a, c) , II.50
lambda?(r) → c(mk-closr(r, e)), II.60
¡
cond?(r) → eval prem(r), e, II.70
¢
λb. if b then eval(conc(r), e, c) else eval(altr(r), e, c) , II.80
¢
letrec?(r) → eval(body(r), mk-rec(r, e), c) II.90
apply = λ(f, a, c). II.100
¡
closr?(f ) → II.110
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )), c , II.120
sc?(f ) → c(succ(a)), II.130
eq1?(f ) → c(mk-eq2(a)), II.140
¢
eq2?(f ) → c(equal(arg1(f ), a)) II.150
get = same as in Interpreter II. II.16–20

At this stage, since continuations are functional arguments, we have achieved order-of-
application independence at the price of re-introducing higher-order functions. Fortunately,
we can now “defunctionalize” the set CONT in the same way as FUNVAL and ENV. To
interpret the new members of CONT we introduce a function cont such that if cnew represents
the continuation cold and a is a member of VAL then cont(cnew , a) = cold (a). The
application of cont must be introduced at each point in eval and apply where a continuation
is applied to a value, i.e., in lines II.30 , 40 , 60 , 130 , 140 , and 150 .
There are four lambda expressions, indicated by solid underlining, that create continu-
ations. The following table gives their locations and global variables, and the equations
defining the new sets of records that will represent their values:
386 REYNOLDS

Location Global Variables New Record Equation


II.10 none FIN = [ ]
II.50 (outer) rec EVOPN = [ap: APPL, en: ENV, next: CONT]
II.50 (inner) fc APFUN = [fun: VAL, next: CONT]
II.80 rec BRANCH = [cn: COND, en: ENV, next: CONT]

By replacing these lambda expressions by record-creation operations and moving their


bodies into the new function cont (within let expressions that rebind their formal parameters
and global variables appropriately), we obtain a third interpreter, which is independent of
order-of-application and does not use higher-order functions:

CONT = FIN ∪ EVOPN ∪ APFUN ∪ BRANCH


FIN = [ ]
EVOPN = [ap: APPL, en: ENV, next: CONT]
APFUN = [fun: VAL, next: CONT]
BRANCH = [cn: COND, en: ENV, next: CONT]
FUNVAL, ENV, etc. = same as in Interpreter II.
interpret = λr. eval(r, mk-init(), mk-fin())
eval = λ(r, e, c).
¡
const?(r) → cont(c, evcon(r)),
var?(r) → cont(c, get(e, r)),
appl?(r) → eval(opr(r), e, mk-evopn(r, e, c)),
lambda?(r) → cont(c, mk-closr(r, e)),
cond?(r) → eval(prem(r), e, mk-branch(r, e, c)),
¢
letrec?(r) → eval(body(r), mk-rec(r, e), c) III
apply = λ(f, a, c).
¡
closr?(f ) →
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )), c ,
sc?(f ) → cont(c, succ(a)),
eq1?(f ) → cont(c, mk-eq2(a)),
¡ ¢¢
eq2?(f ) → cont c, equal(arg1(f ), a)
cont = λ(c, a).
¡
fin?(c) → a,
evopn?(c) → let f = a and r = ap(c) and e = en(c) and c = next(c)
in eval(opnd(r), e, mk-apfun(f, c)),
apfun?(c) → let f = fun(c) and c = next(c) in apply(f, a, c),
branch?(c) → let b = a and r = cn(c) and e = en(c) and c = next(c)
¢
in if b then eval(conc(r), e, c) else eval(altr(r), e, c)
get = same as in Interpreter II.
DEFINITIONAL INTERPRETERS 387

From their abstract syntax, it is evident that continuations in our third interpreter are linear
lists, with the unique record in FIN acting as the empty list, and the next fields in the other
records acting as link fields. In effect, a continuation is a list of instructions to be interpreted
by the function cont. Each instruction accepts a “current value” (the second argument of
cont) and produces a new value that will be given to the next instruction. The following list
gives approximate meanings for each type of instruction:
FIN: The current value is the final value of the program. Halt.

EVOPN: The current value is the value of an operator. Evaluate the operand of the appli-
cation expression in the ap field, using the environment in the en field. Then obtain a
new value by applying the current value to the value of the operand.

APFUN: The current value is the value of an operand. Obtain a new value by applying the
function stored in the fun field to the current value.
BRANCH: The current value is the value of a premiss. If it is true (false) obtain a new
value by evaluating the conclusion (alternative) of the conditional expression stored in
the cn field, using the environment in the en field.
Each of the three serious functions, eval, apply, and cont, does a branch on the form of
its first argument, performs trivial operations such as field selection, record creation, and
environment lookup, and then calls another serious function. Thus our third interpreter
is actually a state-transition machine, whose states each consist of the name of a serious
function plus a list of its arguments.
This interpreter is similar in style to Landin’s SECD machine [7], though there is consid-
erable difference in detailed mechanisms. (Very roughly, one can construct the continuation
by merging Landin’s stack and control and concatenating this merged stack with the dump.)

8. Continuations with Higher-Order Functions

In transforming Interpreter I into Interpreter III, we have moved from a concise, abstract
definition to a more complex machine-like one. If clarity consists of the avoidance of
subtle characteristics of the defining language, then Interpreter III is certainly clearer than
Interpreter I. But if clarity consists of conciseness and the absence of unnecessary com-
plexity, then the reverse is true. The machine-like character of Interpreter III includes a
variety of “cogs and wheels” that are quite arbitrary, i.e., one can easily construct equivalent
interpreters (such as the SECD machine) with different cogs and wheels.
In fact, these “cogs and wheels” were introduced when we defunctionalized the sets
FUNVAL, ENV, and CONT, since we replaced the functions in these sets by representations
that were correct, but not unique. Had we chosen different representations, we would have
obtained an equivalent but quite different interpreter.
This suggests the desirability of retaining the use of higher-order functions, providing
these entities can be given a mathematically rigorous definition that is independent of any
388 REYNOLDS

specific representation. Fortunately, such a definition has recently been provided by D.


Scott’s new theory of computation [12, 13, 14, 15], which is based on concepts of lattice
theory and topology. (The central technical problem that Scott has solved is to define
functions that are not only higher-order, but also typeless, so that any function may be
applied to any other function, including itself.) Although a description of this work would
be beyond the scope of this paper, we may summarize its main implication for definitional
interpreters: Scott has developed a mathematical model of the lambda calculus, which is
thereby a model for a purely applicative higher-order defining language. But the defining
language modelled by Scott uses call by name rather than call by value. (In terms of the
lambda calculus, it uses normal order of evaluation.) Thus to apply Scott’s work to a defined
language that uses call by value, we need a definitional interpreter that retains higher-order
functions but is order-of-application independent.
An obvious approach to this goal is to introduce continuations directly into the meta-
circular interpreter. At first sight, this appears to be straightforward. Referring back to
Interpreter I, we see that the function eval is obviously serious, while evlambda, ext and
initenv are trivial. (evlambda is trivial since the evaluation of lambda expressions always
terminates.) Apparently eval is the only function that must accept continuations.
But when we transform the branch of eval that evaluates application expressions, the
construction described in the previous section seems to give:

¡ ¡ ¢¢
appl?(r) → eval opr(r), e, λf. eval opnd(r), e, λa. c(f (a)) .

Unfortunately, the subexpression c(f (a)) is not independent of the order-of-application,


since the evaluation of the operand f (a) may never terminate, while the function c may be
independent of its argument.
The difficulty is that the class of serious functions must include every potentially nonter-
minating function that may be applied during the execution of the interpreter; in addition
to eval, this class contains the members of the set FUNVAL of defined-language functional
values. Thus we must modify the functions in FUNVAL to accept continuations:

FUNVAL = VAL, CONT → VAL,

replacing each function fold by an fnew such that fnew (a, c) = c(fold (a)). This allows
us to replace the order-dependent expression c(f (a)) by the order-independent expression
f (a, c). Of course, we must add continuations as an extra formal parameter to each lambda
expression that creates a member of FUNVAL.
(A similar modification of the functions in ENV is unnecessary, since it can be shown that
the functions in this set always terminate. Just as with get, this depends on the exclusion of
recursive let expressions with arbitrary declaring subexpressions.)
Once the necessity of altering FUNVAL has been realized, the transformation of Inter-
preter I follows the basic lines described in the previous section. We omit the details and
state the final result:
DEFINITIONAL INTERPRETERS 389

VAL = INTEGER ∪ BOOLEAN ∪ FUNVAL


FUNVAL = VAL, CONT → VAL
ENV = VAR → VAL
CONT = VAL → VAL
interpret = λr. eval(r, initenv, λa. a)
eval = λ(r, e, c).
¡
const?(r) → c(evcon(r)),
var?(r) → c(e(r)),
¡ ¢
appl?(r) → eval opr(r), e, λf. eval(opnd(r), e, λa. f (a, c)) ,
lambda?(r) → c(evlambda(r, e)), IV
¡
cond?(r) → eval prem(r), e,
¢
λb. if b then eval(conc(r), e, c) else eval(altr(r), e, c) ,
letrec?(r) → letrec e0 =
λx. if x = dvar(r) then evlambda(dexp(r), e0 ) else e(x)
¢
in eval(body(r), e0 , c)
¡ ¢
evlambda = λ(`, e). λ(a, c). eval body(`), ext(fp(`), a, e), c
ext = λ(z, a, e). λx. if x = z then a else e(x)
¡
initenv = λx. x = “succ” → λ(a, c). c(succ(a)),
¡ ¢¢
x = “equal” → λ(a, c). c λ(b, c0 ). c0 (equal(a, b)) .

This is basically the form of interpreter devised by L. Morris [20] and Wadsworth. It is
almost as concise as the meta-circular interpreter, yet it offers the advantages of order-of-
application independence and, as we will see in the next section, extensibility to accommo-
date imperative control features.
(The zealous reader may wish to verify that defunctionalization and the introduction of
continuations are commutative, i.e., by replacing FUNVAL, ENV, and CONT by appropriate
nonfunctional representations, one can transform Interpreter IV into Interpreter III.)

9. Escape Expressions

We now turn to the problem of adding imperative features to the defined language (while
keeping the defining language purely applicative). These features may be divided into two
classes:

1. Imperative control mechanisms, e.g., statement sequencing, labels and jumps.

2. Assignment.

We will first introduce control mechanisms and then consider assignment.


At first sight, this order of presentation seems facetious. In a language without assignment,
it seems pointless to jump to a label, since there is no significant way for the part of the
computation before the jump to influence the part afterwards. However, in Reference [29],
Landin introduced an imperative control mechanism that is more general than labels and
390 REYNOLDS

jumps, and that significantly enhances the power of a language without assignment. The
specific mechanism that he introduced was called a J-operator, but in this paper we will
develop a slightly simpler mechanism called an escape expression.
If (in the defined language) x is a variable and r is an expression, then

escape x in r

is an escape expression, whose escape variable is x and whose body is r. The evaluation
of an escape expression in an environment e proceeds as follows:

1. The body r is evaluated in the environment that is the extension of e that binds x to a
function called the escape function.

2. If the escape function is never applied during the evaluation of r, then the value of r
becomes the value of the escape expression.

3. If the escape function is applied to an argument a, then the evaluation of the body r is
aborted, and a immediately becomes the value of the escape expression.

Essentially, an escape function is a kind of label, and its application is a kind of jump. The
greater generality lies in the ability to pass arguments while jumping.
(Landin’s J-operator can be defined in terms of the escape expression by regarding let g =
J λx. r1 in r0 as an abbreviation for escape h in let g = λx. h(r1 ) in r0 , where h is
a new variable not occurring in r0 or r1 . Conversely, one can regard escape g in r as an
abbreviation for let g = J λx. x in r.)
In order to extend our interpreters to handle escape expressions, we begin by extending
the abstract syntax of expressions appropriately:

EXP = . . . ∪ ESCP
ESCP = [escv: VAR, body: EXP].

It is evident that in each interpreter we must add a branch to eval that evaluates the new
kind of expression.
First consider Interpreter IV. Since an escape expression is evaluated by evaluating its
body in an extended environment that binds the escape variable to the escape function, and
since the escape function must be represented by a member of the set FUNVAL = VAL,
CONT → VAL, we have
¡
eval = λ(r, e, c). . . . ,
¡ ¢¢
escp?(r) → eval body(r), ext(escv(r), λ(a, c0 ). . . . , e), c ,

where the value of λ(a, c0 ). . . . must be the member of FUNVAL representing the escape
function.
DEFINITIONAL INTERPRETERS 391

Since eval is a serious function, its result, which is obtained by applying the continuation
c to the value of the escape expression, must be the final result of the entire program being
interpreted. This means that c itself must be a function that will accept the value of the
escape expression and carry out the interpretation of the remainder of the program. But the
member of FUNVAL representing the escape function is also serious, and must therefore
also produce the final result of the entire program. Thus to abort the evaluation of the body
and treat the argument a as the value of the escape expression, it is only necessary for the
escape function ignore its own continuation c0 , and to apply the higher-level continuation c
to a. Thus we have:
¡
eval = λ(r, e, c). . . . ,
¡ ¢¢
escp?(r) → eval body(r), ext(escv(r), λ(a, c0 ). c(a), e), c .

The extension of Interpreter III is essentially similar. In this case, we must add to the set
FUNVAL a new kind of record that represents escape functions:

FUNVAL = . . . ∪ ESCF
ESCF = [cn: CONT].

These records are created in the new branch of eval:


¡
eval = λ(r, e, c). . . . ,
¡ ¢¢
escp?(r) → eval body(r), mk-simp(escv(r), mk-escf (c), e), c ,

and are interpreted by a new branch of apply:


¡
apply = λ(f, a, c). . . . ,
¢
escf?(f ) → cont(cn(f ), a) .

From the viewpoint of this interpreter, it is clear that the escape expression is a signif-
icant extension of the defined language, since it introduces the possibility of embedding
continuations in values.
(The reader should be warned that either of the above interpreters is a more precise
definition of the escape expression than the informal English description given beforehand.
For example, it is possible that the evaluation of the body of an escape expression may
not cause the application of the escape function, but may produce the escape function (or
some function that can call the escape function) as its value. It is difficult to infer the
consequences of such a situation from our informal description, but it is precisely defined
by either of the interpreters. In fact, the possibility that an escape function may propagate
outside of the expression that created it is a powerful facility that can be used to construct
control-flow mechanisms such as coroutines and nondeterministic algorithms.)
When we consider Interpreters I and II, we find an entirely different situation. The ability
to “jump” by switching continuations is no longer possible. An escape function must still be
represented by a member of FUNVAL, but now this implies that, if the function terminates
without an error stop, then its result must become the value of the application expression
that applied the function. As far as is known, there is no way to define the escape expression
392 REYNOLDS

by adding branches to Interpreter I or II (except by the “cheat” of adding imperative control


mechanisms to the defining language, as in Reference [19]). The essential problem is that
the information that was explicitly available in the continuations of Interpreters III and IV
is implicit in the recursive structure of Interpreters I and II, and in this form it cannot be
manipulated with sufficient flexibility.
We have asserted that the escape mechanism encompasses less general control mecha-
nisms such as labels and jumps. The following description outlines the way in which these
more specialized operations can be expressed in terms of the escape expression. (A more
detailed exposition is given in Reference [29].)

1. In the next section we will introduce assignment in such a way that assignments can
be executed during the evaluation of expressions. In this situation it is unnecessary to
make a semantic distinction between expressions and statements; any statement can be
regarded as an expression whose evaluation produces a dummy value.
2. A label-free sequence of statements s1 ; · · · ; sn can be regarded as an abbreviation for
the expression
¡ ¡ ¢ ¢
· · · (λx1 . . . . λxn . xn )(s1 ) · · · (sn ) .

The effect is to evaluate the statements sequentially from left to right, ignoring the value
of all but the last.
3. If s0 , . . . , sn are label-free statement sequences, and `1 , . . . , `n are labels, then a block
of the form

begin s0 , `1 : s1 ; · · · ; `n : sn end

can be regarded as an abbreviation for


escape g in letrec `1 = λx. g(s1 ; · · · ; sn ) and `2 = λx. g(s2 ; · · · ; sn )
and · · · and `n = λx. g(sn ) in (s0 ; · · · ; sn )

(where g and x are new variables not occurring in the original block). The effect is
that each label denotes a function that ignores its argument, evaluates the appropriate
sequence of statements, and then escapes out of the enclosing block.

4. An expression of the form goto r can be regarded as an abbreviation for r(0), i.e., a
jump to a label becomes an application of the function denoted by the label to a dummy
argument.

10. Assignment

Although the basic concept of assignment is well understood by any competent programmer,
a surprising degree of care is needed to combine this concept with the language features
we have discussed previously. Intuitively, the notion of assignment presupposes that the
DEFINITIONAL INTERPRETERS 393

operations that are performed during the evaluation of a program will occur in a definite
temporal order. Some of these operations will assign values to “variables”. Other operations
may be affected by these assignments; specifically, an operation may depend upon the value
most recently assigned to each “variable”, which we will call the value currently possessed
by the “variable”.
This suggests that for each instant during program execution, there should be an entity
which specifies the set of “variables” that are present and the values that they currently
possess. We will call such an entity a memory, and denote the set of possible memories by
MEM.
The main subtlety is to realize that the “variables” discussed here are distinct from the
variables used in previous sections. This is necessitated by the fact that most programming
languages permit situations (such as might arise from the use of “call by address”) in which
several variables denote the same “variable”, in the sense that assignment to one of them
will change the value possessed by all. This suggests that a “variable” is actually a new
kind of object to which a variable can be bound. Henceforth, we will call these new objects
references rather than “variables”. (Other terms used commonly in the literature are L-value
and name.) We will denote the set of references by REF.
Abstractly, the nature of references and memories can be characterized by specifying an
initial memory and four functions:

initmem: Contains no references.

nextref (m): Produces a reference not contained in the memory m.

augment(m, a): Produces a memory containing the new reference nextref (m) plus the
references already in m. The new reference possesses the value a, while the remaining
references possess the same values as in m.

update(m, rf , a): Produces a memory containing the same references as m. The refer-
ence rf (assuming it is present) possesses the value a, while the remaining references
possess the same value as in m.

lookup(m, rf ): Produces the value possessed by the reference rf in memory m.

A simple “implementation” can be obtained by numbering references in the order of their


creation [25]:

REF = [number: INTEGER]


MEM = [count: INTEGER, possess: INTEGER → VAL]
initmem = mk-mem(0, λn. 0)
nextref = λm. mk-ref (count(m) + 1)
¡
augment = λ(m, a). mk-mem count(m) + 1,
¢
λn. if n = count(m) + 1 then a else (possess(m))(n)
¡
update = λ(m, rf , a). mk-mem count(m),
¢
λn. if n = number(rf ) then a else (possess(m))(n)
lookup = λ(m, rf ). (possess(m))(number(rf )).
394 REYNOLDS

Our next task is to introduce memories into our interpreters. Although any of our inter-
preters could be so extended, we will only consider Interpreter IV.
It is evident that the operation of evaluating a defined-language expression will now
depend upon a memory m and will produce a (possibly) altered memory m0 . Thus the
function eval will accept m as an additional argument. However, because of the use of
continuations, m0 will not be part of the result of eval. Instead, m0 will be passed on as an
additional argument to the continuation that is applied by eval to perform the remainder of
program execution.
In a similar manner, the application of a defined-language function will depend upon and
produce memories. Thus each function in the set FUNVAL will accept a memory as an
additional argument, and will also pass on a memory to its continuation.
On the other hand, there are particular kinds of expressions, specifically constants, vari-
ables, and lambda expressions, whose evaluation cannot cause assignments. For this reason,
the functions evcon and evlambda, and the functions in the set ENV, will not accept or pro-
duce memories.
These considerations lead to the following interpreter, in which memories propagate
through the various operations in a manner that correctly reflects the temporal order of
execution:

VAL = INTEGER ∪ BOOLEAN ∪ FUNVAL


FUNVAL = VAL, MEM, CONT → VAL
ENV = VAR → VAL
CONT = MEM, VAL → VAL
interpret = λr. eval(r, initenv, initmem, λ(m, a). a)
eval = λ(r, e, m, c).
¡
const?(r) → c(m, evcon(r)),
var?(r) → c(m, e(r)),
¡
appl?(r) → eval opr(r), e, m,
λ(m0 , f ). eval(opnd(r), e, m0 ,
¢
λ(m00 , a). f (a, m00 , c)) ,
lambda?(r) → c(m, evlambda(r, e)),
¡
cond?(r) → eval prem(r), e, m,
¢
λ(m0 , b). if b then eval(conc(r), e, m0 , c) else eval(altr(r), e, m0 , c) ,
letrec?(r) → letrec e0 =
λx. if x = dvar(r) then evlambda(dexp(r), e0 ) else e(x)
in eval(body(r), e0 , m, c),
¡ ¢¢
escp?(r) → eval body(r), ext(escv(r), λ(a, m0 , c0 ). c(m0 , a), e), m, c
¡ ¢
evlambda = λ(`, e). λ(a, m, c). eval body(`), ext(fp(`), a, e), m, c
ext = λ(z, a, e). λx. if x = z then a else e(x)
¡
initenv = λx. x = “succ” → λ(a, m, c). c(m, succ(a)),
¡ ¢¢
x = “equal” → λ(a, m, c). c m, λ(b, m0 , c0 ). c0 (m0 , equal(a, b)) .
DEFINITIONAL INTERPRETERS 395

At this stage, although we have “threaded” memories through the operations of our
interpreter, we have not yet introduced references, nor any operations that alter or depend
upon memories. To proceed further, however, we must distinguish between two approaches
to assignment, each of which characterizes certain programming languages.
In the “L-value” approach, in each context of the evaluation process where a value would
occur, a reference (i.e., L-value) possessing that value occurs instead. Thus, for example,
expressions evaluate to references, functional arguments and results are references, and
environments bind variables to references. (In richer languages, references would occur
instead of values in still other contexts, such as array elements.) This approach is used in the
languages PAL [3] and ISWIM [2], and in somewhat modified form (i.e., references always
occur in certain kinds of contexts, while values always occur in others) in such languages
as FORTRAN, ALGOL 60, and PL/I. Its formalization is due to Strachey [30], and is used
extensively in the Vienna definition of PL/I [18].
In the “reference” approach, references are introduced as a new kind of value, so that
either references or “normal” values can occur in any meaningful context. This approach
is used in ALGOL 68 [31], BASIL [32] and GEDANKEN [4].
The relative merits of these approaches are discussed briefly in Reference [4]. Although
either approach can be accommodated by the various styles of interpreter discussed in
this paper, we will limit ourselves to incorporating the reference approach into the above
extension of Interpreter IV. We first augment the set of values appropriately:
VAL = INTEGER ∪ BOOLEAN ∪ FUNVAL ∪ REF.
Next we introduce basic operations for creating, assigning, and evaluating references.
For simplicity, we will make these operations basic functions, denoted by the predefined
variables ref, set, and val. The following is an informal description:
ref (a): Accepts a value a and returns a new reference initialized to possess a.
(set(rf ))(a): Accepts a reference rf and a value a. The value a is assigned to rf and also
returned as the result. (Because of our restriction to functions of a single argument, this
function is Curried, i.e., set accepts rf and returns a function that accepts a.)
val(rf ): Accepts a reference rf and returns its currently possessed value.
To introduce these new functions into our interpreter, we extend the initial environment
as follows:
¡
initenv = λx. · · ·
x = “ref” → λ(a, m, c). c(augment(m, a), nextref (m)),
¡ ¢
x = “set” → λ(rf , m, c). c m, λ(a, m0 , c0 ). c0 (update(m0 , rf , a), a) ,
¢
x = “val” → λ(rf , m, c). c(m, lookup(m, rf )) .

The main shortcoming of the reference approach is the incessant necessity of using the
function val. This problem can be alleviated by introducing coercion conventions, as
discussed in Reference [4], that cause references to be replaced by their possessed values
in appropriate contexts. However, since these conventions can be treated as abbreviations,
they do not affect the basic structure of the definitional interpreters.
396 REYNOLDS

11. Directions Of Future Research

Within this paper we have tried to present a systematic, self-contained, and reasonably
complete description of the current state of the art of definitional interpreters. We conclude
with a brief (and hopeful) list of possible future developments:
1. It would still be very desirable to be able to define higher-order languages logically rather
than interpretively, particularly if such an approach can lead to practical correctness
proofs for programs. A major step in this direction, based on the work of Scott [12, 13,
14, 15], has been taken by R. Milner [16]. However, Milner’s work essentially treats a
language using call by name rather than call by value.
2. It should be possible to treat languages with multiprocessing features, or other features
that involve “controlled ambiguity”. An initial step is the work of the IBM Vienna
Laboratory [18], using a nondeterministic state-transition machine.
3. It should also be possible to define languages, such as ALGOL 68 [31], with a highly
refined syntactic type structure. Ideally, such a treatment should be meta-circular, in
the sense that the type structure used in the defined language should be adequate for the
defining language.
4. The conciseness of definitional interpreters makes them powerful tools for language
design, particularly when one wishes to add new capabilities to a language with a
minimum of increased complexity. Of particular interest (at least to the author) are the
problems of devising better type systems and of generalizing assignment (for example,
by permitting memories to be embedded in values.)

References

1. McCarthy, John. Recursive functions of symbolic expressions and their computation by machine, part I.
Communications of the ACM, 3(4):184–195, April 1960.
2. Landin, Peter J. The next 700 programming languages. Communications of the ACM, 9(3):157–166, March
1966.
3. Evans, Jr., Arthur. PAL – A language designed for teaching programming linguistics. In Proceedings of
23rd National ACM Conference, pages 395–403. Brandin/Systems Press, Princeton, New Jersey, 1968.
4. Reynolds, John C. GEDANKEN – A simple typeless language based on the principle of completeness and
the reference concept. Communications of the ACM, 13(5):308–319, May 1970.
5. Church, Alonzo. The Calculi of Lambda-Conversion, volume 6 of Annals of Mathematics Studies. Princeton
University Press, Princeton, New Jersey, 1941.
6. Curry, Haskell Brookes and Feys, Robert. Combinatory Logic, Volume 1. Studies in Logic and the Founda-
tions of Mathematics. North-Holland, Amsterdam, 1958. Second printing 1968.
7. Landin, Peter J. A λ-calculus approach. In Leslie Fox, editor, Advances in Programming and Non-Numerical
Computation: Proceedings of A Summer School, pages 97–141. Oxford University Computing Laboratory
and Delegacy for Extra-Mural Studies, Pergamon Press, Oxford, England, 1966.
8. Floyd, Robert W. Assigning meanings to programs. In J. T. Schwartz, editor, Mathematical Aspects of
Computer Science, volume 19 of Proceedings of Symposia in Applied Mathematics, pages 19–32, New York
City, April 5–7, 1966. American Mathematical Society, Providence, Rhode Island, 1967.
9. Manna, Zohar. The correctness of programs. Journal of Computer and System Sciences, 3(2):119–127,
May 1969.
10. Hoare, C. A. R. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–
580 and 583, October 1969. Reprinted in [11].
11. Gries, David, editor. Programming Methodology. Springer-Verlag, New York, 1978.
DEFINITIONAL INTERPRETERS 397

12. Scott, Dana S. Outline of a mathematical theory of computation. Technical Monograph PRG–2, Program-
ming Research Group, Oxford University Computing Laboratory, Oxford, England, November 1970. A
preliminary version appeared in Proceedings of the Fourth Annual Princeton Conference on Information
Sciences and Systems (1970), 169–176.
13. Scott, Dana S. Lattice theory, data types and semantics. In Randell Rustin, editor, Formal Semantics of
Programming Languages: Courant Computer Science Symposium 2, pages 65–106, New York University,
New York, September 14–16, 1970. Prentice-Hall, Englewood Cliffs, New Jersey, 1972.
14. Scott, Dana S. Models for various type-free calculi. In Patrick Suppes, Leon Henkin, Athanase Joja,
and Gr. C. Moisil, editors, Logic, Methodology and Philosophy of Science IV: Proceedings of the Fourth
International Congress, volume 74 of Studies in Logic and the Foundations of Mathematics, pages 157–187,
Bucharest, Romania, August 29–September 4, 1971. North-Holland, Amsterdam, 1973.
15. Scott, Dana S. Continuous lattices. In F. William Lawvere, editor, Toposes, Algebraic Geometry and Logic,
volume 274 of Lecture Notes in Mathematics, Dalhousie University, Halifax, Nova Scotia, January 16–19,
1971. Springer-Verlag, Berlin, 1972.
16. Milner, Robin. Implementation and applications of Scott’s logic for computable functions. In Proceedings of
an ACM Conference on Proving Assertions about Programs, pages 1–6, Las Cruces, New Mexico, January
6–7, 1972. ACM, New York. SIGPLAN Notices Volume 7, Number 1 and SIGACT News, Number 14.
17. Burstall, Rodney M. Formal description of program structure and semantics in first order logic. In Bernard
Meltzer and Donald Michie, editors, Machine Intelligence 5, pages 79–98. Edinburgh University Press,
Edinburgh, Scotland, 1969.
18. Lucas, Peter, Lauer, Peter E., and Stigleitner, H. Method and notation for the formal definition of program-
ming languages. Technical Report TR 25.087, IBM Laboratory Vienna, June 28, 1968. Revised July 1,
1970.
19. Reynolds, John C. GEDANKEN – a simple typeless language which permits functional data structures and
coroutines. Report ANL–7621, Applied Mathematics Division, Argonne National Laboratory, Argonne,
Illinois, September 1969.
20. Morris, F. Lockwood. The next 700 formal language descriptions. Lisp and Symbolic Computation, 6(3–
4):249–257, November 1993. Original manuscript dated November 1970.
21. de Bakker, Jaco W. Semantics of programming languages. In Julius T. Tou, editor, Advances in Information
Systems Science, volume 2, chapter 3, pages 173–227. Plenum Press, New York, 1969.
22. Park, David M. R. Fixpoint induction and proofs of program properties. In Bernard Meltzer and Donald
Michie, editors, Machine Intelligence 5, pages 59–78. Edinburgh University Press, Edinburgh, 1969.
23. Feldman, Jerome and Gries, David. Translator writing systems. Communications of the ACM, 11(2):77–113,
February 1968.
24. McCarthy, John. Towards a mathematical science of computation. In Cicely M. Popplewell, editor, Infor-
mation Processing 62: Proceedings of IFIP Congress 1962, pages 21–28, Munich, August 27–September
1, 1962. North-Holland, Amsterdam, 1963.
25. Wozencraft, John M. and Evans, Jr., Arthur. Notes on programming linguistics. Technical report, Department
of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, February
1971.
26. van Wijngaarden, Adriaan . Recursive definition of syntax and semantics. In T. B. Steel, Jr., editor,
Formal Language Description Languages for Computer Programming: Proceedings of the IFIP Working
Conference on Formal Language Description Languages, pages 13–24, Baden bei Wien, Austria, September
15–18, 1964. North-Holland, Amsterdam, 1966.
27. Morris, Jr., James H. A bonus from van Wijngaarden’s device. Communications of the ACM, 15(8):773,
August 1972.
28. Fischer, Michael J. Lambda calculus schemata. In Proceedings of an ACM Conference on Proving Assertions
about Programs, pages 104–109, Las Cruces, New Mexico, January 6–7, 1972. ACM, New York.
29. Landin, Peter J. A correspondence between ALGOL 60 and Church’s lambda-notation. Communications
of the ACM, 8(2–3):89–101, 158–165, February–March 1965.
30. Barron, D.W., Buxton, John N., Hartley, D.F., Nixon, E., and Strachey, Christopher. The main features of
CPL. The Computer Journal, 6:134–143, July 1963.
31. van Wijngaarden,Adriaan, Mailloux, B.J., Peck, J.E.L., and Koster, C.H.A. Report on the algorithmic
language ALGOL 68. Numerische Mathematik, 14(2):79–218, 1969.
32. Cheatham, Jr., T.E., Fischer, Alice, and Jorrand, P. On the basis for ELF – an extensible language facility.
In 1968 Fall Joint Computer Conference, volume 33, Part Two of AFIPS Conference Proceedings, pages
937–948, San Francisco, December 9–11, 1968. Thompson Book Company, Washington, D.C.
Higher-Order and Symbolic Computation, 13, 11–49, 2000
°
c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Fundamental Concepts in Programming Languages


CHRISTOPHER STRACHEY
Reader in Computation at Oxford University, Programming Research Group, 45 Banbury Road, Oxford, UK

Abstract. This paper forms the substance of a course of lectures given at the International Summer School in
Computer Programming at Copenhagen in August, 1967. The lectures were originally given from notes and the
paper was written after the course was finished. In spite of this, and only partly because of the shortage of time, the
paper still retains many of the shortcomings of a lecture course. The chief of these are an uncertainty of aim—it is
never quite clear what sort of audience there will be for such lectures—and an associated switching from formal
to informal modes of presentation which may well be less acceptable in print than it is natural in the lecture room.
For these (and other) faults, I apologise to the reader.
There are numerous references throughout the course to CPL [1–3]. This is a programming language which has
been under development since 1962 at Cambridge and London and Oxford. It has served as a vehicle for research
into both programming languages and the design of compilers. Partial implementations exist at Cambridge and
London. The language is still evolving so that there is no definitive manual available yet. We hope to reach another
resting point in its evolution quite soon and to produce a compiler and reference manuals for this version. The
compiler will probably be written in such a way that it is relatively easy to transfer it to another machine, and in
the first instance we hope to establish it on three or four machines more or less at the same time.
The lack of a precise formulation for CPL should not cause much difficulty in this course, as we are primarily
concerned with the ideas and concepts involved rather than with their precise representation in a programming
language.

Keywords: programming languages, semantics, foundations of computing, CPL, L-values, R-values, para-
meter passing, variable binding, functions as data, parametric polymorphism, ad hoc polymorphism, binding
mechanisms, type completeness

1. Preliminaries

1.1. Introduction

Any discussion on the foundations of computing runs into severe problems right at the
start. The difficulty is that although we all use words such as ‘name’, ‘value’, ‘program’,
‘expression’ or ‘command’ which we think we understand, it often turns out on closer
investigation that in point of fact we all mean different things by these words, so that com-
munication is at best precarious. These misunderstandings arise in at least two ways. The
first is straightforwardly incorrect or muddled thinking. An investigation of the meanings
of these basic terms is undoubtedly an exercise in mathematical logic and neither to the taste
nor within the field of competence of many people who work on programming languages.
As a result the practice and development of programming languages has outrun our ability
to fit them into a secure mathematical framework so that they have to be described in ad
hoc ways. Because these start from various points they often use conflicting and sometimes
also inconsistent interpretations of the same basic terms.
12 STRACHEY

A second and more subtle reason for misunderstandings is the existence of profound
differences in philosophical outlook between mathematicians. This is not the place to
discuss this issue at length, nor am I the right person to do it. I have found, however, that
these differences affect both the motivation and the methodology of any investigation like
this to such an extent as to make it virtually incomprehensible without some preliminary
warning. In the rest of the section, therefore, I shall try to outline my position and describe
the way in which I think the mathematical problems of programming languages should be
tackled. Readers who are not interested can safely skip to Section 2.

1.2. Philosophical considerations

The important philosophical difference is between those mathematicians who will not allow
the existence of an object until they have a construction rule for it, and those who admit the
existence of a wider range of objects including some for which there are no construction
rules. (The precise definition of these terms is of no importance here as the difference is
really one of psychological approach and survives any minor tinkering.) This may not seem
to be a very large difference, but it does lead to a completely different outlook and approach
to the methods of attacking the problems of programming languages.
The advantages of rigour lie, not surprisingly, almost wholly with those who require
construction rules. Owing to the care they take not to introduce undefined terms, the
better examples of the work of this school are models of exact mathematical reasoning.
Unfortunately, but also not surprisingly, their emphasis on construction rules leads them to
an intense concern for the way in which things are written—i.e., for their representation,
generally as strings of symbols on paper—and this in turn seems to lead to a preoccupation
with the problems of syntax. By now the connection with programming languages as we
know them has become tenuous, and it generally becomes more so as they get deeper into
syntactical questions. Faced with the situation as it exists today, where there is a generally
known method of describing a certain class of grammars (known as BNF or context-free),
the first instinct of these mathematicians seems to be to investigate the limits of BNF—what
can you express in BNF even at the cost of very cumbersome and artificial constructions?
This may be a question of some mathematical interest (whatever that means), but it has
very little relevance to programming languages where it is more important to discover
better methods of describing the syntax than BNF (which is already both inconvenient and
inadequate for ALGOL) than it is to examine the possible limits of what we already know to
be an unsatisfactory technique.
This is probably an unfair criticism, for, as will become clear later, I am not only tem-
peramentally a Platonist and prone to talking about abstracts if I think they throw light on a
discussion, but I also regard syntactical problems as essentially irrelevant to programming
languages at their present stage of development. In a rough and ready sort of way it seems
to me fair to think of the semantics as being what we want to say and the syntax as how
we have to say it. In these terms the urgent task in programming languages is to explore
the field of semantic possibilities. When we have discovered the main outlines and the
principal peaks we can set about devising a suitably neat and satisfactory notation for them,
and this is the moment for syntactic questions.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 13

But first we must try to get a better understanding of the processes of computing and
their description in programming languages. In computing we have what I believe to be a
new field of mathematics which is at least as important as that opened up by the discovery
(or should it be invention?) of calculus. We are still intellectually at the stage that calculus
was at when it was called the ‘Method of Fluxions’ and everyone was arguing about how
big a differential was. We need to develop our insight into computing processes and to
recognise and isolate the central concepts—things analogous to the concepts of continuity
and convergence in analysis. To do this we must become familiar with them and give them
names even before we are really satisfied that we have described them precisely. If we
attempt to formalise our ideas before we have really sorted out the important concepts the
result, though possibly rigorous, is of very little value—indeed it may well do more harm
than good by making it harder to discover the really important concepts. Our motto should
be ‘No axiomatisation without insight’.
However, it is equally important to avoid the opposite of perpetual vagueness. My own
view is that the best way to do this in a rapidly developing field such as computing, is to be
extremely careful in our choice of terms for new concepts. If we use words such as ‘name’,
‘address’, ‘value’ or ‘set’ which already have meanings with complicated associations and
overtones either in ordinary usage or in mathematics, we run into the danger that these
associations or overtones may influence us unconsciously to misuse our new terms—either
in context or meaning. For this reason I think we should try to give a new concept a neutral
name at any rate to start with. The number of new concepts required may ultimately be
quite large, but most of these will be constructs which can be defined with considerable
precision in terms of a much smaller number of more basic ones. This intermediate form of
definition should always be made as precise as possible although the rigorous description
of the basic concepts in terms of more elementary ideas may not yet be available. Who
when defining the eigenvalues of a matrix is concerned with tracing the definition back to
Peano’s axioms?
Not very much of this will show up in the rest of this course. The reason for this is partly
that it is easier, with the aid of hindsight, to preach than to practice what you preach. In part,
however, the reason is that my aim is not to give an historical account of how we reached
the present position but to try to convey what the position is. For this reason I have often
preferred a somewhat informal approach even when mere formality would in fact have been
easy.

2. Basic concepts

2.1. Assignment commands

One of the characteristic features of computers is that they have a store into which it is
possible to put information and from which it can subsequently be recovered. Furthermore
the act of inserting an item into the store erases whatever was in that particular area of the
store before—in other words the process is one of overwriting. This leads to the assignment
command which is a prominent feature of most programming languages.
14 STRACHEY

The simplest forms of assignments such as

x := 3
x := y + 1
x := x + 1

lend themselves to very simple explications. ‘Set x equal to 3’, ‘Set x to be the value of
y plus 1’ or ‘Add one to x’. But this simplicity is deceptive; the examples are themselves
special cases of a more general form and the first explications which come to mind will not
generalise satisfactorily. This situation crops up over and over again in the exploration of a
new field; it is important to resist the temptation to start with a confusingly simple example.
The following assignment commands show this danger.

i := a > b j,k (See note 1)


A[i] := A[a > b j,k]
A[a > b j, k] := A[i]
a > b j, k := i (See note 2)

All these commands are legal in CPL (and all but the last, apart from minor syntactic
alterations, in ALGOL also). They show an increasing complexity of the expressions written
on the left of the assignment. We are tempted to write them all in the general form

ε1 := ε2

where ε1 and ε2 stand for expressions, and to try as an explication something like ‘evaluate
the two expressions and then do the assignment’. But this clearly will not do, as the meaning
of an expression (and a name or identifier is only a simple case of an expression) on the left
of an assignment is clearly different from its meaning on the right. Roughly speaking an
expression on the left stands for an ‘address’ and one on the right for a ‘value’ which will be
stored there. We shall therefore accept this view and say that there are two values associated
with an expression or identifier. In order to avoid the overtones which go with the word
‘address’ we shall give these two values the neutral names: L-value for the address-like
object appropriate on the left of an assignment, and R-value for the contents-like object
appropriate for the right.

2.2. L-values and R-values

An L-value represents an area of the store of the computer. We call this a location rather than
an address in order to avoid confusion with the normal store-addressing mechanism of the
computer. There is no reason why a location should be exactly one machine-word in size—
the objects discussed in programming languages may be, like complex or multiple precision
numbers, more than one word long, or, like characters, less. Some locations are addressable
(in which case their numerical machine address may be a good representation) but some are
not. Before we can decide what sort of representation a general, non-addressable location
should have, we should consider what properties we require of it.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 15

The two essential features of a location are that it has a content—i.e. an associated
R-value—and that it is in general possible to change this content by a suitable updating
operation. These two operations are sufficient to characterise a general location which are
consequently sometimes known as ‘Load-Update Pairs’ or LUPs. They will be discussed
again in Section 4.1.

2.3. Definitions

In CPL a programmer can introduce a new quantity and give it a value by an initialised
definition such as

let p = 3.5

(In ALGOL this would be done by real p; p := 3.5;). This introduces a new use of the
name p (ALGOL uses the term ‘identifier’ instead of name), and the best way of looking at
this is that the activation of the definition causes a new location not previously used to be
set up as the L-value of p and that the R-value 3.5 is then assigned to this location.
The relationship between a name and its L-value cannot be altered by assignment, and it
is this fact which makes the L-value important. However in both ALGOL and CPL one name
can have several different L-values in different parts of the program. It is the concept of
scope (sometimes called lexicographical scope) which is controlled by the block structure
which allows us to determine at any point which L-value is relevant.
In CPL, but not in ALGOL, it is also possible to have several names with the same L-value.
This is done by using a special form of definition:

let q 'p

which has the effect of giving the name of the same L-value as p (which must already exist).
This feature is generally used when the right side of the definition is a more complicated
expression than a simple name. Thus if M is a matrix, the definition

let x ' M[2,2]

gives x the same L-value as one of the elements of the matrix. It is then said to be sharing
with M[2,2], and an assignment to x will have the same effect as one to M[2,2].
It is worth noting that the expression on the right of this form of definition is evaluated in
the L-mode to get an L-value at the time the definition is obeyed. It is this L-value which
is associated with x. Thus if we have

let i = 2
let x ' M[i,i]
i := 3

the L-value of x will remain that of M[2,2].


M[i,i] is an example of an anonymous quantity i.e., an expression rather than a simple
name—which has both an L-value and an R-value. There are other expressions, such as
16 STRACHEY

a+b, which only have R-values. In both cases the expression has no name as such although
it does have either one value or two.

2.4. Names

It is important to be clear about this as a good deal of confusion can be caused by differing
uses of the terms. ALGOL 60 uses ‘identifier’ where we have used ‘name’, and reserves the
word ‘name’ for a wholly different use concerned with the mode of calling parameters for
a procedure. (See Section 3.4.3.) ALGOL X, on the other hand, appears likely to use the
word ‘name’ to mean approximately what we should call an L-value, (and hence something
which is a location or generalised address). The term reference is also used by several
languages to mean (again approximately) an L-value.
It seems to me wiser not to make a distinction between the meaning of ‘name’ and that
of ‘identifier’ and I shall use them interchangeably. The important feature of a name is that
it has no internal structure at any rate in the context in which we are using it as a name.
Names are thus atomic objects and the only thing we know about them is that given two
names it is always possible to determine whether they are equal (i.e., the same name) or not.

2.5. Numerals

We use the word ‘number’ for the abstract object and ‘numeral’ for its written representation.
Thus 24 and XXIV are two different numerals representing the same number. There is
often some confusion about the status of numerals in programming languages. One view
commonly expressed is that numerals are the ‘names of numbers’ which presumably means
that every distinguishable numeral has an appropriate R-value associated with it. This seems
to me an artificial point of view and one which falls foul of Occam’s razor by unnecessarily
multiplying the number of entities (in this case names). This is because it overlooks the
important fact that numerals in general do have an internal structure and are therefore not
atomic in the sense that we said names were in the last section.
An interpretation more in keeping with our general approach is to regard numerals as
R-value expressions written according to special rules. Thus for example the numeral 253
is a syntactic variant for the expression

2 × 102 + 5 × 10 + 3

while the CPL constant 8 253 is a variant of

2 × 82 + 5 × 8 + 3

Local rules for special forms of expression can be regarded as a sort of ‘micro-syntax’ and
form an important feature of programming languages. The micro-syntax is frequently used
in a preliminary ‘pre-processing’ or ‘lexical’ pass of compilers to deal with the recognition
of names, numerals, strings, basic symbols (e.g. boldface words in ALGOL) and similar
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 17

objects which are represented in the input stream by strings of symbols in spite of being
atomic inside the language.
With this interpretation the only numerals which are also names are the single digits and
these are, of course, constants with the appropriate R-value.

2.6. Conceptual model

It is sometimes helpful to have a picture showing the relationships between the various
objects in the programming language, their representations in the store of a computer
and the abstract objects to which they correspond. Figure 1 is an attempt to portray the
conceptual model which is being used in this course.

Figure 1. The conceptual model.


18 STRACHEY

On the left are some of the components of the programming language. Many of these
correspond to either an L-value or an R-value and the correspondence is indicated by an
arrow terminating on the value concerned. Both L-values and R-values are in the idealised
store, a location being represented by a box and its contents by a dot inside it. R-values
without corresponding L-values are represented by dots without boxes, and R-values which
are themselves locations (as, for example, that of a vector) are given arrows which terminate
on another box in the idealised store.
R-values which correspond to numbers are given arrows which terminate in the right
hand part of the diagram which represents the abstract objects with which the program
deals.
The bottom section of the diagram, which is concerned with vectors and vector elements
will be more easily understood after reading the section on compound data structures.
(Section 3.7.)

3. Conceptual constructs

3.1. Expressions and commands

All the first and simplest programming language—by which I mean machine codes and
assembly languages—consist of strings of commands. When obeyed, each of these causes
the computer to perform some elementary operation such as subtraction, and the more
elaborate results are obtained by using long sequences of commands.
In the rest of mathematics, however, there are generally no commands as such. Expres-
sions using brackets, either written or implied, are used to build up complicated results.
When talking about these expressions we use descriptive phrases such as ‘the sum of x and
y’ or possibly ‘the result of adding x to y’ but never the imperative ‘add x to y’.
As programming languages developed and became more powerful they came under
pressure to allow ordinary mathematical expressions as well as the elementary commands.
It is, after all, much more convenient to write as in CPL, x := a(b+c)+d than the more
elementary

CLA b
ADD c
MPY a
ADD d
STO x

and also, almost equally important, much easier to follow.


To a large extent it is true that the increase in power of programming languages has
corresponded to the increase in the size and complexity of the right hand sides of their
assignment commands for this is the situation in which expressions are most valuable.
In almost all programming languages, however, commands are still used and it is their
inclusion which makes these languages quite different from the rest of mathematics.
There is a danger of confusion between the properties of expressions, not all of which
are familiar, and the additional features introduced by commands, and in particular those
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 19

introduced by the assignment command. In order to avoid this as far as possible, the next
section will be concerned with the properties of expressions in the absence of commands.

3.2. Expressions and evaluation

3.2.1. Values. The characteristic feature of an expression is that it has a value. We have
seen that in general in a programming language, an expression may have two values—an
L-value and an R-value. In this section, however, we are considering expressions in the
absence of assignments and in these circumstances L-values are not required. Like the rest
of mathematics, we shall be concerned only with R-values.
One of the most useful properties of expressions is that called by Quine [4] referential
transparency. In essence this means that if we wish to find the value of an expression which
contains a sub-expression, the only thing we need to know about the sub-expression is its
value. Any other features of the sub-expression, such as its internal structure, the number
and nature of its components, the order in which they are evaluated or the colour of the ink
in which they are written, are irrelevant to the value of the main expression.
We are quite familiar with this property of expressions in ordinary mathematics and often
make use of it unconsciously. Thus we expect the expressions

sin(6) sin(1 + 5) sin(30/5)

to have the same value. Note, however, that we cannot replace the symbol string 1+5 by the
symbol 6 in all circumstances as, for example 21 + 52 is not equal to 262. The equivalence
only applies to complete expressions or sub-expressions and assumes that these have been
identified by a suitable syntactic analysis.

3.2.2. Environments. In order to find the value of an expression it is necessary to know the
value of its components. Thus to find the value of a + 5 + b/a we need to know the values
of a and b. Thus we speak of evaluating an expression in an environment (or sometimes
relative to an environment) which provides the values of components.
One way in which such an environment can be provided is by a where-clause.
Thus
a + 3/a where a = 2 + 3/7
a + b − 3/a where a = b + 2/b

have a self evident meaning. An alternative syntactic form which has the same effect is the
initialised definition:

let a = 2 + 3/7 . . . a + 3/a


let a = b + 2/b . . . a + b − 3/a

Another way of writing these is to use λ-expressions:

(λa. a + 3/a)(2 + 3/7)


(λa. a + b − 3/a)(b + 2/b)
20 STRACHEY

All three methods are exactly equivalent and are, in fact, merely syntactic variants whose
choice is a matter of taste. In each the letter a is singled out and given a value and is known
as the bound variable. The letter b in the second expression is not bound and its value still
has to be found from the environment in which the expression is to be evaluated. Variables
of this sort are known as free variables.

3.2.3. Applicative structure. Another important feature of expressions is that it is possible


to write them in such a way as to demonstrate an applicative structure—i.e., as an operator
applied to one or more operands. One way to do this is to write the operator in front of its
operand or list of operands enclosed in parentheses. Thus

a+b corresponds to +(a, b)


a + 3/a corresponds to +(a, /(3, a))

In this scheme a λ-expression can occur as an operator provided it is enclosed in parentheses.


Thus the expression

a + a/3 where a = 2 + 3/7

can be written to show its full applicative structure as

{λa. + (a, /(3, a))}(+(2, /(3, 7))).

Expressions written in this way with deeply nesting brackets are very difficult to read.
Their importance lies only in emphasising the uniformity of applicative structure from
which they are built up. In normal use the more conventional syntactic forms which are
familiar and easier to read are much to be preferred—providing that we keep the underlying
applicative structure at the back of our minds.
In the examples so far given all the operators have been either a λ-expression or a single
symbol, while the operands have been either single symbols or sub-expressions. There is, in
fact, no reason why the operator should not also be an expression. Thus for example if we use
D for the differentiating operator, D(sin) = cos so that {D(sin)}(×(3, a)) is an expression
with a compound operator whose value would be cos(3a). Note that this is not the same as
the expression ddx sin(3x) for x = a which would be written (D(λx.sin(x(3, x))))(a).

3.2.4. Evaluation. We thus have a distinction between evaluating an operator and applying
it to its operands. Evaluating the compound operator D(sin) produces the result (or value)
cos and can be performed quite independently of the process of applying this to the operands.
Furthermore it is evident that we need to evaluate both the operator and the operands before
we can apply the first to the second. This leads to the general rule for evaluating compound
expressions in the operator-operand form viz:

1. Evaluate the operator and the operand(s) in any order.


2. After this has been done, apply the operator to the operand(s).
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 21

The interesting thing about this rule is that it specifies a partial ordering of the operations
needed to evaluate an expression. Thus for example when evaluating

(a + b)(c + d/e)

both the additions must be performed before the multiplication, and the division before the
second addition but the sequence of the first addition and the division is not specified. This
partial ordering is a characteristic of algorithms which is not yet adequately reflected in most
programming languages. In ALGOL, for example, not only is the sequence of commands
fully specified, but the left to right rule specifies precisely the order of the operations.
Although this has the advantage of precision in that the effect of any program is exactly
defined, it makes it impossible for the programmer to specify indifference about sequencing
or to indicate a partial ordering. The result is that he has to make a large number of logically
unnecessary decisions, some of which may have unpredictable effects on the efficiency of
his program (though not on its outcome).
There is a device originated by Schönfinkel [5], for reducing operators with several
operands to the successive application of single operand operators. Thus, for example,
instead of +(2, p) where the operator + takes two arguments we introduce another adding
operator say +0 which takes a single argument such that +0 (2) is itself a function which
adds 2 to its argument. Thus (+0 (2))( p) = +(2, p) = 2 + p. In order to avoid a large
number of brackets we make a further rule of association to the left and write +0 2 p in
place of ((+0 2) p) or (+0 (2))( p). This convention is used from time to time in the rest of
this paper. Initially, it may cause some difficulty as the concept of functions which produce
functions as results is a somewhat unfamiliar one and the strict rule of association to the
left difficult to get used to. But the effort is well worth while in terms of the simpler and
more transparent formulae which result.
It might be thought that the remarks about partial ordering would no longer apply to
monadic operators, but in fact this makes no difference. There is still the choice of evaluating
the operator or the operand first and this allows all the freedom which was possible with
several operands. Thus, for example, if p and q are sub-expressions, the evaluation of
p + q (or +( p, q)) implies nothing about the sequence of evaluation of p and q although
both must be evaluated before the operator + can be applied. In Schönfinkel’s form this is
(+0 p)q and we have the choice of evaluating (+0 p) and q in any sequence. The evaluation
of +0 p involves the evaluation of +0 and p in either order so that once more there is no
restriction on the order of evaluation of the components of the original expression.

3.2.5. Conditional expressions. There is one important form of expression which appears
to break the applicative expression evaluation rule. A conditional expression such as

(x = 0) 0,1/x

(in ALGOL this would be written if x = 0 then 0 else 1/x) cannot be treated as an
ordinary function of three arguments. The difficulty is that it may not be possible to evaluate
both arms of the condition—in this case when x = 0 the second arm becomes undefined.
22 STRACHEY

Various devices can be used to convert this to a true applicative form, and in essence
all have the effect of delaying the evaluation of the arms until after the condition has been
decided. Thus suppose that If is a function of a Boolean argument whose result is the
selector First or Second so that If (True) = First and If (False) = Second, the naive interpre-
tation of the conditional expression given above as

{If (x = 0)}(0, 1/x)

is wrong because it implies the evaluation of both members of the list (0, 1/x) before
applying the operator {If (x = 0)}. However the expression

[{If (x = 0)}({λa. 0}, {λa. 1/x})]a

will have the desired effect as the selector function If (x = 0) is now applied to the list
({λa. 0}, {λa. 1/x}) whose members are λ-expressions and these can be evaluated (but not
applied) without danger. After the selection has been made the result is applied to a and
provided a has been chosen not to conflict with other identifiers in the expression, this
produces the required effect.
Recursive (self referential) functions do not require commands or loops for their defini-
tion, although to be effective they do need conditional expressions. For various reasons, of
which the principal one is lack of time, they will not be discussed in this course.

3.3. Commands and sequencing

3.3.1. Variables. One important characteristic of mathematics is our habit of using names
for things. Curiously enough mathematicians tend to call these things ‘variables’ although
their most important property is precisely that they do not vary. We tend to assume auto-
matically that the symbol x in an expression such as 3x 2 + 2x + 17 stands for the same
thing (or has the same value) on each occasion it occurs. This is the most important conse-
quence of referential transparency and it is only in virtue of this property that we can use
the where-clauses or λ-expressions described in the last section.
The introduction of the assignment command alters all this, and if we confine ourselves to
the R-values of conventional mathematics we are faced with the problem of variables which
actually vary, so that their value may not be the same on two occasions and we can no longer
even be sure that the Boolean expression x = x has the value True. Referential transparency
has been destroyed, and without it we have lost most of our familiar mathematical tools—for
how much of mathematics can survive the loss of identity?
If we consider L-values as well as R-values, however, we can preserve referential trans-
parency as far as L-values are concerned. This is because L-values, being generalised
addresses, are not altered by assignment commands. Thus the command x := x+1 leaves
the address of the cell representing x (L-value of x) unchanged although it does alter the
contents of this cell (R-value of x). So if we agree that the values concerned are all L-values,
we can continue to use where-clauses and λ-expressions for describing parts of a program
which include assignments.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 23

The cost of doing this is considerable. We are obliged to consider carefully the relationship
between L and R-values and to revise all our operations which previously took R-value
operands so that they take L-values. I think these problems are inevitable and although
much of the work remains to be done, I feel hopeful that when completed it will not seem
so formidable as it does at present, and that it will bring clarification to many areas of
programming language study which are very obscure today. In particular the problems of
side effects will, I hope, become more amenable.
In the rest of this section I shall outline informally a way in which this problem can be
attacked. It amounts to a proposal for a method in which to formalise the semantics of a
programming language. The relation of this proposal to others with the same aim will be
discussed later. (Section 4.3.)

3.3.2. The abstract store. Our conceptual model of the computing process includes an
abstract store which contains both L-values and R-values. The important feature of this
abstract store is that at any moment it specifies the relationship between L-values and the
corresponding R-values. We shall always use the symbol σ to stand for this mapping from
L-values onto R-values. Thus if α is an L-value and β the corresponding R-value we shall
write (remembering the conventions discussed in the last section)

β = σ α.

The effect of an assignment command is to change the contents of the store of the machine.
Thus it alters the relationship between L-values and R-values and so changes σ . We can
therefore regard assignment as an operator on σ which produces a fresh σ . If we update
the L-value α (whose original R-value in σ was β) by a fresh R-value β 0 to produce a new
store σ 0 , we want the R-value of α in σ 0 to be β 0 , while the R-value of all other L-values
remain unaltered. This can be expressed by the equation

(U (α, β 0 ))σ = σ 0 where σ 0 x = (x = α) → β 0 , σ x.

Thus U is a function which takes two arguments (an L-value and an R-value) and produces
as a result an operator which transforms σ into σ 0 as defined.
The arguments of U are L-values and R-values and we need some way of getting these
from the expressions written in the program. Both the L-value and the R-value of an
expression such as V[i+3] depend on the R-value of i and hence on the store. Thus both
must involve σ and if ε stands for a written expression in the programming language we
shall write L ε σ and R ε σ for its L-value and R-value respectively.
Both L and R are to be regarded as functions which operate on segments of text of the
programming language. The question of how those segments are isolated can be regarded
as a matter of syntactic analysis and forms no part of our present discussion.
These functions show an application to Schönfinkel’s device which is of more than merely
notational convenience. The function R, for example, shows that its result depends on both
ε and σ , so it might be thought natural to write it as R(ε, σ ). However by writing R ε σ
and remembering that by our convention of association to the left this means (R ε)σ it
becomes natural to consider the application of R to ε separately and before the application
24 STRACHEY

of R ε to σ . These two phases correspond in a very convenient way to the processes of


compilation, which involves manipulation of the text of the program, and execution which
involves using the store of the computer. Thus the notation allows us to distinguish clearly
between compile-time and execution-time processes. This isolation of the effect of σ is a
characteristic of the method of semantic description described here.
It is sometimes convenient to use the contents function C defined by C α σ = σ α.
Then if
α =Lεσ
β =Rεσ

we have β = C α σ = σ α. After updating α by β 0 , we have

σ 0 = U (α, β 0 )σ

and

C α σ 0 = β 0.

3.3.3. Commands. Commands can be considered as functions which transform σ . Thus


the assignment

ε1 := ε2

has the effect of producing a store

σ 0 = U (α1 , β2 )σ

where

α1 = L ε1 σ

and

β2 = R ε2 σ

so that

σ 0 = U (L ε1 σ, R ε2 σ )σ

and if θ is the function on σ which is equivalent to the original command we have

σ 0 = θσ

where

θ = λ σ. U (L ε1 σ, R ε2 σ )σ
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 25

Sequences of commands imply the successive application of sequences of θ ’s. Thus, for
example, if γ1 , γ2 , γ3 are commands and θ1 , θ2 , θ3 the equivalent functions on σ , the
command sequence (or compound command)

γ1 ;γ2 ;γ3 ;

applied to a store σ will produce a store

σ 0 = θ3 (θ2 (θ1 σ ))
= (θ3 · θ2 · θ1 )σ

where f · g is the function product of f and g.


Conditional commands now take a form similar to that of conditional expressions. Thus
the command

Test ε1 If so do γ1
If not do γ2

corresponds to the operator

λσ. If (R ε1 σ )(θ1 , θ2 )σ

where θ1 and θ2 correspond to γ1 and γ2 .


Conditional expressions can also be treated more naturally. The dummy argument in-
troduced in the last section to delay evaluation can be taken to be σ with considerable
advantages in transparency. Thus

R(ε1 ε2 , ε3 )σ = If (R ε1 σ )(R ε2 , R ε3 )σ

and

L(ε1 ε2 , ε3 )σ = If (R ε1 σ )(L ε2 , L ε3 )σ

Informally R ε2 and L ε2 correspond to the compiled program for evaluating ε2 in the


R-mode or L-mode respectively. The selector If (R ε1 σ ) chooses between these at execu-
tion time on the basis of the R-value of ε1 while the final application to σ corresponds to
running the chosen piece of program.
If we consider commands as being functions operating on σ , loops and cycles are merely
recursive functions also operating on σ . There is, however, no time to go further into these
in this course.
An interesting feature of this approach to the semantics of programming languages is that
all concept of sequencing appears to have vanished. It is, in fact, replaced by the partially
ordered sequence of functional applications which is specified by λ-expressions.
In the remaining sections we shall revert to a slightly less formal approach, and try to
isolate some important ‘high level’ concepts in programming languages.
26 STRACHEY

3.4. Definition of functions and routines

3.4.1. Functional abstractions. In order to combine programs hierarchically we need the


process of functional abstraction. That is to say that we need to be able to form functions
from expressions such as

let f[x] = 5x2 + 3x + 2/x3

This could be thought of as defining f to be a function and giving it an initial value.


Thus the form of definition given above is merely a syntactic variant of the standard form
of definition (which has the quantity defined alone on the left side)

let f = λx. 5x2 + 3x + 2/x3

This form makes it clear that it is f which is being defined and that x is a bound or dummy
variable and could be replaced by any other non-clashing name without altering the value
given to f.

3.4.2. Parameter calling modes. When the function is used (or called or applied) we write
f[ε] where ε can be an expression. If we are using a referentially transparent language
all we require to know about the expression ε in order to evaluate f[ε] is its value. There
are, however, two sorts of value, so we have to decide whether to supply the R-value or the
L-value of ε to the function f. Either is possible, so that it becomes a part of the definition
of the function to specify for each of its bound variables (also called its formal parameters)
whether it requires an R-value or an L-value. These alternatives will also be known as
calling a parameter by value (R-value) or reference (L-value).
Existing programming languages show a curious diversity in their modes of calling pa-
rameters. FORTRAN calls all its parameters by reference and has a special rule for providing
R-value expressions such as a + b with a temporary L-value. ALGOL 60, on the other hand,
has two modes of calling parameters (specified by the programmer): value and name. The
ALGOL call by value corresponds to call by R-value as above; the call by name,3 however,
is quite different (and more complex). Only if the actual parameter (i.e., the expression ε
above) is a simple variable is the effect the same as a call by reference. This incompatibility
in their methods of calling parameters makes it difficult to combine the two languages in a
single program.

3.4.3. Modes of free variables. The obscurity which surrounds the modes of calling the
bound variables becomes much worse when we come to consider the free variables of a
function. Let us consider for a moment the very simple function

f[x] = x + a

where a is a free variable which is defined in the surrounding program. When f is defined
we want in some way to incorporate a into its definition, and the question is do we use its
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 27

R-value or its L-value? The difference is illustrated in the following pair of CPL programs.
(In CPL a function definition using = takes its free variables by R-value and one using ≡
takes them by L-value.)

Free variable by R-value Free variable by L-value


let a = 3 let a = 3
let f[x] = x + a let f[x] ≡ x + a
... (f[5] = 8)... ...(f[5] = 8)...
a := 10 a := 10
... (f[5] = 8)... ...(f[5] = 15)...
The expressions in parentheses are all Booleans with the value true.
Thus the first example freezes the current R-value of a into the definition of f so that it
is unaffected by any future alterations (by assignment) to a, while the second does not. It
is important to realize, however, that even the second example freezes something (i.e., the
L-value of a) into the definition of f. Consider the example

let a = 3
let f[x] ≡ x + a
... (f[5] = 8),(a = 3) ...
§ let a = 100
... (f[5] = 8),(a = 100) ...
a := 10
... (f[5] = 8),(a = 10) ...
............§|
... (f[5] = 8),(a = 3) ...
Here there is an inner block enclosed in the statement brackets § ....... §| (which
corresponds to begin and end in ALGOL), and inside this an entirely fresh a has been
defined. This forms a hole in the scope of the original a in which it continues to exist but
becomes inaccessible to the programmer. However as its L-value was incorporated in the
definition of f, it is the original a which is used to find f[5]. Note that assignments to a in
the inner block affect only the second a and so do not alter f.
It is possible to imagine a third method of treating free variables (though there is nothing
corresponding for bound variables) in which the locally current meaning of the variables is
used, so that in the example above the second and third occurrences of f[5] would have
the values 105 and 15 respectively. I believe that things very close to this exist in LISP2
and are known as fluid variables. The objection to this scheme is that it appears to destroy
referential transparency irrevocably without any apparent compensating advantages.
In CPL the facilities for specifying the mode of the free variables are considerably
coarser than the corresponding facilities for bound variables. In the case of bound variables
the mode has to be specified explicitly or by default for each variable separately. For the
free variables, however, it is only possible to make a single specification which covers all
the free variables, so that they must all be treated alike. The first method is more flexible
and provides greater power for the programmer, but is also more onerous (although good
28 STRACHEY

default conventions can help to reduce the burden); the second is much simpler to use but
sometimes does not allow a fine enough control. Decisions between methods of this sort
are bound to be compromises reflecting the individual taste of the language designer and
are always open to objection on grounds of convenience. It is no part of a discussion on
the fundamental concepts of programming languages to make this sort of choice—it should
rest content with pointing out the possibilities.
A crude but convenient method of specification, such as CPL uses for the mode of the
free variables of a function, becomes more acceptable if there exists an alternative method
by which the finer distinctions can be made, although at the cost of syntactic inelegance.
Such a method exists in CPL and involves using an analogue to the own variables in ALGOL
60 proposed by Landin [6].

3.4.4. Own variables. The idea behind own variables is to allow some private or secret
information which is in some way protected from outside interference. The details were
never very clearly expressed in ALGOL and at least two rival interpretations sprang up,
neither being particularly satisfactory. The reason for this was that owns were associated
with blocks whereas, as Landin pointed out, the natural association is with a procedure
body. (In this case of functions this corresponds to the expression on the right side of the
function definition.)
The purpose is to allow a variable to preserve its value from one application of a function
to the next—say to produce a pseudo-random number or to count the number of times the
function is applied. This is not possible with ordinary local variables defined within the body
of the function as all locals are redefined afresh on each application of the function. It would
be possible to preserve information in a non-local variable—i.e., one whose scope included
both the function definition and all its applications, but it would not then be protected and
would be accessible from the whole of this part of the program. What we need is a way of
limiting the scope of a variable to be the definition only. In CPL we indicate this by using
the word in to connect the definition of the own variable (which is usually an initialised
one) with the function definitions it qualifies.
In order to clarify this point programs using each of the three possible scopes (non-
local, own and local) are written below in three ways viz. Normal CPL, CPL mixed with
λ-expressions to make the function definition in its standard form, and finally in pure λ-
expressions. The differences in the scope rules become of importance only when there is a
clash of names, so in each of these examples one or both of the names a and x are used
twice. In order to make it easy to determine which is which, a prime has been added to one
of them. However, the scope rules imply that if all the primes were omitted the program
would be unaltered.

1. Non-local variable

CPL let a' = 6


let x' = 10
let a = 3/x'
let f[x] ≡ x + a
.... f[a] ....
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 29

Mixed let a' = 6


let x' = 10
let a = 3/x'
let f ≡ λx. x + a
.... f[a] ....
Pure λ {λa'.{λx'. {λa. {λf. f a}[λx. x + a]}[3/x']}10}6
2. Own variable

CPL let a' = 6


let x' = 10
let a = 3/x'
in f[x] ≡ x + a
.... f[a'] ....
Mixed let a' = 6
let x' = 10
let f ≡ {λa. λx. x + a}[3/x']
.... f[a'] ....
Pure λ {λa'. {λx'. {λf. f a'}[{λa. λx. x + a}[3/x']]}10}6
3. Local variable

CPL let a' = 6


let x' = 10
let f[x] ≡ (x + a where a = 3/x)
.... f[a'] ....
Mixed let a' = 6
let x' = 10
let f ≡ λx.{λa.x + a}[3/x]
.... f[a'] ....
Pure λ {λa'. {λx'. {λf. f a' }[λx. {λa. x + a}[3/x]]}10}6

We can now return to the question of controlling the mode of calling the free variables
of a function. Suppose we want to define f[x] to be ax + b + c and use the R-value of
a and b but the L-value of c. A CPL program which achieves this effect is

let a' = a and b' = b


in f[x] ≡ a'x + b' + c
....

(Again the primes may be omitted without altering the effect.)


The form of definition causes the L-values of a', b' and c to be used, while the definition
of the variables a' and b' ensures that these are given fresh L-values which are initialised to
the R-values of a and b. As they are own variables, they are protected from any subsequent
assignments to a and b.
30 STRACHEY

3.4.5. Functions and routines. We have so far discussed the process of functional abstrac-
tion as applied to expressions. The result is called a function and when applied to suitable
arguments it produces a value. Thus a function can be regarded as a complicated sort of
expression. The same process of abstraction can be applied to a command (or sequence of
commands), and the result is know in CPL as a routine. The application of a routine to a
suitable set of arguments is a complicated command, so that although it affects the store of
the computer, it produces no value as a result.
Functions and routines are as different in their nature as expressions and commands. It
is unfortunate, therefore, that most programming languages manage to confuse them very
successfully. The trouble comes from the fact that it is possible to write a function which
also alters the store, so that it has the effect of a function and a routine. Such functions are
sometimes said to have side effects and their uncontrolled use can lead to great obscurity in
the program. There is no generally agreed way of controlling or avoiding the side effects
of functions, and most programming languages make no attempt to deal with the problem
at all—indeed their confusion between routines and functions adds to the difficulties.
The problem arises because we naturally expect referential transparency of R-values in
expressions, particularly those on the right of assignment commands. This is, I think, a very
reasonable expectation as without this property, the value of the expression is much harder
to determine, so that the whole program is much more obscure. The formal conditions
on expressions which have to be satisfied in order to produce this R-value referential
transparency still need to be investigated. However in special cases the question is usually
easy to decide and I suggest that as a matter of good programming practice it should always
be done. Any departure of R-value referential transparency in a R-value context should
either be eliminated by decomposing the expression into several commands and simpler
expressions, or, if this turns out to be difficult, the subject of a comment.

3.4.6. Constants and variables. There is another approach to the problem of side effects
which is somewhat simpler to apply, though it does not get round all the difficulties. This
is, in effect, to turn the problem inside out and instead of trying to specify functions and
expressions which have no side effect to specify objects which are immune from any possible
side effect of others. There are two chief forms which this protection can take which can
roughly be described as hiding and freezing. Their inaccessibility (by reason of the scope
rules) makes them safe from alteration except from inside the body of the function or routine
they qualify. We shall be concerned in this section and the next with different forms of
protection by freezing.
The characteristic thing about variables is that their R-values can be altered by an assign-
ment command. If we are looking for an object which is frozen, or invariant, an obvious
possibility is to forbid assignments to it. This makes it what in CPL we call a constant. It
has an L-value and R-value in the ordinary way, but applying the update function to it either
has no effect or produces an error message. Constancy is thus an attribute of an L-value, and
is, moreover, an invariant attribute. Thus when we create a new L-value, and in particular
when we define a new quantity, we must decide whether it is a constant or a variable.
As with many other attributes, it is convenient in a practical programming language to
have a default convention—if the attribute is not given explicitly some conventional value is
assumed. The choice of these default conventions is largely a matter of taste and judgement,
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 31

but it is an important one as they can affect profoundly both the convenience of the language
and the number of slips made by programmers. In the case of constancy, it is reasonable
that the ordinary quantities, such as numbers and strings, should be variable. It is only
rather rarely that we want to protect a numerical constant such as Pi from interference.
Functions and routines, on the other hand, are generally considered to be constants. We
tend to give them familiar or mnemonic names such a CubeRt or LCM and we would rightly
feel confused by an assignment such as CubeRt := SqRt. Routines and functions are
therefore given the default attribute of being a constant.

3.4.7. Fixed and free. The constancy or otherwise of a function has no connection with
the mode in which it uses its free variables. If we write a definition in its standard form
such as

let f ≡ λx. x + a

we see that this has the effect of initialising f with a λ-expression. The constancy of f merely
means that we are not allowed to assign to it. The mode of its free variables (indicated by
≡) is a property of the λ-expression.
Functions which call their free variables by reference (L-value) are liable to alteration
by assignments to their free variables. This can occur either inside or outside the function
body, and indeed, even if the function itself is a constant. Furthermore they cease to have
a meaning if they are removed from an environment in which their free variables exist. (In
ALGOL this would be outside the block in which their free variables were declared.) Such
functions are called free functions.
The converse of a free function is a fixed function. This is defined as a function which
either has no free variables, or if it has, whose free variables are all both constant and fixed.
The crucial feature of a fixed function is that it is independent of its environment and is
always the same function. It can therefore be taken out of the computer (e.g., by being
compiled separately) and reinserted again without altering its effect.
Note that fixity is a property of the λ-expression—i.e., a property of the R-value, while
constancy is a property of the L-value. Numbers, for example, are always fixed as are all
‘atomic’ R-values (i.e., ones which cannot be decomposed into smaller parts). It is only in
composite objects that the distinction between fixed and free has any meaning. If such an
object is fixed, it remains possible to get at its component parts, but not to alter them. Thus,
for example, a fixed vector is a look-up table whose entries will not (cannot) be altered,
while a free vector is the ordinary sort of vector in which any element may be changed if
necessary.

3.4.8. Segmentation. A fixed routine or function is precisely the sort of object which can
be compiled separately. We can make use of this to allow the segmentation of programs
and their subsequent assembly even when they do communicate with each other through
free variables. The method is logically rather similar to the FORTRAN Common variables.
Suppose R[x] is a routine which uses a, b, and c by reference as free variables. We can
define a function R'[a,b,c] which has as formal parameters all the free variables of R and
32 STRACHEY

whose result is the routine R[x]. Then R' will have no free variables and will thus be a
fixed function which can be compiled separately.
The following CPL program shows how this can be done:

§ let R'[ref a,b,c] = value of


§ let R[x] be
§ ... a,b,c ...
(body of R) §|
result is R §|
WriteFixedFunction [R']
finish §|

The command WriteFixedFunction [R'] is assumed to output its argument in some


form of relocatable binary or otherwise so that it can be read in later by the function
ReadFixedFunction.
If we now wish to use R in an environment where its free variables are to be p, q and r
and its name is to be S we can write

§ let p,q,r = . . . (Setting up the environment)


let S' = ReadFixedFunction
let S = S'[p,q,r]
.... S[u] .... §|
In this way S' becomes the same function as R' and the call S'[p,q,r], which use the
L-values of p, q and r, produces S which is the original routine R but with p, q and r as its
free variables instead of a, b and c.
One advantage of this way of looking at segmentation is that it becomes a part of the
ordinary programming language instead of a special ad hoc device. An unfamiliar feature
will be its use of a function R' which has as its result another function or routine. This is
discussed in more detail in the next section.

3.5. Functions and routines as data items.

3.5.1. First and second class objects. In ALGOL a real number may appear in an expression
or be assigned to a variable, and either may appear as an actual parameter in a procedure
call. A procedure, on the other hand, may only appear in another procedure call either
as the operator (the most common case) or as one of the actual parameters. There are no
other expressions involving procedures or whose results are procedures. Thus in a sense
procedures in ALGOL are second class citizens—they always have to appear in person
and can never be represented by a variable or expression (except in the case of a formal
parameter), while we can write (in ALGOL still)

(if x > 1 then a else b) + 6


FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 33

when a and b are reals, we cannot correctly write

(if x > 1 then sin else cos)(x)

nor can we write a type procedure (ALGOL’s nearest approach to a function) with a result
which is itself a procedure.
Historically this second class status of procedures in ALGOL is probably a consequence
of the view of functions taken by many mathematicians: that they are constants whose
name one can always recognise. This second class view of functions is demonstrated by the
remarkable fact that ordinary mathematics lacks a systematic notation for functions. The
following example is given by Curry [7, p. 81].
Suppose P is an operator (called by some a ‘functional’) which operates on functions.
The result of applying P to a function f (x) is often written P[ f (x)]. What then does
P[ f (x + 1)] mean? There are two possible meanings (a) we form g(x) = f (x + 1) and
the result is P[g(x)] or (b) we form h(x) = P[ f (x)] and the result is h(x + 1). In many
cases these are the same but not always. Let
( f (x) − f (0)
for x 6= 0
P[ f (x)] = x
f 0 (x) for x = 0

Then if f (x) = x 2

P[g(x)] = P[x 2 + 2x + 1] = x + 2

while

h(x) = P[ f (x)] = x

so that h(x + 1) = x + 1.
This sort of confusion is, of course, avoided by using λ-expressions or by treating func-
tions as first class objects. Thus, for example, we should prefer to write (P[ f ])[x] in place of
P[ f (x)] above (or, using the association rule P[ f ][x] or even P f x). The two alternatives
which were confused would then become

Pgx where g x = f (x + 1)

and P f (x + 1).
The first of these could also be written P(λx. f (x + 1))x.
I have spent some time on this discussion in spite of its apparently trivial nature, because
I found, both from personal experience and from talking to others, that it is remarkably
difficult to stop looking on functions as second class objects. This is particularly unfortunate
as many of the more interesting developments of programming and programming languages
come from the unrestricted use of functions, and in particular of functions which have
functions as a result. As usual with new or unfamiliar ways of looking at things, it is harder
for the teachers to change their habits of thought than it is for their pupils to follow them. The
34 STRACHEY

difficulty is considerably greater in the case of practical programmers for whom an abstract
concept such as a function has little reality until they can clothe it with a representation and
so understand what it is that they are dealing with.

3.5.2. Representation of functions. If we want to make it possible to assign functions


we must be clear what are their L-values and R-values. The L-value is simple—it is the
location where the R-value is stored—but the R-value is a little more complicated. When
a function is applied it is the R-value which is used, so that at least sufficient information
must be included in the R-value of a function to allow it to be applied. The application of a
function to its arguments involves the evaluation of its defining expression after supplying
the values of its bound variables from the argument list. To do this it is necessary to provide
an environment which supplies the values of the free variables of the function.
Thus the R-value of a function contains two parts—a rule for evaluating the expression,
and an environment which supplies its free variables. An R-value of this sort will be called
a closure. There is no problem in representing the rule in a closure, as the address of a piece
of program (i.e., a subroutine entry point) is sufficient. The most straightforward way of
representing the environment part is by a pointer to a Free Variable List (FVL) which has an
entry for each free variable of the function. This list is formed when the function is initially
defined (more precisely when the λ-expression which is the function is evaluated, usually
during a function definition) and at this time either the R-value or the L-value of each of
the free variables is copied into the FVL. The choice of R- or L-value is determined by the
mode in which the function uses its free variables. Thus in CPL functions defined by = have
R-values in their FVL while functions defined by ≡ have L-values. Own variables of the
kind discussed in the previous section can also be conveniently accommodated in the FVL.
The concept of a closure as the R-value of a function makes it easier to understand
operations such as passing a function as a parameter, assigning to a variable of type function,
or producing a function as the value of an expression or result of another function application.
In each case the value concerned, which is passed on or assigned, is a closure consisting of
a pair of addresses.
It is important to note that a function closure does not contain all the information associ-
ated with the function, it merely gives access (or points) to it, and that as the R-value of a
function is a closure, the same applies to it. This is in sharp distinction to the case of data
items such as reals or integers whose R-value is in some sense atomic or indivisible and
contains all the information about the data items.
This situation, where some of the information is in the FVL rather than the R-value,
is quite common and occurs not only with functions and routines, but also with labels,
arrays and all forms of compound data structure. In these cases it is meaningful to ask if the
information which is in the FVL or accessible through it is alterable or whether it cannot
be changed at all, and this property provides the distinction between free and fixed objects.
A function which has been defined recursively so that the expression representing it
includes at least one mention of its own name, can also be represented rather simply by
making use of closures. Suppose, for example, we take the non-recursive function

let f[x]= (x = 0) 1,x*g[x-1]


FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 35

This has a single free variable, the function g which is taken by R-value. Thus the closure
for f would take the form

If we now identify g with f, so that the function becomes the recursively defined factorial,
all we need to do is to ensure that the FVL contains the closure for f, Thus it will take the form

so that the FVL, which now contains a copy of the closure for f, in fact points to itself. It
is a characteristic feature of recursively defined functions of all sorts that they have some
sort of a closed loop in their representation.

3.6. Types and polymorphism

3.6.1. Types. Most programming languages deal with more than one sort of object—for
example with integers and floating point numbers and labels and procedures. We shall call
each of these a different type and spend a little time examining the concept of type and
trying to clarify it.
A possible starting point is the remark in the CPL Working Papers [3] that “The Type of
an object determines its representation and constrains the range of abstract object it may be
used to represent. Both the representation and the range may be implementation dependent”.
This is true, but not particularly helpful. In fact the two factors mentioned—representation
and range—have very different effects. The most important feature of a representation
is the space it occupies and it is perfectly possible to ignore types completely as far as
representation and storage is concerned if all types occupy the same size of storage. This
is in fact the position of most assembly languages and machine code—the only differences
of type encountered are those of storage size.
In more sophisticated programming languages, however, we use the type to tell us what
sort of object we are dealing with (i.e., to restrict its range to one sort of object). We
also expect the compiling system to check that we have not made silly mistakes (such as
multiplying two labels) and to interpret correctly ambiguous symbols (such as +) which
mean different things according to the types of their operands. We call ambiguous operators
of this sort polymorphic as they have several forms depending on their arguments.
The problem of dealing with polymorphic operators is complicated by the fact that the
range of types sometimes overlap. Thus for example 3 may be an integer or a real and it
may be necessary to change it from one type to the other. The functions which perform
this operation are known as transfer functions and may either be used explicitly by the
programmer, or, in some systems, inserted automatically by the compiling system.
36 STRACHEY

3.6.2. Manifest and latent. It is natural to ask whether type is an attribute of an L-value
or of an R-value—of a location or of its content. The answer to this question turns out to be
a matter of language design, and the choice affects the amount of work, which can be done
when a program is compiled as opposed to that which must be postponed until it is run.
In CPL the type is a property of an expression and hence an attribute of both its L-value and
its R-value. Moreover L-values are invariant under assignment and this invariance includes
their type. This means that the type of any particular written expression is determined solely
by its position in the program. This in turn determines from their scopes which definitions
govern the variables of the expression, and hence give their types. An additional rule states
that the type of the result of a polymorphic operator must be determinable from a knowledge
of the types of its operands without knowing their values. Thus we must be able to find the
type of a + b without knowing the value of either a or b provided only that we know both
their types.4
The result of these rules is that the type of every expression can be determined at compile
time so that the appropriate code can be produced both for performing the operations and
for storing the results.
We call attributes which can be determined at compile time in this way manifest; attributes
that can only be determined by running the program are known as latent. The distinction
between manifest and latent properties is not very clear cut and depends to a certain extent
on questions of taste. Do we, for example, take the value of 2 + 3 to be manifest or latent?
There may well be a useful and precise definition—on the other hand there may not. In
either case at present we are less interested in the demarkation problem than in properties
which are clearly on one side or other of the boundary.

3.6.3. Dynamic type determination. The decision in CPL to make types a manifest prop-
erty of expressions was a deliberate one of language design. The opposite extreme is also
worth examining. We now decide that types are to be attributes of R-values only and that
any type of R-value may be assigned to any L-value. We can settle difficulties about stor-
age by requiring that all types occupy the same storage space, but how do we ensure that
the correct operations are performed for polymorphic operators? Assembly languages and
other ‘simple’ languages merely forbid polymorphism. An alternative, which has interest-
ing features, is to carry around with each R-value an indication of its type. Polymorphic
operators will then be able to test this dynamically (either by hardware or program) and
choose the appropriate version.
This scheme of dynamic type determination may seem to involve a great deal of extra
work at run time, and it is true that in most existing computers it would slow down pro-
grams considerably. However the design of central processing units is not immutable and
logical hardware of the sort required to do a limited form of type determination is relatively
cheap. We should not reject a system which is logically satisfactory merely because today’s
computers are unsuitable for it. If we can prove a sufficient advantage for it machines
with the necessary hardware will ultimately appear even if this is rather complicated; the
introduction of floating-point arithmetic units is one case when this has already happened.

3.6.4. Polymorphism. The difficulties of dealing with polymorphic operators are not re-
moved by treating types dynamically (i.e., making them latent). The problems of choosing
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 37

the correct version of the operator and inserting transfer functions if required remain more
or less the same. The chief difference in treating types as manifest is that this information
has to be made available to the compiler. The desire to do this leads to an examination
of the various forms of polymorphism. There seem to be two main classes, which can be
called ad hoc polymorphism and parametric polymorphism.
In ad hoc polymorphism there is no single systematic way of determining the type of the
result from the type of the arguments. There may be several rules of limited extent which
reduce the number of cases, but these are themselves ad hoc both in scope and content. All
the ordinary arithmetic operators and functions come into this category. It seems, moreover,
that the automatic insertion of transfer functions by the compiling system is limited to this
class.
Parametric polymorphism is more regular and may be illustrated by an example. Suppose
f is a function whose argument is of type α and whose results is of β (so that the type of
f might be written α ⇒ β), and that L is a list whose elements are all of type α (so that
the type of L is α list). We can imagine a function, say Map, which applies f in turn to
each member of L and makes a list of the results. Thus Map[f,L] will produce a β list.
We would like Map to work on all types of list provided f was a suitable function, so that
Map would have to be polymorphic. However its polymorphism is of a particularly simple
parametric type which could be written

(α ⇒ β, α list) ⇒ β list

where α and β stand for any types.


Polymorphism of both classes presents a considerable challenge to the language designer,
but it is not one which we shall take up here.

3.6.5. Types of functions. The type of a function includes both the types and modes of
calling of its parameters and the types of its results. That is to say, in more mathematical
terminology, that it includes the domain and the range of the function. Although this seems
a reasonable and logical requirement, it makes it necessary to introduce the parametric
polymorphism discussed above as without it functions such as Map have to be redefined
almost every time they are used.
Some programming languages allow functions with a variable number of arguments;
those are particularly popular for input and output. They will be known as variadic functions,
and can be regarded as an extreme form of polymorphic function.5
A question of greater interest is whether a polymorphic function is a first class object in
the sense of Section 3.5.1. If it is, we need to know what type it is. This must clearly include
in some way the types of all its possible versions. Thus the type of a polymorphic function
includes or specifies in some way the nature of its polymorphism. If, as in CPL, the types
are manifest, all this information must be available to the compiler. Although this is not
impossible, it causes a considerable increase in the complexity of the compiler and exerts a
strong pressure either to forbid programmers to define new polymorphic functions or even
to reduce all polymorphic functions to second class status. A decision on these points has
not yet been taken for CPL.
38 STRACHEY

3.7. Compound data structures

3.7.1. List processing. While programming was confined to problems of numerical anal-
ysis the need for general forms of data structure was so small that it was often ignored.
For this reason ALGOL, which is primarily a language for numerical problems, contains no
structure other than arrays. COBOL, being concerned with commercial data processing, was
inevitably concerned with larger and more complicated structures. Unfortunately, however,
the combined effect of the business man’s fear of mathematics and the mathematician’s
contempt for business ensured that this fact had no influence on the development of general
programming languages.
It was not until mathematicians began using computers for non-numerical purposes—
initially in problems connected with artificial intelligence—that any general forms of com-
pound data structure for programming languages began to be discussed. Both IPL V and
LISP used data structures built up from lists and soon a number of other ‘List Processing’
languages were devised.
The characteristic feature of all these languages is that they are designed to manipulate
more or less elaborate structures, which are built up from large numbers of components
drawn from a very limited number of types. In LISP, for instance, there are only two sorts
of object, an atom and a cons-word which is a doublet. The crucial feature is that each
member of a doublet can itself be either an atom or another cons-word. Structures are built
up by joining together a number of cons-words and atoms.
This scheme of building up complex structures from numbers of similar and much simpler
elements has a great deal to recommend it. In some sense, moreover, the doublet of LISP
is the simplest possible component from which to construct a structure and it is certainly
possible to represent any other structure in terms of doublets. However from the practical
point of view, not only for economy of implementation but also for convenience in use, the
logically simplest representation is not always the best.
The later list processing languages attempted to remedy this by proposing other forms
of basic building block with more useful properties, while still, of course, retaining the
main plan of using many relatively simple components to form a complex structure. The
resulting languages were generally very much more convenient for some classes of problems
(particularly those they had been designed for) and much less suitable (possibly on grounds
of efficiency) for others. They all, however, had an ad hoc look about them and arguments
about their relative merits seemed somewhat unreal.
In about 1965 or 1966 interest began to turn to more general schemes for compound
data structures which allowed the programmer to specify his own building blocks in some
very general manner rather than having to make do with those provided by the language
designer. Several such schemes are now around and in spite of being to a large extent
developed independently they have a great deal in common—at least as far as the structures
described in the next section as nodes are concerned. In order to illustrate these ideas, I
shall outline the scheme which will probably be incorporated in CPL.

3.7.2. Nodes and elements. The building blocks from which structures are formed are
known as nodes. Nodes may be of many types and the definition of a new node is in fact
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 39

the definition of a new programmer-defined type in the sense of section 3.6. A node may
be defined to consist of one or more components; both the number and the type of each
component is fixed by the definition of the node. A component may be of any basic or
programmer-defined type (such as a node), or may be an element. This represents a data
object of one of a limited number of types; the actual type of object being represented is
determined dynamically. An element definition also forms a new programmer-defined type
in the sense of Section 3.6 and it also specifies which particular data types it may represent.
Both node and element definitions are definitions of new types, but at the same time
they are used to form certain basic functions which can be used to operate on and construct
individual objects of these types. Compound data structures may be built up from individuals
of these types by using these functions.
The following example shows the node and element definitions which allow the lists of
LISP to beformed.

node Cons is LispList : Car


with Cons : Cdr
element LispList is Atom
or Cons
node Atom is string PrintName
with Cons : PropertyList

These definitions introduce three new types: Cons and Atom, which are nodes, and
LispList which is an element. They also define the basic selector and constructor functions
which operate on them. These functions have the following effect.
If x is an object of type Cons, it has two components associated with it; the first, which is
of manifest type LispList is obtained by applying the appropriate selector function Car to
x, thus Car[x] is the first component of x and is of type LispList. The second component
of x is Cdr[x] and is an object of type Cons.
If p is an object of type LispList and q is an object of type Cons, we can form a fresh
node of type Cons whose first component is p and whose second component is q by using
the constructor function Cons[p,q] which always has the same name as the node type.
Thus we have the basic identities

Car[Cons[p,q]]= p
Cdr[Cons[p,q]]= q

In an exactly similar way the definition of the node Atom will also define the two selector
functions PrintName and PropertyList and the constructor function Atom.
The number of components of a node is not limited to two—any non-zero number is
allowed. There is also the possibility that any component may be the special object NIL.This
can be tested for by the system predicate Null. Thus, for example, if end of a list is indicated
by a NIL second component, we can test for this by the predicate Null[Cdr[x]].
There is also a constructor function associated with an element type. Thus, for example
if n is an atom, LispList[n] is an object of type LispList dynamically marked as being
an atom and being in fact the atom n. There are two general system functions which apply
40 STRACHEY

to elements, both are concerned with finding their dynamically current type. The function
Type[p] where p is a LispList will have the result either Atom or Cons according to the
current type of p. In a similar way the system predicate Is[Atom,p] will have the value
true if p is dynamically of type Atom.
These definitions give the basic building block of LISP using the same names with the
exception of Atom. In Lisp Atom[p] is the predicate which would be written here as
Is[Atom,p]. We use the function Atom to construct a new atom from a PrintName and a
PropertyList.

3.7.3. Assignments. In order to understand assignments in compound data structure we


need to know what are the L- and R-values of nodes and their components.
Let us suppose that A is a variable of type Cons—i.e., that A is a named quantity to which
we propose to assign objects of type Cons. The L-value of A presents no problems; like any
other L-value it is the location where the R-value of A is stored. The R-value of A must give
access to the two components of A (Car[A] and Cdr[A])—i.e., it must give their L-values
or locations. Thus, we have the diagram:

The L-values or locations are represented by boxes. The R-value of A is represented by


the ‘puppet strings’ which lead from the inside of the L-value of A to the L-values of its
components. One can think of the ‘shape’ of the box as representing its type and hence
specifying the kind of occupant which may be put there.
Using this sort of diagram, it is now simple to determine the effect of an assignment.
Consider the structure

The effect of obeying the assignment command

Car[Car[A]] := Cdr[Cdr[A]]
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 41

can be determined by the following steps

(1) Find the L-value of [Car[Car[A]].


This is the box marked (1).
(2) Find the R-value of [Cdr[Cdr[A]].
This is the puppet string marked (2).

(1) and (2) may be carried out in either order as neither actually alters the structure.

(3) Replace the contents of (1) by a copy of (2).

The resulting structure is as follows

Notice that this assignment has changed the pattern of sharing in the structure so that
now Car[Car[Car[A]]] and Car[Cdr[Cdr[A]]] actually share the same L-value (and
hence also the same R-value). This is because the assignment statements only take a copy
of the R-value of its right hand side, not a copy of all the information associated with it. In
this respect, structures are similar to functions whose FVL is not copied on assignment.
Thus, as with functions, the R-value of a compound data structure gives access to all the
information in the structure but does not contain it all. So that the distinction between fixed
and free applies as much to structures as it does to functions.

3.7.4. Implementation. The discussion of R- and L-values of nodes has so far been quite
general. I have indicated what information must be available, but in spite of giving diagrams
I have not specified in any way how it should be represented. I do not propose to go into
problems of implementation in any detail—in any case many of them are very machine
dependent—but an outline of a possible scheme may help to clarify the concepts.
Suppose we have a machine with a word length which is a few bits longer than a single
address. The R-value of a node will then be an address pointing to a small block of
consecutive words, one for each component, containing the R-values of the components.
An element requires for its R-value an address (e.g., the R-value of a node) and a marker to
say which of the various possibilities is its dynamically current type. (There should be an
escape mechanism in case there are too few bits available for the marker.) The allocation
42 STRACHEY

and control of storage for these nodes presents certain difficulties. A great deal of work has
been done on this problem and workable systems have been devised. Unfortunately there
is no time to discuss these here.
If we use an implementation of this sort for our example in the last section, we shall find
that nodes of type Cons will fill two consecutive words. The ‘puppet string’ R-values can
be replaced by the address of the first of these, so that we can redraw our diagram as

After the assignment

Car[Car[A]] := Cdr[Cdr[A]]

this becomes

3.7.5. Programming example. The following example shows the use of a recursively de-
fined routine which has a structure as a parameter and calls it by reference (L-value). A
tree sort takes place in two phases. During the first the items to be sorted are supplied in
sequence as arguments to the routine AddtoTree. The effect is to build up a tree structure
with an item and two branches at each node. The following node definitions define the
necessary components.

node Knot is Knot : Pre


with Knot : Suc
with Data : Item
node Data is integer Key
with Body : Rest

Here the key on which the sort is to be performed is an integer and the rest of the
information is of type Body. The routine for the first phase is

rec AddtoTree [ref Knot : x, value Data : n] is


§ Test Null[x]
If so do x := Knot[NIL,NIL,n]
If not do AddtoTree[((Key[n] < Key[Item[x]]) Pre[x],
Suc[x]),n]
return §|
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 43

The effect of this is to build up a tree where all the items accessible from the Pre (prede-
cessor) branch of a Knot precede (i.e., have smaller keys) than the item at the Knot itself,
and this in turn precedes all those which are accessible from the Suc (successor) branch.

The effect of AddtoTree(T,N) where N is a data-item whose Key is 4 would be to replace


the circled NIL node by the node

where the central branch marked 4 stands for the entire data-item N.
The second phase of a tree sort forms a singularly elegant example of the use of a
recursively defined routine. Its purpose is effectively to traverse the tree from left to right
printing out the data-items at each Knot. The way the tree has been built up ensures that
the items will be in ascending order of Keys.
We suppose that we have a routine PrintBody which will print information in a data-item
in the required format. The following routine will then print out the entire tree.

rec PrintTree[Knot:x] is

§ Unless Null[x] do
§ PrintTree[Pre[x]]
PrintBody[Rest[Item[x]]]
PrintTree[Suc[x]] §|
return §|

3.7.6. Pointers. There is no reason why an R-value should not represent (or be) a location;
such objects are known as pointers. Suppose X is a real variable with an L-value α. Then
if P is an object whose R-value is α, we say the type of P is real pointer and that P
44 STRACHEY

‘points to’ X. Notice that the type of a pointer includes the type of the thing it points to, so
that pointers form an example of parametric type. (Arrays form another.) We could, for
example, have another pointer Q which pointed to P; in this case Q would be of type real
pointer pointer.
There are two basic (polymorphic) functions associated with pointers:
Follow[P] (also written ↓ P in CPL) calls its argument by R-value and produces as a
result the L-value of the object pointed to. This is, apart from changes of representation,
the same as its argument. Thus we have

L-value of Follow[P] = P
R-value of Follow[P] = Contents of P

The function Pointer[X] calls its argument by L-value and produces as a result an
R-value which is a pointer to X.

Follow[Pointer[X]]

has the same L-value as X.


We can assign either to P or to Follow[P], but as their types are not the same we must
be careful to distinguish which we mean.

P := Follow[Y]

will move the pointer P

↓ P := ↓ P + 2

will add 2 to the number P points to.


Pointers are useful for operating on structures and often allow the use of loops instead of
recursive functions. (Whether this is an advantage or not may be a matter for discussion.
With current machines and compilers loops are generally faster than recursion, but the
program is sometimes harder to follow.) The following routine has the same effect as the
first routine in the previous section. (It is not nearly so easy to turn the other recursive
routine into a loop, although it can be done.)

AddtoTree' [ref Knot : x, value Data : n] is


§ let p = Pointer[x]
until Null[↓p] do
p := (Key[n] < Key[Item[↓p]]) a Pointer[Pre[↓p]],
Pointer[Suc[↓p]]
↓p := Knot[NIL,NIL,n]
return §|

3.7.7. Other forms of structure. Vectors and arrays are reasonably well understood. They
are parametric types so that the type of an array includes its dimensionality (the number of
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 45

its dimensions but not their size) and also the type of its elements. Thus unlike in nodes,
all the elements of an array have to be of the same type, though their number may vary
dynamically. It is convenient, though perhaps not really necessary, to regard an n-array
(i.e., one with n dimensions) as a vector whose elements are (n − 1)-arrays.
We can then regard the R-value of a vector as something rather similar to that of a node
in that it gives access (or points to) the elements rather than containing them. Thus the
assignment of a vector does not involve copying its elements.
Clearly if this is the case we need a system function Copy (or possibly CopyVector)
which does produce a fresh copy.
There are many other possible parametric structure types which are less well understood.
The following list is certainly incomplete.

List An ordered sequence of objects all of the same type. The number is dynamically
variable.
Ntuple A fixed (manifest) number of objects all of the same type. This has many advan-
tages for the implementer.
Set In the mathematical sense. An unordered collection of objects all of which are of
the same type but different from each other. Operations on sets have been proposed for
some languages. The lack of ordering presents considerable difficulty.
Bag or Coll This is a new sort of collection for which there is, as yet, no generally
accepted name. It consists of an unordered collection of objects all of which are of the
same type and differs from a set in that repetitions are allowed. (The name bag is derived
from probability problems concerned with balls of various colours in a bag.) A bag is
frequently the collection over which an iteration is required—e.g., when averaging.

There are also structures such as rings which cannot be ‘syntactically’ defined in the
manner of nodes. They will probably have to be defined in terms of the primitive functions
which operate on them or produce them.
It is easy enough to include any selection of these in a programming language, but the
result would seem rather arbitrary. We still lack a convincing way of describing those and
any other extensions to the sort of structures that a programmer may want to use.

4. Miscellaneous topics

In this section we take up a few points whose detailed discussion would have been out of
place before.

4.1. Load-Update Pairs

A general L-value (location) has two important features: There is a function which gives the
corresponding R-value (contents) and another which will update this. If the location is not
simply addressable, it can therefore be represented by a structure with two components—a
Load part and an Update part; these two can generally share a common FVL. Such an
46 STRACHEY

L-value is known as a Load-Update Pair (LUP). We can now represent any location of type
α by an element (in the sense of Section 3.7.2)

element α Location is α Address


or α LUP

node α LUP is α Function[ ] : Load


with Routine [α: ∗] : Update

Note that these are parametrically polymorphic definitions. There is also a constraint on
the components of a LUP that if X is an α LUP and y is of type α

y = value of § Update[x][y]
result is Load[X] §|

LUPs are of considerable practical value even when using machine code. A uniform
system which tests a general location to see if it is addressable or not (in which case it is a
LUP)—say by testing a single bit—can then use the appropriate machine instruction (e.g.
CDA or STO) or apply the appropriate part of the LUP. This allows all parts of the machine to
be treated in a uniform manner as if they were all addressable. In particular index registers,
which may need loading by special instruction, can then be used much more freely.
Another interesting example of the use of a LUP is in dealing with the registers which
set up the peripheral equipment. In some machines these registers can be set but not read
by the hardware. Supervisory programs are therefore forced to keep a copy of their settings
in normal store, and it is quite easy to fail to keep these two in step. If the L-value of the
offending register is a LUP, and it is always referred to by this, the Update part can be made
to change both the register and its copy, while the Load part reads from the copy.
The importance of this use of LUPs is that it reduces the number of ad hoc features of the
machine and allows much greater uniformity by treatment. This in turn makes it easier for
programmers at the machine code level to avoid oversights and other errors and, possibly
more important, makes it easier to write the software programs dealing with these parts of
the machine in a high level language and to compile them.
The disadvantage in current machines is that, roughly speaking, every indirect reference
requires an extra test to see if the location is addressable. Although this may be unaccept-
able for reasons of space or time (a point of view which requires the support of much more
convincing reasons than have yet been given), it would be a relatively insignificant extra
complication to build a trap into the hardware for this test. It is the job of people investi-
gating the fundamental concepts of programming to isolate the features such as this whose
incorporation in the hardware of machine would allow or encourage the simplification of
its software.

4.2. Macrogenerators

Throughout this course, I have adopted the point of view that programming languages are
dealing with abstract objects (such as numbers or functions) and that the details of the way
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 47

in which we represent these are of relatively secondary importance. It will not have escaped
many readers that in the computing world, and even more so in the world of mathematicians
today, this is an unfashionable if not heretical point of view. A much more conventional
view is that a program is a symbol string (with the strong implication that it is nothing more),
a programming language the set of rules for writing down local strings, and mathematics
in general a set of rules for manipulating strings.
The outcome of this attitude is a macrogenerator whose function is to manipulate or
generate symbol strings in programming languages without any regard to their semantic
content. Typically such a macrogenerator produces ‘code’ in some language which is already
implemented on the machine and whose detailed representation must be familiar to anyone
writing further more definitions. It will be used to extend the power of the base language,
although generally at the expense of syntactic convenience and often transparency, by adding
new macrocommands.
This process should be compared with that of functional abstraction and the definition
of functions and routines. Both aim to extend the power of the language by introducing
now operations. Both put a rather severe limit on the syntactic freedom with which the
extensions can be made.
The difference lies in the fact that macrogenerators deal with the symbols which represent
the variables, values and other objects of concern to a program so that all their manipulation
is performed before the final compiling. In other words all macrogeneration is manifest.
Function and routine definitions on the other hand are concerned with the values themselves,
not with the symbols which represent them and thus, in the first instance are dynamic (or
latent) rather than manifest.
The distinction is blurred by the fact that the boundary between manifest and latent is
not very clear cut, and also by the fact that it is possible by ingenuity and at the expense of
clarity to do by a macrogenerator almost everything that can be done by a function definition
and vice versa. However the fact that it is possible to push a pea up a mountain with your
nose does not mean that this is a sensible way of getting it there. Each of these techniques
of language extension should be used in its proper place.
Macrogeneration seems to be particularly valuable when a semantic extension of the
language is required. If this is one which was not contemplated by the language designer the
only alternative to trickery with macros is to rewrite the compiler—in effect to design a new
language. This has normally been the situation with machine code and assembly languages
and also to a large extend with operating systems. The best way to avoid spending all your
time fighting the system (or language) is to use a macrogenerator and build up your own.
However with a more sophisticated language the need for a macrogenerator diminishes,
and it is a fact that ALGOL systems on the whole use macrogenerators very rarely. It is,
I believe, a proper aim for programming language designers to try to make the use of
macrogenerators wholly unnecessary.

4.3. Formal semantics

Section 3.3 gives an outline of a possible method for formalising the semantics of program-
ming languages. It is a development of an earlier proposal [8], but it is far from complete
and cannot yet be regarded as adequate.
48 STRACHEY

There are at present (Oct. 1967) only three examples of the formal description of the
semantics of a real programming language, as opposed to those which deal with emasculated
versions of languages with all the difficulties removed. These are the following:

(i) Landin’s reduction of ALGOL to λ-expressions with the addition of assignments and
jumps. This requires a special form of evaluating mechanism (which is, of course, a
notional computer) to deal with the otherwise non-applicative parts of the language.
The method is described in [6] and given in full in [9].
(ii) de Bakker [10] has published a formalisation of most of ALGOL based on an extension
of Markov algorithms. This is an extreme example of treating the language as a symbol
string. It requires no special machine except, of course, the symbol string manipulator.
(iii) A team at the IBM Laboratories in Vienna have published [12, 13] a description of PL/I
which is based on an earlier evaluating mechanism for pure λ-expressions suggested
by Landin [11] and the concept of a state vector for a machine suggested by McCarthy
[14]. This method requires a special ‘PL/I machine’ whose properties and transition
function are described. The whole description is very long and complex and it is hard
to determine how much of this complexity is due to the method of semantic description
and how much to the amorphous nature of PL/I.

The method suggested in Section 3.3 has more in common with the approach of Landin
or the IBM team than it has with de Bakker’s. It differs, however, in that the ultimative
machine required (and all methods of describing semantics come to a machine ultimately)
is in no way specialised. Its only requirement is that it should be able to evaluate pure
λ-expressions. It achieves this result by explicitly bringing in the store of the computer in
an abstract form, an operation which brings with it the unexpected bonus of being able to
distinguish explicitly between manifest and latent properties. However until the whole of a
real language has been described in these terms, it must remain as a proposal for a method,
rather than a method to be recommended.

Notes

1. This is the CPL notation for a conditional expression which is similar to that used by LISP. In ALGOL the
equivalent would be if a > b then j else k.
2. The ALGOL equivalent of this would have to be if a > b then j := i else k := i.
3. ALGOL 60 call by name Let f be an ALGOL procedure which calls a formal parameter x by name. Then a call
for f with an actual parameter expression ε will have the same effect as forming a parameterless procedure λ ().ε
and supplying this by value to a procedure f∗ which is derived from f by replacing every written occurrence
of x in the body of f by x(). The notation λ().ε denotes a parameterless procedure whose body is ε while
x() denotes its application (to a null parameter list).
4. The only elementary operator to which this rule does not already apply is exponentiation. Thus, for example,
if a and b are both integers ab will be an integer if b ≥ 0 and a real if b < 0. If a and b are reals, the type of ab
depends on the sign of a as well as that of b. In CPL this leads to a definition of a ↑ b which differs slightly in
its domain from ab .
5. By analogy with monadic, dyadic and polyadic for functions with one, two and many arguments. Functions
with no arguments will be known as anadic. Unfortunately there appears to be no suitable Greek prefix meaning
variable.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 49

References

1. Barron, D.W., Buxton, J.N., Hartley, D.F., Nixon, E., and Strachey, C. The main features of CPL. Comp. J.
6 (1963) 134–143.
2. Buxton, J.N., Gray, J.C., and Park, D. CPL elementary programming manual, Edition II. Technical Report,
Cambridge, 1966.
3. Strachey, C. (Ed.). CPL working papers. Technical Report, London and Cambridge Universities, 1966.
4. Quine, W.V. Word and Object. New York Technology Press and Wiley, 1960.
5. Schönfinkel, M. Über die Bausteine der mathematischen Logik. Math. Ann. 92 (1924) 305–316.
6. Landin, F.J. A formal description of ALGOL 60. In Formal Language Description Languages for Computer
Programming, T.B. Steel (Ed.). North Holland Publishing Company, Amsterdam, 1966, pp. 266–294.
7. Curry, H.B. and Feys, R. Combinatory Logic, Vol. 1, North Holland Publishing Company, Amsterdam, 1958.
8. Strachey, C. Towards a formal semantics. In Formal Language Description Languages for Computer Pro-
gramming T.B. Steel (Ed.). North Holland Publishing Company, Amsterdam, 1966, pp. 198–216.
9. Landin, P.J. A correspondence between ALGOL 60 and Church’s Lambda notation. Comm. ACM 8 (1965)
89–101, 158–165.
10. de Bakker, J.W. Mathematical Centre Tracts 16: Formal Definition of Programming Languages. Mathematisch
Centrum, Amsterdam, 1967.
11. Landin, P.J. The Mechanical Evaluation of Expressions. Comp. J. 6 (1964) 308–320.
12. PL/I—Definition Group of the Vienna Laboratory. Formal definition of PL/I. IBM Technical Report TR
25.071, 1966.
13. Alber, K. Syntactical description of PL/I text and its translation into abstract normal form. IBM Technical
Report TR 25.074, 1967.
14. McCarthy, J. Problems in the theory of computation. In Proc. IFIP Congress 1965, Vol. 1, W.A. Kalenich
(Ed.). Spartan Books, Washington, 1965, pp. 219–222.
Computational lambda-calculus and monads
Eugenio Moggi∗
Lab. for Found. of Comp. Sci.
University of Edinburgh
EH9 3JZ Edinburgh, UK
On leave from Univ. di Pisa

Abstract • The logical approach gives a class of possible


models for the language. Then the problem is to
The λ-calculus is considered an useful mathematical prove that two terms denotes the same object in
tool in the study of programming languages. However, all possible models.
if one uses βη-conversion to prove equivalence of pro-
grams, then a gross simplification1 is introduced. We The operational and denotational approaches give only
give a calculus based on a categorical semantics for a theory (the operational equivalence ≈ and the set T h
computations, which provides a correct basis for prov- of formulas valid in the intended model respectively),
ing equivalence of programs, independent from any and they (especially the operational approach) deal
specific computational model. with programming languages on a rather case-by-case
basis. On the other hand, the logical approach gives
a consequence relation ` (Ax ` A iff the formula A is
Introduction true in all models of the set of formulas Ax), which
This paper is about logics for reasoning about pro- can deal with different programming languages (e.g.
grams, in particular for proving equivalence of pro- functional, imperative, non-deterministic) in a rather
grams. Following a consolidated tradition in theoret- uniform way, by simply changing the set of axioms
ical computer science we identify programs with the Ax, and possibly extending the language with new
closed λ-terms, possibly containing extra constants, constants. Moreover, the relation ` is often semide-
corresponding to some features of the programming cidable, so it is possible to give a sound and complete
language under consideration. There are three ap- formal system for it, while T h and ≈ are semidecidable
proaches to proving equivalence of programs: only in oversimplified cases.
We do not take as a starting point for proving equiv-
• The operational approach starts from an oper- alence of programs the theory of βη-conversion, which
ational semantics, e.g. a partial function map- identifies the denotation of a program (procedure) of
ping every program (i.e. closed term) to its result- type A → B with a total function from A to B, since
ing value (if any), which induces a congruence re- this identification wipes out completely behaviours like
lation on open terms called operational equiva- non-termination, non-determinism or side-effects, that
lence (see e.g. [10]). Then the problem is to prove can be exhibited by real programs. Instead, we pro-
that two terms are operationally equivalent. ceed as follows:

1. We take category theory as a general theory of


• The denotational approach gives an interpreta-
functions and develop on top a categorical se-
tion of the (programming) language in a math-
mantics of computations based on monads.
ematical structure, the intended model. Then
the problem is to prove that two terms denote the 2. We consider how the categorical semantics should
same object in the intended model. be extended to interpret λ-calculus.
∗ Research partially supported by EEC Joint Collaboration At the end we get a formal system, the computational
Contract # ST2J-0374-C(EDB). lambda-calculus (λc -calculus for short), for proving
1 Programs are identified with total functions from values to equivalence of programs, which is sound and com-
values. plete w.r.t. the categorical semantics of computations.

1
.
The methodology outlined above is inspired by [13]2 , T and µ: T 2 → T are natural transformations and the
and it is followed in [11, 8] to obtain the λp -calculus. following equations hold:
The view that “category theory comes, logically, be-
• µT A ; µA = T (µA ); µA
fore the λ-calculus” led us to consider a categorical
semantics of computations first, rather than to mod- • ηT A ; µA = idT A = T (ηA ); µA
ify directly the rules of βη-conversion to get a correct
calculus. A computational model is a monad (T, η, µ) satis-
A type theoretic approach to partial functions and fying the mono requirement: ηA is a mono for every
computations is attempted in [1] by introducing a type A ∈ C.
constructor Ā, whose intuitive meaning is the set of There is an alternative description of a monad (see
computations of type A. Our categorical semantics is [7]), which is easier to justify computationally.
based on a similar idea. Constable and Smith, how-
ever, do not adequately capture the general axioms for Definition 1.2 A Kleisli triple over C is a triple
computations (as we do), since they lack a general no- (T, η, ∗ ), where T : Obj(C) → Obj(C), ηA : A → T A,
tion of model and rely instead on operational, domain- f ∗ : T A → T B for f : A → T B and the following equa-
and recursion-theoretic intuition. tions hold:

• ηA = idT A
1 A categorical semantics of • ηA ; f ∗ = f
computations • f ∗ ; g ∗ = (f ; g ∗ )∗
The basic idea behind the semantics of programs de- Every Kleisli triple (T, η, ∗ ) corresponds to a monad
scribed below is that a program denotes a morphism (T, η, µ) where T (f : A → B) = (f ; ηB )∗ and µA =
from A (the object of values of type A) to T B (the id∗T A .
object of computations of type B).
Intuitively ηA is the inclusion of values into compu-
This view of programs corresponds to call-by-value
tations and f ∗ is the extension of a function f from
parameter passing, but there is an alternative view of
values to computations to a function from computa-
“programs as functions from computations to compu-
tions to computations, which first evaluates a compu-
tations” corresponding to call-by-name (see [10]). In
tation and then applies f to the resulting value. The
any case, the real issue is that the notions of value and
equations for Kleisli triples say that programs form
computation should not be confused. By taking call-
a category, the Kleisli category CT , where the set
by-value we can stress better the importance of values.
CT (A, B) of morphisms from A to B is C(A, T B), the
Moreover, call-by-name can be more easily represented
identity over A is ηA and composition of f followed
in call-by-value than the other way around.
by g is f ; g ∗ . Although the mono requirement is very
There are many possible choices for T B correspond-
natural there are cases in which it seems appropriate
ing to different notions of computations, for instance
to drop it, for instance: it may not be satisfied by the
in the category of sets the set of partial computa-
monad of continuations.
tions (of type B) is the lifting B + {⊥} and the set of
Before going into more details we consider some ex-
non-deterministic computations is the powerset P(B).
amples of monads over the category of sets.
Rather than focus on specific notions of computations,
we will identify the general properties that the object Example 1.3 Non-deterministic computations:
T B of computations must have. The basic require-
• T ( ) is the covariant powerset functor, i.e. T (A) =
ment is that programs should form a category, and
P(A) and T (f )(X) is the image of X along f
the obvious choice for it is the Kleisli category for a
monad. • ηA (a) is the singleton {a}
Definition 1.1 A monad over a category C is a • µA (X) is the big union ∪X
.
triple (T, η, µ), where T : C → C is a functor, η: IdC →
Computations with side-effects:
2 “I am trying to find out where λ-calculus should come from,
• T ( ) is the functor ( × S)S , where S is a
and the fact that the notion of a cartesian closed category is a
nonempty set of stores. Intuitively a computa-
late developing one (Eilenberg & Kelly (1966)), is not relevant
to the argument: I shall try to explain in my own words in the tion takes a store and returns a value together
next section why we should look to it first”. with the modified store.
• ηA (a) is (λs: S.ha, si) • On top of the programming language we consider
equivalence and existence assertions (see Table 2).
• µA (f ) is (λs: S.eval(f s)), i.e. the computation
that given a store s, first computes the pair Remark 1.4 The let-constructor is very important se-
computation-store hf 0 , s0 i = f s and then returns mantically, since it corresponds to composition in the
the pair value-store ha, s00 i = f 0 s0 . Kleisli category CT . While substitution corresponds
to composition in C. In the λ-calculus (let x=e in e0 ) is
Continuations: usually treated as syntactic sugar for (λx.e0 )e, and this
( ) can be done also in the λc -calculus. However, we think
• T ( ) is the functor RR , where R is a nonempty
that this is not the right way to proceed, because it
set of results. Intuitively a computation takes a
amounts to understanding the let-constructor, which
continuation and returns a result.
makes sense in any computational model, in terms of
• ηA (a) is (λk: RA .ka) constructors that make sense only in λc -models. On
the other hand, (let x=e in e0 ) cannot be reduced to
A
• µA (f ) is (λk: RA .f (λh: RR .hk)) the more basic substitution (i.e. e0 [x: = e]) without
collapsing CT to C.
One can verify for himself that other notions of compu- The existence assertion e ↓ means that e denotes a
tation (e.g. partial, probabilistic or non-deterministic value and it generalizes the existence predicate used in
with side-effects) fit in the general definition of monad. the logic of partial terms/elements, for instance:
• a partial computation exists iff it terminates;
1.1 A simple language
• a non-deterministic computation exists iff it gives
We introduce a programming language (with existence
exactly one result;
and equivalence assertions), where programs denote
morphisms in the Kleisli category CT corresponding • a computation with side-effects exists iff it does
to a computational model (T, η, µ) over a category C. not change the store.
The language is oversimplified (for instance terms have
exactly one free variable) in order to define its inter-
pretation in any computational model. The additional 2 Extending the language
structure required to interpret λ-terms will be intro-
duced incrementally (see Section 2), after computa- In this section we describe the additional structure re-
tions have been understood and axiomatized in isola- quired to interpret λ-terms in a computational model.
tion. It is well-known that λ-terms can be interpreted in a
The programming language is parametric in a sig- cartesian closed categories (ccc), so one expects that
a monad over a ccc would suffice, however, there are
nature (i.e. a set of base types and unary command
symbols), therefore its interpretation in a computa- two problems:
tional model is parametric in an interpretation of the • the interpretation of (let x=e in e0 ), when e0 has
symbols in the signature. To stress the fact that the other free variables beside x, and
interpretation is in CT (rather than C), we use τ1 * τ2
(instead of τ1 → τ2 ) as arities and ≡ : τ (instead of • the interpretation of functional types.
= : T τ ) as equality of computations of type τ .
Example 2.1 To show why the interpretation of the
• Given an interpretation [[A]] for any base type A, let-constructor is problematic, we try to interpret
i.e. an object of CT , then the interpretation of a x1 : τ1 ` (let x2 =e2 in e): τ , when both x1 and x2 are
type τ : : = A | T τ is an object [[τ ]] of CT defined free in e. Suppose that g2 : τ1 → T τ2 and g: τ1 ×
in the obvious way, [[T τ ]] = T [[τ ]]. τ2 → T τ are the interpretations of x1 : τ1 ` e2 : τ2
and x1 : τ1 , x2 : τ2 ` e: τ respectively. If T were IdC ,
• Given an interpretation [[p]] for any unary com- then [[x1 : τ1 ` (let x2 =e2 in e): τ ]] would be hidτ1 , g2 i; g.
mand p of arity τ1 * τ2 , i.e. a morphism from In the general case, Table 1 says that ; above is
[[τ1 ]] to [[τ2 ]] in CT , then the interpretation of a indeed composition in the Kleisli category, therefore
well-formed program x: τ ` e: τ 0 is a morphism hidτ1 , g2 i; g becomes hidτ1 , g2 i; g ∗ . But in hidτ1 , g2 i; g ∗
[[x: τ ` e: τ 0 ]] in CT from [[τ ]] to [[τ 0 ]] defined by there is a type mismatch, since the codomain of
induction on the derivation of x: τ ` e: τ 0 (see Ta- hidτ1 , g2 i is τ1 × T τ2 , while the domain of T g is
ble 1). T (τ1 × τ2 ).
The problem is that the monad and cartesian prod- where a one-one correspondence is established between
ucts alone do not give us the ability to transform a functorial and tensorial strengths 3 :
pair value-computation (or computation-computation)
into a computation of a pair. What is needed is • the first two equations say that t is a tensorial
a morphism tA,B from A × T B to T (A × B), so strength of T , so that T is a C-enriched functor.
that x1 : τ1 ` (let x2 =e2 in e): T τ will be interpreted by • the last two equations say that η and µ are natu-
hidτ1 , g2 i; tτ1 ,τ2 ; g ∗ . ral transformations between C-enriched functors,
Similarly for interpreting x: τ ` p(e1 , e2 ): τ 0 , we need . .
namely η: IdC → T and µ: T 2 → T .
a morphism ψA,B : T A × T B → T (A × B), which given
a pair of computations returns a computation com- So a strong monad is just a monad over C enriched
puting a pair, so that, when gi : τ → T τi is the inter- over itself in the 2-category of C-enriched categories.
pretation of x: τ ` ei : τi , then [[x: τ ` p(e1 , e2 ): τ 0 ]] is The second explanation was suggested to us by G.
hg1 , g2 i; ψτ1 ,τ2 ; [[p]]∗ . Plotkin, and takes as fundamental structure a class D
of display maps over C, which models dependent types
Definition 2.2 A strong monad over a category C (see [2]), and induces a C-indexed category C/D . Then
with finite products is a monad (T, η, µ) together with a a strong monad over a category C with finite products
natural transformation tA,B from A × T B to T (A × B) amounts to a monad over C/D in the 2-category of
s.t. C-indexed categories, where D is the class of first pro-
t1,A ; T (rA ) = rT A jections (corresponding to constant type dependency).
tA×B,C ; T (αA,B,C ) = αA,B,T C ; (idA × tB,C ); tA,B×C In general the natural transformation t has to be
(idA × ηB ); tA,B = ηA×B given as an extra parameter for models. However, t
is uniquely determined (but it may not exists) by T
(idA × µB ); tA,B = tA,T B ; T (tA,B ); µA×B
and the cartesian structure on C, when C has enough
where r and α are the natural isomorphisms points.
• rA : 1 × A → A Proposition 2.4 If (T, η, µ) is a monad over a cat-
egory C with finite products and enough points (i.e.
• αA,B,C : (A × B) × C → A × (B × C) for any f, g: A → B if h; f = h; g for every points
Remark 2.3 The natural transformation t with the h: 1 → A, then f = g), and tA,B is a family of mor-
above properties is not the result of some ad hoc con- phisms s.t. for all points a: 1 → A and b: 1 → T B
siderations, instead it can be obtained via the following ha, bi; tA,B = b; T (h!B ; a, idB i)
general principle:
where !B is the unique morphism from B to the ter-
when interpreting a complex language the 2-
minal object 1, then (T, η, µ, t) is a strong monad over
category Cat of small categories, functors
C.
and natural transformations may not be ad-
equate and one may have to use a different Remark 2.5 The tensorial strength t induces a natu-
2-category which captures better some funda- ral transformation ψA,B from T A × T B to T (A × B),
mental structures underlying the language. namely
Since monads and adjunctions are 2-category concepts,
ψA,B = cT A,T B ; tT B,A ; (cT B,A ; tA,B )∗
the most natural way to model computations (and
datatypes) for more complex languages is simply by where c is the natural isomorphism
monads (and adjunctions) in a suitable 2-category.
Following this general principle we can give two ex- • cA,B : A × B → B × A
planations for t, one based on enriched categories (see
The morphism ψA,B has the correct domain and
[4]) and the other on indexed categories (see [3]).
codomain to interpret the pairing of a computation of
The first explanation takes as fundamental a com-
type A with one of type B (obtained by first evaluating
mutative monoidal structure on C, which models the
the first argument and then the second). There is also
tensor product of linear logic (see [6, 14]). If C is a
monoidal closed category, in particular a ccc, then it 3 A functorial strength for an endofunctor T is a natural
can be enriched over itself by taking C(A, B) to be transformation stA,B : B A → (T B)T A which internalizes the
the object B A . The equations for t are taken from [5], action of T on morphisms.
a dual notion of pairing, ψ̃A,B = cA,B ; ψB,A ; T cB,A We claim that the formal system is sound and com-
(see [5]), which amounts to first evaluating the second plete w.r.t. interpretation in λc -models. Soundness
argument and then the first. amounts to showing that the inference rules are admis-
sible in any λc -model, while completeness amounts to
The reason why a functional type A → B in a pro- showing that any λc -theory has an initial model (given
gramming language (like ML) cannot be interpreted by a term-model construction). The inference rules of
by the exponential B A (as done in a ccc) is fairly ob- the λc -calculus are partitioned as follows:
vious; in fact the application of a functional procedure
to an argument requires some computation to be per- • general rules for terms denoting computations,
formed before producing a result. By analogy with but with variables ranging over values (see Ta-
partial cartesian closed categories (see [8, 11]), we will ble 4)5
interpret functional types by exponentials of the form
A • the inference rules for let-constructor and types of
(T B) .
computations (see Table 5)
Definition 2.6 A λc -model over a category C with
• the inference rules for product and functional
finite products is a strong monad (T, η, µ, t) together
types (see Table 6)
with a T -exponential for every pair hA, Bi of objects
in C, i.e. a pair
Remark 3.1 A comparison among λc -, λv - and λp -
A A
h(T B) , evalA,T B : ((T B) × A) → T Bi calculus shows that:

• the λv -calculus proves less equivalences between


satisfying the universal property that for any object C
λ-terms, e.g. (λx.x)(yz) ≡ (yz) is provable in the
and f : (C × A) → T B there exists a unique h: C →
A λc - but not in the λv -calculus
(T B) , denoted by ΛA,T B,C (f ), s.t.
• the λp -calculus proves more equivalences between
f = (ΛA,T B,C (f ) × idA ); evalA,T B λ-terms, e.g. (λx.yz)(yz) ≡ (yz) is provable in the
A
λp - but not in the λc -calculus, because y can be
Like p-exponentials, a T -exponential (T B) can be a procedure, which modifies the store (e.g. by in-
equivalently defined by giving a natural isomorphism creasing the value contained in a local static vari-
CT (C × A, B) ∼
A
= C(C, (T B) ), where C varies over C. able) each time it is executed.
The programming language introduced in Sec-
tion 1.1 and its interpretation can be extended accord- • a λ-term e has a value in the λc -calculus, i.e.
ing to the additional structure available in a λc -model e is provably equivalent to some value (either a
as follows: variable or a λ-abstraction), iff e has a value in
the λv -calculus/λp -calculus. So all three calculi
• there is a new type 1, interpreted by the terminal are correct w.r.t. call-by-value operational equiv-
object of C, and two new type constructors τ1 × τ2 alence.
and τ1 * τ2 interpreted by the product [[τ1 ]]×[[τ2 ]]
[[τ ]]
and the T -exponent (T [[τ2 ]]) 1 respectively
Conclusions and further research
• the interpretation of a well-formed program Γ ` The main contribution of this paper is the category-
e: τ , where Γ is a sequence x1 : τ1 , . . . , xn : τn , is a theoretic semantics of computations and the general
morphism in CT from [[Γ]] (i.e. [[τ1 ]] × . . . × [[τn ]]) to principle for extending it to more complex languages
[[τ ]] (see Table 3)4 . (see Remark 2.3), while the λc -calculus is a straightfor-
ward fallout, which is easier to understand and relate
to other calculi.
3 The λc -calculus This semantics of computations corroborates the
In this section we introduce a formal system, the λc - view that (constructive) proofs and programs are
calculus, with two basic judgements: existence (Γ `
5 The general rules of sequent calculus, more precisely those
e ↓ τ ) and equivalence (Γ ` e1 ≡ e2 : τ ).
for substitution and quantifiers, have to be modified slightly,
because variables range over values and types can be empty.
4 In a language with products nonunary commands can be These modifications are similar to those introduced in the logic
treated as unary commands from a product type. of partial terms (see Section 2.4 in [9]).
rather unrelated, although both of them can be un- [6] Y. Lafont. The linear abstract machine. Theoret-
derstood in terms of functions. For instance, vari- ical Computer Science, 59, 1988.
ous logical modalities (like possibility and necessity in
modal logic or why not and of course of linear logic) are [7] E. Manes. Algebraic Theories, volume 26 of Grad-
modelled by monads or comonads which cannot have a uate Texts in Mathematics. Springer Verlag, 1976.
tensorial strength. In general, one should expect types [8] E. Moggi. Categories of partial morphisms and
suggested by logic to provide a more fine-grained type the partial lambda-calculus. In Proceedings Work-
system without changing the nature of computations. shop on Category Theory and Computer Pro-
Our work is just an example of what can be achieved gramming, Guildford 1985, volume 240 of Lec-
in the study of programming languages by using a ture Notes in Computer Science. Springer Verlag,
category-theoretic methodology, which free us from 1986.
the irrelevant detail of syntax and focus our mind on
the important structures underlying programming lan- [9] E. Moggi. The Partial Lambda-Calculus. PhD
guages. We believe that there is a great potential to be thesis, University of Edinburgh, 1988.
exploited here. The λc -calculus open also the possibil-
ity to develop a new Logic of Computable Functions [10] G.D. Plotkin. Call-by-name, call-by-value and
(see [12]), based on an abstract semantic of compu- the λ-calculus. Theoretical Computer Science, 1,
tations rather than domain theory, for studying ax- 1975.
iomatically different notions of computation and their [11] G. Rosolini. Continuity and Effectiveness in
relations. Topoi. PhD thesis, University of Oxford, 1986.
[12] D.S. Scott. A type-theoretic alternative to CUCH,
Acknowledgements
ISWIM, OWHY. Oxford notes, 1969.
My thanks to M. Hyland, A. Kock (and other partic-
ipants to the 1988 Category Theory Meeting in Sus- [13] D.S. Scott. Relating theories of the λ-calculus. In
sex) for directing me towards the literature on monads. R. Hindley and J. Seldin, editors, To H.B. Curry:
Discussions with R. Amadio, R. Burstall, J.Y. Girard, essays in Combinarory Logic, lambda calculus and
R. Harper, F. Honsell, Y. Lafont, G. Longo, R. Milner, Formalisms. Academic Press, 1980.
G. Plotkin provided useful criticisms and suggestions. [14] R.A.G. Seely. Linear logic, ∗-autonomous cate-
Thanks also to M. Tofte and P. Taylor for suggesting gories and cofree coalgebras. In Proc. AMS Conf.
improvements to an early draft. on Categories in Comp. Sci. and Logic (Boulder
1987), 1987.
References
[1] R.L. Constable and S.F. Smith. Partial objects
in constructive type theory. In 2nd LICS Conf.
IEEE, 1987.

[2] J.M.E. Hyland and A.M. Pitts. The theory of


constructions: Categorical semantics and topos-
theoretic models. In Proc. AMS Conf. on Cate-
gories in Comp. Sci. and Logic (Boulder 1987),
1987.

[3] P.T. Johnstone and R. Pare, editors. Indexed Cat-


egories and their Applications, volume 661 of Lec-
ture Notes in Mathematics. Springer Verlag, 1978.

[4] G.M. Kelly. Basic Concepts of Enriched Category


Theory. Cambridge University Press, 1982.

[5] A. Kock. Strong functors and monoidal monads.


Archiv der Mathematik, 23, 1972.
RULE SYNTAX SEMANTICS
var
x: τ ` x: τ = η[[τ ]]

let
x: τ ` e1 : τ1 = g1
x1 : τ 1 ` e 2 : τ 2 = g2
x: τ ` (let x1 =e1 in e2 ): τ2 = g1 ; g2∗

p: τ1 * τ2
x: τ ` e1 : τ1 = g1
x: τ ` p(e1 ): τ2 = g 1 ; p∗

[]
x: τ ` e: τ 0 = g
x: τ ` [e]: T τ 0 = g; ηT [[τ 0 ]]

µ
x: τ ` e: T τ 0 = g
x: τ ` µ(e): τ 0 = g; µ[[τ 0]]

Table 1: Programs and their interpretation

RULE SYNTAX SEMANTICS


eq
x: τ1 ` e1 : τ2 = g1
x: τ1 ` e2 : τ2 = g2
x: τ1 ` e1 ≡ e2 : τ2 ⇐⇒ g1 = g2

ex
x: τ1 ` e: τ2 = g
x: τ1 ` e ↓ τ2 ⇐⇒ g factors through η[[τ2 ]]

i.e. there exists (unique) h s.t. g = h; η[[τ2 ]]

Table 2: Atomic assertions and their interpretation


RULE SYNTAX SEMANTICS
var
x1 : τ 1 , . . . , x n : τ n ` x i : τ i = πin ; η[[τi ]]

let
Γ ` e 1 : τ1 = g1
Γ, x1 : τ1 ` e2 : τ2 = g2
Γ ` (let x1 =e1 in e2 ): τ2 = hid[[Γ]], g1 i; t[[Γ]],[[τ1]] ; g2∗


Γ ` ∗: 1 = ![[Γ]] ; η1

hi
Γ ` e 1 : τ1 = g1
Γ ` e 2 : τ2 = g2
Γ ` he1 , e2 i: τ1 × τ2 = hg1 , g2 i; ψ[[τ1 ]],[[τ2]]

πi
Γ ` e: τ1 × τ2 = g
Γ ` πi (e): τ1 = g; T (πi )

λ
Γ, x1 : τ1 ` e2 : τ2 = g
Γ ` (λx1 : τ1 .e2 ): τ1 * τ2 = Λ[[τ1 ]],T [[τ2 ]],[[Γ]](g); η[[τ1 *τ2 ]]

app
Γ ` e 1 : τ1 = g1
Γ ` e: τ1 * τ2 = g
Γ ` e(e1 ): τ2 = hg, g1 i; ψ(T [[τ2 ]])[[τ1 ]] ,[[τ1 ]] ; (eval[[τ1 ]],T [[τ2]] )∗

Table 3: Interpretation in a λc -model

We write [x: = e] for the substitution of x with e in .

E.x Γ ` x ↓ τ
Γ`e↓τ Γ, x: τ ` A
subst
Γ ` A[x: = e]
≡ is an congruence relation

Table 4: General rules


We write (let x=e in e) for (let x1 =e1 in (. . . (let xn =en in e) . . .)), where n is the lenght of the sequence x (and e).
In particular, (let ∅=∅ in e) stands for e.

unit Γ ` (let x=e in x) ≡ e: τ

ass Γ ` (let x2 =(let x1 =e1 in e2 ) in e) ≡ (let x1 =e1 in (let x2 =e2 in e)): τ x1 6∈ FV(e)
let.β Γ ` (let x1 =x2 in e) ≡ e[x1 : = x2 ]: τ
let.p Γ ` p(e) ≡ (let x=e in p(x)): τ

E.[ ] Γ ` [e] ↓ T τ

T.β Γ ` µ([e]) ≡ e: τ
T.η Γ ` [µ(x)] ≡ x: T τ

Table 5: rules for let and computational types

E.∗ Γ ` ∗ ↓ 1

1.η Γ ` ∗ ≡ x: 1

E.h i Γ ` hx1 , x2 i ↓ τ1 × τ2

let.h i Γ ` he1 , e2 i ≡ (let x1 , x2 =e1 , e2 in hx1 , x2 i): τ1 × τ2


E.πi Γ ` πi (x) ↓ τi
×.β Γ ` πi (hx1 , x2 i) ≡ xi : τi
×.η Γ ` hπ1 (x), π2 (x)i ≡ x: τ1 × τ2

E.λ Γ ` (λx: τ1 .e) ↓ τ1 * τ2

β Γ ` (λx1 : τ1 .e2 )(x1 ) ≡ e2 : τ2


η Γ ` (λx1 : τ1 .x(x1 )) ≡ x: τ1 * τ2

Table 6: rules for product and functional types

You might also like