Classic Papers in Programming Languages and Logic
Classic Papers in Programming Languages and Logic
These papers provide a breadth of information about Programming Languages and Logic that is generally useful in
The 1977 A C M Turing Award was presented to John Backus putations called Fortran. This same group designed the first
at the A C M Annual Conference in Seattle, October 17. In intro- system to translate Fortran programs into machine language.
ducing the recipient, Jean E. Sammet, Chairman of the Awards They employed novel optimizing techniques to generate fast
Committee, made the following comments and read a portion of machine-language programs. Many other compilers for the lan-
the final citation. The full announcement is in the September guage were developed, first on IBM machines, and later on virtu-
1977 issue of Communications, page 681. ally every make of computer. Fortran was adopted as a U.S.
"Probably there is nobody in the room who has not heard of national standard in 1966.
Fortran and most of you have probably used it at least once, or at During the latter part of the 1950s, Backus served on the
least looked over the shoulder of someone who was writing a For. international committees which developed Algol 58 and a later
tran program. There are probably almost as many people who version, Algol 60. The language Algol, and its derivative com-
have heard the letters BNF but don't necessarily know what they pilers, received broad acceptance in Europe as a means for de-
stand for. Well, the B is for Backus, and the other letters are veloping programs and as a formal means of publishing the
explained in the formal citation. These two contributions, in my algorithms on which the programs are based.
opinion, are among the half dozen most important technical In 1959, Backus presented a paper at the UNESCO confer-
contributions to the computer field and both were made by John ence in Paris on the syntax and semantics of a proposed inter-
Backus (which in the Fortran case also involved some col- national algebraic language. In this paper, he was the first to
leagues). It is for these contributions that he is receiving this employ a formal technique for specifying the syntax of program-
year's Turing award. ming languages. The formal notation became known as B N F -
The short form of his citation is for 'profound, influential, standing for "Backus N o r m a l Form," or "Backus Naur F o r m " to
and lasting contributions to the design of practical high-level recognize the further contributions by Peter Naur of Denmark.
programming systems, notably through his work on Fortran, and Thus, Backus has contributed strongly both to the pragmatic
for seminal publication of formal procedures for the specifica- world of problem-solving on computers and to the theoretical
tions of programming languages.' world existing at the interface between artificial languages and
The most significant part of the full citation is as follows: computational linguistics. Fortran remains one of the most
' . . . Backus headed a small IBM group in New York City widely used programming languages in the world. Almost all
during the early 1950s. The earliest product of this group's programming languages are now described with some type of
efforts was a high-level language for scientific and technical corn- formal syntactic definition.' "
a p n d l o [fog, afoh] ~ a f o a p n d l o [g,h] O u r p r o o f will take the f o r m o f showing that the follow-
PROOF. W e show that, for every object x, both o f the ing function, R,
a b o v e functions yield the same result. Def R m null o 1 --~ 6;
CASE 1. h:x is neither a sequence nor q,. apndlo[aIpodistlo[l o 1, 2], MM'o[tlo 1, 2]]
T h e n both sides yield ± w h e n applied to x.
CASE 2. h:x = ~. T h e n is, for all pairs <x,y>, the same function as M M ' . R
apndlo[fog, afoh]: x "multiplies" two matrices, w h e n the first has m o r e t h a n
zero rows, by c o m p u t i n g the first row o f the " p r o d u c t "
= apndl: <fog:x, ~ > = <f.'(g:x)>
afoapndlo[g,h ]: x (with aIpodistlo[lo 1, 2]) and adjoining it to the "prod-
uct" o f the tail o f the first matrix and the second matrix.
= afoapndl: <g:x, if> = af.'<g:x>
T h u s the t h e o r e m we want is
= <f.'(g:x)>
Abstract data type declarations appear in typed programming languages like Ada, Alphard, CLU and
ML. This form of declaration binds a list of identifiers to a type with associated operations, a
composite “value” we call a data algebra. We use a second-order typed lambda calculus SOL to show
how data algebras may be given types, passed as parameters, and returned as results of function calls.
In the process, we discuss the semantics of abstract data type declarations and review a connection
between typed programming languages and constructive logic.
Categories and Subject Descriptors: D.3 [Software]: Programming Languages; D.3.2 [Program-
ming Languages]: Language Classifications-applicative languages; D.3.3 [Programming Lan-
guages]: Language Constructs--abstract data types; F.3 [Theory of Conmputation]: Logics and
Meanings of Programs; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming
Languages-denotational semantics, operational semantics; F.3.3 [Logics and Meanings of Pro-
grams]: Studies of Program Constructs-type structure
General Terms: Languages, Theory, Verification
Additional Key Words and Phrases: Abstract data types, lambda calculus, polymorphism, program-
ming languages, types
1. INTRODUCTION
Ada packages [17], Alphard forms [66, 711, CLU clusters [41, 421, and abstype
declarations in ML [23] all bind identifiers to values. Although there are minor
variations among these constructs, each allows a list of names to be bound to a
composite value consisting of “private” type and one or more operations. For
example, the ML declaration
abstype complex = real # real
with create = . . .
and pius = . . .
and re = . . .
andim= ..-
An earlier version of this paper appeared in the Proceedings of the 22thACM Symposium on Principles
of Programming Languages (New Orleans, La., Jan. 14-16). ACM, New York, 1985.
Authors’ addresses: J. C. Mitchell, Department of Computer Science, Stanford University, Stanford,
CA 94305; G. D. Plotkin, Department of Computer Science, University of Edinburgh, Edinburgh,
Scotland EH9 352.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
0 1988 ACM 0164-0925/88/0700-0470 $01.50
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988, Pages 470-502.
Abstract Types Have Existential Type l 471
binds the identifiers complex, create, plus, re, and im to the components of an
implementation of complex numbers. The implementation consists of the collec-
tion defined by the ML expression real # real, meaning the type of pairs of
reals, and the functions denoted by the code for create, plus, and so on. An
important aspect of this construct is that access to the representation is limited.
We cannot apply arbitrary operations on pairs of reals to elements of type
complex; only the explicitly declared operations may be used.
We will call a composite value constructed from a set and one or more
operations, packaged up in a way that limits access, a data algebra. We will
discuss the typing rules associated with the formation and the use of data algebras
and observe that data algebras themselves may be given types in a straightforward
manner. This will allow us to devise a typed programming notation in which
implementations of abstract data types may be passed as parameters or returned
as the results of function calls.
The phrase “abstract data type” sometimes refers to a class of algebras (or
perhaps an initial algebra) satisfying some specification. For example, the ab-
stract type stack is sometimes regarded as the class of all algebras satisfying the
familiar logical formulas axiomatizing push and pop. Associated with this view is
the tenet that a program must rely only on the data type specification, as opposed
to properties of a particular implementation. Although this is a valuable guiding
principle, most programming languages do not contain assertions or their proofs,
and without this information it is impossible for a compiler to guarantee that a
program depends only on a data type specification. Since we are primarily
concerned with properties of the abstract data type declarations used in common
programming languages, we will focus on the limited form of information hiding
or “abstraction” provided by conventional type checking rules.
We can be more specific about how data algebras are defined by considering
the declaration of complex numbers in more detail. Using an explicitly typed
ML-like notation, the declaration sketched earlier looks something like this:
abstype complex = real # real
with create: real + real + complex = Xx: real. Xy: real. ( X, y )
and plus: complex --, complex =
Xz:real # real. Xw:real # real. ( fst(z) + fst(w), snd(z) + snd(w))
and re: complex + real = Xz:real # real.fst(z)
and im: complex + real = Xz:real # real. snd(z)
The identifiers complex, create, plus, re, and im are bound to a data algebra whose
elements are represented as pairs of reals, as specified by the type expression
real # real. The operations of the data algebra are given by the function expres-
sions to the right of the equals signs1 Notice that the declared types of the
operations differ from the types of the implementing functions. For example, re
is declared to have type complex + real, but the implementing expression has
type real # real + real. This is because operations are defined using the concrete
representation of values, but the representation is hidden outside the declaration.
In the next section, we will discuss the type checking rules associated with
abstract data type declarations, which are designed to make complex numbers
1 In most programming languages, function definitions have the form “create(x:real, y:real) = . . .”
In the example above, we have used explicit lambda abstraction to move the formal parameters from
the left- to the right-hand sides of the equals signs.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
472 9 J. C. Mitchell and G. D. Plotkin
“abstract” outside the data algebra definition. In the process, we will give types
to data algebras. These will be existential types, which were originally developed
in constructive logic and are closely related to infinite sums (as in category
theory, for example). In Section 3, we describe a statically typed language SOL.
This language is a notational variant of Girard’s system F, developed in the
analysis of constructive logic [21,22], and an extension of Reynolds’ polymorphic
lambda calculus [62]. An operational semantics of SOL, based on the work of
Girard and Reynolds, is presented using reduction rules. However, we do not
address a variety of practical implementation issues. Although the basic calculus
we use has been known for some time, we believe that the analysis of data
abstraction using existential types originates with this paper. (A preliminary
version appeared as [56].)
The use of SOL as a proof-theoretic tool is based on an analogy between types
and constructive logic. This analogy gives rise to a large family of typed languages
and suggests that our analysis of abstract data types applies to more expressive
languages involving specifications. Since the connection between constructive
proofs and typed programs does not seem to be well known in the programming
language community (at least at present), our brief discussion of specifications
will follow a review of the general analogy in Section 4. Additional SOL program-
ming examples are given in Section 5.
The design of SOL suggests new programming languages along the lines of
Ada, Alphard, CLU, and ML but with richer and more flexible type structures.
In addition, SOL seems to be a natural “kernel language” for studying the
semantics of languages with polymorphic functions and abstract data type
declarations. For this reason, we expect SOL to be useful in future studies of
current languages. It is clear that SOL provides greater flexibility in the use of
abstract data types than previous languages, since data algebras may be passed
as parameters and returned as results. We believe that this is accomplished
without any compromise in “type security.” However, since we do not have a
precise characterization of type security, we are unable to show rigorously that
SOL is secure.’
Some languages that are similar to SOL in scope and intent are Pebble [7],
designed to capture some essential features of Cedar (an extension of Mesa [57]),
and Kernel Russell, KR, of [28], based on Russell [14, 15, 161. Martin-Lof’s
constructive type theory [46] and the calculus of constructions [ll] are farther
from programming language syntax but share many properties of SOL. Some
features of Martin-Lof’s system have been incorporated into the Standard ML
module design [44, 541, which was formulated after the work described here was
completed. We will compare SOL with some of these languages in Section 3.8.
’ Research begun after this paper was written has shed some light on the type security of SOL. See
[52] and [55] for further discussion.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 473
common languages, there is one novel aspect that leads to additional flexibility:
We separate the names bound by a declaration from the data algebra they come
to denote. For example, the complex number example is written as follows:
abstype complex with
create: real --, real * complex,
plus: complex + complex,
re: complex + real,
im: complex + real
is
pack real A real
Xx: real.Xy:real. (1c,y )
Xz:real A real.hw:real A real.(fst(z) + fst(lo), snd(z) + snd(w))
Xzreal A real.fst(z)
Xz:real A real.snd(z) to 3 t.[ (real + real --$ t) A (t + t) A (t + real) A (t + real)],
where the expression beginning pack and running to the end of the example is
considered to be the definition of the data algebra. (In SOL, we write real A real
for the type of pairs of reals. When parentheses are omitted, the connective A
has higher precedence than +-.) This syntax is designed to allow implementations
of abstract data types (data algebras) to be defined using expressions of any form
and to emphasize the view that abstract data type declarations commonly
combine two separable actions, defining a data algebra and binding identifiers to
its components.
The SOL declaration of an abstract data type t with operations x1, . . . , X, has
the general form
abstype t with x1: u,, . . . , x,: CT,,is M in N,
where ul, . . . . a, are the types of the operations and M is a data algebra
expression. As in the complex number example above, the type identifier t often
appears in the types of operations x1, . . . , x,. The scope of the declaration is N.
The simplest data algebra expressions in SOL are those of the form
pack TM, . -. M,, to 3t.u
involving a basic data algebra expression only makes sense if k = n (so that each
operation gets an implementation) and the types of Ml, . . . , Mk match the
declared types of the operations x1, . . . , x, in some appropriate way. The matching
rule in SOL is that the type of Mi must be [T/t ]~i, the result of substituting T for
tin ui (with appropriate renaming of bound type variables in ui). To see how this
works in practice, look back at the complex number declaration. The declared
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
474 - J. C. Mitchell and G. D. Plotkin
type of the first operation create is real + real + complex, whereas the type of
the implementing function expression is real + real + (real A real). The matching
rule is satisfied in this case because the type of the implementing code may be
obtained by substituting real A real for complex in the declared type real + real
+ complex.
We can recast the matching rule using the existential types we have associated
with data algebra expressions. An appropriate type for a data algebra is an
expression that specifies how the operations may be used, without describing the
type used to represent values. If each Mi has type [~/t]ui, then we say that
pack 7M1 . . . M,,to 3t.al A . . . A u,
has type 3t.ul A . . . A un. This type may be read “there exists a type t with
operations of types u1 and . . . and bn.” The operator 3 binds the type variable t
in 3 t.a, so 3 t.u = 3 s.[s/t ]a when s does not occur in u. Existential types provide
just enough information to verify the matching condition stated above, without
providing any information about the representation of the carrier or the algo-
rithms used to implement the operations. The matching rule for abstype may
now be stated.
(AB.l) In abstype t with x1: ul, . . . , x,: a, is M in N, the data algebra
expression M must have type 3 t.ul A . - - A a,.
Although it may seem unnecessarily verbose to write the type of pack - . . to
. . . as part of the expression, this is needed to guarantee that the type is unique.
Without the type designation, an expression like pack TM could have many
types. For example, if the type of M is 7 + 7, then pack TM might have types
3 t.t -+ t, 3 t.t += 7, 3 t.7 -+ t, and 3 t.7 + 7. To avoid this, we have included the
intended type of the whole expression as part of the syntax. Something equivalent
to this is done in most other languages. In CLU, for example, types are determined
using the keyword cvt, which specifies which occurrences of the representation
type are to be viewed as abstract. ML, as documented in [23], uses keywords abs
and rep, whereas later versions [50] use type constructors and pattern matching.
An important constraint in abstract type declarations is that only the explicitly
declared operations may be applied to elements of the type [58]. In SOL, this
constraint is formulated as follows:
(AB.2) In abstype t with x1: ul,. . . , x ,, : u,, is M in N, if y is any free identifier
in N different from x1, . . . , x,, then t must not appear free in the type of y.
In addition to accomplishing the goals put forth in [58], this condition is easily
seen to be a natural scoping rule for type identifiers. We can see why (AB.2)
makes sense and what kind of expressions it prevents by considering the following
example.
let f = Xx: stack . . . in
abstype stack with empty : stack,
push : int A stack + stack,
pop : stack + int A stack
is . . .
in f (empty)
end
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988
Abstract Types Have Existential Type 475
and
pack pP, a-. P, to 3 t.o
is a data algebra expression of SOL with type 3 t.a. Conditional algebra expres-
sions are useful for selecting between several alternative implementations of the
same abstract type. For example, a program using matrices may choose between
sparse or dense matrix implementations using a conditional data algebra expres-
sion inside an abstype declaration. Without (AB.3), the type of an abstype
expression with a data algebra conditional such as
abstype t with x1: t, . . . , xn: u,
isifBthen(pack7M1 e-e M,toilt.(r)
else (pack pP1 . . . P, to 3 t.u)
in x1
may depend on whether the conditional test is true or false. (Specifically, the
meaning of the expression above is either iVll or PI, depending on B). Thus,
without (AB.3), we cannot type check expressions with conditional data algebra
expressions at “compile time,” that is, without computing the values of arbitrary
tests.
Another way of describing this situation is to consider the form of type
expression we would need if we wanted to give the expression above a type
without evaluating B. Since the type of the expression actually depends on the
value of B, we would have to mention B in the type. This approach is used in
some languages (notably Martin-Lof’s intuitionistic type theory), but it intro-
duces ordinary value expressions into types. Consequently, type equality depends
on equality of ordinary expressions. Some of the simplicity of SOL is due to the
separation of type expressions from “ordinary” expressions, and considerable
complication would arise from giving this up.
Finally, the termination of all recursion-free programs seems to fail if we drop
(AB.3). In other words, there is a roundabout way of writing programs that do
not halt on any input, without using any recursive declarations or iterative
constructs. This is a complex issue whose full explanation is beyond the scope of
this paper. The reader is referred to [lo], [29], [49], and [54] for further discussion.
Putting all of these reasons together, it seems that dropping (AB.3) would change
the nature of SOL quite drastically. Therefore, we leave the study of abstype
without (AB.3) to future research.
With rule (AB.3) in place, we can allow very general computation with data
algebras. In addition to conditional data algebra expressions, SOL allows data
algebra parameters. An example that illustrates their use is the general tree
search routine given in Section 2.5. The usual algorithms for depth-first search
and breadth-first search may be written so that they are virtually identical,
except that depth-first search uses a stack and breadth-first search uses a queue.
The general tree-search algorithm in Section 2.6 is based on this idea, using a
formal parameter in place of a stack or queue. If a stack data algebra is supplied
as an actual parameter, then the algorithm performs depth-first search. Similarly
a queue parameter produces breadth-first search. Additional structures like
priority queues may also be passed as actual parameters, resulting in “best-first”
search algorithms.
Data algebra parameters are allowed in SOL simply because the typing rules
do not prevent them. If z is a variable with type 3 t.al A . . . A IS,,,then
abstype t with x1: (Jo, . . . , x,: U, is z in N
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 477
the typing rules using type assignments, which are functions from ordinary
variables to type expressions. For each type assignment A, we define a partial
function TypeA from expressions to types. Intuitively, TypeA (M) = u means that
the type of M is u, given the assignment A of types to variables that may appear
free in M. Each partial function Type* is defined by a set of deduction rules of
the form
TypeA = u, . . .
Typea = T
meaning that if the antecedents hold, then the value of TypeA at N is defined to
be 7. The conditions on TypeA may mention other type assignments if N
binds variables that occur in subterms.
A variable of any type is a term. Formally, we have the axiom
TyPeA = A(x)
saying that a variable x has whatever type it is given. We also allow term
constants, provided that each constant is assigned a type that does not contain
free type variables. One particularly useful constant is the polymorphic condi-
tional cond, which will be discussed after V-types are introduced.
‘I’nx~[,:,~(M) = 7
TypeA(Xx:a.M) = u+ 7
and
TypeA = (T + T, TypeA = u
TypeA = 7
Thus a typed lambda expression has a functional type and may be applied to any
argument of the correct type. An example function expression is the lambda
expression
Xx:int. x + 1
for the successor function on integers.
The semantics of SOL is described using a set of operational reduction rules.
The reduction rules use substitution, and, therefore, require the ability to rename
bound variables. For functions, we rename bound variables according to the
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1933.
Abstract Types Have Existential Type 479
equational axiom
xx : u.M = xy : u.[ y/x]M, y not free in M
The operational semantics of function definition and application are captured by
the reduction rule
(Xx : a.M)N + [N/x]M,
where we assume that substitution [N/x]M includes renaming of bound variables
to avoid capture. (Technically speaking, the collection of SOL reduction rules
defines a relation on equivalence classes of SOL terms, where equivalence is
defined by the collection of all SOL axioms for renaming bound variables. See,
e.g., [2] for further discussion). Intuitively, the reduction rule above says that the
expression (XX: a.M)N may be evaluated by substituting the argument N for
each free occurrence of the variable x in M. For example,
(XX: int. x + 2)5 + 5 + 2.
Some readers may recognize this mechanism as the “copy rule” of ALGOL 60.
We write +z+ for the congruent and transitive closure of +.
We introduce let declarations by the abbreviation
let x = M in N :: = (Xx : a.N)M,
where c = TypeA( Note that since the assignment A of types to variables is
determined by context, the definition of let depends on the context in which it
is used. An alternative would be to write let x: u = M in N, but since u is always
uniquely determined, the more succinct let notation seems preferable.
The typing rules and operational semantics for let are inherited directly
from X. For example, we have
let f = Xn:int. x + 3 in f(f(2)) * (2 + 3) + 3.
A similar declaration is the ML recursive declaration
letrec f = M in N
which declares f to be a recursive function with body M. (If f occurs in M, then
this refers recursively to the function being defined; occurrences of f in M are
bound by letrec.) Although we use letrec in programming examples, it is
technically useful to define pure SOL as a language without recursion. This pure
language has simpler theoretical properties, making it easier to study the type
structure of SOL.
The operational semantics of pairing and projection are given by the reduction
rules
fst(M, N) * M, snd(M, N) + N.
For example,
let p = (1, 2) in f&(p) + 1.
Note that the type of this case statement remains int if z is declared to be
inright of a Boolean instead of inleft of an integer.
3.3 Polymorphism
Intuitively, Xt.M is a polymorphic expression that can be “instantiated” to values
of various types. In an Ada-like syntax, the term Xt.M would be written
generic (type t )A4
Polymorphic expressions are instantiated using type application, which we will
write using braces 1,) to distinguish it from an ordinary function application.
If M has type V t.a, then the type of M(T ) is [7/t ]u. The Ada-like syntax for
M(T) is
new M(7).
Intuitive Semantics and Reduction Rules for Xt.M. The intuitive meaning of
ht.M is the infinite product of all meanings of A4 as t varies over all types. In the
next section, we see that abstract data type declarations involve infinite sums.
To see the similarity between V-types and infinite products, we review the
general notion of product, as ,used in category theory [l, 27, 431. There are two
parts to the definition: product types (corresponding to product objects in
categories) and product elements (corresponding to product arrows). Given a
collection S of types, the product type n S has the property that for each s E S
there is a projection function proj s from n S to s. Furthermore, given any
family F = ( fs] of elements indexed by S with fs E s, there is a unique product
element fl F with the property that
proj s fl F = fs.
Uniqueness of products means that if proj s n F = g, for all s E S, then
nF=flG.
The correspondence with SOL is that we can think of a type expression u and
type variable t as defining a collection of types, namely the collection S of all
substitution instances [T/t]u of CJ.If A4 is a term with t not free in the type of
any free ordinary variable, then M and t determine a collection of substitution
instances [T/t]M. It is easy to show that if t is not free in the type of any
free variable of M and TypeA = u, then TypeA ([T/t ]M) = [T/t ]u. By letting
f [T/t10
= [T/t JM, we may view the collection of substitution instances of M as a
family F = 1fs) indexed by elements of S. Using this indexing of instances, we
may regard V t.a as a product type n S and Xt.M as a product element JJ F, with
projection accomplished by type application. The product axiom above leads to
the reduction rule
(ht.M)bl * b/tlM
where we assume that bound variables are renamed in [r/t]M to avoid capture
of free type variables in 7. Since X binds type variables, we also have the renaming
rule
Xt.M = Xs.[s/t]M, s not free in Xt.M.
There is a third “extensionality” rule for X-abstraction over types, stemming
from the uniqueness of products, but we are not concerned with it in this paper
(primarily because it does not seem to be a part of ordinary programming language
implementation and because it complicates the Static Typing Theorem in
Section 3.7).
where
u= Ul A (*.. A un . ..).
Polymorphic data algebras may be written in Ada, Alphard, CLU, and ML.
Since SOL has X-binding of types, we can also write polymorphic representations
in SOL. For example, let t-stuck be a representation of stacks of elements of t,
say,
t-stuck ::= pack (int A array of t ) empty push pop
to 3s.~ A (t A s + s) A (s + t A s),
where empty represents the empty stack, and push and pop are functions
implementing the usual push and pop operations. Then the expression
stack ::= ht.t-stack
with type
stuck: Vt. 3s.[s A (t A s + s) A (s - t A s)]
is a polymorphic implementation of stacks. We could also define a polymorphic
implementation of queues
queue: Vt. 3q.[q A (t A q + q) A (q + t A q)]
similarly. Note that stuck and queue have the same existential type, reflecting
the fact that as algebras, they have the same signature.
Abstract data type declarations are formed according to the rule
declares a type of integer stacks with three operations. Note that the names for
the stack operations are local to N, rather than defined globally by stuck.
3.5 Programming with Data Algebras
One feature of SOL is that a program may select one of several data type
implementations at run time. For example, a parser that uses a symbol table
could be parameterized by the symbol table implementation and passed either a
hash table or binary tree implementation according to conditions. This ability to
manipulate data algebras makes a common feature of file systems and linkage
editors an explicit part of SOL. For example, many of the functions of the CLU
library, a design for handling multiple implementations [41], may be accom-
plished directly by programs.
In allowing programs to select representations, we also allow programs to
choose among data types that have the same signature. This flexibility accrues
from the fact that SOL types are signatures, rather than complete data type
specifications: Since we only check signature information, data types that have
the same signature have implementations of the same existential type. This is
used to advantage in the tree-search algorithm of Figure 1. It may also be argued
that this points out a deficiency in the SOL typing discipline. In a language with
specifications as types, type checking could guarantee that every actual parameter
to a function is an implementation of a stack, rather than just an implementation
with a designated element and two binary operations. Languages with this
capability will be discussed briefly in Section 4.4.
The common algorithm for depth-first search uses a stack, whereas the usual
approach to breadth-first search uses a queue. Since stack and queue implemen-
tations have the same SOL type, the program fragment in Figure 1 declares a
tree-search function with a data algebra parameter instead of a stack or queue.
If a stack is passed as a parameter, the function does depth-first search, while a
queue parameter produces breadth-first. In addition, other data algebras, such as
priority queues, could be passed as parameters. A priority queue produces a “best-
first” search; the search proceeds along paths that the priority queue deems
“best.”
The three arguments to the function search are a node start in a labeled tree,
a label goal to search for, and the data algebra parameter struct. We assume that
one tree node is labeled with the goal, so there is no error test. The result of a
call to search is the first node reached, starting from start, whose label matches
goal. The tree structure is declared at the top of the program fragment to make
the types of the tree functions explicit. The tree has a root, each node has a label
and is either a leaf or has two descendants. The function is-leaf? tests whether
a node is a leaf, while left and right return the left and right descendants of any
nonleaf.
3.6 Reduction Rules and intuitive Semantics of Existential Types
Intuitively, the meaning of the abstype expression
abstype t with x: CTis (pack TM to 3 t.a) in N
is the meaning of N in an environment where t is bound to 7, and x to M.
Operationally, we can evaluate abstype expressions using the reduction rule
abstype t with x: (T is (pack TM to 3 t.a) in N + [M/x][~/t]lV,
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 485
I’ Search returns first node reached from sfurr with label(node) = goal.
The structure parameter may be a stack, queue, priority queue, etc. */
let search(start:t, goahstring, strnct: VEls[s/\(tAs-s)r\(s-Us)]) =
abstype s with empty:s, insert:t/\s-s, delete:s-tAs
is stnlct {t}
in
1’ function to select next node; also returns updated structure ‘/
let next(node:t, st:s) =
if isleaf?(node) then delete(st)
else delete(insert(left(node), insert(right(node),st)))
in
/* recursive function jind calls near until goal reached +/
letrec find(node:t, st:s) =
if label(node)=goal then node else find(next(node, St))
in
/* callfind to reach node with label(node)=goaL*/
find(start, empty)
end
end
end
in
.. /* program using search function *I
end
end
where substitution includes renaming of bound variables as usual. (It is not too
hard to prove that the typing rules of SOL guarantee that [iV/x][~/t]N is well-
typed.) Since abstype binds variables, we also have the renaming equivalence
It is interesting to compare abstype with case since V-types with inleft, inright,
and case correspond to finite categorical sums. Essentially, abstype is an
infinitary version of case.
As an aside, we note that the binding construct abstype may be replaced by a
constant sum. This treatment of abstype points out that the binding aspects of
abstype are essentially X binding. If N is a term with type u + p, and t is not
free in p, then both Xt.N and C t.N are well typed. Therefore, it suffices to have
a function sum 3t.a p that maps Xt.N:Vt.[a + p] to C t.N: (3 t.a) + p.
Essentially, this means sum 3 t.u p must satisfy the equation
(sum 3t.a px)(pack my to 3t.a) = x(~)y
for any x, y of the appropriate types. In the version of SOL with sum as basic,
we use this equation, read from left to right, as the defining reduction rule for
sum. Given sum, both C and abstype may be defined by
C t.M ::= sum 3 t.u p Xt.M,
abstype t with x: u is N in M ::= (C th: a.M)N.
The reduction rules for C and abstype follow the reduction rules for sum. From
a theoretical point of view, it would probably be simpler to define SOL using
sum instead of C or abstype, since this reduces the number of binding operators
in the language. However, for expository purposes, it makes sense to take abstype
as primitive, since this makes the connection with data abstraction more readily
apparent. The difference is really inessential since any one of C, abstype, and
sum may be used to define the other two (using other constructs of the language).
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type l 487
Let qE be the congruent and transitive closure of =s~. Then we have the
following theorem:
STATIC TYPING THEOREM. Let M, N be two terms of SOL with Type*(M) =
TypeA( Then M +c= N iff Erase(M) aE Erase(N).
Since the theorem shows that two sets of reduction rules have essentially
equivalent results, it follows that programs may be executed using any interpreter
or compiler on the basis of untyped reduction rules. Like the Type Preservation
Theorem, the proof uses induction on the length of reduction paths and is
essentially straightforward. Although easily proved, these theorems are important
since they confirm our expectations about the relationship between typing and
program execution.
It is worth mentioning the relationship between the Static Typing Theorem
and the seemingly contradictory “folk theorem” that tagged sums (in SOL
notation, g V T types) require run-time type information. Both are correct but
based on different notions of “untyped” evaluation. The Static Typing Theorem
says that if a term M is well typed, then M can be evaluated using untyped
reduction => E. However, notice that Erase does not remove inleft and inright,
only the type designations on these constructs. Therefore, in evaluating a case
statement
case M left ... right . . . end
the untyped evaluation rules can depend on whether M is of the form inleft M,
or inright Ml. In the “ folk theorem,” this is considered type information, hence
the apparent contradiction.
The SOL reduction rules have several other significant properties. For example,
the reduction rules have the Church-Rosser property [22,61].
CHURCH-R• SSERTHEOREM. Suppose M is a term of SOL which reduces to
Ml and M2. Then there is a term N such that both M, and Mz reduce to N.
In contrast to the untyped lambda calculus, no term of SOL can be reduced
infinitely many times.
STRONG NORMALIZATION THEOREM. There are no infinite reduction se-
quences.
The strong normalization theorem was first proved by Girard [22]. In light of
the strong normalization theorem, the Church-Rosser theorem follows from a
simple check of the weak Church-Rosser property (see Proposition 3.1.25 of [2]).
A consequence of Church-Rosser and Strong Normalization is that all maximal
reduction sequences (from a given term) end in the same normal form.3 As
proved in Girard’s thesis [22] and discussed in [20] and [59], the proof of the
strong normalization theorem cannot be carried out formally in either Peano
arithmetic or second-order Peano arithmetic (second-order Peano is also called
“analysis”). Furthermore, the class of number-theoretic functions that are
LIA normal form M is a term that cannot be reduced. Our use of the phrase strong normalization
follows [2]. Some authors use strong normalization for the property that all maximal reduction
sequences from a given term end in the same normal form.
ACM Transactions on Programming Languages and Systems, Vol. 10, NO. 3, July 1988
Abstract Types Have Existential Type 489
representable in pure SOL without base types are precisely the recursive functions
that may be proved total in second-order Peano arithmetic [22, 681. These and
related results are discussed in [20] at greater length.
Since Pebble does not supply projection functions for dependent products, the
dependent product of Pebble actually seems to be a sum (in the sense of category
theory), like SOL g-types. KR dependent products do have something that looks
like a projection function: If A is a data algebra, then Currier(A) is a type
expression of KR. However, since Carrier(pack TM to 3 La) is not considered
equal to 7, it seems that KR dependent products are not truly products. Perhaps
further analysis will show that KR dependent products are also sums and closer
to SOL existential types than might appear at first glance.
As pointed out in [30], there are actually two reasonable notions of sum type,
“weak” and “strong” sums. The SOL existential type is a typical example of weak
sums, whereas strong sums appear as the C-types of Martin Lof’s type theory
[46]. The main difference lies in rule (AB.3), which holds for weak sums, but not
for strong. Thus, while Martin-Lof’s product types over universes give a form of
polymorphism that is similar to SOL polymorphism, Martin-Lof’s sum types
differ from our existential types. For this reason, the languages are actually quite
different. In addition, the restrictions imposed by universes simplify the seman-
tics of Martin-Lof’s language, at the cost of a slightly more complicated
syntax. (Some relatively natural programming examples, such as the Sieve of
Eratosthenes program given in Section 5.2 of this paper, are prohibited by
the universe restrictions of Martin-Lof type theory.) For further discussion of
sum and product types over universes, the reader is referred to [9], [lo], [31],
[451, [461, [491, and [541.
4. FORMULAS AS TYPES
4.1 Introduction
The language SOL exhibits an analogy between logical formulas and types that
has been used extensively in proof theory [12, 13, 22, 30, 35, 38, 39, 46, 67, 691.
The programming significance of the analogy has been stressed by Martin-Lof
[46]. We review the basic idea using propositional logic and then discuss quan-
tification briefly. In addition to giving some intuition into the connection between
computer science and constructive logic, the formulas-as-types analogy also
suggests other languages with existential types. One such language, involving
specifications as types, is discussed briefly at the end of this section. In general,
our analysis of abstype suggests that any constructive proof rules for existential
formulas provide data type declarations. For this reason, the formulas-as-types
languages provide a general framework for studying many aspects of data
abstraction.
as simply being true or false whenever we assign truth values to each variable.
While various forms of intuitionistic semantics have been developed [IO, 33, 34,
701, we will not go into this topic. Instead, we will characterize intuitionistic
validity by means of a proof system.
Natural deduction is a style of proof system that is intended to mimic the
common blackboard-style argument
Assume u.
By . . . we conclude 7.
Therefore u + 7.
We make an assumption in the first line of this argument. In the second line,
this assumption is combined with other reasoning to derive 7. At this point, we
have proved T, but the proof depends on the assumption of u. In the third step,
we observe that since u leads to a proof of 7, the implication 6 + r follows. Since
the proof of u + r is sound without proviso, we have “discharged” the assumption
of u in proceeding from 7 to u + T. In a natural deduction proof, each proposition
may depend on one or more assumptions. A proposition is considered proved
only when all assumptions have been discharged.
The natural deduction proof system for implicational propositional logic
consists of three rules, given below. For technical reasons, we use labeled
assumptions. (This is useful from a proof-theoretic point of view as a means of
distinguishing between different assumptions of the same formula.) Let V be a
set, intended to be the set of labels, and let A be a mapping from labels to
formulas. We will use the notation Conseq,(M) = u to mean that M is a proof
with consequence u, given the association A of labels to assumptions. Proofs and
their consequences are defined as follows:
ConseqA(M) does not depend on A.) Even when ---) is the only propositional
connective, there are classical ,tautologies that are not intuitionistically provable.
For example, it is easy to check that the formula ((s + t) + s) + s is a classical
tautology just by trying all possible assignments of true and false to s and t.
However, this formula is not intuitionistically provable.
Of course, we have just defined the typed lambda calculus: The terms of typed
lambda calculus are precisely the proofs defined above and their types are the
formulas given. In fact, ConseqA and TypeA are precisely the same function, and
Assume(M) is precisely the set of free variables of M. The similarity between
natural deduction proofs and terms extends to the other connectives and quan-
tifiers. The proof rules for A, V, V, and 3 are precisely the formation rules given
earlier for terms of these types.
One interesting feature of the proof rule for V of [60] is that it is the
discriminating case statement of CLU [42], rather than the problematic outleft
and outright functions of ML [23]. The “out” functions of ML are undesirable
since they rely on run-time exceptions (cf. [41], p. 569). Specifically, if X: r
in ML, then (inright 3~): cf V 7 and outleft(inright x): 6. However, we cannot
actually compute a value of type g from x : T, so this is not semantically sensible.
The ML solution to this problem is to raise a run-time exception when
outleft(inright X) is evaluated, which introduces a form of run-time type
checking. Since the V rule leads us directly to a case statement that requires no
run-time type checking, it seems that the formulas-as-types analogy may be a
useful guide in designing programming languages.
and
M2: (3t.a) */I.
We will say that MI is universally parameterized and MZ is ex&entially pararm+
terized.
Generic packages are universally parameterized data algebras. For example,
given any type t with operations
plus: t A t + t
times: t A t + t,
we can write a data algebra t-matrix implementing matrix operations over t. Four
operations we might choose to include are
create: t A ... A t-mat
mplus: mat A mat + mat,
mtimes: mat A mat * mat,
a!&: mat + t.
If mbody is an expression of the form
mbody ::= pack 7M1 . . - M, to 3s[(t A -. - A t + s)
A (s A s + s) A (s A s + s) A (s + t)]
implementing create, m&s, mtimes, and det using plus and times, then
matrix ::= At. Aplus: t A t + t. Xtimes: t A t + t.mbody
is a universally parameterized data algebra. The type of matrix is
Vt.(t A t + t) --, (t A t + t) + 3s[(t A . . . A
t + s) A (s A s + s) A (s A s + s) A (s + t)].
Note that mbody could not be existentially parameterized by t since t appears
free in the type of mbody.
Functions from data algebras to data algebras are existentially parameterized.
One simple manipulation of data algebras is to remove operations from the
signature. For example, a doubly ended queue, or dequeue, has two insert and
two remove operations. The type of an implementation dq of dequeues with
empty, insertl, insert2, removel, and remove2, is
dq-type ::= Vt.3d.[d A (t A d + d) A
(t A d + d) A (d + t A d) A (d + t A d)]
A function that converts dequeue implementations to queue implementations
is a simple example of an existentially parameterized structure. Given dq, we can
implement queues using the form
Q(x, t) ::= abstype d with empty: . . . , insertl: . . . , insert2: . . . ,
removel: . . . , remove2: . . .
is x(t]
in pack d empty insert1 remove2 to 3 t.a
with type
&-type ---, Vt. 3s.[s A (t A s + s) A (s + t A s) ]
is a function from data algebras to data algebras. Suppose that queue is the data
algebra produced by applying dq-to-q to dq. Since the type of queue is a closed
type expression, the fact that queue uses the same representation type as dq
seems effectively hidden. Generally, universal parameterization may be used to
effect some kind of sharing of types, whereas existential parameterization ob-
scures the identity of representations. (See [45], which was written later, for
related discussion.)
Some other useful transformations on data algebras are the analogs of the
theory building operations combine, enrich, and derive of CLEAR [5,6]. Although
a general combine operation as in CLEAR, for example, cannot be written in
SOL because of type constraints, we can write a combine operation for any pair
of existential types. For example, we can write a procedure to combine data
algebras of types 3s.~ and 3 t.p into a single data algebra. The type of this
function
Combine, = Xx: 3 t.u Xy : 3 t.p.
abstype s with z: u is x in
abstype t with w : p is y in
packs [pack t(z, w) to 3t(u A p)] to 3s3t(u A p)
is
Combine,: 3s.~ + 3t.p 4 3s3t(u A p).
For universally parameterized data algebras of types V r 3 S.CTand V r 3 t.p, we can
write combine so that in the combined data algebra, the type parameter will be
shared. The combine function with sharing
Combines = Xx:VrZls.a XyzVr3t.p.
Xr.abstype s with z : tr is x(r) in
abstype t with w:p is y(r) in
packs [pack t(z, w) to 3t(u A p)] to 3s3t(u A p)
has type
Combinez: Vr3s.a + Vr3t.p + Vr3s3t(u A p).
A similar, but slightly more complicated, combine function can be written for
the case in which the two parameters are both universally parameterized by a
type and several operations on the type. For example, a polymorphic matrix
package could be combined with a polymorphic polynomial package to give a
combined package parameterized by a type t and two binary operations plus and
times providing both matrices and polynomial over t. Furthermore, the combine
function could be written to enrich the combined package by adding a function
that finds the characteristic polynomial of a matrix.
5.2 Data Structures Using Existential Types
Throughout this paper, we have viewed data algebras as implementations of
abstract data types. An alternative view is that data algebras are simply records
tagged with types. This view leads us to consider using data algebras as parts of
data structures. In many cases, these data structures do not seem directly related
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
496 - J. C. Mitchell and G. D. Plotkin
to any kind of abstract data type. The following example uses existentially typed
data structures to represent streams.
Intuitively, streams are infinite lists. In an applicative language, it is convenient
to think of a stream as a kind of “process” that has a set of possible internal
states and a specific value associated with each state. Since the process imple-
ments a list, there is a designated initial state and a deterministic state transition
function, Therefore, a stream consists of a type s (of states) with a designated
individual (start state) of type s, a next-state function of type s + s, and a value
function of type s --, t, for some t. An integer stream, for example, will
have a value function of type s + int, and so the type of integer streams will be
3s[s A (s ---) s) A (s --, int)].
The Sieve of Eratosthenes can be used to produce an integer stream enumer-
ating all prime numbers. This stream is constructed using a sift operation on
streams. Given an integer stream sl, Sift(sl) is a stream of integers that are not
divisible by the first value of sl. If Num is the stream 2, 3, . . . , then the sequence
formed by taking the first value of each stream
Num, Sift(Num), Sift(Sift(Num)), ...
will be the sequence of all primes.
With streams represented using existential types, Sift may be written as the
following function over existential types.
Sift =
X stream: 3s[s A (s --, s) A (s - int)].
abstype s with start : s, next : s -+ s, value : s + int is stream
in let n = value(start)
in letrec f = X state : s.
if n divides value(state) then f (next(state))
else state
in
pack s f (start) Xx: s.f (next(x)) value to 3s[s A (s + s) A (s + int)]
end
end
end
Sieve will be the stream with states represented by integer streams, start state
the stream of all integers greater than 1, and Sift the successor function on states.
The value associated with each Sieve state is the first value of the integer stream,
so that the values of Sieve enumerate all primes.
Sieve =
abstype s with start : s, next: s + s, value : s + int
ispack 3t[tA (t-t) A (t-+int)]
packint2 Successor Xx:int.r to 3t[t A (t-t) A (t+int)]
Sift
Xstate:Yt[tA(t+t)A(t-+int)].
abstype F with r-start, r-next, r-val is state
in r-val( r-start)
to!lt[tA(t+t)A(t+int)]
Expressed in terms of Sieve, the ith prime number is
abstype s with start : s, next : s + s, value : s + int
is Sieve
in value(next’ start),
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 497
where “next’ start” is the expression next(next(. . . (next start). . .)) with i occur-
rences of next.
It is worth noticing that Sieve is “circular” in the sense that the representation
type 3t[t A (t + t) A (t + int)] used to define Sieve is also the type of Sieve
itself. For this reason, this example could not have been written in a predicative
system like Martin-Lof’s intuitionistic type theory [9, 461. The typing rules of
that theory require that elements of one type be composed only of elements of
simpler types.
and type binding can be added to M to produce a well-typed term of SOL. Some
questions of this nature are discussed in [40], [48], and [53].
A general problem in the study of types is a formal characterization of type
security. We have given two theorems about typing in SOL: Expressions may be
evaluated without considering type information, and the syntactic type of an
expression is not affected by reducing the expression to simpler forms. These
theorems imply that types may be ignored when evaluating SOL expressions and
that SOL type checking is sufficient to prevent run-time type errors. The study
of representation independence (mentioned above) leads to another notion of
type security, but further research seems necessary to show that SOL programs
are “type-safe” in other ways.
One interesting aspect of SOL is that it may be derived from quantified
propositional (second-order) logic using the formulas-as-types analogy discussed
in Section 4. Our analysis of abstype demonstrates that the proof rules for
existential formulas in a variety of logical systems all correspond to declaring
and using abstract data types. Thus, the formulas-as-types languages provide a
general framework for studying abstract data types. In particular, the language
derived from first- and second-order logic seems to incorporate specifications
into programs in a very natural way. The semantics and programming properties
of this language seem worth investigating and relating to other studies of data
abstraction based on specification.
where t is any type variable and c is any type constant. (We use two sorts of
variables, type variables r, s, t, . . . and ordinary variables X, y, z, . . . )
A type assignment A is a function from ordinary variables to type expressions.
We use A[x : u] to denote the type assignment A, with A1 (y) = A (y) for y different
from X, and A, (x) = u. The partial functions TypeA, for all type assignments A,
and the operational semantics of SOL are defined as follows:
TypeA = u + 7, TypeA = u
TypeA = 7
Xx : u.M = Xy : u.[ y/x]M, y not free in M
(Xx : u.M)N =+ [N/x]M,
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Abstract Types Have Existential Type 499
Products
Type,(M) = (r, TypeA = 7
Type,.,((M, N)) = (r A T
TypeA = (r A 7
TypeA ( fst M) = u, TypeA (snd M) = T
fst(M, N) =9 M, snd(M, N) + N
Sums
Typea = u
TypeA (inleft M to CTV 7) = u V 7, TypeA (inright A4 to T V a) = 7 V u
TseA(M) = u V 7, Twq,:,1W) = P, Twqr:rl(P) = P
Type, (case M left x : UN right y : r.P end) = p
case M left x: u.N right y: 7.P end
= case M left u: u.[u/x]N right u: ~.[u/y]P end,
ACKNOWLEDGMENTS
Thanks to John Guttag and Albert Meyer for helpful discussions. Mitchell thanks
IBM for a graduate fellowship while at MIT, and Plotkin acknowledges the
support of the BP Venture Research Unit.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
500 l J. C. Mitchell and G. D. Plotkin
REFERENCES
1. ARBIB, M. A., AND MANES, E. G. Arrows, Structures, and Functors: The Categorical Imperative.
Academic Press, Orlando, Fla., 1955.
2. BARENDREGT, H. P. The Lambda Calculus: Its Syntax and Semantics. North-Holland, Amster-
dam, The Netherlands, 1984 (revised edition).
3. BRUCE, K. B., AND MEYER, A. A completeness theorem for second-order polymorphic lambda
calculus. In Proceedings of the International Symposium on Semantics of Data Types. Lecture
Notes in Computer Science 173, Springer-Verlag, New York, 1984, pp. 131-144.
4. BRUCE, K. B., MEYER, A. R., AND MITCHELL, J. C. The semantics of second-order lambda
calculus. In Information and Computation (to be published).
5. BURSTALL, R. M., AND GOGUEN, J. Putting theories together to make specifications. In Fifth
International Joint Conference on Artificial Intelligence, 1977, pp. 1045-1958.
6. BURSTALL, R. M., AND GOGUEN, J. An informal introduction to specification using CLEAR.
In The Correctness Problem in Computer Science, Boyer and Moore, Eds. Academic Press,
Orlando, Fla., 1981, pp. 185-213.
7. BURSTALL, R., AND LAMPSON, B. A kernel language for abstract data types and modules. In
Proceedings of International Symposium on Semantics of Data Types. Lecture Notes in Computer
Science 173, Springer-Verlag, New York, 1984, pp. l-50.
8. CONSTABLE, R. L. Programs and types. In 21st IEEE Symposium on Foundations of Computer
Science (Syracuse, N.Y., Oct. 1980). IEEE, New York, 1980, pp. 118-128.
9. CONSTABLE, R. L., ET AL. Implementing Mathematics With The Nuprl Proof Deuelop-
ment System. Graduate Texts in Mathematics, vol. 37, Prentice-Hall, Englewood Cliffs, N.J.,
1986.
10. COQUAND, T. An analysis of Girard’s paradox. In Proceedings of the IEEE Symposium on Logic
in Computer Science (June 1986). IEEE, New York, 1986, pp. 227-236.
11. COQUAND, T., AND HUET, G. The calculus of constructions. Znf. Comput. 76, 2/3 (Feb./Mar.
1988), 95-120.
12. CURRY, H. B., AND FEYS, R. Combinatoty Logic I. North-Holland, Amsterdam, 1958.
13. DEBRUIJN, N. G. A survey of the project Automath. In To H. Z3. Curry: Essays on Com-
binatory Logic, Lambda Calculus and Formalism. Academic Press, Orlando, Fla., 1980, pp.
579-607.
14. DEMERS, A. J., AND DONAHUE, J. E. Data types, parameters and type checking. In 7th ACM
Symposium on Principles of Programming Languages (Las Vegas, Nev., Jan. 28-30, 1980). ACM,
New York, 1980, pp. 12-23.
15. DEMERS, A. J., AND DONAHUE, J. E. ‘Type-completeness’ as a language principle. In 7th ACM
Symposium on Principles of Programming Languages (Las Vegas, Nev., Jan. 28-30, 1980). ACM,
New York, 1980, pp. 234-244.
16. DEMERS, A. J., DONAHUE, J. E., AND SKINNER, G. Data types as values: polymorphism, type-
checking, encapsulation. In 5th ACM Symposium on Principles of Programming Languages
(Tucson, Ariz., Jan. 23-25,1978). ACM, New York, 1978, pp. 23-30.
17. U.S. DEPARTMENT OF DEFENSE Reference Manual for the Ada Programming Language. GPO
008.ooo-00354-8,198O.
18. DONAHUE, J. On the semantics of data type. SIAM J. Comput. 8 (1979), 546-560.
19. FITTING, M. C. Zntuitionistic Logic, Model Theory and Forcing. North-Holland, Amsterdam,
1969.
20. FORTUNE, S., LEIVANT, D., AND O’DONNELL, M. The expressiveness of simple and second
order type structures. J. ACM 30,l (1983), 151-185.
21. GIRARD, J.-Y. Une extension de l’interpretation de Godel i l’analyse, et son application i
l’elimination des coupures dans l’analyse et la theorie des types. In 2nd Scandinavian Logic
Symposium, J. E. Fenstad, Ed. North-Holland, Amsterdam, 1971, pp. 63-92.
22. GIFWRD, J.-Y. Interpretation fonctionelle et elimination des coupures de l’arithmetique d’ordre
superieur. These D’Etat, Univ. Paris VII, Paris, 1972.
23. GORDON, M. J., MILNER, R., AND WADSWORTH, C. P. Edinburgh Lecture Notes in Computer
Science 78, Springer-Verlag, New York, 1979.
24. GRATZER G. Universal Algebra. Van Nostrand, New York, 1968.
25. GUT-TAG, J. V., HOROWITZ, E., AND MUSSER, D. R. Abstract data types and software validation.
Commun. ACM 21,12 (Dec. 1978). 10481064.
ACM Transactions on Programming Languages and Systems, Vol. lo, No. 3, July 1988.
Abstract Types Have Existential Type 501
50. MILNER, R. The standard ML core language. Polymorphism 2, 2 (1985), 28 pages. An earlier
version appeared in Proceedings of 1984 ACM Symposium on Lisp and Functional Programming.
51. MITCHELL, J. C. Semantic models for second-order Lambda calculus. In Proceedings of the
25th IEEE Symposium on Foundations of Computer Science (1984). IEEE, New York, 1984,
pp. 289-299.
52. MITCHELL, J. C. Representation independence and data abstraction. In Proceedings of the
13th ACM Symposium on Principles of Programming Languages (St. Petersburg Beach, Fla.,
Jan. 13-15, 1986). ACM, New York, 1986, pp. 263-276.
53. MITCHELL, J. C. Polymo?phic type inference and containment. Inf. Comput. 76,2/3 (Feb./Mar.
1988), 211-249.
54. MITCHELL, J. C., AND HARPER, R. The essence of ML. In Proceedings of the 15th ACM
Symposium on Principles of Programming Languages (San Diego, Calif., Jan. 13-15,1988). ACM,
New York, 1988, pp. 28-46.
55. MITCHELL, J. C., AND MEYER, A. R. Second-order logical relations. In Log& of Programs.
Lecture Notes in Computer Science 193, Springer-Verlag, New York, 1985, pp. 225-236.
56. MITCHELL, J. C., AND PLOTKIN, G. D. Abstract types have existential types. In Proceedings
of the 12th ACM Symposium on Principles of Programming Languages (New Orleans, La.,
Jan. 14-16, 1985). ACM, New York, 1985, pp. 37-51.
57. MITCHELL, J. G., MAYBERRY, W., AND SWEET, R. Mesa language manual. Tech. Rep. CSL-
79-3, Xerox PARC, Palo Alto, Calif., 1979.
58. MORRIS, J. H. Types are not sets. In 1st ACM Symposium on Principles of Programming
Languuges (Boston, Mass., Oct. l-3, 1973). ACM, New York, 1973, pp. 120-124.
59. O’DONNELL, M. A practical programming theorem which is independent of Peano arithmetic.
In 11th ACM Symposium on the Theory of Computation (Atlanta, Ga., Apr. 30-May 2, 1979).
ACM, New York, 1979, pp. 176-188.
60. PRAWITZ, D. Natural Deduction. Almquist and Wiksell, Stockholm, 1965.
61. PRAWITZ, D. Ideas and results in proof theory. In 2nd Scandinavian Logic Symposium. North-
Holland, Amsterdam, 1971, pp. 235-308.
62. REYNOLDS, J. C. Towards a theory of type structure. In Paris Colloquium on Programming.
Lecture Notes in Computer Science 19, Springer-Verlag, New York, 1974, pp. 408-425.
63. REYNOLDS, J. C. The essence of Algol. In Algorithmic Languages, J. W. de Bakker and J. C.
van Vliet, Eds. IFIP, North-Holland, Amsterdam, 1981, pp. 345-372.
64. REYNOLDS, J. C. Types, abstraction, and parametric polymorphism. In IFIP Congress (Paris,
Sept. 1983).
65. REYNOLDS, J. C. Polymorphism is not set-theoretic. In Proceedings of International Symposium
on Semantics of Data Types. Lecture Notes in Computer Science 173, Springer-Verlag, New York,
1984, pp. 145-156.
66. SHAW, M. (Ed.) ALPHARD: Form and Content. Springer-Verlag, New York, 1981.
67. STATMAN, R. Intuitionistic propositional logic is polynomial-space complete. Theor. Comput.
Sci. 9 (1979), 67-72.
68. STATMAN, R. Number theoretic functions computable by polymorphic programs. In 22nd IEEE
Symposium on Foundations of Computer Science. IEEE, New York, 1981, pp. 279-282.
69. STENLUND, S. Combinators, X-terms and Proof Theory. Reidel, Dordrecht, Holland, 1972.
70. TROELSTRA, A. S. Mathematical Investigation of Zntuitionistic Arithmetic and Analysis. Lecture
Notes in Mathematics 344, Springer-Verlag, New York, 1973.
71. WULF, W. W., LONDON, R., AND SHAW, M. An introduction to the construction and verification
of Alphard programs. IEEE Trans. Softw. Eng. SE-2 (1976), 253-264.
ACM Transactions on Programming Languages and Systems, Vol. 10, No. 3, July 1988.
Using Dependent Types to Express Modu|ar Struetr~re
David MacQueen
I~treductio~r~
Writing any large program poses difficult problems of organization, In many modern programming
languages these problems are addressed by special linguistic constructs, variously known as modules, packages,
or clusters, which provide for partitioning programs intn manageable components and for securely combining
these components to form complete programs. Some general purpose components are able to take on a life of
their own, being separately compiled and stored in libraries of generic, reusable program units. Usually
modularity constructs also support some form of information hiding, such as "abstract data types." "Pro-
gramming in the large" is concerned with using such constructs to impose structure on large programs, in con-
trast to '~programming in the smatF ', which deals with the detailed implementation of algorithms in terms of
data structurcs and control constructs. Our goal here is to examine some of the proposed linguistic notions
with respect to how they meet the pragmatic requirements of programming in the large.
Originally, linguistic constructs supporting modularity were introduced as a matter of pragmatic
language engineering, in response to a widely perceived need. More recently, the underlying notions have
been anatyzcd in terms of type systems incorporating second-order concepts. Here I use the term "second-
order '~ in the sense of °~second-order" logic, which admits quantification over predicate variables [Pra651.
Similarly, the type systems in question introduce variables ranging over types and allow various forms of
abstraction or '~quantification"' over them.
Historically, these type systems are based on fundamental insights in proof theory, particularly 'the "for-
mulas as types" notion that evolved through the work of Curry and Feys ICF58h Howard [HHow8010 de Bruijn
ldeBS0] and Scott IScoT0I. This notion provided the basis for Martin-L6f's h~rmalizations of constructive logic
as lntuitionistic Type Theory (ITF) IM-L71, M-L74, M-L821, and was utilized by Girard IGir71], who intro-
duced a h)rm of second-order typed lambda calculus as a tool in his proof-theoretic work. The "t2~rmulas as
types" notion, as developed in de Bruijn's AUTOMATH system and Martin-L6f's ITT° is also central to the
"programming logics", PL/CV3 and nu-PRL developed by Constable and his coworkers ICZ84, BC85].
In the programming language area. Reynolds 1Roy74] independently invented a language similar to that
used by Girard, and his version has come to be called the second-order lambda calculus. An extended form of
this language, called SOL, was used by Mitchell and Plotkin IMP851 to give an explanation of abstract data
types. The programming languages ML IGMW78, Mi178] and Russell IBDDS0, I'.too84, DD851 represent two
distinctly different ways of realizing *'polymorphism" by abstraction with respect to types. ML is basically a
restricted form of second-order lambda calculus, while Russell employs the more general notion of ~dependent
types" (Martin-L6f"s general product and sum, defined in §2). The Pebble language of Burstall and Lampson
IBL84, Bur841 also provides dependent types, but in a somewhat purer t~.)rm. Finally. HHuet and Coquand's
Calculus of Constructions is another variant of typed tambala calculus using the general product dependent
type. It also provides a form of metatype (or type of types), called a '~context", that characterizes the struc-
ture of second-order types, thus making it possib e to abstract not only with respect to types, but also with
respect to families of types and type constructors. T h e Calculus of Constructions is an explicit attempt to com-
bine a logic and a programming language in one system.
Permissionm copy wi~.houtfee all or part of this maleria/isgranted public~Uionand dale appear~and notice is giventhat copyingis by
provided that lhe copiesare nol madeor dislribuledlk~rdirect permi~,~ionof ~hc As,~ociationfor ComputingMachinery.To c~py
commercialadvantage, lhc ACM copyrightnoliceand the tillc of the o~herwi~,or t.~ republish, requires a fee andh~rspecificpermission,
277
Among these languages, Russell and Pebble arc distingl~ished by havi~g "reflexive" type systems° mean-
ing that there is a "type of al] types" that is a m,)mber of itself ('FypecT}r;:)~,), Martin-I,Sf's initial version of
ITT tM-L7t] was also reflexive in this sense, but hc abandoned this vcrsion in fav()r of a "'ramified ''l system
with a hierarchy of type univcrscs when Girard's Paradox fGirTl] showed that the reflexive system was incon-
sistent as a constructive logic. In terms of programming languages, the paradox implies at least the existence
of divergent expressions, but it is not yet clear whether mnre serious pathologies might folk)w fr<nn it (see
Meyer and Rcinhotd's paper, this proceedings /MR861). Since types arc simply values belonging to the type
Tyl)e, reflexive type systems tend to obscure the distinction between types and the values they are meant to
describe, and this in turn tends to compile:ate the task of type checking, tt is, on the other hand~ possible to
construct reasonable semantic models %r reflexive typc systems I McC7% Car85t.
The remaining nonreflcxive languages distinguish, at least implicit]y, between individual types and the
universe of types to which they belong and over which type variables range. Howevcro ~he second order
lambda calculus, SOL, and the Calculus of Constructk)ns (despite its "contcxts-) arc -imprcdicativc., '< mean-
ing that there is only eric type universe and it is closed under type cnnstructions like Vt.(rU) and 3t.(r(,') that
involve quantifiers ranging over itself. The reflexive type systems of Russell and Pebble are also impredica-
tire, in perhaps an even stronger sense since type variables can actually take on Type, the universe of types, as
a value. In contrast, the later verskms of ITT and Constable's logics are ramified systems in which quantifica-
lion or abstraction over a type universe at one level produces an element of the next higher level, and they are
therefore predicative.
Our purpose here is not to set out the mathematical nuances of these various languages, but to look at
some of the pragmatic issues that arise when we actually attempt to use such languages as vehicles h)r pro-
gramming in the 1urge. We will begin by discussing some of the consequences of the SOL Type system %r
modular programming. Then in §2 we briefly sketch a ramified (i.e. stratified) system of dependent types
from which we derive a small language called DL, which is a gcneralized and " d e s u g a r e d " vcrsinn of the
extended ML language presented in IMac85L The final seetinn uses DL to illustrate some uf the stylistic
differences between ML and Pebble.
where ,: is a type variabD and (r(:) is a type expression possibly containing free occurrences of :. Values of
such types are introduced by expressions of the %rm
where P is an expression of type o-(r), These values are intended to model abstract data types, and were
called data al,gebrets in IMP85] and packages in ICW85/: we wilt use the term structure to agree with the termi-
nology of ]Mac851 that we wilt be adopting in tater sections. The type component ~r will be called the wimess
or rg5~resenmtion type of the structure~ Access to the components of a structure is provided by an expression
of the form
abs~ype : width x is M in N : p
which is well-typed assuming M : d t , ~ ( : ) and x:<r(~~) => N : p with the restriction that : does m)t appear free in
p nor in the type of any variable y appearing free in N.
As mentioned in tMP851, and because of the impredicative nature of SOL, these cxistential types arc
ordinary types just like in: and boot, and the structures that are their values arc just ordinary values. This
implies that atl the standard value manipulating constructs such as conditionals and functional abstraction
apply cqually to structures. Thus a parametric module is just an ordinary function of type T-*dt,(r(t), fl.)r
example,
There is a tradeoff for this simplicity, however. Let us consider carefully the consequences of the res-
trictions on the abs{ype expression, Once a structure has been constructed, say
A = rep3~.,,~,/rP
Since 9ertrand Russeil i~r~ed~eed his ~ramified type theory," the word "ramified" has been used in k~gic to
mean "stratified into a sequence of levels', normally an infinite ascending sequence of levels.
z Roughly speaking, a definition of a set is said to be. impredicativc if the set contains members defined wish
ret~renc¢ to the entire set,
278
the type T is essentially forgotten. Although we may locally " o p e n " the structure, as in
abs~ype t with x is A in N
there is absolutely no connection between the bound type variable t and the original representation type "r.
Moreover, we cannot even make a connection between the witness type n a m e s obtained from two different
opcnings of the s a m e structure. For example the types s and t will not agree within the body of
a b s t y p e s with x is A in
a b s t y p e ¢ with y is A in * . .
h~ effect, not only is the form and identity of the representation type hidden, but we are not even allowed to
assume that there is a unique witness type associated with the structure A. The witness type has been made
not only opaque, but hypothetical! This very strong restriction on our access to an abstraction goes beyond
~ommon practice in Xanguage design, since we nnrmalty have some means of referring to an abstract type as a
definite though unrccognizablc type within the scope of its definition. This indefiniteness seems to bc the price
paid for being able to treat the abstract type structure as an ordinary value rather than as a type. (See
tCMS5I, where we use the terms "virtual witness," "abstract witness °~, and " t r a n s p a r e n t witncss" to describe
three possible treatments of the witness type in an existential structure.)
H i e r a r c h i e s ~/" structures. The consequences of SOL's treatment of the witness type become clearer
when wc consider building abstractions in terms of other abstractions. Consider the following definition of a
structure representing a geometric point abstraction.
Point = 3p.PointWRT(p)
Now suppose that we want to define a rectangle abstraction that uses C a r t e s i a n P o i n t . We must first open Car-
t e s i a n P o i n t , define the rectangle structure, and then close the rectangle structure with respect to the point type.
R e e t W R T ~R ) = 3 r e c t . { point interp : P o i n t W R T tP ).
m k _ r e c t : p x p ~ reet,
topic:/t: reet ~ p ,
botright :rect --" p )
CartesianRect = a b s t y p e p o i n t with P is C a r t e s i a n P o i n t in
r e p ~,,,., p o i n t
rep gectWRT (poim ~ p o i n t x ,point
( p o i n t _ i n t e r p = P~ \
m k _ r e c t = h ( tl : poinb~ br :point) ~ ( br, tl ),
toph,J? = k r :point x po'i~t . (f~'t r ).
botright = h r : point x p o i n t . (snd r) )
If wc (doubly) opcv C a r t e s i a n R e c t we will get a new virtual point typd unrelated to any existing type We had
to incorporate an interpretation o f this point type in the R e e t structure as p o i n t _ i n t e r p to provide the means to
create elements of that type, which in turn allows us to create rectangles,
Now supposc we also define a circle abstractinn based on the same CartesianPoint structure, and we
warn to allow interactions between the two abstractions, such as creating a unit circle centered at the top left-
hand corner of a given rectangle, This requires that the rectangle structure, the circle structure, and any
operations relating them all be defined within the scope of a single opening of the CartesianPoint structure. In
general, we must anticipate all abstractions that use the point structure and might~ possibly interact in terms of
points and define them within a single abstype expression.
It anvears'H
t ' that' when building a collection of interrelated abstractions,,, the lower the level of the abstrac-
tion the wider the scope in which it must be opened. We thus have the traditional disadvantages of block
structured languages where low-level facilities must be given the widest visibility. IFor further details, see the
279
examptes in {~6 of CardeIli and Wegner's tutorial ICW851.)
]nterpreting km:n:p~ Oy)es, "/'he notion of providing operations to interpret a type does tint apply only to
"'abstract" types, It is oRen useful to,impose additional structure on a given type without hiding the identdy.
of that type, For instance, wc might want tn temporarily view in: x boo/ as an ordered set with some special
ordering. To do this wc might define <the structure ]n:BoolOrd as ibllows
Under the SOL typing rules, there is no way ~o make use of ]ntBoolOrd because wc could never crea~tc any
d e m e n t s to which the ordering operation could be applied. In fact, no structure of *ypc OrdSeI can ever bc
used, because of our inabdity to express values of type :. O f course, this also means that LexOrd is useless.
However, if we had access to the witness types~ then structures like lntBo(:/Ord and mappings like LexOrd
could be quite useful,
There arc various ways of working around these problems within S O L We can, fur instance° delay or
avoid entirely the creation of closed sttructurcs and ins<cad deal separately with types and thcir interpreting
operations, Thus, ZexOrd couM bc rewritten tu have the type Vt, OrdSe:WRT(:)-~OrdgetWRT(listt :) with
OrdSei'WRT(r) -: (Iv : ; x t -~b::::/), However, our preferred solution is to abandon the restrictive SOL rule and
view structures as inhcrcndy "*npcn" or "transparent. °' This is suggested by tthc type rules of ITT, which pro-
vide access to both the wdncss and interpretation components of an existential (i.e. general sum) structure,
intuitively, within the scope of the local declaration
ahstype : wish .r ~s M ~r~ N
we consider : to bc simply an abbrevia~hm or local name for thc witness type of M. O f cuursc, : itself should
not appear in the types of free variables or <)11:the e n t k e expression, because it has only local significance, but
its meaning that is ,:he witness type of M, may, "Abstraction" is then achieved by other means, namely by
real or simulated functional abstraction with respect< a sttructture variable (scc IMac851), which is merely an
"uncurricd" form of the approach ~o data abstraction originally propnscd by Reynolds in tRey74j. When
sttructures are transparenL it is d e a r that they carry a particu!ar type, ttngctther with its intcrprelation; in fact,
it is reasonable to think of structures as imerpreted types rather than a kind of value. Conseq~cntly we also
abandon the hnprcdioafivc twoqeve~ system of SOL and move to a ramified system in which quantified ~ypes
arc objccts of ~cvc~ 2, while Ievcl ~ is occupied by ordinary monomorphic types, structures, and po]ymorphic
functions,
2, ~. Depefider~t types
There are two basic {brms of depe~der~t types, which we wit~ call the gener~d product and the gee~eral
,r~m, The general product, written J l x : A , B ( x ) , i s ~aively intterpretcd as the ordinary Cartesian product of the
family of ~ets {B (x)}~a indexed by A, Le,
IIx:A,B{x) = { l e A - r U B ( x ) I Va~Aof(a)~B(a)}
IdA
It denotes the type of functions that map aa etcmcat ~¢(A into B ( a ) , that< is functions whnsc result type
depends on the argumenL witth B specif~¢ing the dependence. E!ementts of H x : A , B ( x ) are in<<reduced by
~ambda absttraction and etiminatted by fhnction applicattion. In ~:hc degcacrattc case where B is a constant func-
tion, e,,',f, when B(x) is defined by an expression no~ con<raining x free, <the general product reduces to the ordi-
nary bane<ion space A --~B, ~
General produd: types arc also called "i'ndcxcd pr~,Muds"0 *~Cartcsian products," or *'depe~dcr~t {i~cdnn
spaces "' Other mr<arlenE include x :A ~ B ix) /CZg4L x :A .... B (x) JBLg4L and Vx :A.B (x) (from #~e fimnu~as as
280
The general sum, written E x : A . B ( x ) , is intuitively just the disjoint union of 'the family {B(X)}.v~a, i.e.
Etements of the general sum have the form of pairs, where the first element, called the witness or index deter-
mines the type of the second element. Elements of the general sum are constructed by a primitive injection
function
and they can be analyzed by using the two primitive projection functions
witness : (~2x : A .B (x)) ~ A
o u t : lip : ( Z x :A °B (x)) o B ( w # n e s s p )
Note that the existence of these projection functions (corresponding roughly to Martin-L6f's E operation) make
the general sum an " o p e n " construct° in contrast to the existential type of SOL or the categorical sum (see
IMP851, §2.6). 4 In the degenerate case where B ( x ) is independent of x, the general sum is isomorphic to the
ordinary binary Cartesian product AcrossB. 5
~n the following sections we will snmetimes take the liberty of saying simply " p r o d u c t " or " s u m " when
we mean "general p r o d u c t " and "general s u m . "
Vt.~r~t) ~ I~,t:Type . ¢ ~ ( t ) : T y p e 2
types isomorphism).
4 A "closed" version of the general sum, analngous n~ SOL's existential type, can be derived from the general
product [Pra65 , but the open version used here and in tTT appears m be an independent primitive notion.
~ . . . . . . ~ ....... ~ . . . . i~} h ~ n ca ed "indexed sums ' dislo nt unions and "dependcm producW' (an t nf~.~r-
innate c ash with the *'general product" terminology) Other notations used include x:A B(x) IBL ] '
3 x : A . B ( x ) (from the formulas as types isomorphism). Is r ~v
Thc simpler forms of type language will not admit variab e,' anging ~ er values and only constant functions B
will be definable. Under thesecircums~anccs the first-order general product and sum always reduce to their de-
generate forms A ~ B and A x B,
7 With e,g,, H x : A , B ( x t = f I ~ ( A ) ( h x : A , B ( x ) L
281
We wilt in fact think of E~.structmcs as a generalizcd lbrm of small type.
The level 2 gestural sum operation E, and its associated primitive operations actually have vc,y ge~crzd
polymorphie ~ypcs:
E;, : l t.~ .4 : Type:~ 0(k -+ ~Type?) ~.~ 'I"ype2 : Type:
inj: : l l > A : T y p e : o l l : ~ B : ( A - + T y p e : ! , i ] , x : A , ( 8 ( x ) - % E : ( A ) ( : ~ ) :Type;
The corresponding ~ypes for witness~ and o~t.~ arc left as exercises, The basic structure cxprcssbn
rep?:,,(:~TP : ~t.<r(:)
translates i~ato the following
i~j2(Type~ )(X::Type~ . (r(:))(,r)(Y) : E:: : Type, , <r(:)
which wc will often abbreviate to if~j2"r P when the polymorphic parameters Types and X:.(r(:) clear from ~he
context. Notc that because of tbc gcncra]ity of E2, we may also create strudures with structures rather than
types as witnesses (or even with polymorphic functions as witnesses, though we won"t pursue this possibility
here). We willexptoit this generality in the languagcdcscribcd in the next section.
The rules for type chucking in this system arc convcntional, consisting of the appropriate generalizations
of 'the usual introduction and elimination rules at each level, t o g d h e r with additional rules to deal with ~.
conversion and definitional equality.
:~<rp ::= b<:o: lint!real It~,ar ! :exp x :exp' i {id I::expl ..... i4,::e.~p,,)! :exp-~e.~p' l w#:w.~.~t,~:~:r)
where :~'ar ranges over type variables and svar over structure wariables. ~ The actual small types of DL
correspond to the closed (i.e. variable free) type expressions, and this class is dcnotcd simply by Type {short
for Type1 ).
3°2, Sig~a~ures
The class of signatures is obtained by starting with Types and closing with rcspcct to the E~ operator.
This gives a class of types characterizing the union of small typcs and "~abstraction-free ~ E?-structures (Ye.
those that d() not contain any second-order lambda abstractions), Rather than use {he E2 opera{or directly, we
give a ~ittlc grammar for signatures that covers the cases of imerest:
sig :: -- Type I E s~!ar : sig otexp I E <far : sig ,#ig'
where E is short for E2. Typically, the :exp forming the body of a signature is a labeled product typc spccify-.
ing a collection of named functions and other vatues. Note that if sig is Type in cithcr of thc E h?rms, the
siructure variable is actually a type -variable, so structurc variables ,subsume type variables. Note also that m a
signature such as E s : A . B ( s ) , the structure variable s can appear ill B only as a cornponen~ of a type subcx-
prcssion, It can appear either directly, if A = Type, or else in a sabc'kpression ~'wiiness(.._r.,.)," tk)rmed by
nested application of w~iriess and out and denoting a smaU type.
f;or wi:ne,~s{~'ar) to be proper small type, ,s'vur should be restricted to range over structures wilb Type
Witf~ebge~,
~2
3°3° S*rue~tures
In DL, the term " s t r u c t u r e " is used in a somewhat broader sense than above to match the notion of sig-
nature. ©L Structures may be either small types or nested Z2-structures. As in the case of signatures, we
substitute some syntax for the use of the inj primitive in its full generality, The syntax of structure expres-
sions naturally follows that of signatures, viz.
sexp ::= svar l texp l inj sexp ~,rp l inj sexp sexp'
where svar ranges over structure variables and exp ranges over ordinary value expressions. We will not
further specify exp other than to say that it includes labeled tuples (called bindings in Pebble) expressing ele-
ments of labeled products, and cxpressions of the form " o u t ( . . . s w m . . ) , " formed by nested application of wi~-
hess and o u t and denoting a value of type depending on the signature of s w m
3,4° Func'tors
Wc allow (second-order) lambda-abstraction of structurc exprcssions with respect to structurc variables
to ff)rm functions from structures to structures. Following IMae85], we will call such abstractions.]unctors.
Wc will allow nested abstractions, yielding " c u r r i e d " functors. The type of a functor is a gencral product that
we will call a,fimctor signature.
The abstract syntax of functor signatures and functor expressions is
where 1t represents It a. The syntax of structure expressions must be extended to encompass functor applica-
tions by adding
sexp ::= m e x p ( s e x p )
The restrictions embodied in the structure and functor syntax a m o u n t to saying that structures cannot
have functors as components, nor can functors have functnrs as arguments. In other words, functors arc res-
trictcd to bc csscntiatly "first-order" mappings over structures. These restrictions are partly a reflection of
certain tentative principles for programming with parametric modules, and partly an attempt to simplify imple-
mentation of the language. Further experience with parametric modules (functors) and their implementation
should help refine our ideas about what restrictions on the full type theory are pragmatically justified.
R e c t W R T ( P : Point) =
E r e c t : T y p e . ~mk_rect: IP x [Pt - , r e c t .
tophJ? :rect ~ I P .
botright :rect ~ [p t )
283
referred to in its signature, will be called its supporting structures, or more brieEty, its support.
kf we have an overt dependency, such as
B = ser~(k) : s @ a ( A )
where A :sigA, there are two ways of m a k i n g B serf-sufficient relative to A, both of which have the effect of
ck)sing ¢he signature sigte(A) with respect to A. One method is to abstract with respect to A, thus turrdng B
in:to a functor:
B'" = i N k s W ~ ( k ) : E X : s i g : os'ig~(X)
Note that the B " closure is no longer a structure, in order t() get a usable structure we have to apply it to a
structure expression, thus recreating the original situation o f overt dependency, as in
I3' = B ~ ( F ( G ( X ) ) ) : xigB(F(G(X))).
O n the other hand, B *' is truly self-contained, at least so far as A is concerned, and is usable as i~ stands
because it incorporates the necessary supporting structure A within i{seK. ~n ML, A is callcd a sub,~'tructure o f
A = s:ra : gig a
t3 = strg(A) : sigB(A)
and we wish to abstract C with respect to its supporting structures. There are three different ways ~o do this:
(~) full abstraction with respect to all supporting structures:
M k C 2 = k B : s i g ~ ( A ) , sire(A, B)
: tlB:sig~e(A)o sigc(A, B)
if we both E-close C with respecI to B' a n d abstract with respect to B ' we get
M k C ' = XB' : sig,8, , i N B ' (sire( [B' t, otlt( B') ) ) : I IB' : sig:r ,sig c,
where sig( . . . . EB':si&~, , s i g c ( t B ' t , e u t ( B ' ) ) . The rules o f type equMity will insure ihat h)r all structures
S : s i g u , , I M k C ' ( S ) I = S , even though the relation between the a r g u m e n t and result o f M k C ' is not manifest in
its signature.
Note that when B was E-closed to {brm B' the support of C was coalesced into a single st~~.~ct,~fc, which
made it easier to fuIly abstract C with respect to its support. When there are many levels o f s u p i x ~ i n g struc-
tures this efficiency of abstraction becomes a signifieam advantage. On the other h a n d , it b e c a m e impossible
284
to abstract with respect to B' while leaving A fixed, because A had become a component of B.
The final example illustrates the interplay between sharing and abstraction. Suppose structures A, B, C,
and D are related as follows:
A .... s t r a : s i g A
C = s:rc(A): sigc(A)
i.e, D depends on A, B, and C while B and C both depend on A. It" we fully abstract D with respect to its sup-
port wc have
M k D = XA : sig A o XB : sig B , X C : sigc o strf)( A , 13~ C) : I IA : siga , lIB : sigl~ , I]C: sigc o sigl)(A, B , C )
~f, on the other hand, we first Z-close B and C with respect to A and then abstract D with respect to its sup-
port, we get
B' = i N A stratA) : sigle = Z X : siga os i g # ( A )
]n the type of M k D ' something new has been added. The way that B and C support the definition of D prob-
ably depends on the fact that B and C share the same support A (think of B and C as rectangles and circles,
and A as points, for example). For M k D this sharing is directly expressed by the signature, but this is not the
case for M k D ' , so a special sharing constraint must be added to the signature.
Two styles of modular programming have been illustrated here. The first, which is favored in Pebble,
expresses dependencies by allowing structure names to appear in the signatures of other structures, and tends
to abstract directly and individually on each supporting structure. The other style is representative of modules
in ML. It involves h)rming Z-closures to capture dependencies and coalesce the support of structures into one
level. In fact, the ML module language goes so far as to require that all signatures be Z-closed, even the
argument and result signatures of functors. There are several other factors involved which indirectly support
this strict closure rule. In particular, ML's "generative" declarations of datatypes and exceptions, and the fact
that structures can contain state, make it necessary to maintain fairly rigid relations between structures. In
addition, Z-closed structures appear to bc more appropriate units for separate compilation and persistent
storage.
5, CencJusions
The main thrust of this work is that a ramified type system with general dependent type constructs is an
effective tool for the analysis and design of programming language type systems, particularly those oriented
toward programming m the large. We have explored some of the design choices that have been raised by
recently proposed languages such as Pebble. SOL, and Standard ML with modules. But many important ques-
tions remain to be answered. For instance, we need to have precise characterizations of the relative strengths
of predicative vs impredicative type systems, and reflexive vs irreflexive systems, it would be desirable to
have a representation independence result analogous to that of Mitchell ]Mit86] for the stratified system used
here. Finally, It appears that the basic polymorphic type system of ML IMi1781 is in fact a ramified system,
and that the system described in §2, rather than the second order lambda calculus, can be viewed as its most
natural generalization.
References
IBC851 J. L. Bates and R. L. Constable. Proofs" as P r o g r a m s , ACM Trans. on Programming Languages
and Systems, 7, 1~ January 1985, pp 113-136.
t BDD80J H. Boehm, A. Demers. and J. Donahue, An inJbrmal description o f Russell, Technical Report TR
80-430, Computer Science Dept.. Cornell Univ.- October 1980.
IBL841 R o M. Burstall and B. Lampsono A kernel language.fi)r abstract data types a n d modules, in Seman-
tics of Data Types, G. Kahn. D. B. MacQueem and G. Plotkin,. eds., LNCS, Vol 173, Springer-
Verlag, Berlin, 1984.
285
IBur84/ R. M. Burstall, Programming w#h modzdes as (s7:ed ./hnctiona/ pros, ramming, in{'] Conf. {)n 5th
Generation Compm:h~g Systems, Tokyo., Nov. t984.
ICargN] L. CardeIIi~ 77~e impredica:ive o,ped X-ealcu&s, u~pubtishcd mam~script, 1985.
ICF581 H. B, Curry and R. Feys~ Combinatory L o g k I, North.-Holland. 1958.
/CH85] T. Coquand and G. Huet, A ca~cubes q/consfructions, ~nR)rmatkm and Control, to appear.
ICM85] L. Cardelli and D. B. MacQucen, Persistence and type abstvwtion, Proceedings of ~he Appin
Workshop on Data Types and Persistence, Aug ~985, to appear.
Icwss! L. Cardelli and P. Wegncr, On unc@r,~'mnding (vpes, data abstraction, and po&moephism, Technical
Report No. CS-85-~4, Brown University, August t985.
!CZ841 R. L. Constable and D. R. Zlatin, The type theo~T (g PL/CK3, ACM Trans. on Programming
Languages and Systems, 6, 1, January 1984, pp. 94-1 t7.
IdeB80] N. G. de gru0n, A survey G['project AUTOMATH, in To H. B. Curry: Essays on CombinaIory
l,ogic, Lambda-Calculus and Formalism, Academic Press, 1980, pp, 579-607.
tDD85] J. Donahue and A. Dcmers, Data )~7)es are Va&es, ACM Traas. on Programming Languages and
Systems, 7, 3, July 1985, pp. 426--445.
tGir71 ] J.-Y. Girard, Une extension de t'intewremtion de G~}de[ ~) ['ano/vse, el son application d
['d/imination &'s coupures &ms :'ona@se el/a :h&:rie des (vpes, in Second Scandinavian Logic Sym-
posium, J. E. Fenstad, Ed., North-Holland, 197t, pp. 63-92.
I H 0o84 ] J. G. Hook, Understanding Russell--o .firs: attempG in Semantics of Data Types, G. Kahn, D. B.
MacQuecn, and G. Ptotkin., Eds., LNCS Vol 173, Springer-Verlag, 1984, pp. 69-85.
I How80/ W. Howard, The./brnTulas-as-types notion of constant:ion, in To H. B. Ct~rry: Essays on Combina-
tory Logic° Lambda-Calculus and Formalism, Academic Press, 1980, pp. 476-490. (written 1969)
IMac85] D. B° MacQtleen, ModulesJ})r Standard ML (Revised)° Polymorphism Newsletter, It, 2, Oct 1985.
IMcC791 N. J. McCracken, An inves'tigason oj" a programming langeeage with a po@morphic t)'pe structure,
Ph.D. Thesis, Computer and Information Science, Syracusc Univ., June 1979.
IM-L7~i P, Martin-L6f, A :heogv q/'typeso unpunished manuseripL Octtober 1971.
{M-L74t P. Martin-LSf, An intuitionistic :heo 0, :71"(ypes: predicative part, L,ogic Colloquium 73, H. Rose and
J. Shepherdson, Eds., North-Holland, 1974, pp. 73-118.
IM-L821 P. Martin-LSf, Constructive mathematics and computer progratnming, in LogiG Methede]ogy and
Phi~osephy of Scfience, VL North-Holland, Amsterdam, ]982, pp, I53-175.
lMR86] A. R. Meyer and M. B. Reinhold, 'Type' is not a type, t3th Annual ACM POPL Symposium, S~.
Petersburg, January 1986.
IM~I_~?81 R. Milner~ A theory (?/'type po@morphism in programming, JCSS, t7,3, Dec 1978, pp. 348-375.
IMit86] J. C. Mitchell, Representation independence and da:aabs#'action, t3th Annual ACM POPL Sympo-
sium, St. Petersburg, January 1986,
tMP85] J. C. Mitchell and G, D. Plotkin, Absmw: types have ~:ristenthd O,pes, 12th ACM Syrup. on Princi-
pics of Programming Languages, New Orleans, Jan. t985, pp. 37-51,
I Rey74j J, C, ReynoMs, Towards a theop3, 4" (vpe structure, in Co|]oq~htm sur ~a Pregraramaflon° Lecture
Notes in Compu~ter Science, Vol~ 19, Springer Verlag, Berlin, 1974, pp, 408--423,
ISco701 D. Scott, Constructive Va:idigy, in Symposinm on Automatic Demonstration, Lecture Notes in
Math., Vol 125, Springcr-Verlag, t970, pp. 237-275.
288
Higher-Order Modules and the Phase Distinction
342
for the sake of simplicity, ML’s concrete and ab- Cp as a partial function with finite domain Dam(@)
stract types (which could be modeled using existen- assigning kinds to const,ructor variables and types to
tial types [MPS8]), recursive types (which can be de- term variables.
scribed through a XML theory), and record types. We
also do not consider pattern matching, or computa-
tional aspecls such as side-effects and exceptions. A 2.3 Judgement Forms
promising approach toward integrating these features
There are two classes of judgements in AML, the GOT-
is described in [Mog89b].
malion judgements and the equality judgements. The
formation judgements are used to define the set of
2.1 Syntactic Preliminaries well-formed AML expressions. With the exception of
the kind expressions, there is one formation judge-
There are four basic syntactic classes in XML: ment for each syntactic category. (Every raw kind ex-
kinds,constructors,types and terms, The kinds in- pression is well-formed.) The equality judgements are
clude T, the collection of all monotypes, and are used to axiomatize equivalence of expressions. (There
closed under formation of products and function is no equality judgement for kinds; kind equivalence
spaces. The constructors, which include monotypes is just syntactic identity.) The equality judgements
such as in& and type constructors such as list, are are divided into two classes, the compile-time equa-
elements of kinds. The types of XML, whose elements tions and the run-time equations, reflecting the in-
are terms, include Cartesian products, function spaces tuitive phase distinction: kind a.nd type equivalence
,and polymorphic types. The terms of the calculus are compile-time, term equivalence is run-time. The
correspond to the basic expression forms of ML, but judgment forms of XML are summa.rized in Table 2.
are written in an explicitly-typed syntax, following The metavariable F ranges over formation judge-
[MH88]. It is important to note that our “types” ments, Cc ranges over eyua.lity jndgements, and ,7
correspond roughly to ML’s “type schemes,” the es- ranges over all forms of judgement. We sometimes
sential difference being that we require them to be write Q >> cr to sta.nd for an arbitrary judgement
closed with respect to quantification over all kinds when we wish t,o make t,he context part explicit.
(not just the kind of monotypes) and function spaces.
These additional closure conditions for type schemes
are needed to make the the category of modules for 2.4 Formation Rules
XML relatively Cartesian closed (i.e., closed under for-
mation of dependent products and sums). The syntas of XML IS specified by a set of inference
The organization of XML is a refinement of the rules for deriving form&ion judgements. These re-
type structure of Core-XML[MH88]. The kind T of semble rules in [MHSS, MogSSa] and are essentially
monotypes corresponds directly to the first universe standard. Due to space constraints, they are omit-
171of Core-XML. However, the second universe, Uz, ted from this conference pa.per. We write XML k 7
of Care-XML is separated into distinct collections of to indicate that the formation judgement F is deriv-
kinds and types. For technical reasons, the cumula- able using these rules. The formation rules may be
tivity of the Core-XML universes is replaced by the summarized as follows. The constructors and kinds
explicit “injection” of T into the collection of types, form a simply-typed X-ca.lculus (with product and
written using the keyword set. unit types) with ba.se kind T, and basic constructors
1, x,and-+. The collection of types is built from base
types 1 and set(r), where r is a constructor of kind T,
2.2 syntax. using the type constructors x a.nd 3, and quantifi-
cation over an arbitrary kind. The terms amount to
The syntax of AML raw expressions is given in Ta-
an explicitly-typed presentation of t,he ML core ian-
ble 1. The collection of term variables, ran.ged over by
guage, similar to t,ltat presented in [MHSS]. (The let
Z, and the collection of constructor variables, ranged
construct is omitted since it is definable here.)
over by V, are assumed to be disjoint. The metavari-
able r ranges over the collection of monotypes (con-
structors ‘of kind ‘?). Contexts consist of a sequence 2.5 Equality rules
of declarations of the form v:k and z:cr declaring the
kind or type, respectively, of a constructor or term The rules for deriving equational judgements also re-
variable. In addition to the context-free syntax, we semble rules in [MHSS, Mog89a] a.nd are essentia.lly
require that no variable be declared more than once standard. We write XML k t’ to indicate that an
in a context G so that we may unambiguously regard equation I is derivable in accordance with these rules.
343
k E kind :: = 1 1 T 1 ICI x liz 1 ICI - kz
u E constr ::= Vjll x 14 1 * 1 (Ul,U2) I %(U) I @J:k.u> I u1wt
u E type :: = set(u)1U] x (32151-52 1(Vv:k.u)
e E term :: = x 1 * 1 (el,ez) 1 K,(e) 1 (Ax:u.e) 1 el e2 1 (hv:k.e) I e[tlj
Q E context :: = 0 1 a’, v:k I ip, X:(T
5%conlezt (9 is a context
cp >> u : k u is a constructor of kind k
@ >> 5 type u is a type
@,>>e:u e is a term of type U
The X”‘L equational rules are formulated so as to en- Types The equivalence relation on types includes
sure that if an equational judgement is derivable, t,hen the following axioms expressing the interpretation of
it. is well-formed, meaning that the evident associated the basic ML type constructors
formation judgements are derivable. For the sake of
@ context
convenience we give a brief summary of the equational (1 T=)
rules of XA4L + > set(l) = 1 type
344
Since the constructors and kinds form a simply-
typed X-calculus, it is a routine matter to show
+‘>>e:al-+a2 that equality of well-formed constructors (and, conse-
(-+ 77) <p >> (Xx:ul.ex) = e : (~1-+u2 (x e Do+v quently, types) in XML is decidable. It is then easy to
show that type checking in XML is decidable. This is a
well-known property of the polymorphic la.mbda cal-
(9 >> u : k Q,v:k >> e : u
culus F,, (c.f. [Gir’ll, Gir72, Rey74, BMM89]), which
(’ ‘) Cp> (hv:k.e)[n] = [u/v]e : [u/v]u may be seen as an impredicative extension of the XhgL
calculus.
<p >> e : (Vv:k.u)
(’ n) @ > (hv:k.e[v]) = e : (Vv:k.u) (’ ’ Dam(‘)) Lemma 2.2 There is a straightforward one-pass al-
gorithm which decides, for an arbitrary well-formed
theory 7 and formation judgement 3, whether or not
2.6 Theories -... PL[7] I- 3.
The XML calculus is defined with respect to an ar-
bitrary theory 7 = (a7,d7) consisting of a well- The main technical accomplishment of this paper
formed context cPr and a set AT of run-time equa- is to present a full calculus encompassing the module
tional axioms of the form el = e2 : u with Qc >> ei : u expressions of ML which has a compile-time decidable
derivable for i = 1,2. A theory corresponds to type checking problem.
the programming language notion of standard pre-
lude, and might contain declara.tions such as inl : T
and fiz : Vt:T. set((t -+ t) + t), and a.xioms such
3 Modules Calculus
as expressing the fixed-point property of f;z. For
7 = (G7 ,dl), we write X ML[7] I- J to indicate that 3.1 Overview
the judgement J is derivable in JML, taking the vari- In the XML account of Standard ML modules
ables declared in a’ as basic constructors and terms, [Ma&G, MHS8] ( see also [NPS88, C+SG, Mar841 for
and taking the equa.tions in Cc7 as non-logical axioms. related ideas), a structure is an element of a strol~g
We write X”!‘L[7] Ect J t#o indicate that the judge- snm type of the form Cx:A.B. For example, a struc-
ment ,7 is deriva.ble from theory ‘7 using only the ture with one type and one value component is re-
compile-time equational rules‘arid equational axioms garded as a pair [T, e] of type S = 2:T.u. Although
of 7. Standard ML structures bind names to their compo-
nents, component selection in XML is simplified us-
2.7 Properties of XML ing the projections Fst and Snd. Functors are treated
as elements of dependent function types of the form
We will describe the pha.se distinct/on in XML by sepa- IIz:A.B. For example, a functor mapping structures
rating contexts into sets of “compile-time” and “run- with signature S to structures with the same signa-
time” declarations. If @ is a J4A4Lcontext, we let (PC ture would have type IIs:(Et:T.a).(Ct:T.u). In XML,
be the context obtained by omitting all term vari- functors are therefore written as X-terms mapping
able declarations from Q and let Qr .be the context structures to structures. As discussed in the intro-
obtainecl by eliminating all constructor variable dec- duction, the standard use of dependent types con-
lara.tions from (5,. The following lemma expresses the flicts with compile-time type checking since a type
compile-t,ime t,ype checking property of AntL: expression (which we expect to evalua,te a compile
time) may depend on an arbitrary (possibly run time)
Lemma 2.1 Let 7 be any theory. Tht?follo,wing im-
expression. For example, if F is a functor variable
plications hold:
of- signature S -+ S (where S is as above), then
If x97] l- then XML[@‘I,O] tct Fst(F [int, 31) is a.n irreducible type expression in-
volving a run-time sub-expression.
Cpcontext V, @ context
, ~- , In this section we develop a calculus Xgbd of higher-
Q >> u : k order modules with a phase distinction based on the
categorical analysis of [Mog89a]. We begin with a
I -- *. simpler “structures-only” calculus that is primarily
a >> 61 = (72 tYP( ? I Oc > u1 = u2 type 4 a technical device used in the proofs. The full cal-
@ >> e : u @,@‘>>e:i -- culus of higher-order modules has a standard syntax
+ >> el = e2 : u Qc,Qr > ei : u for dependent strong sums and functions, resembling
345
XML, but a non-sta.ndard equational theory inspired context, define @* to be the AML context obtained by
by the categorical interpretation of program mod- replacing all structure variable de&rations s : [v:k, 01
ules [Mog89a]. The calculus also employs a single by the pair of declarations sc : k and sr : [sc/v]u.
non-standard typing rule for structures that we con-
jecture is not needed for decidable typing, but which Lemma 3.1 Let 7 be a well-formed XML theory.
allows a more generous (and simple) type-checking al-
Xff:[?-] l- fD > [v:k,a] sig i;tT XML[7] I-
gorithm without invalidating the categorical seman-
W,v:k >> u type, and similarly for signature
tics. Although inspired by a ca.tegorical construc-
equality.
tion, we prove our main results directly using only
standard techniques of lambda calculus. The non- Xft[‘ir] l- (P > [u, e] : [v:k,u] i#XML[7] I- @* >>
standard aspects of XEtd calculus are justified by u : k and AML[7J I- a* >> e : [u/v]a, and simi-
showing tha-t this calculus is a definitional extension larly for structure equality.
of the “structures-only” ca.lculus, which itself bears
a straightforward relationship to the core calculus. AZk[I] I- a > a ig XML[7”j I- 0” >> a, for
This definitional extension result is used to prove that any judgement (Y other than of the four forms
Xtid type equivalence is decidable and that the lan- considered in items 1. and 2. above.
guage therefore has a pra.ctical type checking algo-
It is an immediate consequence of this lemma and
rithm.
the decidability of XML type equivalence that X2:
type equivalence is decidable. This will be impor-
3.2 The Calculus of Structures tant for the decidability of type checking in the full
In this section, we extend XML with structures and modules calculus.
signatures. The resulting calculus, Xzt, has a
straightforward phase distinction and forms the ba- 3.3 The Calculus of Modules
sis for the full calculus of modules. We assume we
The relative Cartesian closure of Moggi’s category of
have some set of structure variables that are disjoint
modules implies that higher-order functors are defin-
from the constructor and term va.riables, and use s, s’,
able in X2:. This may seem surprising, since X$t
Sl, . . as metavariables for structure variables. The
is a rather minimal ca.lculus of structures, with noth-
a.dditional synt,a.x of Xz,” is given in Table 3. Note
ing syntactically resembling lambda abstraction over
that contexts are extended to include declarations of
structures. The key idea in understanding this phe-
structure identifiers, but structures are required to
nomenon is to regard all modules as “mixed-phase”
be in “split” form [u, e]. (A variable s is not a struc-
entities, consisting of a compile-time part and a run-
ture and t,here is no need for operations to select the
time part. For basic structures of the form [u, e], the
components of a. structure.)
partitioning is clear: U, a constructor, may be evalu-
The judgement forms of XdWLare extended with two
ated at compile-time, while e, a term, is left until run-
additional formation judgements, and two additional
time . For more complex module expressions such as
equality judgements, summarized in Table 4. The
functors, the separa.tion requires further explanation.
rules for deriving judgements in Afie are obtained by
Consider the signature S = [v:T, set(v)], and let
extending the rules of XhfL (taking contexts now in
F:S + S be a functor. Since this functor lies within
the extended sense) with the obvious rules for struc-
the first-order fragment of XML, we may rely on Stan-
tures in “split” form, in particular the following two
dard ML for intuition. The functor F takes a struc-
rules governing the use of structure variables:
ture of signature S as argument, and returns a struc-
Q context ture, also of signature S. On the face of it, F might
(13El) + > $C k (@(s)= b:W) compute the type .component of the result as a func-
tion of both the type and term component of the ar-
gument. However; no such computation is possible in
0 condext
([I Ed a > sr : [SF/t+
@(s) = [v:k,u]) ML since there are no primitives for building types
from terms. Thus we may regard F as consisting
The notion of t.heory and derivability with respect to of two parts, the compile-time part, which computes
a theory are the same as in X”‘. the type component of the result as a function of the
The ca.lculus of structures may be understood in type component of the argument, and the run-time
terms of a translation into the core calculus, which part, which computes the term component of the re-
amounts to showing that Azk may be interpreted into sult as a function of both the type and term com-
the category of modules of [MogSSa]. For <p a A$! ponent of the argument. (Since we are working in
346
k E kind :: = ...
21 E conslr :: = . . . 1 s‘
u E 2ype :: = ...
E iem ::= ___ 1 sr
i Esig :: = [v:k,o]
M E mod :: = [u,e]
Q E conle22 :: = . . . ) Q, s:S
a typed framework with explicit polymorphism, the Although X$f,” already “ha.9 higher-order mod-
term component may contain type information that ules, the syntax for representing them forces the
depends on the compile-time functor argument,) For user to explicitly decompose every functor into dis-
a more concrete example, suppose I is the identit,y tinct compile-time and run-time parts, even for the
functor Xs:S.s. Separated into compile time and run first-order functors of Standard ML. This is syn-
time parts, I becomes the structure tactically cumbersome. In keeping with the syntas
of Standard ML, and practical programming con-
[AsC:T.sC, AsC:T.~sr:set(sC).sr]
siderations; we will consider a more natural nota-
of signature tion based, on [Ma&G, MH88]. However, our calcu-
lus will nonetheless respect the phase distinction in-
[f:T--+T, Vs’;T. set(sc+fsC)].
herent in representing functors as structures. This
In other words, I may be represented by the structure is achieved by employing a non-standard equational
consisting of the identity constructor on types, and theory t1~a.t; when used during type checking, makes
the polymorphic identity on terms. (A technical side explicit the underlying- “split” interpretation of mod-
comment is that the structure corresponding to I has ule expressions, and hence eliminates apparent phase
more than one signature, as we shall see.) viol&ions. For example, if A is a functor of signa-
With functors represented by structures, functor ture [t:T> set(ini)]-+[t:T, 11, then the type expression
application becomes a form of “structure a.pplica.- u = Fsl(A [in2,3]) is equal, using the non-standard
tion.” In keeping with the above discussion, structure rules, to Fs$(A) int, which is free of run-time subex-
application is computed by applying the first compo- pressions. As a result, if e is a term of type (T, then
nent of the functor to the first component of the ar- t.lie application
gument, and the second component of the functor to
both components of the argument. More precisely, if
[u, e] is a structure of signature [f:k’ - k,Vv’:k’.r’ -
if v’I44 t and [u’, e’] is a structure of signa.ture is type-correct, whereas in the absence of the non-
[v’:k’, 0’1, then the application [u, e] [u’, e’] is defined standard equations this would not be so (assuming
to be the structure [uu’, cue’] of signature [v:k, 01. As 3 # 5 : inl).
we shall see below, the appropriate typing conditions The raw syntax of Xz& is an extension of that of
are satisfied whenever the first. structure is the im- XklL; the extensions are given in Table 5. The judge-
age of a functor under the translation sketched in the ment forms are the same as for AZ,&, and are asiom-
next paragraph. Moreover, both type correctness and a.tized by standard structure and functor rules, as in
equality are preserved under the translation. [MHS8]. The Xgid calculus is parametric in a the-
347
k E kind :: = , . ,
U E constr :: = . / F&(M)
u E type :: =
e E ierm :: = . . . 1 Sad(M)
S Esig :: = [v:k,cr] ] 1 ] (Cs:S&) ] (IIs:Sl.Sz)
M E mod :: = s I [u, 4 I * I WI, Mz) I ri(M) I (Xs:S.M) I Ml MZ
Cp E contezt :: = . . ( Q,s:S
ory, defined as in XML (i.e., we do not admit module We begin by giving a tra,nslation _b from raw XKfd
constants, or axioms governing module expressions.) expressions into raw A$& expressions. This transla-
The formation rules of A$$d are essentially the tion is defined by induction on the structure of AEfd
standard rules for dependent strong sums and depen- expressions. Apart from the cases given in Table 7,
dent function types. The equational rules include t,he the translation is defined to commute with the expres-
expected rules for dependent types, together with t,he sion constructors. For the basis we associate with ev-
non-standard rules summarized in Table 6. ery module variable s a constructor variable s‘ and a
Beside the non-standard equational rules (and “or- term variable sr in X 2,“. For convenience in defining
thogonal” to them), there is a.)so a non-standard typ- the tra.nslation we fix a constructor variable v tha.t
ing rules for structures: may occur in expressions of X2:, but not in expres-
sions of X$bd. Signatures of Aztd will be translated
Q >> M : [v:k, o] to X2: signatures of the form [v:k,a]. The transla-
a, v:k > u’ type tion is extended “declaration-wise” to contexts: ab
is obta.ined from (P by replacing declarations of the
@ > Snd M : [Fst M/v]o’
form X:CT by x:gb, a.nd decla.rations of the form s:S
@ > M : [v:k, CT’]
by s:Sb Note that the translation leaves XML expres-
sions fixed; consequently, the translation need not be
The non-standard typing rule is consistent with the
extended to theories.
interpretation in the category of modules [MogSSa],
but (we conjecture that) without it the main propcr- Lemma 3.2 (Substitutivity) The translation -b
ties of X^,aL,, namely the compile-time type checking commutes with substitution.
theorem and the decidability of typing judgements, 1~1 particvlnr if Mb = Lute], then ([M/S]-)b =
would still hold. The reason for ha.ving such rule [u, e/SC, ~3’](-~).
is mainly pra.gmatic: to have a. simple type check-
Theorem 3.3 (! interpretation) Let 7 be a well-
ing algorithm (see Definition 3.9). Moreover, this
additional typing rule captures a. particularly uatu-
formed theory, and let 3 be a $ftd judgement. If
ML[7] t gb.
A$~~[71 t ,7, then Astr
ral property of C-types (once uniqueness of type has
been a.ba.ndoned), namely that a structure M should Conversely, AZ;4 is essentially a sub-calculus of
be identified with its expansion [Fst M, Snd A/r]. A JKtd, differing only in the treatment of structure vari-
typical example of typing judgement derivable by ables. To make this precise, define the embedding -e
the non-standard typing rule is s:[v:t,a] >> s : of Xz,c raw expressions into A,MoLhraw expressions by’
[v:k, [Fst s/z+]. replacing all occurrences of sc by Fit(s), and all oc-
currences of sr by Snd(s).
3.4 Translation of A:$ into At:: Theorem 3.4 (-e interpretation) Let 7 be a
,well-formed theory, and let J’ be a X2,” judgement.
The non-standard equationa. theory used in the def- If A$,?[71 t J, then Azfd[7] t Je.
inition of ,!zkd is justified by proving that ?I:&, is a
definitional extension of X2:, in a sense t,o be made Theorem 3.5 (Definitional extension) Let 7 be
precise below. This definitional extension result will a well-formed theory,
then play an important role in establishing the decid- l For any formation judgement 3 of A$,“, if
ability and compile-time type checking property of A$![71 t 3, then (3e)b is syntactically equal
AML
mod’ to 3, modulo the names of bound vam’ables.
348
Non-standard equational rules for signatures
53 conte2d
(1 >I 0 > 1 = [v:l, l] sig
Q, context
(1 I >> cp 29 * = [*, *] [v:l, l]
a., vl;kl >> u1 type a’, vl:kl, v2:kz > U-J type
@ > u : kl x k2 Cp>> e : [~F~u/~I]vI x [7r1’11, K~U/VI,I~U~
(C E2 >> Qp> w[u, e] = [wu, me] : [wkz, [7r14+721
349
expression translalion induction hypotheses
l If X$$7l t- @ >> M : S, then the following Theorem 3.8 (Compile-time type checking)
equality judgements are derivable in AEid[7]: Given any well-formed theory ‘T = (a7, A7), the fol-
lowing implications hold:
- +p, >> @(s) = (a(~)‘)~ sig, for all s E
Dam(@), where + 3 a,, s:@(s), P (and If AE$[T] t- then $f$[@,0] I-,t
similarly fort and v in Dam(@))
Cpcontext Cpcontext
- 4 >> S = (Sb)e sig Cp> u type @ > u type
- Q > M = (Mb)” :S @ >> S sig Cp>> S sig
(an.d similarly for the other formation judge- a,>>tb:k @>>u:k
ments.) @>>e:a 0 >> e : u
@>M:S @>M:S
Corollary 3.6 (Conservative extension) Let 7
be an arbitrary well-formed theory. For any A$!
judgement J, Xr;l,J771 I- Je @ qf,Lp-1 I- 3.
350
which will always have the form [v:lc, u] where v does Theorem 3.11 (Completeness) Let 7 be cll~y
not occur free in CT. As a notational convenience, well-formed theory. The following implicutions hold:
we will usually omit explicit designation of the non-
occurring variable, and write such signatures in the then TqIJ I- & X$,“[7] tct
form [:rC,cr]. The algorithm defined below takes as
input a raw context G and, for instance, a raw mod- Q >> u type @ >> u - Qb >> ub type
ule expression M of Xgtd and produces one of the @ >> S sig 0 >> S - Qb >> Sb sig
following results: +>>u:k a, >> u - ab >> ub : k
Q?>>e:a b b
0 >> u - @ >> u type
l The context CDband Mb EE[~,e]:[:k,a], meaning Cp >> e - Qb > eb : CT’
that (9 B- M : [%,a] is derivable in X$td. ab >> ub = u’ type
+>M:S Cp>> S - Qb >> [v:k, u] sig
l An error, meaning that @ context is not derivable Q >> M --H @ >> [u, e] : [:k, u’]
in JI~:~ or that 0 >> M : S is not derivable in ipb > IS’ = {u/vJa type
X$td for any S. -l
Definition 3.9 (Type-checking algorithm) The _ If v%fP-l t‘ct then TC$T] t & XzF[?-] kct
type-checking algorithm TC is given by a determin- Cp > ul = ~2 type @ >> ui 4-k Q,6 >> ai type
istic set of inference rules to derive judgements of the Qb > uQ1= 0; type
following form: 0 > S1 = S2 sig 0 >> S; - cpb > Si sig
eb >> S\ = Si sig
input output (9 >> u1 = ug : k <p>> ui - Qb >> ui : k
91 --+) Qb context ab > ~“1 = 11; : k
Q >> el = e2 : u Cp>> u -.Gb > ub type
@ >> ei 4-t ab >> ei : ui
I Qb >> ub = ui type
i@b>> eb, = ei : Ub
Q >> e -++ Qb > eb : u
@>>M1=M2:SI @B-S +
a! >> A4 ---H ab >> Mb : [:k,u]
I.-
translation, but also a kind/type/signature. A sample Qb >> [pi, ei] : [:k, ui]
of the inference rules that constitute the algorithm is Gb >> u1 = u2 : k
given in Table 8. Gb 23 u = [ui/v]ui type
Qb > el s e2 : u
TC is parametric in a theory 7, and we write Theorem 3.12 (Decidahility) It
Tq7] for the instance of the algorithm in which is decidable whether a raw type-checking judgemenf
the constants declared in cP7 are regarded as vari- lhs --H rhs is derivable using the inference rules in
ables. More precisely, Q! - Qb context in Tq’T] iff Definition 3.9.
#I, + --H <p7, Ob context in TC.
Corollary 3.13 Given any zoell-formed theory 7,
th.e derivability of formation judgements in Xttd[7]
Theorem 3.10 (Soundness) Let 7 be a well-
is decidable and does not depend on pun-time axioms
formed theo y. ‘The following implications hold:
nor the axiom.s in 7.
351
cp> s - cpb> Sb sig
(Q, s:S)
Cp,s:S --+ Qb,s:Sb context (s 4 DomW
0 --+ Qb context
h’T) cp >> s --n ab >> [sc, sr] : [:k, [s”/v]cT] (@b(s) = “:k’al)
+ context - 5Bbcontext
(1 1) a> * - ab > [*,*I: [:lJl
(22 Ei)
Q, >> ri M - Bb >> [riu, xie] : [:ki, gi]
352
types. To address this pragmatic issue, we have devel- l’arithmetique d’ordre superieur. These
oped an alternate form of the XML calculus in which D’Eta.t, Universit,e Paris VII, 1972.
there is a clear compile-time/run-time distinction.
Essentially, our technique is to add equational ax- [HMM86] R. Harper, D.B. MacQueen, and R. Mil-
ioms that allow us to decompose structures and func- ner. Standard ml. Technical Report
tors into separate compile-time and run-time compo- ECS-LFCS-86-2, Lab. for Foundations
nents. While the phase distinction in XML reduces of Computer Science, University of Edin-
to the syntactic difference between types and their burgh, March 1986.
elements, the general technique seems applica.ble to
other forms of phase distinction. [I-I h!lT87a] R. Harper, R. Milner, and M. Tofte. The
The basis for our development is the “category semantics of standard ML. Technical Re-
of modules” over an indexed category, which is an port ECS-LFCS-87-36, Lab. for Founda-
instance of the Grothedieck construction. General tions of Computer Science, University of
properties of the category of modules are explained Edinburgh, August 1987.
in the companion paper [Mog89a]. In the specific case
[HMT87b] R. Harper, R. Milner, and M Tofte. A
of XML, our non-standard equational axioms lead to
type discipline for program modules. In
a calculus which bears a natural relationship to the
TAPSOFT ‘87, volume 250 of LNCS.
category of modules. In future work, it would be
Springer-Verlag, March 1987.
interesting to explore the exact connection between
our calculus and the categorical construction, and to [Ma.c86] D.B. MacQueen. Using dependent types
develop phase distinctions.for languages whose type to express modular structure. In Proc. 1%
expressions may contain “run-time” suhexpressions th ACM Symp. on Principles of Program-
in more complicated ways. ming Languages, pages 277-286, 1986.
353
IRey741 J .C. Reynolds. Towards a theory of
type structure. In Paris Colloq. on
Programming, pages 405-425. Springer-
Verlag LNCS 19, 1974.
354
The mechanical evaluation of expressions
By P. J. Landin
This paper is a contribution to the "theory" of the activity of using computers. It shows how
some forms of expression used in current programming languages can be modelled in Church's
X-notation, and then describes a way of "interpreting" such expressions. This suggests a
method, of analyzing the things computer users write, that applies to many different problem
orientations and to different phases of the activity of using a computer. Also a technique is
introduced by which the various composite information structures involved can be formally
characterized in their essentials, without commitment to specific written or other representations.
References
1. CHURCH, A. (1941). The Calculi of Lambda-Conversion, Princeton, Princeton University Press.
2. CURRY, H. B., and FEYS, R. (1958). Combinatory Logic, Vol. 1, Amsterdam, North Holland Publishing Co.
3. DUKSTRA, E. W. (1962). "An ALGOL60 Translator for the XI," Automatic Programming Bulletin, No. 13.
4. DtjKSTRA, E. W. (1962). "Substitution Processes," Preliminary Publication, Amsterdam, Mathematisch Centrum.
5. GILMORE, P. C. (1963). "An Abstract Computer with a LISP-like Machine Language without a Label Operator," in
Computer Programming and Formal Systems, ed. Braffort, P., and Hirschberg, D., Amsterdam, North Holland Publishing
Co.
6. MCCARTHY, J. (1960). "Recursive Functions of Symbolic Expressions and their Computation by Machine, Part 1,"
Comm. A.C.M., Vol. 3, No. 4, pp. 184-195.
7. MCCARTHY, J. et al. (1962). LISP 1.5, Programmer's Manual, Cambridge, M.I.T.
8. QUINE, W. V. (1960). Word and Object, New York, Technology Press and Wiley.
9. ROSENBLOOM, P. (1950). The Elements of Mathematical Logic, New York, Dover.
320
The Next 700 Programming Languages
P. J. Landin
Univac Division of Sperry Rand Corp., New York, New York
A family of unimplemented computing languages is de- differences in the set of things provided by the library or
scribed that is intended to span differences of application area operating system. Perhaps had ALGOL 60 been launched
by a unified framework. This framework dictates the rules as a family instead of proclaimed as a language, it would
ckout the uses of user-coined names, and the conventions have fielded some of the less relevant criticisms of its
about characterizing functional relationships. Within 'lhis frame- deficiencies.
work 'lhe design of a specific language splits into two inde- At first sight the facilities provided in IswI~,~ will appear
pendent parts. One is 'lhe choice of written appearances of comparatively meager. This appearance will be especially
programs (or more generally, their physical representation). misleading to someone who has not appreciated how much
The o:her is the choice of the abstract entities (such as numbers, of current manuals are devoted to the explanation of
character-strings, lists of them, functional relations among common (i.e., problem-orientation independent) logical
them) that can be referred to in the language. structure rather than problem-oriented specialties. For
The system is biased towards "expressions" rather than example, in almost every language a user can coin names,
"statements." It includes a nonprocedural (purely functional) obeying certain rules about the contexts in which the
subsystem fhat aims to expand the class of users' needs that name is used and their relation to the textual segments
can be met by a single print-instruction, without sacrificing the that introduce, define, declare, or otherwise constrain its
important properties that make conventional right-hand-side use. These rules vary considerably from one language to
expressions easy to construct and understand. another, and frequently even within a single language
there may be different conventions for different classes of
1. Introduction names, with near-analogies that come irritatingly close to
Most programming languages are partly a way of being exact. (Note that restrictions on what names can
expressing things in terms of other things and partly a be coined also vary, but these are trivial differences. When
basic set of given things. The Isw~M (If you See What I they have any logical significance it is likely to be perni-
Mean) system is a byproduct of an attempt to disentangle cious, by leading to puns such as ALaOL'S integer labels.)
these two aspects in some current languages. So rules about user-coined names is an area in which
This attempt has led the author to think that many we might expect to see the history of computer applica-
linguistic idiosyneracies are concerned with the former tions give ground to their logic. Another such area is in
rather than the latter, whereas aptitude for a particular specifying functional relations. In fact these two areas are
class of tasks is essentially determined by the latter rather closely related since any use of a user-coined name im-
than the former. The conclusion follows that m a n y plicitly involves a functional relation; e.g., compare
language characteristics are irrelevant to the alleged x(x-ka) f(b+2c)
problem orientation. w h e r e x = b -4- 2c w h e r e f(x) = x(x+a)
IswI~ is an attempt at a general purpose system for ISW~M is thus part. programming language and part pro-
describing things in terms of other things, that can be gram for research. A possible first step in the research
problem-oriented by appropriate choice of "primitives." program is 1700 doctoral theses called " A Correspondence
So it is not a language so much as a family of languages, between x and Church's X-notation. ''~
of which each member is the result of choosing a set of
2. The where-Notation
primitives. The possibilities concerning this set and what
is needed to specify such a set are discussed below. I n ordinary mathematical communication, these uses
Isw~M is not alone in being a family, even after mere of 'where' require no explanation. Nor do the following:
syntactic variations have been discounted (see Section 4). f(b-l-2c) ---I-f(2b--c)
In practice, this is true of most languages that achieve w h e r e f(x) = x(x-t-a)
more than one implementation, and if the dialects are well f(bA--2c) -- f ( 2 b - c )
w h e r e f(x) = x ( x + a )
disciplined, they might with luck be characterized as
and b = u/(u+l)
Presented at an ACM Programming Languages and Pragmatics a n d c = v/(v-t-1)
Conference, San Dimes, California, August 1965. g ( f w h e r e f ( x ) = ax 2 -]- bx -I- c,
1Throe is no more use or mentiol~ of Xin this paper--eognoseenti u/(u-4-1),
will nevertheless sense an undercurrent. A not inappropriate title v/(v+l))
would have been "Church without lambda," w h e r e g ( f , p, q) = f ( p - k 2 q , 2 p - - q )
The texts of abstract ISWlM are composite information A program-point definition introduces a deviant kind
structures called amessage's. The following structure of function. Applying such a function precipitates pre-
definition defines~ the class amessage in terms of a class mature termination of the where-expression containing
called identifier. It also defines several functions for it, and causes its result to be delivered as the value of the
manipulating amessage's. These comprise the predicates entire where-expression.
Program-points are Iswli'S, nearest thing to jumping.
2 W r i t i n g a s t r u c t u r e definition i~volves coining n a m e s for the
v a r i o u s a l t e r n a t i v e f o r m a t s of amessage's and t h e i r c o m p o n e n t s .
Assignment is covered as a particular case of an operator.
T h e only o b s c u r e coinage is " b e e t , " w h i c h a b b r e v i a t e s " b e t a - For both of these the precise specification is in terms of the
r e d e x , " i.e., " a n e x p r e s s i o n a m e n a b l e to rule (fl)"; see Section 7'. underlying abstract machine (see [2]).
4.4 Scanning a Set 4.6 Multiple Exits: Remove the Least Member
Problem: Extend the solution to 4.3 by providing a fast Exercise: Extend the above solution to respond to a
method for scanning all members of the set without command to yield the least member of the set and to
changing the value of the set. The user program will remove it from the set. The user program will invoke the
contain a repetitive command of the form: facility by a pair of commands:
0 0 0
6. M i s c e l l a n e o u s
~111 ~12x A13x
This section contains further examples of the use of
communicating sequential processes for the solution of
some less familiar problems; a parallel version o f the A11x*Azly A12x*A22y A13x*A23y
sieve o f Eratosthenes, and the design of an iterative
array. The proposed solutions are even more speculative
than those of the previous sections, and in the second A11x*A21y,A~ AlzX.A22Y.A3~AoX.A23y.A~z
example, even the question of termination is ignored.
p,rap:integer;
SIEVE(i- l)?p; The WEST and SOUTH borders are processes of the user
print!p; program; the remaining processes are:
rap .---p; c o m m e n t rap is a multiple o f p;
,[re:integer; S I E V E ( / - l)?m ---* N O R T H = .[true --* M(Id)!0 ]
*[m > mp ~ mp .--'- m p + p]; E A S T = .Ix:real; M(i,3)?x---> skip]
[m = rap --* skip C E N T E R = .[x:real; M(id - l)?x --*
nra < rap --* SIEVE(i + l)!ra M ( i , j + l)!x; sum:real;
] ] M ( i - l,j)?sum; M(i + l d ) ! ( A ( i , j ) * x + sum)
HSIEVE(0)::print!2; n:integer; n .--- 3; ]
* I n < 10000--* SIEVE(I)!n; n .--- n + 2]
IISIEVE(101)::*[n:integer;SIEVE(100)?n --~ print!n]
Hprint::,[(i:0.. 101) n:integer; SIEVE(0?n --> ...]
1 7. D i s c u s s i o n
Note: (1) This beautiful solution was contributed by
David Giles. (2) It is algorithmically similar to the A design for a programming language must neces-
program developed in [7, pp. 27-32]. sarily involve a number o f decisions which seem to be
674 Communications August 1978
of Volume 2 I
the A C M Number 8
fairly arbitrary. The discussion of this section is intended printed text which should describe the execution of the
to explain some of the underlying motivation and to program, independent of which parts were drawn from
mention some unresolved questions. a library.
Since I did not intend to design a complete language,
7.1 Notations I have ignored the problem of libraries in order to
I have chosen single-character notations (e.g. !,?) to concentrate on the essential semantic concepts of the
express the primitive concepts, rather than the more program which is actually executed.
traditional boldface or underlined English words. As a
result, the examples have an APL-like brevity, which 7.3 Port Names
some readers fred distasteful. My excuse is that (in An alternative to explicit naming of source and des-
contrast to APL) there are only a very few primitive tination would be to name a port through which com-
concepts and that it is standard practice of mathematics munication is to take place. The port names would be
(and also good coding practice) to denote common prim- local to the processes, and the manner in which pairs of
itive concepts by brief notations (e.g. +,x). When read ports are to be connected by channels could be declared
aloud, these are replaced by words (e.g. plus, times). in the head of a parallel command.
Some readers have suggested the use of assignment This is an attractive alternative which could be de-
notation for input and output: signed to introduce a useful degree of syntactically check-
able redundancy. But it is semantically equivalent to the
<target variable> := <source> present proposal, provided that each port is connected to
<destination> .---<expression>
exactly one other port in another process. In this case
I fend this suggestion misleading: it is better to regard each channel can be identified with a tag, together with
input and output as distinct primitives, justifying distinct the name of the process at the other end. Since I wish to
notations. concentrate on semantics, I preferred in this paper to use
I have used the same pair of brackets ([...]) to bracket the simplest and most direct notation, and to avoid
all program structures, instead of the more familiar raising questions about the possibility of connecting more
variety of brackets (if..fi, begin..end, case...esac, etc.). In than two ports by a single channel.
this I follow normal mathematical practice, but I must
also confess to a distaste for the pronunciation of words 7.4 Automatic Buffering
like fi, od, or esac. As an alternative to synchronization of input and
I am dissatisfied with the fact that my notation gives output, it is often proposed that an outputting process
the same syntax for a structured expression and a sub- should be allowed to proceed even when the inputting
scripted variable. Perhaps tags should be distinguished process is not yet ready to accept the output. An imple-
from other identifiers by a special symbol (say #). mentation would be expected automatically to interpose
I was tempted to introduce an abbreviation for com- a chain of buffers to hold output messages that have not
bined declaration and input, e.g. X?(n:integer) for yet been input.
n:integer; X?n. I have deliberately rejected this alternative, for two
reasons: (1) It is less realistic to implement in multiple
7.2 Expficit Naming disjoint processors, and (2) when buffering is required
My design insists that every input or output com- on a particular channel, it can readily be specified using
mand must name its source or destination explicitly. This the given primitives. Of course, it could be argued
makes it inconvenient to write a library of processes equally well that synchronization can be specified when
which can be included in subsequent programs, inde- required by using a pair of buffered input and output
pendent of the process names used in that program. A commands.
partial solution to this problem is to allow one process
(the main process) of a parallel command to have an 7.5 Unbounded Process Activation
empty label, and to allow the other processes in the The notation for an array of processes permits the
command to use the empty process name as source or same program text (like an Algol recursive procedure) to
destination of input or output. have many simultaneous "activations"; however, the
For construction of large programs, some more gen- exact number must be specified in advance. In a conven-
eral technique will also be necessary. This should at least tional single-processor implementation, this can lead to
permit substitution of program text for names defined inconvenience and wastefulness, similar to the fixed-
elsewhere--a technique which has been used informally length array of Fortran. It would therefore be attractive
throughout this paper. The Cobol coPY verb also permits to allow a process array with no a priori bound on the
a substitution for formal parameters within the copied number of elements; and to specify that the exact number
text. But whatever facility is introduced, I would rec- of elements required for a particular execution of the
ommend the following principle: Every program, after program should be determined dynamically, like the
assembly with its library routines, should be printable as maximum depth of recursion of an Algol procedure or
a text expressed wholly in the language, and it is this the number of iterations of a repetitive command.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=ams.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.
American Mathematical Society is collaborating with JSTOR to digitize, preserve and extend access to
Transactions of the American Mathematical Society.
http://www.jstor.org
A Structural Approach to Operational
Semantics
Gordon D. Plotkin
Laboratory for Foundations of Computer Science, School of Informatics,
University of Edinburgh, King’s Buildings, Edinburgh EH9 3JZ, Scotland
Contents
1.1 Introduction 3
1.5 Exercises 18
2 Bibliography 23
3.3 L-commands 34
4 Bibliography 52
5.1 Introduction 54
5.5 Exercises 79
5.6 Remarks 85
6 Bibliography 86
2
1 Transition Systems and Interpreting Automata
1.1 Introduction
It is the purpose of these notes to develop a simple and direct method for specifying the seman-
tics of programming languages. Very little is required in the way of mathematical background;
all that will be involved is “symbol-pushing” of one kind or another of the sort which will al-
ready be familiar to readers with experience of either the non-numerical aspects of programming
languages or else formal deductive systems of the kind employed in mathematical logic.
Apart from a simple kind of mathematics the method is intended to produce concise com-
prehensible semantic definitions. Indeed the method is even intended as a direct formalisation
of (many aspects of) the usual informal natural language descriptions. I should really confess
here that while I have some experience what has been expressed above is rather a pious hope
than a statement of fact. I would therefore be most grateful to readers for their comments and
particularly their criticisms.
I will follow the approach to programming languages taken by such authors as Gordon [Gor] and
Tennent [Ten] considering the main syntactic classes – expressions, commands and declarations
– and the various features found in each. The linguistic approach is that developed by the Scott-
Strachey school (together with Landin and McCarthy and others) but within an operational
rather than a denotational framework. These notes should be considered as an attempt at
showing the feasibility of such an approach. Apart from various inadequacies of the treatment
as presented many topics of importance are omitted. These include data structures and data
types; various forms of control structure from jumps to exceptions and coroutines; concurrency
including semaphores, monitors and communicating process.
Many thanks are due to the Department of Computer Science at Aarhus University at whose
invitation I was enabled to spend a very pleasant six months developing this material. These
notes partially cover a series of lectures given at the department. I would like also to thank the
staff and students whose advice and criticism had a strong influence and also Jette Milwertz
whose typing skills made the work look better than it should.
The announced “symbol-pushing” nature of our method suggests what is the truth; it is an
operational method of specifying semantics based on syntactic transformations of programs
and simple operations on discrete data. The idea is that in general one should be interested in
computer systems whether hardware or software and for semantics one thinks of systems whose
configurations are a mixture of syntactical objects – the programs and data – such as stores or
3
environments. Thus in these notes we have
One wonders if this study could be generalised to other kinds of systems, especially hardware
ones.
Clearly systems have some behaviour and it is that which we wish to describe. In an opera-
tional semantics one focuses on the operations the system can perform – whether internally
or interactively with some supersystem or the outside world. For in our discrete (digital) com-
puter systems behaviour consists of elementary steps which are occurrences of operations. Such
elementary steps are called here, (and also in many other situations in Computer Science) tran-
sitions (= moves). Thus a transition steps from one configuration to another and as a first idea
we take it to be a binary relation between configurations.
Definition 1 A Transition System (ts) is (just!) a structure hΓ, −→i where Γ is a set (of
elements, γ, called configurations) and −→ ⊆ Γ × Γ is a binary relation (called the transition
relation). Read γ −→ γ 0 as saying that there is a transition from the configuration γ to the
configuration γ 0 . (Other notations sometimes seen are `, ⇒ and ).
A Transition
Of course this idea is hardly new and examples can be found in any book on automata or formal
languages. Its application to the definition of programming languages can be found in the work
of Landin and the Vienna Group [Lan,Oll,Weg].
Structures of the form, hΓ, −→i are rather simple and later we will consider several more
elaborate variants, tailored to individual circumstances. For example it is often helpful to have
an idea of terminal (= final = halting) configurations.
4
Definition 2 A Terminal Transition System (tts) is a structure hΓ, −→, T i where hΓ, −→i is
a ts, and T ⊆ Γ (the set of final configurations) satisfies ∀γ ∈ T ∀γ 0 ∈ Γ . γ 6−→ γ 0 .
A point to watch is to make a distinction between internal and external behaviour. Internally
a system’s behaviour is nothing but the sum of its transitions. (We ignore here the fact that
often these transitions make sense only at a certain level; what counts as one transition for one
purpose may in fact consist of many steps when viewed in more detail. Part of the spirit of our
method is to choose steps of the appropriate “size”.) However externally many of the transitions
produce no detectable effect. It is a matter of experience to choose the right definition of external
behaviour. Often two or more definitions of behaviour (or of having the same behaviour) are
possible for a given transition system. Indeed on occasion one must turn the problem around and
look for a transition system which makes it possible to obtain an expected notion of behaviour.
We recall a few familiar and not so familiar examples from computability and formal languages.
Γ = Q × Σ∗
So any configuration, γ = hq, wi has a state component, q, and a control component, w, for
data.
hq, awi ` hq 0 , wi
The behaviour of a finite automaton is just the set L(M ) of strings it accepts:
5
Of course we could also define the terminal configurations by:
T = {hq, εi | q ∈ F }
and then
In fact we can even get a little more abstract. Let hΓ, −→, T i be a tts. An input function for it
is any mapping in: I −→ Γ and the language it accepts is then L(Γ) ⊆ I where:
L(Γ) = {i ∈ I | ∃γ ∈ T . in(i)−→∗ γ}
(For finite automata as above we take I = Σ∗ , and in(w) = hq0 , wi). Thus we can easily
formalise at least one general notion of behaviour.
A transition sequence:
We have three counters, C, namely I, J and K. There are instructions, O, of the following four
types:
• Increment: inc C : m
• Decrement: dec C : m
• Zero Test: zero C : m/n
• Stop: stop
Then programs are just sequences P = O1 , . . . , Ol of instructions. Now, fixing P , the set of
configurations is:
Γ = {hm, i, j, ki | 1 ≤ m ≤ l; i, j, k ∈ N}
6
Then the transition relation is defined in terms of the various possibilities by:
hm, i, j, ki ` hm0 , i + 1, j, ki
hm, i + 1, j, ki ` hm0 , i, j, ki
hm, 0, j, ki ` hm0 , 0, j, ki
hm, i + 1, j, ki ` hm00 , i + 1, j, ki
∀γ, γ 0 , γ 00 · γ −→ γ 0 ∧ γ −→ γ 00 ⇒ γ 0 = γ 00
or, diagrammatically:
γ
J
J
J
J
JJ
^
γ 0 ======= γ 00
(Exercise – prove this).
T = {hm, 0, j, 0i | Om = stop}
def
f (i) = j = h1, i, 0, 0i−→∗ hm, 0, j, 0i ∈ T
This can be put a little more abstractly, if we take for any tts hΓ, −→, T i an input function,
in : I −→ Γ as before and also an output function, out : T −→ O and define a partial function
fΓ : I −→ O by
P
7
Of course for this to make sense the tts must be deterministic (why?). In the case of a three-
counter machine we have
I=O=N
in(i) = h1, i, 0, 0i
out(hm, i, j, ki) = j
inc J
?
H H
HH
HH yes -
zero I stop
HH
H
HH
H
no
?
6
dec I
inc J
Γ = (N ∪ Σ)∗
8
Now the behaviour is just
L(G) = {w | S ⇒∗ w}
Amusingly, this already does not fit into our abstract idea for behaviours as sets (the one which
worked for finite automata). The problem is that was intended for acceptance where here we
have to do with generation (by leftmost derivations).
S→
S → (S)
S → SS
Transition systems in general do not give the opportunity of saying very much about any
individual transition. By adding the possibility of such information we arrive at a definition.
Definition 6 A Labelled Transition System (lts) is a structure hΓ, A, −→i where Γ is a set (of
configurations) and A is a set (of actions (= labels = operations)) and
−→ ⊆ Γ × A × Γ
The idea of Labelled Terminal Transition Systems hΓ, A, −→, T i should be clear to the reader
who will also expect the following generalisation of reflexive (resp. transitive) closure. For any
9
lts let γ and γ 0 be configurations and take x = a1 . . . ak in A+ (resp. A∗ ) then:
x + (resp. ∗) def a a
γ −→ γ0 = 1
∃γ1 , . . . , γk . γ −→ k
γ1 . . . −→ γk = γ 0
• Γ=Q
• A=Σ
a
• q −→ q 0 ≡ q 0 ∈ δ(q, a)
• T =F
w ∗
Then we have L(M ) = {w ∈ A∗ | ∃q ∈ T. q0 −→ q}. The example transition sequence given
above now becomes simply:
0 1 0 0 1
p −→ q −→ p −→ q −→ r −→ r ∈ F
Example 8 (Petri Nets) One idea of a Petri Net is just a quadruple N = hB, E, F, mi where
A configuration, m, is contact-free if
#
a #b
x x
Q
"! "!
QQ
s
QQ
3
Q e
Q
# #
QQ
s
3 Q
x QQ
"! "!
a0 0
b
The point of this definition is that the occurrence of an event, e, is nothing more than the
ceasing-to-hold of its preconditions ( = F −1 (e)) and the starting-to-hold of its postconditions
( = F (e)) in any given case. Here a case is a set of conditions (those that hold in the case). A
10
contact-situation is one where this idea does not make sense. Often one excludes this possibility
axiomatically (and imposes also other intuitively acceptable axioms). We will just (somewhat
arbitrarily) regard them as “runtime errors” and take
Γ = {m ⊆ B | m is contact-free}
If two different events share a precondition in a case, then according to the above intentions
they cannot both occur at once. Accordingly we define a conflict relation between events by:
An event can occur from a given case if all its preconditions hold in the case. What is (much)
more, Petri Nets model concurrency in that several events (not in conflict) can occur together
in a given case. So we put
A = {X ⊆ E | ¬∃e, e0 ∈ X. e]e0 }
and define
X
m −→ m0 ≡ F −1 (X) ⊆ m ∧ m0 = [m\F −1 (X)] ∪ F (X)
# # # # # # # #
x x
"! "! "! "! "! "! "! "!
6 6 6 6 {1,4}
-
6 6 6 6
1 2 3 4 1 2 3 4
Q
kQ 3PPP Q 3PPP
1
6 iP 6 1
6 kQ iP 6
# # # # # #
P P
Q P Q P
x x x x
"! "! "! "! "! "!
A Transition
We give no definition of behaviour as there does not seem to be any generally accepted one in
the literature. For further information on Petri Nets see [Bra,Pet].
Of course our transitions with their actions must also be thought of as kinds of events; even
more so when we are discussing the semantics of languages for concurrency. We believe there
are very strong links between our ideas and those in Net Theory, but, alas, do not have time
here to pursue them.
11
Example 9 (Readers and Writers) This is a (partial) specification of a Readers and Writ-
ers problem with two agents each of whom can read and write (and do some local processing)
but where the writes should not overlap.
# #
"! "!
@
@ @
@
@ @
I
@
@ I
@
@
@ @
LP1 x LP2 x
"! "! "! "! "!
@
@
R
@
6 6 @@ 6 6
@
x
"! "!
To finish Chapter 1 we give an example of how to define the operational semantics of a language
by an interpreting automaton. The reader should obtain some feeling for what is possible along
these lines (see the references given above for more information), as well as a feeling that the
12
method is somehow a little too indirect thus paving the way for the approach taken in the next
chapter.
We begin with the Abstract Syntax of a very simple programming language called L. What is
abstract about it will be discussed a little here and later at greater length. For us syntax is a
collection of syntactic sets of phrases; each set corresponds to a different type of phrase. Some
of these sets are very simple and can be taken as given:
e ::= m | v | e + e0 | e − e0 | e ∗ e0
b ::= t | e = e0 | b or b0 | ∼b
This specification can be taken, roughly speaking, as a context-free grammar if the reader just
ignores the use of the infinite set N and the use of primes. It can also (despite appearances!) be
taken as unambiguous if the reader just regards the author as having lazily omitted brackets as
in:
b ::= t | e = e0 | b or b0 | ∼b
specifying parse trees so that rather than saying ambiguously that (for example):
while b do c; c0
13
Z
J Z
J
J Z
J
J Z J
and are trees.
J ZZ
J
J
J Z
Z
JJ
while b do
J B@
B ; c0
J B@
J B @
J B @
JJ B @
@
c ; c0 while b do c
So we are abstract in not worrying about some lexical matters and just using for example
integers rather than numerals and in not worrying about the exact specification of phrases.
What we are really trying to do is abstract away from the problems of parsing the token strings
that really came into the computer and considering instead the “deep structure” of programs.
Thus the syntactic categories we choose are supposed to be those with independent semantic
significance; the various program constructs – such as semicolon or while . . . do . . . – are the
constructive operations on phrases that possess semantic significance.
For example contrast the following concrete syntax for (some of) our expressions (taken from
[Ten]):
Now, however convenient it is for a parser to distinguish between hexpressioni, htermi and
hfactori it does not make much semantic sense!
Thus we will never give semantics directly to token strings but rather to their real structure.
However, we can always obtain the semantics of token strings via parsers which we regard as
essentially just maps:
Of course it is not really so well-defined what the abstract syntax for a given language is, and
we shall clearly make good use of the freedom of choice available.
14
Returning to our language L we observe the following “dependency diagram”:
C
@
@
@
@
@
? @R
@
B - E
Now we define a suitable transition system whose configurations are those of the SMC-machine.
• Value Stacks is ranged over by S and is the set (T ∪ N ∪ Var ∪ BExp ∪ Com)∗
• Memories is ranged over by M and is Var −→ N
• Control Stacks is ranged over by C and is
and so a typical configuration is γ = hS, M, Ci. The idea is that we interpret commands and
produce as our interpretation proceeds, stacks C, of control information (initially a command
but later bits of commands). Along the way we accumulate partial results (when evaluating
expressions), and bits of command text which will be needed later; this is all put (for some
reason) on the value stack, S. Finally we have a model of the store (= memory) as a function
M : Var −→ N which given a variable, v, says what its value M (v) is in the store.
So M [m/v] is the memory resulting from updating M by changing the value of v from M (v)
to m.
The transition relation, ⇒, is defined by cases according to what is on the top of the control
stack.
15
• Expressions
En hS, M, n Ci ⇒ hn S, M, Ci
Ev hS, M, v Ci ⇒ hM (v) S, M, Ci
+ + +
E −I hS, M, e − e0 Ci ⇒ hS, M, e e0 − Ci
∗ ∗ ∗
+ +
0
E −E hm m S, M, − Ci ⇒ hn S, M, Ci
∗ ∗
+
(where n = m − m0 )
∗
Note 1 The symbols +, −, ∗, are being used both as symbols of L and to stand for the
functions addition, subtraction and multiplication.
• Boolean Expressions
Bt hS, M, t Ci ⇒ ht S, M, Ci
B=I hS, M, e = e0 Ci ⇒ hS, M, e e0 = Ci
B=E hm0 m S, M, = Ci ⇒ ht S, M, Ci
(where t = (m = m0 ))
B or I hS, M, b or b0 Ci ⇒ hS, M, b b0 or Ci
B or E ht0 t S, M, or Ci ⇒ ht00 S, M, Ci
(where t00 = (t ∨ t0 ))
B∼I hS, M, ∼ b Ci ⇒ hS, M, b ∼ Ci
B∼E ht S, M, ∼ Ci ⇒ ht0 S, M, Ci
(where t0 = ∼ t)
16
• Commands
Now that we have at some length defined the transition relation, the terminal configurations
are defined by:
T = {hε, M, εi}
in(C, M ) = hε, M, Ci
out(hε, M, εi) = M
Example 10 (Factorial)
C0
z }| {
y := 1; while ∼(x = 0) do y := y ∗ x; x := x − 1
| {z }
C
17
⇒ h1 y, h3, 5i, := Ci by Em
⇒ hε, h3, 1i, Ci by C := E
⇒ h∼(x = 0) C 0 , h3, 1i, ∼(x = 0) whilei by C while I
⇒ h∼(x = 0) C 0 , h3, 1i, (x = 0) ∼ whilei by E∼I
⇒ h∼(x = 0) C 0 , h3, 1i, x 0 = ∼ whilei by E=I
⇒ h3 ∼(x = 0) C 0 , h3, 1i, 0 = ∼ whilei by Ev
⇒ h0 3 ∼(x = 0) C 0 , h3, 1i, = ∼ whilei by Em
⇒ hff ∼(x = 0) C 0 , h3, 1i, ∼ whilei by E=E
⇒ htt ∼(x = 0) C 0 , h3, 1i, whilei by E∼E
⇒ hε, h3, 1i, C 0 Ci by C while E1
⇒ hε, h3, 1i, y := y ∗ x x := x − 1 Ci by C;
⇒∗ hε, h3, 3i, x := x − 1 Ci
⇒∗ hε, h2, 3i, Ci
⇒∗ hε, h1, 6i, Ci
⇒∗ hε, h0, 6i, Ci
⇒ h∼(x = 0) C 0 , h0, 6i, ∼(x = 0) whilei by C while I
⇒∗ hff ∼(x = 0) C 0 , h0, 6i, whilei
⇒ hε, h0, 6i, εi by C while E2
Many other machines have been proposed along these lines. It is, perhaps, fair to say that
none of them can be considered as directly formalising the intuitive operational semantics to
be found in most language definitions. Rather they are more or less clearly correct on the basis
of this intuitive understanding. Further, although this is of less importance, they all have a
tendency to pull the syntax to pieces or at any rate to wander around the syntax creating
various complex symbolic structures which do not seem particularly forced by the demands
of the language itself. Finally, they do not in general have any great claim to being syntax-
directed in the sense of defining the semantics of compound phrases in terms of the semantics of
their components, although the definition of the transition relation does fall into natural cases
following the various syntactical possibilities.
1.5 Exercises
Finite Automata
18
2. Suppose that δ were changed so that the labelled transition relation had instead the form:
a
q −→ q1 , q2
and F so that F ⊆ Q × Σ. What is the new type of δ? How can binary trees like
a
A
A
A
A
b c
A
A
A
A
now be accepted by M ? d e
4. Finite automata can be turned into transducer by taking δ to be a finite set of transitions
of the form:
v
q −→ q 0
w
v
with v, w ∈ Σ∗ . Define the relation q −→
w
q 0 and the appropriate notion of behaviour.
Show any finite-state transducer can be turned into an equivalent one, where we have in
any transition that 0 ≤ |v| ≤ 1.
Various Machines
5. Define k counter machines. Show that any function computable by a k counter machine
is computable by a 3-counter machine. [Hint: First program elementary functions on the
3-counter machine including pairing, pair : N2 −→ N, and selection functions, fst, snd :
N −→ N such that:
fst(pair(m, n)) = m
snd(pair(m, n)) = n
19
Then simulate by coding all the registers of the k counter machine by a big tuple held in
one of the registers of the 3-counter machine.]
Show that any partial-recursive function (= one computable by a Turing Machine) can
be computed by some 3-counter machine (and vice-versa).
6. Consider stack machines where the registers hold stacks and operations on a stack (=
element of Σ∗ ) are pusha , pop, ishda (for each a ∈ Σ) given by:
pusha (w) = aw
pop(aw) = w
true
(if w = aw0 for some w0 )
ishda (w) =
false
(otherwise)
Show stack machines compute the same functions as Turing Machines. How many stacks
are needed at most?
8. See how your favourite machines (Turing Machines, Push-Down Automata) fit into our
framework. For a general view of machines, consult the eminently readable: [Bir] or [Sco].
Look too at [Gre].
Grammars
9. For CF grammars our notion of behaviour is adapted to generation. Define a notion that
is good for acceptance. What about mixed generation/acceptance? Change the definitions
so that you get parse trees as behaviour. What is the nicest way you can find to handle
syntax-directed translation schemes?
10. Show that for LL(1) grammars you can obtain deterministic labelled (with Σ) transitions
of the form
a
w −→ w0
with w strings of terminals and non-terminals. What can you say about LL(k), LR(k)?
11. Have another look at other kinds of grammar too, e.g., Context-Sensitive, Type 0 (=
arbitrary) grammars. Discover other ideas for Transition Systems in the literature. Ex-
amples include: Tag, Semi-Thue Systems, Markov Algorithms, λ-Calculus, Post Systems,
L-Systems, Conway’s Game of Life and other forms of Cell Automata, Kleene’s Nerve
Nets . . .
20
Petri Nets
have:
m
J
0
X
J
J X
J
J^
J
m0 Y m00
J
X0 J
X
J
J ?
J
m000 where Y = X ∪ X 0
This is a so-called Church-Rosser Property.
X
13. Show that if we have m −→ m0 where X = {e1 , . . . , ek } then for some m1 , . . . , mk we
have:
{e1 } {e2 } {ek }
m −→ m1 −→ · · · −→ mk = m
14. Write some Petri Nets for a parallel situation you know well (e.g., for something you knew
at home or some computational situation).
15. How can nets accept languages (= subsets of Σ∗ )? Are they always regular?
16. Find, for the Readers and Writers net given above, all the cases you can reach by transition
sequences starting at the initial case. Draw (nicely!) the graph of cases and transitions
(this is a so-called case graph).
Interpreting Automata
21
hN, Σ0 , P 0 , Si is strongly unambiguous where Σ0 = Σ ∪ {(, )} where the parentheses are
assumed not to be in N or Σ and where
T −→ (w) is in P 0 if T −→ w is in P .
18. See what changes you should make in the definition of the interpreting automaton when
some of the following features are added:
c ::= if b then c |
case e of e1 : c
..
.
ek : c
end |
for v := e, e do c |
repeat c until b
19. Can you handle constructions that drastically change the flow of control such as:
21. Can you add facilities to the automaton to handle run-time errors?
22. Can you produce measures of time/space complexity by adding extra components to the
automaton?
24. What about real-time? That is suppose we had the awful expression:
e ::= time
25. Treat the following PASCAL subset. The basic sets are T, N and x ∈ I × {i, r, b} – the
set of typical identifiers (which is infinite) and o ∈ O – the set { =, <>, <, <=, >, >=,
22
+, −, ∗, /, div, mod, and } of operations. The idea for typical identifiers is that i, r, b
are type symbols for integer, real and boolean respectively and so hFRED, ri is the real
identifier FRED.
The derived sets are expressions and commands where:
e ::= m | t | v | −e | not e | e o e
c ::= nil | v := e | c; c0 | if e then c else c0 | while e do c
The point of the question is that you must think about compile-time type-checking and
the memories used in the hS, M, Ci machine should be finite (even although there are
potentially infinitely many identifiers).
s ::= i | r | b
c ::= var v : s begin c end
2 Bibliography
23
3 Simple Expressions and Commands
The hS, M, Ci machine emphasises the idea of computation as a sequence of transitions involv-
ing simple data manipulations; further the definition of the transitions falls into simple cases
according to the syntactic structure of the expression or command on top of the control stack.
However, many of the transitions are of little intuitive importance, contradicting our idea of
the right choice of the “size” of the transitions. Further the definition of the transitions is not
syntax-directed so that, for example, the transitions of c; c0 are not directly defined in terms of
those for c and those for c0 . Finally but really the most important, the hS, M, Ci machine is
not a formalisation of intuitive operational ideas but is rather, fairly clearly, correct given these
intuitive ideas.
In this chapter we develop a method designed to answer these objections, treating simple
expressions and commands as illustrated by the language L. We consider run-time errors and
say a little on how to establish properties of transition relations. Finally we take a first look at
simple type-checking.
Let us consider first the very simple subset of expressions given by:
e ::= m | e0 + e1
24
and how the hS, M, Ci machine deals with them. For example we have the transition sequence
for the expression (1 + (2 + 3)) + (4 + 5):
In these 13 transitions only the 4 additions marked (∗) are of any real interest as system events.
Further the intermediate structures generated on the stacks are also of little interest. Preferable
would be a sequence of 4 transitions on the expression itself thus:
5 5
(1 + (2 + 3)) + (4 + 5)) −→ (1 + 5) + (4 + 5)
5
−→ 6 + (4 + 5)
5
−→ 6 + 9
−→ 15
where we are ignoring the memory and we have marked the occurrences of the additions in each
transition. (These transition sequences of expressions are often called reduction sequences (=
derivations) and the occurrences are called redexes; this notation originates in the λ-calculus
(see, e.g., [Hin]).)
Now consider an informal specification of this kind of expression evaluation. Briefly one might
just say one evaluates from left-to-right. More pedantically one could say:
25
(3) Add m0 to m1 obtaining m2 , say, as result.
This finishes the evaluation and m2 is the result of the evaluation.
Note that this specification is syntax-directed, and we use it to obtain rules for describing steps
(= transitions) of evaluation which we think of as nothing else than a derivation of the form:
e = e1 −→ e2 −→ . . . −→ en−1 −→ en = m
(where m is the result). Indeed if we just look at the first step we see from the above specification
that
(1) If e0 is not a constant the first step of the evaluation of e0 + e1 is the first step of the
evaluation of e0 .
(2) If e0 is a constant, but e1 is not, the first step of the evaluation of e0 + e1 is the first step
of the evaluation of e1 .
(3) If e0 and e1 are constants the first (and last!) step of the evaluation of e0 +e1 is the addition
of e0 and e1 .
Clearly too the first step of evaluating an expression, e, can be taken as resulting in an expression
e0 with the property that the evaluation of e is the first step followed by the evaluation of e0 .
We now put all this together to obtain rules for the first step. These are rules for establishing
binary relationships of the form:
Rules: Sum
e0 −→ e00
(1)
e0 + e1 −→ e00 + e1
e1 −→ e01
(2)
m0 + e1 −→ m0 + e01
(3) m0 + m1 −→ m2 (if m2 is the sum of m0 and m1 )
Thus, for example, rule 1 states what is obvious from the above discussion:
If e00 is the result of the first step of the evaluation of e0 then e00 + e1 is the result of the first
step of the evaluation of e0 + e1 .
We now take these rules as a definition of what relationships hold – namely exactly these we
can establish from the rules. We take the above discussion as showing why this mathematical
definition makes sense from an intuitive view; it is the direct formalisation referred to above.
(1 + (2 + 3)) + (4 + 5) −→ (1 + 5) + (4 + 5)
26
To establish this step we have
1. 2 + 3 −→ 5 (By rule 3)
2. 1 + (2 + 3) −→ 1 + 5 (By rule 2)
3. (1 + (2 + 3)) + (4 + 5) −→ (1 + 5) + (4 + 5) (By rule 1)
Rather than this unnatural “bottom-up” method we usually display these little proofs in the
“top-down” way they are actually “discovered”. The arrow is supposed to show the “direction”
of discovery:
6
Sum 1
(1+(2+3))+(4+5) −→ (1+5)+(4+5))
Sum 2
1 + (2+3) −→ 1 + 5
Sum 3
2+3 −→ 5
& %
Thus, while the evaluation takes four steps, the justification (proof) of each step has a certain
size of its own (which need not be displayed). In this light the hS, M, Ci machine can be viewed
as mixing-up the additions with the reasons why they should be performed into one long linear
sequence.
It could well be argued that our formalisation is not really that direct. A more direct approach
would be to give rules for the transition sequences themselves (the evaluations). For the intuitive
specification refers to these evaluations rather than any hypothetical atomic actions from which
they are composed. However, axiomatising a step is intuitively simpler, and we prefer to follow
a simple approach until it leads us into such difficulties that it is better to consider whole
derivations.
Another point concerns the lack of formalisation of our ideas. The above rules are easily turned
into a formal system of formulae, axioms and rules. What we would want is a sufficiently
elastic conception of a range of such formal systems which on the one hand allows the natural
expression of all the systems of rules we wish, and on the other hand returns some profit in
the form of interesting theorems about such systems or interesting computer systems based on
such systems. However, the present work is too exploratory for us to fix our ideas, although we
may later try out one or two possibilities. We also fear that introducing such formalities could
easily lead us into obscurities in the presentation of otherwise natural ideas.
Now we try out more expressions. To evaluate variables we need the memory component of the
hS, M, Ci machines – indeed that is the only “natural” component they have! It is convenient
27
here to change our notation to a more generally accepted one:
OLD NEW
Memory Store
Memories = (Var −→ N) = S
M σ
M [m/v] σ[m/v]
3.1.1 L-Expressions
e ::= m | v | (e + e0 ) | (e − e0 ) | (e ∗ e0 )
Γ = {he, σi}
he, σi −→ he0 , σi
meaning one step of the evaluation of e (with store σ) results in the expression e0 (with store
σ). The rules are just those we already have, adapted to take account of stores plus an obvious
rule for printing the value of a variable in a store.
Rules: Sum
he0 , σi −→ he00 , σi
(1)
he0 + e1 , σi −→ he00 + e1 , σi
he1 , σi −→ he01 , σi
(2)
hm + e1 , σi −→ hm + e01 , σi
28
(3) hm + m0 , σi −→ hn, σi (where n = m + m0 )
Minus
1,2. Exercise for the reader.
3. hm − m0 , σi −→ hn, σi (if m ≥ m0 and n = m − m0 )
Times
1,2,3. Exercise for the reader.
Variable
(1) hv, σi −→ hσ(v), σi
Note the two uses of the symbol, +, in rule Sum 3: one as a syntactic construct and one for
the addition function. We will often overload symbols in this way relying on the context for
disambiguation. So here, for example, to make sense of n = m+m0 we must be meaning addition
as the left-hand-side of the equation denotes a natural number.
Of course the terminal configurations are those of the form hm, σi, and m is the result of the
evaluation. Note that there are configurations such as:
γ = h5 + (7 − 11), σi
In most programming languages these stuck configurations result in run-time errors. These will
be considered below.
The behaviour of expressions is the result of their evaluation and is defined by:
The reader will see (from 2.3 below, if needed) that eval is a well-defined partial function.
b := t | b or b0 | e = e0 | ∼b
Here we take Γ = {hb, σi} and consider the rules for the transition relation. There are clearly
none for truth-values, t, but there are several possibilities for disjunctions, b or b0 . These possi-
bilities differ not only in the order of the transitions, but even on which transitions occur. The
configurations are pairs hb, σi.
29
A. Complete Evaluation: This is just the Boolean analogue of our rules for expressions and
corresponds to the method used by our SMC-machine.
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
hb1 , σi −→ hb01 , σi
(2)
ht or b1 , σi −→ ht or b01 , σi
(3) t or t0 −→ t00 (where t00 = t ∨ t0 )
B. Left-Sequential Evaluation: This takes advantage of the fact that it is not needed to
evaluate b, in tt or b, as the result will be tt independently of the result of evaluating b,
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
(2) htt or b1 , σi −→ htt, σi
(3) hff or b1 , σi −→ hb1 , σi
C. Right-Sequential Evaluation: Like B but “backwards”.
D. Parallel Evaluation: This tries to combine the advantages of B and C by evaluating b0 and
b1 in parallel. In practice that would mean having two processors, one for b0 and one for b1 , or
using one but interleaving, somehow, the evaluations of b0 and b1 . This idea is therefore not
found in the usual sequential programming languages (as opposed to these making explicit
provisions for concurrency). However, it may be useful for hardware specification.
hb0 , σi −→ hb00 , σi
(1)
hb0 or b1 , σi −→ hb00 or b1 , σi
hb1 , σi −→ hb01 , σi
(2)
hb0 or b1 , σi −→ hb0 or b01 , σi
(3) htt or b1 , σi −→ htt, σi
(4) hb0 or tt, σi −→ htt, σi
(5) hff or b1 , σi −→ hb1 , σi
(6) hb0 or ff, σi −→ hb0 , σi
The above evaluation mechanisms are very different when subexpressions can have non-terminating
evaluations, when we have the following relationships:
B⇐A
⇓ ⇓
D⇐C
where X ⇒ Y means that if method X terminates with result t, so does method Y. We take
method A for the semantics of our example language L.
30
For Boolean expressions of the form e = e0 our rules depend on those for expressions, but
otherwise are normal (and for brevity we omit the σ’s).
• Equality
e0 −→ e00
(1)
e0 = e1 −→ e00 = e1
e1 −→ e01
(2)
m = e1 −→ m = e01
(3) m = n −→ t (where t is tt if m = n and ff otherwise)
• Negation
b −→ b0
(1)
∼b −→ ∼b0
(2) ∼t −→ t0 (where t0 = ¬t)
c ::= nil | v := e | c; c0
31
and see how the SMC-machine behaves on an example:
And we see that of the eleven transitions only three – the assignments – are of interest as
system events.
Preferable here would be a sequence of three transitions on configurations of the form hc, σi,
thus:
5 5
hz := x; (x := y; y := z), abci −→ h(x := y; y := z), abai
5
−→ hy := z , bbai
−→ baa
• Nil: To execute nil from store σ take no action and terminate with σ as the final store of
the execution.
• Assignment: To execute v := e from store σ evaluate e, and if the result is m, change σ to
σ[m/v] (the final store of the execution).
• Composition: To execute c; c0 from store σ
(1) Execute c from store σ obtaining a final store, σ 0 , say, if this execution terminates.
(2) Execute c0 from the store σ 0 . The final store of this execution is also the final store of the
execution of c; c0 .
- c - c0 -
32
As in the case of expressions one sees that this description is syntax-directed. We formalise it
considering terminating executions of a command c from a store σ to be transition sequences
of the form:
T = {σ}
One step of execution of the command c from the store σ results in the store σ 0 and the rest
of the execution of c is the execution of c0 from σ 0 (resp. and the execution terminates).
Thus we choose c0 to represent, in as simple a way as is available, the remainder of the execution
of c after its first step. The rules are
• Nil: hnil, σi −→ σ
• Assignment:
he, σi −→∗ hm, σi
(1)
hv := e, σi −→ σ[m/v]
• Composition:
hc0 , σi −→ hc00 , σ 0 i
(1)
hc0 ; c1 , σi −→ hc00 ; c1 , σ 0 i
hc0 , σi −→ σ 0
(2)
hc0 ; c1 , σi −→ hc1 , σ 0 i
Note: In formulating the rule for assignment we have considered the entire evaluation of the
right-hand-side as part of one execution step. This corresponds to a change in view of the size
of our step when considering commands, but we could just as well have chosen otherwise.
As an example consider the first transition desired above for the execution
hz := x; (x := y; y := z), abci
33
Again we see, as in the case of expressions a “two-dimensional” structure consisting of a “hor-
izontal” transition sequence of the events of system significance and for each transition a “ver-
tical” explanation of why and how it occurs.
γ0 - γ1 - ...... - γn ∈ T
AA AA AA
A A A
A A A
proof A proof A proof A
A A A
Again we see that the SMC-machine transition sequences are more-or-less linearisations of these
structures. Note the appearance of rules for binary relations (with additional data components)
such as:
def
R(c, c0 , σ, σ 0 ) = hc, σi −→ hc0 , σ 0 i
def
S(e, e0 , σ) = he, σi −→ he0 , σ 0 i
Later we shall make extensive use of predicates to treat the context-sensitive aspects of syntax
(= the static aspects of semantics). As far as we can see there is no particular need for ternary
relations, although the above discussion on the indirectness of our formalisation does suggest
the possibility of needing relations of variable degree for dealing with execution sequences.
3.3 L-commands
34
2.2 If result was ff execute c0 from σ.
In pictures we have:
?
""bb
" b
" bb
"
bb b "
b "
tt +
b
b""
" Q
Q ff
" sb
Q
" b
"
" b
b
c c0
? ?
And the rules are: ∗
hb, σi −→ htt, σi
(1)
hif b then c else c0 , σi −→ hc, σi
hb, σi −→∗ hff, σi
(2)
hif b then c else c0 , σi −→ hc0 , σi
Note: Again we are depending on the transition relation of another syntactic class – here
Boolean expressions – and a whole computation from that class becomes one step of the com-
putation.
Note: No rules for T (if b then c else c0 ) are given as that predicate never applies. For a
conditional is never terminal as one always has at least one action – namely evaluating the
condition.
35
?
""bb
" b
" bb
"
bb b "
-
b "
b "
b""
6 ?
Example 12 Consider the factorial example y := 1; w from Chapter 1, where w = while ∼(x =
0) do c where c = (y := y ∗ x; x := x − 1). We start from the state h3, 5i.
ASS1
hyh3, 5ii −→ hw, h3, 1iii
COMP2
−→ hc; w, h3, 1ii (via WHI)
COMP1
−→ hx := x − 1; w, h3, 3ii (via COMP2 and ASS1)
COMP2
−→ hw, h2, 3ii
COMP2
−→ hc; w, h2, 3ii (WHI)
COMP1
−→ hx := x − 1; w, h2, 6ii (via COMP2 and ASS1)
COMP1
−→ hw, h1, 6ii
−→ hc; w, h1, 6ii
−→ hx := x − 1; w, h1, 6ii
−→ hw, h0, 6ii
COMP2
−→ h0, 6i (via WHI2)
w −→ c; w −→ . . . −→ w −→ c; w −→ . . . −→ w − − − − − − . . . −→ w −→ ·
b −→∗ tt c −→ . . . −→ ·b −→∗ tt c −→ . . . −→ ·b − − − − − − . . . −→ ·; b −→∗ ff
36
L
∗ L
. . .L L
∗ L
. . . L ∗ LL
∗ L L
∗
L L L
L L L L L L
L L L L L L L L
L L L L L L L L
LL L
LL LL LL LL LL L L LL
One can now define the behaviour and equivalence of commands by:
and
where we are using Kleene equality, which means that one side is defined iff the other is, and
in that case they are both equal.
Although we have no particular intention of proving very much either about or with our oper-
ational semantics, we would like to introduce enough mathematical apparatus to enable us to
establish the truth of such obvious statements as:
The standard tool is the principle of Structural Induction (SI). It enables us to prove properties
P (p) of syntactic phrases, and it takes on different forms according to the abstract syntax of the
language. For L we have three such principles, one for expressions, one for Boolean expressions
and one for commands.
37
We take this principle as being intuitively obvious. It can be stated more compactly by using
standard logical notation:
As an example we prove
Now there are five cases according to the hypotheses necessary to establish the conclusion by
SI.
In the above we did not need such a strong induction hypothesis. Instead we could choose a
fixed σ and proceed by SI on Q(e) where:
However, this is just a matter of luck (here that the evaluation of expressions does not side
effect the state). Generally it is wise to choose one’s induction hypothesis as strong as possible.
38
The point is that if one’s hypothesis has the form (for example)
then when proving P (e0 + e1 ) given P (e0 ) and P (e1 ) one fixes σ and tries to prove Q(e, σ). But
in this proof one is at liberty to use the facts Q(e0 , σ), Q(e0 , σ 0 ), Q(e1 , σ), Q(e1 , σ 00 ) for any σ 0
and σ 00 .
We just write down the symbolic version for a desired property P (b) of Boolean expressions.
In general when applying this principle one may need further structural inductions on expres-
sions. For example:
Continuing with case 2.1 we see that he = e0 , σi −→ he00 = e0 , σi so hb, σi is not stuck.
Case 2.2 Here e is in N but e0 is not; the proof is much like case 2.1 and also uses the
lemma.
Case 2.3 Here e, e0 are in N and we an use rule EQU. 3.
Case 3 b = (b0 or b1 ) This is like case 3 of the proof of fact 1.
Case 4 b = ∼b0 If b0 is not in T we can easily apply the induction hypothesis. Otherwise
use rule NEG. 2.
39
This concludes all the cases and hence the proof.
SI for Commands
We just write down the symbolic version for a (desired) property P (c) of commands:
[P (nil) ∧ ∀v ∈ Var, e ∈ E. P (v := e)
∧ (∀c, c0 ∈ C. P (c) ∧ P (c0 ) ⊃ P (c; c0 ))
∧ (∀b ∈ B.∀c, c0 ∈ C. P (c) ∧ P (c0 ) ⊃ P (if b then c else c0 ))
∧ (∀b ∈ B. ∀c ∈ C. P (c) ⊃ P (while b do c))]
⊃ ∀c ∈ C. P (c)
Fact 15 If v does not occur on the left-hand-side of an assignment in c, then the execution of
c cannot affect its value. That is if hc, σi −→∗ σ 0 then σ(v) = σ 0 (v).
PROOF. By SI on commands. The statement of the hypothesis should be apparent from the
proof, and is left to the reader.
Case 4 c = if b then c0 else c1 Here we easily use the induction hypothesis on c0 and c1
(according to the outcome of the evaluation of b).
Case 5 c = while b do c0 Here we argue on the length of the transition sequence hc, σi −→
. . . −→ σ 0 . This is just an ordinary mathematical induction. In case the sequence has length
0, we have σ 0 = σ. Otherwise there are two cases according to the result of evaluating b. We
just look at the harder one.
Case 5.1 hc, σi −→ hc0 ; c, σi −→ . . . −→ σ1 . Here we see that hc0 , σi −→∗ σ2 (and apply
the main SI hypothesis) and also that hc, σ2 i −→∗ σ1 and a shorter transition sequence to
which the induction hypothesis can therefore be applied.
This particular lemma shows that on occasion we will use other induction principles such as
induction on the length of a derivation sequence.
Another possibility is to use induction on some measure of the size of the proof of an assertion
γ −→ γ 0 (which would, strictly speaking, require a careful definition of the size measure).
40
Anyway we repeat that we will not develop too much “technology” for making these proofs,
but would like the reader to be able, in principle, to check out simple facts.
γ ∈ T ⊃ ¬∃γ 0 .γ −→ γ 0
we did not ensure the converse. Implementations of real programming languages will ensure the
converse generally by issuing a run-time ( = dynamic) error report and forcibly terminating
the computation. It would therefore be pleasant if we could also specify dynamic errors.
• Expressions
· Sum
he0 , σi −→ error
4.
he0 + e1 , σi −→ error
he1 , σi −→ error
5.
hm + e1 , σi −→ error
· Minus
4,5 as for Sum
6. hm − m0 , σi −→ error (if m < m0 )
· Times
4,5 as for Sum
• Boolean Expressions
· Disjunction
4,5 as for Sum
· Equality
4,5 as for Sum
· Negation
hb, σi −→ error
3.
h∼b, σi −→ error
• Commands
· Assignment
he, σi −→ error
2.
hv := e, σi −→ error
· Composition
hc0 , σi −→ error
3.
hc0 ; c1 , σi −→ error
41
· Conditional
hb, σi −→∗ error
3.
hif b then c else c0 , σi −→ error
· Repetition
hb, σi −→∗ error
3.
hwhile b do c, σi −→ error
So the only possibility of dynamic errors in L arises from the subtraction of a greater from a
smaller. Of course other languages can provide many other kinds of dynamic errors: division by
zero, overflow, taking the square root of a negative number, failing dynamic type-checking tests,
overstepping array bounds, missing a dangling reference or reaching an uninitialised location
etc. etc. But the above simple example does at least indicate a possibility.
e ::= m | t | v | e0 bop e1 | ∼e
Note: We have taken Var to be infinite in the above in order to raise a little problem (later)
on how to avoid infinite memories.
Many expressions such as (tt + 5) or ∼6 now have no sense to them, and nor do such commands
as if x or 5 then c0 else c1 . To make sense an expression must have a type, and in L0 there are
exactly two possibilities:
42
• Types: τ ∈ Types = {nat, bool}
To see which expressions have types and what they are we will just give some rules for assertions:
Note first that the basic syntactic sets have, in a natural way, associated type information.
Clearly we will have truth-values having type bool, numbers having type nat, variables having
type nat and for each binary operation, bop, we have a partial binary function τbop on Types:
• Rules
Truth-values: t : bool
Numbers: m : nat
Variables: v : nat
e0 : τ0 e1 : τ1
Binary Operations: (where τ2 = τbop (τ0 , τ1 ))
e0 bop e1 : τ2
e : bool
Negation:
∼e : bool
Now for commands we need to sort out those commands which are well-formed in the sense that
all subexpressions have a type and are Boolean when they ought to be. The rules for commands
involve assertions:
Nil: Wfc(nil)
e : nat
Assignment:
Wfc(v := e)
Wfc(c0 ) Wfc(c1 )
Sequencing:
Wfc(c0 ; c1 )
e : bool Wfc(c0 ) Wfc(c1 )
Conditional:
Wfc(if e then c0 else c1 )
e : bool Wfc(c)
While:
Wfc(while e do c)
Of course all of this is really quite trivial and one could have separated out the Boolean ex-
pressions very easily in the first place, as was done with L. However, we will see that the
method generalises to the context-sensitive aspects, also referred to in the literature as the
static semantics.
43
Turning to the dynamic semantics we want now to avoid configurations hc, σi with σ : Var −→
N, as such stores are infinite objects. For we have more or less explicitly indicated that we are
doing (hopefully nice) finitary mathematics. The problem is easily overcome by noting that we
only need σ to give values for all the variables in C, and there are certainly only finitely many
such variables. Consequently for any finite subset V of Var we set:
StoresV = V −→ N
where Var(e) is the set of variables occurring in e. The rules are much the same as before,
formally speaking. That is they are the same as before but with the variables and metavariables
ranging over the appropriate sets and an added index. So for example in the rule
hc0 , σi −→V σ 0
Comp 2
hc0 ; c1 , σi −→V hc1 , σ 0 i
it is meant that c0 , c1 (and hence c0 ; c1 ) are well formed commands with their variables all in
V and all of the configurations mentioned in the rule are in ΓC,V .
it is meant that all the expressions e0 , e00 , e0 + e1 , e00 + e1 have a type (which must here be
nat) and all their variables are in V and all the configurations mentioned in the rule are in
ΓE,V . Thus the rules define families of transition relations, −→V ⊆ ΓE,V × ΓE,V for expressions,
−→V ⊆ ΓC,V × ΓC,V for commands.
In the above we have taken the definition of Var(e), the variables occurring in e and also of
Var(c) for granted as it is rather obvious what is meant. However, it is easily given by a so-called
definition by structural induction.
Var(t) = Var(m) = ∅
Var(v) = {v}
Var(e0 bop e1 ) = Var(e0 ) ∪ Var(e1 )
Var(∼e) = Var(e)
With this kind of syntax-directed definition what is meant is that it can easily be shown by
SI that the above equations ensure that for any e there is only one V with Var(e) = V . The
44
definition for commands is similar and is left to the reader, the only point of (very slight)
interest is the definition of Var(v := e).
The definition can also be cast in the form of rules for assertions of the form Var(t) = V .
Truth-values: Var(t) = ∅
Numbers: Var(m) = ∅
Variables Var(v) = {v}
Var(e0 ) = V0 Var(e1 ) = V1
Binary Operations:
Var(e0 bop e1 ) = V0 ∪ V1
Var(e) = V
Negation:
Var(∼e) = V
Finally we have a parametrical form of behaviour. For example for commands we have a partial
function:
The point here is to specify failures in the type-checking mechanism. Here are some rules for a
very crude specification where one just adds a new predicate Error.
• Binary Operations
Error(e0 )
(1)
Error(e0 bop e1 )
Error(e1 )
(2)
Error(e0 bop e1 )
e0 : τ0 e1 : τ1
(3) (if τbop (τ0 , τ1 ) is undefined)
Error(e0 bop e1 )
• Negation
Error(e)
Error(∼e)
• Assignment
Error(e)
(1)
Error(v := e)
e : bool
(2)
Error(v := e)
45
• Sequencing
Error(c0 )
(1)
Error(c0 ; c1 )
Error(c1 )
(2)
Error(c0 ; c1 )
• Conditional
Error(e)
(1)
Error(if e then c0 else c1 )
Error(c0 )
(2)
Error(if e then c0 else c1 )
Error(c1 )
(3)
Error(if e then c0 else c1 )
e : nat
(4)
Error(if e then c0 else c1 )
• While
Error(e)
(1)
Error(while e do c)
Error(c)
(2)
Error(while e do c)
e : nat
(3)
Error(while e do c)
3.8 Exercises
Expressions
2. Write down rules for the right-to-left evaluation of expressions, as opposed to the left-to-
right evaluation described above.
3. Write down rules for the parallel evaluation of expressions, so that the following kind of
transition sequence is possible:
Here one transition is one action of imaginary processors situated just above the leaves
of the expressions (considered as a tree).
4. Note that in the rules if he, σi −→ he0 , σ 0 i then σ 0 = σ. This is the mathematical coun-
terpart of the fact that evaluation of L-expressions produces no side-effects. Rephrase the
rules for L-expressions in terms of relations σ ` e −→ e0 where σ ` e −→ e0 means that
he, σi −→ he0 , σi and can be read as “given σ, e reduces to e0 ”.
46
5. Give rules for “genuine” parallel evaluation where one or more processors as imagined in
3 can perform an action during the same transition. [Hint: Use the idea of exercise 4.]
∗∗
6. Try to develop a method of axiomatising entire derivation sequences. Can you find any
advantages for this idea?
Boolean Expressions
7. Can you find various kinds of rules analogous to those for or for conjunctions b and b0 ?
By the way, the left-sequential construct is often advantageous to avoid array subscripts
going out of range as in:
Presumably you will have given rules for the usual sequential conditional. Can you find
and give rules for a parallel conditional analogous to parallel disjunction?
9. Treat the following additions to the syntax which introduce the possibilities of side-effects
in the evaluation of expressions:
e ::= (v := e)
where the intention is that the value of (v := e) is the value of e but the assignment also
occurs, producing a side-effect in general.
10. Show that the equivalence relations on expressions and boolean expressions are respected
by the program constructs discussed above so that for example:
47
Commands
v+ := e
(v+ := e) ≡ (v := v + e)
v1 := (v2 := . . . (vn := e) . . .)
v1 := e1 and . . . and vn := en
where the vi must all be different. Execution of this command consists of first evaluating
all the expressions and then performing the assignments.
and show they can all be eliminated (to within equivalence) in favour of the ordinary
conditional.
do e times c
48
loop
c1
when b1 do c01 exit
c2
..
.
when bn do c0n exit
cn+1
repeat
where the last construct has n possible exits from the loop.
16. Show that equivalence is respected by the above constructs on commands so that, for
example
a) e ≡ e0 ⊃ (v := e) ≡ (v := e0 )
b) c0 ≡ c00 ∧ c1 ≡ c01 ⊃ c0 ; c1 ≡ c00 ; c01
c) b ≡ b0 ∧ c0 ≡ c00 ∧ c1 ≡ c01 ⊃ if b then c0 else c1 ≡ if b0 then c00 else c01
d) b ≡ b0 ∧ c ≡ c0 ⊃ while b do c ≡ while b0 do c0
17. Redefine behaviour and equivalence to take account of run-time errors. Do the statements
of exercise 16 remain valid?
∗∗
18. Try time and space complexity in the present setting. [Hint: Consider configurations of
the form, say, hc, σ, t, si where
There is lots to do. Try finding fairly general definitions, define behaviour and equivalence
(approximate equivalence?) and see which program equivalences preserve equivalence. Try
looking at measures for the parallel evaluation of expressions. Try to see what is reasonable
to incorporate from complexity literature. Can you use the benefits of our structured
languages to make standard simulation results easier/nicer for students?
∗∗
19. Try exercises 23 and 24 from Chapter 1 again.
20. Give an operational semantics for L, but where only 1 step of the evaluation of an expres-
sion or Boolean expression is needed for 1 step of execution of a command. Which of the
two possibilities – the “big steps” of the main text or the “little steps” of the exercise –
do you prefer and why?
49
Proof
21. Let c be any command not involving subexpressions of the form (e−e0 ) or while loops but
allowing the simple iteration command of exercise 15. Show that any execution sequence
hc, σi −→∗ . . . terminates.
e0 + 0 ≡ e0
e0 + e1 ≡ e1 + e0
e0 + (e1 + e2 ) ≡ (e0 + e1 ) + e2
etc
23. Establish or refute each of the following suggested equivalences for the language L (and
slight extensions, as indicated):
a) nil; c ≡ c ≡ c; nil
b) c; if b then c0 else c1 ≡ if begin c result b then c0 else c1
c) (if b then c0 else c1 ); c ≡ if b then c0 ; c else c1 ; c
d) while b do c ≡ if b then (c; while b do c) else nil
e) repeat c until b ≡ c; while ∼b do c
Type-Checking
24. Make L0 a little more realistic by adding a type real, decimals, variables of all three types,
and a variety of operators. Allow nat to real conversion, but not vice-versa.
25. Show that if hc, σi −→ hc0 , σ 0 i and x ∈ Dom(σ)\V ar(c) then σ(x) = σ 0 (x).
26. Show that if hc, σi −→ hc0 , σ 0 i is a transition within ΓC,V and hc, σi −→ hc0 , σ 0 i is a
transition within ΓC,V 0 where V ⊆ V 0 then, if σ = σ V , it follows that σ 0 = σ 0 V .
50
27. The static error specification is far too crude. Instead one should have a set M of messages
and a relation:
and similarly for commands. Design a suitable M and a specification of Error for L0 . Try
to develop a philosophy of what a nice error message should be. See [Hor] for some ideas.
28. How would you treat dynamic type-checking in L0 ? What would be the new ideas for error
messages (presumably one adds an M (see exercise 27) to the configurations).
The idea of reduction sequences originates in the λ-calculus [Hin] as does the present method
of specifying steps axiomatically where I was motivated by Barendregt’s thesis [Bar1]. I applied
the idea to λ-calculus-like programming languages in [Plo1], [Plo2] and Milner saw how to
extend it to simple imperative languages in [Mil1]. More recently the idea has been applied to
languages for concurrency and distributed systems [Hen1], [Mil2], [Hen2]. The present course
is a systematic attempt to apply the idea as generally as possible. A good deal of progress
has been made on other aspects of reduction and the λ-calculus, a partial survey and further
references can be found in [Ber] and see [Bar2].
Related ideas can be found in work by de Bakker and de Roever. A direct precursor of our
method can be found in the work by Lauer and Hoare [Hoa], who use configurations which have
the rough form hs1 , . . . , sn , σi where the si are statements (includes commands). They define
a next-configuration function and the definition is to some extent syntax-directed. The idea of
a syntax-directed approach was independently conceived and mentioned all too briefly in the
work of Salwicki [Sal].
Somewhat more distantly various grammatical (= symbol-pushing too) approaches have been
tried. For example W-grammars [Cle] and attribute grammars [Mad]; although these defini-
tions are not syntax-directed definitions of single transitions it should be perfectly possible to
use the formalisms to write definitions which are. The question is rather how appropriate the
formalisms would be with regard to such issues as completeness, clarity (= readability), nat-
uralness, realism, modularity (= modifiability + extensionality). One good discussion of some
of these issues can be found in [Mar]. For concern with modularity consult the course notes
of Peter Mosses. Our method is clearly intended to be complete and natural and realistic, and
we try to be clear; the only point is that it is quite informal, being normal finite mathematics.
There must be many questions on good choices of formalism. As regards modularity we just
hope that if we get the other things in a reasonable state, then current ideas for imposing
modularity on specifications will prove useful.
For examples of good syntax-directed English specifications consult the excellent article by
51
Ledgard on ten mini-languages [Led]. These languages will provide you with mini-projects
which you should find very useful in understanding the course, and which could very well be
the basis for more extended projects. For a much more extended example see the ALGOL 68
Report [Wij]. Structural Induction seems to have been introduced to Computer Science by
Burstall in [Bur]; for a system which performs automatic proofs by Structural Induction on
lists see [Boy]. For discussions of what error messages should be see [Hor] and for remarks on
how and whether to specify them see [Mar].
4 Bibliography
[Bar1] Barendregt, H. (1971) Some Extensional Term Models for Combinatory Logic and
Lambda-calculi, PhD thesis, Department of Mathematics, Utrecht University.
[Bar2] Barendregt, H. (1981) The Lambda Calculus, Studies in Logic 103, North-Holland.
[Ber] Berry, G. and Lévy, J-J. A Survey of Some Syntactic Results in the Lambda-calculus,
Proc. MFCS’79, ed. J. Becvár, LNCS 74, pp. 552–566.
[Boy] Boyer, R.S. and Moore, J.S. (1979) A Computational Logic, Academic Press.
[Bur] Burstall, R.M.B. (1969) Proving Properties of Programs by Structural Induction, Com-
puter Journal 12(1):41–48.
[Cle] Cleaveland, J.C. and Uzgalis, R.C. (1977) Grammars for Programming Languages,
Elsevier.
[Hen1] Hennessy, M.C.B. and Plotkin, G.D. (1979) Full Abstraction for a Simple Parallel
Programming Language, Proc. MFCS’79, ed. J. Becvár, LNCS 74, pp. 108–120.
[Hen2] Hennessy, M.C.B., Li, Wei and Plotkin, G.D. (1981) A First Attempt at Translating
CSP into CCS, Proc. ICDCS’81, pp. 105–115, IEEE.
[Hin] Hindley, J.R., Lercher, B. and Seldin, J.P. (1972) Introduction to Combinatory Logic,
Cambridge University Press.
[Hoa] Hoare, C.A.R. and Lauer, P.E. (1974) Consistent and Complementary Formal Theories
of the Semantics of Programming Languages, Acta Informatica 3:135–153.
[Hor] Horning, J.J. (1974) What the Compiler Should Tell The User, Compiler Construction:
An Advanced Course, eds F.L. Bauer and J. Eickel, LNCS 21, pp. 525–548.
[Lau] Lauer, P.E. (1971) Consistent Formal Theories of The Semantics of Programming
Languages, PhD thesis, Queen’s University of Belfast, IBM Laboratories Vienna TR
25.121.
[Led] Ledgard, H.F. (1971) Ten Mini-Languages: A Study of Topical Issues in Programming
Languages, ACM Computing Surveys 3(3):115–146.
[Mad] Madsen, O.L. (1980) On Defining Semantics by Means of Extended Attribute Gram-
mars, Semantics-Directed Compiler Generation, ed. N.D. Jones, LNCS 94, pp. 259–299.
[Mar] Marcotty, M., Ledgard, H.F. and von Bochmann, G. (1976) A Sampler of Formal
Definitions, ACM Computing Surveys 8(2):191-276
[Mil1] Milner, A.J.R.G. (1976) Program Semantics and Mechanized Proof, Foundations of
Computer Science II, eds K.R. Apt and J.W. de Bakker, Mathematical Centre Tracts
82, pp. 3–44.
52
[Mil2] Milner, A.J.R.G. (1980) A Calculus of Communicating Systems, LNCS 92.
[Plo1] Plotkin, G.D. (1975) Call-by-name, Call-by-value and the Lambda-calculus, Theoretical
Computer Science 1(2):125–159.
[Plo2] Plotkin, G.D. (1977) LCF Considered as a Programming Language, Theoretical Com-
puter Science 5(3):223–255.
[Sal] Salwicki, A. (1976) On Algorithmic Logic and its Applications, Mathematical Institute,
Polish Academy of Sciences.
[Wij] van Wijngaarden, A., Mailloux, B.J., Peck, J.E.L., Koster, C.H.A., Sintzoff, M., Lind-
sey, C.H., Meertens, L.G.T. and Fisker, R.G. (1975) Revised Report on the Algorithmic
Language ALGOL 68, Acta Informatica 5:1–236.
53
5 Definitions and Declarations
5.1 Introduction
In this chapter we begin the journey towards realistic programming languages by considering
binding mechanisms which enable the introduction of new names in local contexts. This leads to
definitions of local variables in applicative languages and declarations of constant and variable
identifiers in imperative languages. We will distinguish the semantic concepts of environments
and stores. The former concerns those aspects of identifiers which do not change throughout
the evaluation of expressions or the execution of commands and so on; the latter concerns those
aspects which do as in side-effects in the evaluation of expressions or the effects of the execution
of commands. In the static semantics context-free methods no longer suffice, and we show how
our rules enable the context-sensitive aspects to be handled in a natural and syntax-directed
way.
We consider a little applicative (= functional) language with simple local definitions of variables.
It can be considered as a first step towards full-scale languages like ML [Gor].
Clearly any expression contains various occurrences of variables, and in our language there are
two kinds of these occurrences. First we have defining occurrences where variables are intro-
duced; second we have applied occurrences where variables are used. For example, considering
the figure below the defining occurrences are 2, 6, 9 and the others are applied. In some lan-
guages - but not ours! - one finds other occurrences which can fairly be termed useless.
54
x1 ∗ ( let x2 = 5 ∗ y 3 ∗ x4
in x5 + ( let y 6 = 14 − x7
in y 8 + ( let x9 = 3 + x10 + x11
in x12 ∗ y 13 )))
Now the region of program text over which defining occurrences have an influence is known as
their scope. One often says, a little loosely, that, for example, the scope of the first occurrence
of x in e = let x = e0 in e1 is the expression e1 . But then one considers examples such as
that of the above figure, where occurrence 12 is not in the scope of 2 (as it is instead in the
scope of 9), this is called a hole in the scope of 2. It is more accurate to say that the scope
of a defining occurrence is a set of applied occurrences. In the case of let x = e0 in e1 the
scope of x is all those applied occurrences of x in e1 , which are not in the scope of any defining
occurrence of x in e1 . Thus in the case of figure 1 we have the following table showing which
applied occurrences are in the scope of which defining occurrences (equivalently which defining
occurrences bind which applied occurrences).
Note that each applied occurrence is in the scope of at most one defining occurrence. Those
not in any scope are termed free (versus bound); for example occurrences 1, 3, 4 above are free.
One can picture the bindings and the free variables by means of a drawing with arrows such
as:
3
let x = 5 + y
6
6 3
in let y = 4 + x + y
S
o
S 3
S
in x + y + z
From the point of view of semantics it is irrelevant which identifiers are chosen just so long as
the same set of bindings is generated. (Of course a sensible choice of identifiers greatly affects
readability, but that is not a semantic matter.) All we really need are the arrows, but it is
55
hard to accommodate then into our one-dimensional languages. In the literature on λ-calculus
one does find direct attempts to formalise the arrows and also attempts to eliminate variables
altogether; as in Combinatory Logic [Hin]; in Dataflow one sees graphical languages where the
graphs display the arrows [Ack].
Static Semantics
Free Variables: The following definition by structural induction is of FV(e), the set of free
variables (= variables with free occurrences) of e:
m x e0 bop e1 let x = e0 in e1
Example 17
Dynamic Semantics
For the most part applicative languages have no concept of state; there is only the evaluation
of expressions in different environments (= semantic contexts). We take:
EnvV = (V −→ N)
P
for any finite subset of V of the set Var of variables and let ρ range over Env = V EnvV and
write ρ : V to mean that ρ is in EnvV . Of course EnvV = StoreV , but we introduce a new
notation in order to emphasise the new idea.
ΓV = {e ∈ Exp | FV(e) ⊆ V }
TV = N
56
The transition relation is now relative to an environment and for any ρ : V and e, e0 in ΓV we
write
ρ `V e −→ e0
and read that in (= given) environment ρ one step of the evaluation of the expression e results
in the expression e0 . The use of the turnstile is borrowed from formal logic as we wish to think
of the above as an assertion of e −→ e0 conditional on ρ which in turn is thought of as an
assertion supplied by the environment on the values of the free variables of e and e0 . As this
environment will not change from step to step of the evaluation of an expression, we will often
use, fixing ρ in the transition relation, the transitive reflexive closure ρ `V e −→∗ e0 . It is left
to the reader to define relative transition systems.
Rules:
Variables: ρ `V x −→ ρ(x)
Binary Operations: (1) ρ `V e0 −→ e00 ⇒ ρ `V e0 bop e1 −→ e00 bop e1
(2) ρ `V e1 −→ e01 ⇒ ρ `V m bop e1 −→ m bop e01
(3) ρ `V m bop m0 −→ n (where n = m bop m0 )
Note: To save space we are using an evident horizontal lay-out for our rules. That is the rule:
A1 . . . . . . Ak
A
can alternatively be written in the form
A1 , . . . . . . , Ak ⇒ A.
ρ `V e0 −→ e00
(1)
ρ `V let x = e0 in e1 −→ let x = e00 in e1
ρ[m/x] `V ∪{x} e1 −→ e01
(2)
ρ `V let x = m in e1 −→ let x = m in e01
(3) ρ `V let x = m in n −→ n
Of course these rules are just a clearer version of those given in Chapter 2 for expressions (as
suggested in exercise 4). Continuing the logical analogy our rules look like a Gentzen system
57
of natural deduction [Pra] written in a linear way. Possible definitions of behaviour are left to
the reader.
In general it is not convenient just to repeat simple definitions, and so we consider several ways
of putting definitions together. The category of expressions is now:
An Expression
where x is a typical free variable of e. A definition, d, imports values for its free variables and
exports values for its defining variables (those with defining occurrences). This can be pictured
as:
x- y -
d
A Definition
These are dataflow diagrams and they also help explain compound expressions and definition.
For example a definition block let d in e imports from its environment into d and then d exports
into e with any other needed imports of e coming from the block environment. Pictorially
58
a- - x
d Q
- y
Q
- Q
b -. Q
Q
- e Q
-
c-
A Definition Block
Here a is a typical variable imported by d but not e, and b is one imported by d and e, and
c is one imported by e and not d; again x is a variable exported by d and not imported by e
(useless but logically possible), and y is a variable exported by d and imported by e. Of course
we later give a precise explanation of all this by formal rules of an operational semantics.
a -
- - x
b - y d1 - y
- -
- z -
d0 - z
c - - u
Sequential Definition
Simultaneous definition is much simpler; d0 and d1 imports into both d0 and d1 from the
environment and then exports from both (and there must be no common defined variable).
Pictorially
59
a -
d0 - x
-
b - .
d1 - y
c -
Simultaneous Definition
Finally, a private definition d0 in d1 is just like a sequential one, except that the only exports
are from d1 . It can be pictured as:
a -
- - x
b - d1
- y - y
z
d0
c - u
Private Definition
As remarked in [Ten] many programming languages essentially force one construct to do jobs
better done by several; for instance it is common to try to get something of the effect of both
sequential and simultaneous definition. A little thought should convince the reader that there
are essentially just the three interesting ways of putting definitions together.
let x = 3
in let x = 5 & y = 6 ∗ x
in x + y
60
Depending on whether & is ; or and or in, the expression has the values 35 = 5 + (6 ∗ 5) or
23 = 5 + (6 ∗ 3) or 33 = 3 + (6 ∗ 5).
Static Semantics
We will define the set DV(d) of defined variables of a definition d and also FV(d/e), the set of
free variables of a definition d or expression e.
nil x = e d0 ; d1 d0 and d1 d0 in d1
For expressions the definition of free variables is the same as before except for the case
Because of the restriction on simultaneous definitions not all expressions or definitions are well-
formed - for example consider let x = 3 and x = 6 in x. So we also define the well-formed ones
by means of rules for a predicate W(d/e) on definitions and expressions.
Rules:
• Definitions
Nil: W(nil)
Simple: W(e) ⇒ W(x = e)
Sequential: W(d0 ), W(d1 ) ⇒ W(d0 ; d1 )
Simultaneous: W(d0 ), W(d1 ) ⇒ W(d0 and d1 ) (if DV(d0 ) ∩ DV(d1 ) = ∅)
Private: W(d0 ), W(d1 ) ⇒ W(d0 in d1 )
• Expressions
Constants: W(m)
Variables: W(x)
Binary Op.: W(e0 ), W(e1 ) ⇒ W(e0 bop e1 )
61
Definitions: W(d), W(e) ⇒ W(let d in e)
Dynamic Semantics
It is convenient to introduce some new notation to handle environments. For purposes of dis-
playing environments consider, for example, ρ : {x, y, z}, where ρ(x) = 1, ρ(y) = 2, ρ(z) = 3.
We will also write ρ as {x = 1, y = 2, z = 3} and drop the set brackets when desired; this
situation makes it clearer that environments can be thought of as assertions.
Next for any V0 , V1 and ρ0 :V0 , ρ1 :V1 we define ρ = ρ0 [ρ1 ]:V0 ∪ V1 by:
ρ1 (x)
(x ∈ V1 )
ρ(x) =
ρ (x)
0 (x ∈ V0 \V1 )
We now have the nice ρ[x = m] to replace the less readable ρ[m/x]. Finally for any ρ0 :V0 , ρ1 :V1
with V0 ∩ V1 = ∅ we write ρ0 , ρ1 for ρ0 ∪ ρ1 . Of course this is equal to ρ0 [ρ1 ], and also to ρ1 [ρ0 ],
but the extra notation makes it clear that it is required that V0 ∩ V1 = ∅.
ΓV = {e | W(e), FV(e) ⊆ V }
and of course
TV = N
For definitions the idea is that just as an expression is evaluated to yield values so is a definition
elaborated to yield a “little” environment (for its defined variables). For example, given ρ =
{x = 1, y = 2, z = 3} the definition x = 5 + x + z; y = x + y + z is elaborated to yield
{x = 9, y = 14}. In order to make this work we add another clause to the definition of Def
d ::= ρ
What this means is that the abstract syntax of declaration configurations allows environments;
it does not mean that the abstract syntax of declarations does so.
In a sense we slipped a similar trick in under the carpet when we allowed numbers as expressions.
Strictly speaking we should only have allowed literals and then allowed natural numbers as part
of the configurations and given rules for evaluating literals to numbers. Similar statements hold
for other kinds of literals. However, there seemed little point in forcing the reader through this
tedious procedure.
62
Returning to definitions we now add clauses for free and defined variables:
FV(ρ) = ∅
DV(ρ) = V (if ρ : V )
and also add for any ρ that W(ρ) holds, and for any V that
ΓV = {d | W(d), FV(d) ⊆ V }
and
TV = {ρ}
ρ `V d −→ d0
x = 1, y = 2, z = 3 ` x = (5 + x) + z; y = (x + y) + z
−→∗ {x = 9}; y = (x + y) + z
−→∗ {x = 9}; {y = 14}
−→ {x = 9, y = 14}
Rules:
63
Sequential: Informally to elaborate d0 ; d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0
(2) Elaborate d1 in ρ[ρ0 ] yielding ρ1
Then the elaboration of d0 ; d1 yields ρ0 [ρ1 ]. Formally we have:
ρ `V d0 −→ d00
(1)
ρ `V d0 ; d1 −→ d00 ; d1
ρ[ρ0 ] `V ∪V0 d1 −→ d01
(2) (where ρ0 : V0 )
ρ `V ρ0 ; d1 −→ ρ0 ; d01
(3) ρ `V ρ0 ; ρ1 −→ ρ0 [ρ1 ]
Simultaneous: Informally to elaborate d0 and d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0 .
(2) Elaborate d1 in ρ yielding ρ1 .
Then the elaboration of d0 and d1 yields ρ0 , ρ1 if that is defined. Formally
(1) ρ `V d0 −→ d00 ⇒ ρ `V d0 and d1 −→ d00 and d1
(2) ρ `V d1 −→ d01 ⇒ ρ `V ρ0 and d1 −→ ρ0 and d01
(3) ρ `V ρ0 and ρ1 −→ ρ0 , ρ1
Private: Informally to elaborate d0 in d1 given ρ
(1) Elaborate d0 in ρ yielding ρ0 .
(2) Elaborate d1 in ρ[ρ0 ] yielding ρ1 .
Then the elaboration of d0 in d1 yields ρ1 . Formally
(1) ρ `V d0 −→ d00 ⇒ ρ `V d0 in d1 −→ d00 in d1
(2) ρ[ρ0 ] `V ∪V0 d1 −→ d01 ⇒ ρ `V ρ0 in d1 −→ ρ0 in d01
(where ρ0 : V0 )
(3) ρ `V ρ0 in ρ1 −→ ρ1
Example 21
x = 1, y = 2, z = 3 ` x = (5 + x) + z; y = (x + y) + z
SEQ1
−→ x = (5 + 1) + z; y = (x + y) + z (using SIM1)
SEQ1
−→ x = 9; y = (x + y) + z (using SIM1)
SEQ1
−→ {x = 9}; y = (x + y) + z (using SIM2)
SEQ2
−→ {x = 9}; y = (9 + y) + z
SEQ2
−→ {x = 9}; {y = 14}
SEQ3
−→ {x = 9, y = 14}.
The reader is encouraged here (and generally too) to work out examples for all the other
constructs.
64
5.4 Type-Checking and Definitions
New problems arise in static semantics when we consider type-checking and definitions. For
example one cannot tell whether or not such an expression as x or tt or x + x is well-typed
without knowing what the type of x is and that depends on the context of its occurrence. We
will be able to solve these problems by introducing static environments α to give this type
information and giving rules to establish properties of the form
α `V e : τ
• Basic Sets
Types: τ ∈ Types = {nat, bool}
Numbers: m, n ∈ N;
Truth-values: t ∈ T;
Variables: x, y, z ∈ Var;
Binary Operations: bop ∈ Bop = {+, −, ∗, =, or}.
• Derived Sets
Constants: con ∈ Con where con ::= m | t
Definitions: d ∈ Def where
Static Semantics
The definitions of DV(d) and FV(d) are as before as is FV(e) just adding that
TEnvV = V −→ Types
and the set TEnvV = V TEnvV is ranged over by α and β and we write α : V for α ∈ TEnvV .
P
Of course all the notation α[β] and α, β extends without change from ordinary environments
to type environments.
65
Now for every V and α:V , τ and e with FV(e) ⊆ V we give rules for the relation
α `V e : τ
meaning that given α the expression e is well-formed and has type τ . This will involve us in
giving similar rules for constants and also for every V and α : V , β and definition d with
FV(d) ⊆ V , for the relation
α `V d : β
meaning that given α the definition d is well-formed and yields the type environment β.
Rules:
• Constants:
Numbers: α `V m : nat
Truth-values: α `V t : bool
• Expressions:
Constants: α `V con : τ ⇒ α `V con : τ (this makes sense!)
Variables: α `V x : α(x)
Negation: α `V e : bool ⇒ α `V ∼e : bool
α `V e0 : τ0 α `V e1 : τ1
Binary Operations: (if τ = τ0 τbop τ1 )
α `V e0 bop e1 : τ
Conditional: α `V e0 : bool, α `V e1 : τ, α `V e2 : τ
⇒ α `V if e0 then e1 else e2 : τ
α `V d : β α[β] `V ∪V0 e : τ
Definition: (where β : V0 )
α `V let d in e : τ
Definition 23
Nil: α `V nil : ∅
Simple: α `V e : τ ⇒ α `V (x : τ = e) : {x = τ }
α `V d0 : β0 α[β0 ] `V ∪V0 d1 : β1
Sequential: (where β0 : V0 )
α `V (d0 ; d1 ) : β0 [β1 ]
α `V d0 : β0 α `V d1 : β1
Simultaneous: (if DV(d0 ) ∩ DV(d1 ) = ∅)
α `V (d0 and d1 ) : β0 , β1
α `V d0 : β0 α[β0 ] `V ∪V0 d1 : β1
Private: (where β0 : V0 )
α `V (d0 in d1 ) : β1
66
It is hoped that these rules are self-explanatory. It is useful to define for any V and α : V and
e with FV(e) ⊆ V the property of being well-formed
WV (e, α) ≡ ∃τ. α `V e : τ
and also for any V , α : V and d with FV(d) ⊆ V the property of being well-formed
WV (d, α) ≡ ∃β. α `V d : β.
Dynamic Semantics
If x has type τ in environment α then in the corresponding ρ it should be the case that ρ(x)
also has type τ ; that is if τ = nat, then we should have ρ(x) ∈ N and otherwise ρ(x) ∈ T. To
this end for any V and α : V and ρ : V −→ N + T we define:
and put Envα = {ρ : V −→ N + T | ρ : α}. Note that if ρ0 : α0 and ρ1 : α1 then ρ0 [ρ1 ] : α0 [α1 ]
and so too that (if it makes sense) (ρ0 , ρ1 ) : (α0 , α1 ).
Configurations: We separate out the various syntactic categories according to the possible
type environments.
Transition Relations:
ρ `α e −→ e0
ρ `α d −→ d0
Rules: The rules are much as usual but with the normal constraints that all mentioned ex-
pressions and definitions be configurations and environments be of the right type-environment.
Here are three examples which should make the others obvious.
• Expressions:
ρ[ρ0 ] `α[α0 ] e −→ e0
Definition 2: (where ρ0 : α0 )
ρ `α let ρ0 in e −→ let ρ0 in e0
67
• Definitions:
Simple 2: ρ `α x = con −→ {x = con}
ρ[ρ0 ] `α[α0 ] d1 −→ d01
Sequential 2: (where ρ0 : α0 )
ρ `α ρ0 ; d1 −→ ρ0 ; d01
Example 24
The ideas so far developed transfer to imperative languages where we will speak of declarations
(of identifiers) rather than definitions (of variables). Previously we have used stores for imper-
ative languages and environments for applicative ones, although mathematically they are the
same - associations of values to identifiers/variables. It now seems appropriate, however, to use
both environments and stores; the former shows what does not vary and the latter what does
vary when commands are executed.
It is also very convenient to change the definitions of stores by introducing an (arbitrary) infinite
set, Loc, of locations (= references = cells) and taking for any L ⊆ Loc
StoresL = L −→ Values
and
X
Stores = StoresL ( = Loc −→fin Values)
L
68
and putting
The idea is that if in some environment ρ we have an identifier x whose values should not
vary then ρ(x) = that value; otherwise ρ(x) is a location, l, and given a store σ : L (with l
in L) then σ(l) is the value held in the location l (its contents). In the first case we talk of
constant identifiers and in the second we talk of variable identifiers. The former are introduced
by constant declarations like
const x = 5
var x = 5
In all cases declarations will produce new (little) environments, just as before. The general form
of transitions will be:
ρ `l hd, σi −→ hd0 , σ 0 i
where ρ is the elaboration environment and σ, σ 0 are the stores. So, for example we will have
and
Locations can be thought of as “abstract addresses” where we do not really want to commit
ourselves to any machine architecture, but only to the needed logical properties. A better way
to think of a location is as an individual or object which has lifetime (= extent); it is created
in a transition such as (∗) and its lifetime continues either throughout the entire computation
(execution sequence) or until it is deleted (= disposed of) (the deletion being achieved either
through such mechanisms as block exit or through explicit storage management primitives
in the language). Throughout its lifetime it has a (varying) contents, generally an ordinary
mathematical value (or perhaps other locations). It is generally referred to by some identifier
and is then said to be the L-value (or left-hand value) of the identifier and its contents, in
some state, is the R-value (right-hand value) of the identifier, in that state. The lifetime of the
location is related to, but logically distinct from the scope of the identifier. Thus we have a
two-level picture
69
ρ σ
.... ....... .......H H .......
x ... HHHH - l. H H - v
6 6 6 6 6
x := y
where on the left we think of the variable as referring to a location and on the right as referring
to a value. Indeed we analyse the effect of assignment as changing the contents of the location
to the R-value of y:
ρ ` hx := y, σi −→ σ[ρx = σ(ρy)]
This is of course a more complicated analysis of assignment than in Chapter 2. The L/R ter-
minology is a little inappropriate in that some programming languages write their assignments
in the opposite order and also in that not all occurrences on the left of an assignment are
references to L-values.
The general idea of locations and separation of environments and stores comes from the Scott-
Strachey tradition (e.g., [Gor,Ten,Led]); it is also reminiscent of ideas of individuals in modal
logic [Hug]. In fact we do not need locations for most of the problems we encounter in the rest
of this chapter (see exercise 26) but they will provide a secure foundation for later concepts
such as
• Static binding of the same global variables in different procedure bodies (storage sharing).
• Call-by-reference (aliasing problems).
• Arrays (location expressions).
• Reference types (anonymous references).
On the other hand it would be interesting to see how far one can get without locations and to
what extent programming languages would suffer from their excision (see [Don][Rey]). One can
argue that it is the concept of location that distinguishes imperative from applicative languages.
Syntax:
• Basic Sets:
Types: τ ∈ T ypes = {bool, nat}
Numbers: m, n ∈ N
Truth-values: t∈T
70
Binary Operations: bop ∈ Bop
• Derived Sets
Constants: con ∈ Con where con ::= m | t
Expressions: e ∈ Exp where
Note: On occasion we write begin c end for (c). That is begin . . . end act as command
parentheses, and have no particular semantic significance. However, their use can make scopes
more apparent.
The whole of our discussion of defining, applied, and free and bound occurrences carries over
to commands and is illustrated by the command in figure 2.
var x : bool = tt ;
6
6 >
var y : int = if x then 0 else z;
6 >
const z : bool = if ∼ (x =0) then tt else v;
6
begin
y : = if x then 0 else z
>
x : = tt or v
end
Bindings
Note that left-hand variable occurrences in assignments are applied, not binding.
71
Static Semantics
Identifiers: For expressions we need the set, FI(e), of identifiers occurring freely in e (defined
as usual). For declarations we need the sets FI(d) and DI(d) of identifiers with free and defining
occurrences in d; they are defined just like in the case of definitions and of course
For commands we only need FI(c) defined as usual plus FI(d; c) = FI(c)\DI(d).
Type-Checking: We take
and write α : I for any α in TEnv with domain I ⊆ Id. The idea is that α(x) = τ means that x
def
denotes a value of type τ , whereas α(x) = τ loc ( = hτ, loci) means that x denotes a location
which holds a value of type τ .
Assertions:
α `I e : τ
α `I d : β
meaning that given α the declaration d is well-formed and yields the type-environment β.
• Commands: Here for each I and command c with FI(c) ⊆ I and type-environment α : I we
define:
α `I c
Rules:
72
α `I e : τ
Constants:
α `I const x : τ = e : {x = τ }
α `I e : τ
Variables:
α `I var x : τ = e : {x = τ loc}
• Commands: The rules are similar to those in Chapter 2. We give an illustrative sample.
Nil: α `I nil
α `I e : τ
Assignment: (if α(x) = τ loc)
α `I x := e
α `I c 0 α `I c 1
Sequencing:
α `I c 0 ; c 1
α `I d : β α[β] `I∪I0 c
Blocks: (where β : I0 )
α `I d; c
Dynamic Semantics
Following the ideas on environments and stores we consider suitably typed locations and assume
we have for each τ infinite sets
Locτ
which are disjoint and that (in order to create new locations) we have for each I ⊆ Locτ a
location Newτ (I) ∈ Locτ with Newτ (I) 6∈ I (the new property).
Note: It is very easy to arrange these matters. Just put Locτ = N × {τ } and Newτ (I) =
hµm.hm, τ i 6∈ I, τ i.
[
Now putting Loc = Locτ we take for
τ
Transition Relations:
73
• Expressions: For any α : I we set
Γα = {he, σi | ∃τ. α `I e : τ }
Tα = {hcon, σi}
ρ `α he, σi −→ he0 , σ 0 i
d ::= ρ
and putting FI(ρ) = ∅ and DI(ρ) = I (where ρ : I), and putting α `I ρ : β (where ρ : β).
Now for any α : I we take
Rules:
• Expressions: These should be fairly obvious and we just give some examples.
Identifiers: (1) ρ `α hx, σi −→ hcon, σi (if ρ(x) = con)
(2) ρ `α hx, σi −→ hcon, σi (if ρ(x) = l and σ(l) = con)
ρ `α he0 , σi −→ he00 , σi
Conditional: (1)
ρ `α hif e0 then e1 else e2 , σi −→ hif e00 then e1 else e2 , σi
(2) ρ `α hif tt then e1 else e2 , σi −→ he1 , σi
(3) ρ `α hif ff then e1 else e2 , σi −→ he2 , σi
• Declarations:
Nil: ρ `α hnil, σi −→ h∅, σi
74
ρ `α he, σi −→ he0 , σ 0 i
Constants: (1)
ρ `α hconst x : τ = e, σi −→ hconst x : τ = e0 , σ 0 i
(2) ρ `α hconst x : τ = con, σi −→ h{x = con}, σi
Variables: Informally to elaborate var x : τ = e from state σ given ρ
(1) Evaluate e from state σ given ρ yielding con.
(2) Get a new location l and change σ to σ[l = con] and yield {x = l}.
Formally
ρ `α he, σi −→ he0 , σ 0 i
(1)
ρ `α hvar x : τ = e, σi −→ hvar x : τ = e0 , σ 0 i
(2) ρ `α hvar x : τ = con, σi −→ h{x = l}, σ[l = con]i
(where σ : L and l = Newτ (L ∩ Locτ ))
ρ `α hd0 , σi −→ hd00 , σ 0 i
Sequential: (1)
ρ `α hd0 ; d1 , σi −→ hd00 ; d1 , σ 0 i
ρ[ρ0 ] `α[α0 ] hd1 , σi −→ hd01 , σ 0 i
(2) (where ρ0 : α0 )
ρ `α hρ0 ; d1 , σi −→ hρ0 ; d01 , σ 0 i
(3) ρ `α hρ0 ; ρ1 , σi −→ hρ0 [ρ1 ], σi
Private: 1./2. Like Sequential.
3. ρ `α hρ0 in ρ1 , σi −→ hρ1 , σi
Simultaneous: (1) Like Sequential.
ρ `α hd1 , σi −→ hd01 , σ 0 i
(2)
ρ `α hρ0 and d1 , σi −→ hρ0 and d01 , σ 0 i
(3) ρ `α hρ0 and ρ1 , σi −→ hρ0 ,ρ1 , σi
Note: These definitions follow those for definitions very closely.
• Commands: On the whole the rules for commands are much like those we have already seen
in Chapter 2.
Nil: ρ `α hnil, σi −→ σ
ρ `α he, σi −→∗ hcon, σ 0 i
Assignment:
ρ `α hx := e, σi −→ σ 0 [l = con]
(where ρ(x) = l, and if l ∈ L where σ : L)
Composition: 1./2. Like Chapter 2, but with ρ.
Conditional While: Like Chapter 2, but with ρ.
Blocks: Informally to execute d; c from σ given ρ
(1) Elaborate d from σ given ρ yielding ρ0 and a store σ 0 .
(2) Execute c from σ 0 given ρ[ρ0 ] yielding σ 00 . Then σ 00 is the result of the
execution.
ρ `α hd, σi −→ hd0 , σ 0 i
(1)
ρ `α hd; c, σi −→ hd0 ; c, σ 0 i
ρ[ρ0 ] `α[α0 ] hc, σi −→ hc0 , σ 0 i
(2) (ρ0 : α0 )
ρ `α hρ0 ; c, σi −→ hρ0 ; c0 , σ 0 i
75
ρ[ρ0 ] `α[α0 ] hc, σi −→ σ 0
(3)
ρ `α hρ0 ; c, σi −→ σ 0
In the above we have not connected up ρ and σ. In principle it could happen either that
(1) There is an l in the range of ρ but not in the domain of σ. This is an example of a dangling
reference. They are also possible in relation to a configuration such as hc, σi where l occurs
in c (via some ρ) but not in the domain of σ.
(2) There is an l not in the range of ρ but in the domain of σ. And similarly wrt c and σ, etc.
This is an example of an inaccessible reference.
However, we easily show that if for example we have no dangling references in ρ and σ,
or c and σ and if ρ ` hc, σi −→∗ hc0 , σ 0 i then there are none either in ρ and σ 0 or c and σ 0 .
One says that the language has no storage insecurities. An easy way to obtain a language
which is not secure is to add the command
c ::= dispose(x)
(and σ\l = σ\{hl, σ(l)i}) (and obvious static semantics). One might wish to add an error
rule for attempted assignments to dangling references.
On the other hand according to out semantics we do have inaccessible references. For example
a block exit
ρ ` hvar x : bool = tt, begin nil end, σi −→ h{x = l}; nil, σ[l = tt]i
−→ σ[l = tt]
ρ ` hvar x : bool = tt; var x : bool = tt, σi −→ h{x = l1 }; var x : bool = tt, σ[l1 = tt]i
−→ h{x = l1 }; {x = l2 }, σ[l1 = tt, l2 = tt]i
−→ h{x = l2 }, σ[l1 = tt, l2 = tt]i
and again
ρ ` hvar x : bool = tt in var y : bool = tt, σi −→∗ h{x = l1 in y = l2 }, σ[l1 = tt, l2 = tt]i
−→ h{y = l2 }, σ[l1 = tt, l2 = tt]i
It is not clear whether inaccessible references should be allowed. They can easily be avoided,
at the cost of complicating the definitions, by “pruning” them away as they are created, a kind
76
of logical garbage collection. We prefer here to leave them in, for the sake of simple definitions;
they do not, unlike dangling references, cause any harm.
The semantics for expressions is a little more complicated than necessary in that if ρ ` he, σi −→
he0 , σ 0 i then σ = σ 0 ; that is there are no side-effects. However, the extra generality will prove
useful. For example suppose we had a production:
e ::= begin c
result e
To evaluate begin c result e from σ given ρ one first executes c from σ given ρ yielding σ 0 and
then evaluates e from σ 0 given ρ. The transition rules would, of course, be:
ρ `α hc, σi −→ hc0 , σ 0 i
ρ `α hbegin c result e, σi −→ hbegin c0 result e, σ 0 i
ρ `α hc, σi −→ σ 0
ρ `α hbegin c result e, σi −→ he, σ 0 i
With this construct one also has now the possibility of side-effects during the elaboration of
definitions; previously we had instead that if
ρ `α hd, σi −→ hd0 , σ 0 i
then σ 0 L = σ where σ : L.
We note some other important constructs. The principle of qualification suggests we include
expression blocks:
e ::= let d
in e
d ::= x == y
77
meaning that x should refer to the location referred to by y (in ρ). The relevant static semantics
will, of course, be:
ρ `α hx == y, σi −→ hx = l, σi (if ρ(y) = l)
This construct is an example where it is hard to do without locations; more complex versions
allowing the evaluation of expressions to references will be considered in the next chapter.
d ::= d
initial
c
end
and
α `I d : β α[β] `I∪I0 c
(if β : I0 )
α `I d initial c end
However, we may wish to add other conditions (like the drastic FI(c) ⊆ DI(d)) to avoid side-
effects. The dynamic semantics is:
ρ `α hd, σi −→ hd0 , σ 0 i
ρ `α hd initial c end, σi −→ hd0 initial c end, σ 0 i
ρ `α[α0 ] hc, σi −→ hc0 , σ 0 i
(where ρ0 : α0 )
ρ `α hρ0 initial c end, σi −→ hρ0 initial c0 end, σ 0 i
ρ[ρ0 ] `α[α0 ] hc, σi −→ σ 0
ρ `α hρ0 initial c end, σi −→ hρ0 , σ 0 i
In the exercises we consider a dual idea of declaration finalisation commands which are executed
after the actions associated with the scope rather than before the scope of the declaration.
Finally, we stand back a little and look at the various classes of values associated with our
language.
78
• Expressible Values: These are the values of expressions. In our language this set, EVal, is
just the set, Con, of constants.
• Denotable Values: These are the values of identifiers in environments. Here the set, DVal,
is the set Con + Loc of constants and locations. Note, that Env = Id −→fin DVal.
• Storeable Values: These are the values of locations in the store. Here, the set, SVal, is the
set Con of constants. Note, that Stores is the set of type-respecting finite maps from Loc to
SVal.
Thus we can consider the sets EVal, DVal, SVal of expressible, denotable and storeable values;
languages can differ greatly in what they are and their relationship to each other [Str]. Other
classes of values – e.g., writeable ones – may also be of interest.
5.5 Exercises
Occ(e, ε) = e
Occ(x, l) (m = 1)
Occ(e0 , l) (m = 2)
Occ(let x = e0 in e1 , m _ l) =
Occ(e1 , l) (m = 3)
undefined
(otherwise)
Define Occ(e, l) in general. Define FO(x, e) = the set of free occurrences of x in e and also
the sets AO(x, e) and BO(x, e) of applied and binding occurrences of x in e. For any l in
BO(x, e) define Scope(l) = the set of applied occurrences of x in the scope of l; for any
bound occurrence, l, of x in e (i.e., l in [AO(x, e) ∪ BO(x, e)]\FO(x, e), define binder(l)
the unique occurrence in whose scope l is.
2. Repeat exercise 1 for the other languages in Chapter 3 (and later chapters!).
3. Ordinary mathematical language also has binding constructions. Notable are such exam-
ples as integration and summation.
Z y Z x
an x n
X
f (y) dy dx and
0 1 n≥0
Define mathematical expression language with these constructs and then define free vari-
ables and occurrences etc, just as in exercise 1.
79
4. The language of predicate logic also contains binders. Given a syntax for arithmetic
expressions (say) we can define formulae by:
where ∧, ∨, ⊃ mean logical and, or and implies and to assert ∀x. F means that for all
x we have F and to assert ∃x. F means that we have F for some x. Repeat the work
of exercise 3 for predicate logic. To what extent is it feasible to construct an operational
semantics
Xfor the languages of exercise 3 and 4? How would it help to only consider finite
sums, e and quantifications ∀x. ≤ b.F and piecewise approximation?
a≤n≤b
5. Can you specify the location of dynamic errors? Thus starting from c, σ suppose we reach
c0 , σ 0 and the next action is (for example) division by zero; then we want to specify an error
occurred as some occurrence in the original command c. [Hint: Add a labelling facility,
c ::= L :: c and transition rules for it, and start not from c but a labelled version in which
the occurrences are used for labels.]
6. Define the behaviour and equivalence of definitions and expressions of the second language
of this chapter; prove that the program constructs respect equivalence. Establish or refute
each of the following suggested equivalences
holds. What about the left-distributive law? What about other such laws? Show that
d0 in (x = e) ≡ x = let d0 in e. Show that d0 ; d1 ≡ d0 in (d1 and dV ) where V =
DV(d0 )\DV(d1 ) and where for any V = {x1 , . . . , xn } we put dV = x1 = x1 and . . . and xn =
xn . Conclude that any d can be put, to within equivalence, in the form x1 = e1 and . . . and xn =
en .
8. Show that let d0 ; d1 in e ≡ let d0 in (let d1 in e). Under what general conditions do we
have d0 ; d1 ≡ d1 ; d0 ? When do we have d0 ; d1 ≡ d0 in d1 ? When do we have let d0 ; d1
in e ≡ let d0 in d1 in d0 ; e?
9. It has been said that in blocks like let d0 in e all free variables of e should be bound by
d for reasons of programming readability. Introduce strict blocks let d0 in e and d0 in d1
where it is required that FV(e) (resp. FV(d1 )) ⊆ DV(d0 ). Show that the non-strict blocks
80
are easily defined in terms of the strict ones. [Hint: Use simultaneous definitions and the
dV of exercise 7.] Investigate equivalences for the strict constructions.
10. Two expressions (of the first language of the present chapter) e and e0 are α-equivalent -
written e ≡α e0 - if they are identical “up to renaming of bound variables”. For example
11. Define for the first language of the present chapter the substitution of an expression e
for a variable x in the expression e0 - written [e/x]e0 ; in the substitution process no free
variable of e0 should be captured by a binding occurrence in e0 , so that some systematic
renaming of bound variables will be needed. For example we could not have
12. By using substitution we could avoid the use of environments in the dynamic semantics
of the first language of the present chapter. The transition relation would have the form
e −→ e0 for closed e, e0 (no free variables) and the rules would be as usual for binary
operations, none (needed) for identifiers, and let x = e0 in e1 −→ [e0 /x]e1 . Show this
gives the same notion of behaviour for closed expressions as the usual semantics.
13. Extend the work of exercises 10, 11 and 12 to the second language of the present chapter.
81
14. It is possible to have iterative constructs in applicative languages. Tennent has suggested
the construct
e = for x = e0 to e1 op bop on e2
X
So that, for example, if e0 = 0 and e1 = n and bop = + and e2 = x∗x then e = x∗x.
0≤x≤n
Give the operational semantics of this construct.
15. It is even possible to use definitions to obtain analogues of while loops. Consider the
definition construct
d = while e do d
So that
computes n! for n ≥ 1. Give this construct a semantics; show that the construct of exercise
14 can be defined in terms of it. Is the new construct a “good idea”?
16. Consider the third language of the present chapter. Show that the type-environments
generated by definitions are determined by defining by Structural Induction a partial
function DTE: Definitions −→ TEnv and then proving that for any α, V, d, β:
17. Give a semantics to a variant of the third language in which the types of variables are
not declared and type-checking is dynamic.
18. Change the fourth language of the present chapter so that the atomic declarations have
the more usual forms:
Can you type-check the resulting language? To what extent can you impose in the static
semantics the requirement that variables should be initialised before use? Give an op-
erational semantics following one of the obvious alternatives regarding initialisation at
declaration:
(1) The variable is initialised to a conventional value (e.g., 0/ff), or an unlikely one (e.g.,
the maximum natural number available/?).
82
(2) The variable is not initialised at declaration. [Hint: Use undefined maps for stores
or (equivalently) introduce a special UNDEF value into the natural numbers (and
another for truth-values).] In this case show how to specify the error of access before
initialisation. Which alternative do you prefer?
19. In PL/I identifiers can be declared to be “EXTERNAL”; as such they take their value
from an external environment - and so the declaration is an applied occurrence - but
they have local scope - and so the declaration is also a binding occurrence. For example
consider the following fragment in an extension of our fourth mini-language (not PL/I!)
(where we allow d ::= external x : τ ):
external x : nat;
begin
x := 2;
var x : nat;
begin
x := 1;
external x : nat;
begin y := x end
end
end
20. In PL/I variables can be declared without storage allocation being made until explicitly
requested. Thus a program fragment like
var x : nat
begin
x := 1; allocate(x)
end
would result in a dynamic error under that interpretation of variable declaration. Give a
semantics to this idea.
83
pervasive declarations and give it a static semantics. Are there any problems with its
dynamic semantics?
22. Formalise Dijkstra’s ideas on scope as presented in Section 10 of his book, A Discipline of
Programming (Prentice-Hall, 1976). To do this define and give a semantics to a variant
of the fourth mini-language which incorporates his ideas in as elegant a way as you can
manage.
(cf PL/I, ALGOL 68). From an implementation point of view local variables are allocated
space on the stack and heap ones on the heap; from a semantical point of view the locations
are disposed of on block exit (i.e., they live until the end of the variable’s scope is reached)
or never (unless explicitly disposed of). Formalise the semantics for these ideas. Does
replacing local by heap make any difference to a program’s behaviour? If not, find some
language extensions for which it does.
static var x : τ
Here, the locations are allocated as part of the static semantics (of FORTRAN, COBOL,
PL/I).
25. Consider the finalisation construct d = d0 final c. Informally to elaborate this from
an environment ρ one elaborates d0 obtaining ρ0 but then after the actions (whether
elaboration, execution or evaluation) involved in the scope of d one executes c in the
environment ρ0 = ρ[ρ0 ] (equivalently, one executes ρ0 ; c). Give an operational semantics for
an extension of the imperative language of the present chapter by a finalisation construct.
[Hint: The elaboration of declarations should result in an environment and a command
(with no free identifiers).] Justify your treatment of the interaction of finalisation and the
various compound definition forms.
26. How far can you go in treating the constructs of the imperative language of this chapter
(or later ones) without using locations? One idea would be for declarations to produce
couples < ρ, σ > of environments and stores (in the sense of Chapter 2) where ρ : I1 , σ :
I2 and I1 ∩ I2 = φ. What problems arise with the declaration x == y?
27. Formalise the notion of accessibility of a location and of a dangling location by defining
when given an environment ρ and a configuration hc, σi (or hd, σi or he, σi) a location,
l, is accessible. Define the notion of lifetime with respect to the imperative language of
the present chapter. Would it be best to define it so that the lifetime of a location ended
exactly when it was no longer accessible or dangling? Using your definition formulate and
84
prove a theorem, for the imperative language, relating scope and lifetime.
28. Locations can be considered as “dynamic place holders” (in the execution sequence) just
as we considered identifiers as “static place holders” (in program text). Draw some arrow
diagrams for locations in execution sequences to show their creation occurrences analogous
to those drawn in this chapter to show binding occurrences.
29. Define α-equivalence for the imperative programming language of the present chapter
(see exercise 10). One can consider c ≡α c0 as saying that c and c0 are equivalent up
to choice of static place holders. Define a relation of location equivalence between couples
of environments and configurations, written ρ, γ ≡l ρ0 , γ 0 (where γ is an expression,
command or declaration configuration); it should mean that the couples are equivalent
up to choice of locations (dynamic place holders). For example
holds.
30. Define the behaviour of commands, expressions and declarations and define an equivalence
relation ≡l between behaviours which should reflect equality of behaviours up to choice
of dynamic place holders. Prove, for example, that
even though the two sides do not have identical behaviours. Investigate the issues of
exercises 10, 11, and 12 using ≡l .
5.6 Remarks
The ideas of structuring definitions and declarations seem to go back to Landin [Lan] and Milne
and Strachey [Mil]. The idea of separating environments and stores, via locations, can also be
found in [Mil]. The concepts of scope, extent, environments, stores and their mathematical
formulations seem to be due to Burstall, Landin, McCarthy, Scott and Strachey. [I do not want
to risk exact credits, or exclude others . . . ] For another account of these matters see [Sto].
The ideas of Section 5.4 on static semantics where the constraints are clearly context-sensitive
in general were formulated in line with the general ideas on dynamic semantics. In fact, they
are simpler as it is only needed to establish properties of phrases rather than having relations
between them. lt is hoped that the method is easy to read and in line with one’s intuition.
There are many other methods for the purpose and for a survey with references, see [Wil].
It is also possible to use the techniques of denotational semantics for this purpose [Gor,Sto].
Our method seems particularly close to the production systems of Ledgard and the extended
85
attribute grammars used by Watt; one can view, in such formulae as α `V d : β, the turnstile
symbols α and V as inherited attributes and β as a synthesized attribute of the definition
d; obviously too the type-environments α and β are nothing but symbol tables. It would be
interesting to compare the methods on a formal basis.
As pointed out in exercise 26 one can go quite far without using locations. Donahue also tries
to avoid them in [Don]. In a first version of our ideas we also avoided them, but ran into
unpleasantly complicated systems when considering shared global variables of function bodies.
As pointed out in exercise 12 one can try to avoid environments by using substitutions; it
is not clear how far one can go in this direction (which is the usual one in syntactic studies
of the λ-calculus). However, we have made a definite decision in these notes to stick to the
Scott-Strachey tradition of environments. Note that in such rules as
there is no offence against the idea of syntax-directed operational semantics. It is just that
substitution is a rather “heavy” primitive and one can argue that the use of environments is
closer to the intuitions normally used for understanding programming languages. (One awful
exception is the ALGOL 60 call-by-name mechanism.)
6 Bibliography
[Ack] Ackerman, W.B. (1982) Data Flow Languages, IEEE Computer 15(2):15–25.
[Don] Donahue, J.E. (1977) Locations Considered Unnecessary, Acta Informatica 8:221–242.
[Gor1] Gordon, M.J., Milner, A.J.R.G. and Wadsworth, C.P. (1979) Edinburgh LCF, LNCS
78, Springer.
[Gor2] Gordon, M.J. (1979) The Denotational Description of Programming Languages,
Springer.
[Hin] Hindley, J.R., Lercher, B. and Seldin, J.P. (1972) Introduction to Combinatory Logic,
Cambridge University Press.
[Hug] Hughes, G.E. and Cresswell, M.J. (1968) An Introduction to Modal Logic, Methuen.
[Lan1] Landin, P.J. (1964) The Mechanical Evaluation of Expressions, Computer Journal
6(4):308–320.
[Lan2] Landin, P.J. (1965) A Correspondence between ALGOL 60 and Church’s Lambda-
notation, Communications of the ACM 8(2):89–101 and 8(3):158–165.
[Led] Ledgard, H.F. and Marcotty, M. (1981) The Programming Language Landscape, Science
Research Associates.
[Mil] Milne, R.E. and Strachey, C. (1976) A Theory of Programming Language Semantics,
Chapman and Hall.
[Pra] Prawitz, D. (1971) Ideas and Results in Proof Theory, Proc. 2nd Scandinavian Logic
Symposium, ed. J.E. Fenstad, p. 237–309, North Holland.
[Rey] Reynolds, J.C. (1978) Syntactic Control of Interference, Proc. POPL’78, pp. 39–46.
86
[Str] Strachey, C. (1973) The Varieties of Programming Language, Technical Monograph
PRG-10, Programming Research Group, Oxford University.
[Sto] Stoy, J.E. (1977) Denotational Semantics: The Scott-Strachey Approach to Program-
ming Language Theory, MIT Press.
[Wil] M.H. Williams (1981) Methods for Specifying Static Semantics, Computer Languages
6(1):1–17.
87
7 Functions, Procedures and Classes
In this chapter we consider various mechanisms allowing various degrees of abbreviation and
abstraction in programming languages. The idea of abbreviating the repeated use of some
expressions by using definitions or declarations of identifiers was considered in Chapter 3; if we
apply the same choice to commands we arrive at (parameterless) procedures (= subroutines). It
is very much more useful to abstract many similar computations together, different ones being
obtained by varying the values of parameters. In this way we obtain functions from expressions
and procedures from commands.
Tennent’s Principle of Abstraction declares that the same thing can be done with any semanti-
cally meaningful category of phrases. Applying the idea to definitions of declarations we obtain
a version of the class concept, introduced by SIMULA and recently taken up in many modern
programming languages. (If we just use identifiers to stand for definitions or declarations we
obtain the simpler but still most useful idea of module.)
Calling (= invoking) abstractions with actual parameters (their arguments) for the formal
ones appearing in their definition results in appropriate computations whether evaluations,
executions or elaborations of the bodies of their definitions. We will explain this by allowing
abstraction identifiers to denote closures which record their formal parameters and bodies.
Invocations will be explained in terms of computations of blocks chosen in terms of Tennent’s
Principle of Correspondence which declares that in principle to every parameter mechanism
there corresponds an appropriate definition or declaration mechanism. For example if we define
f (x : nat) : nat = x + 1
f = λx : nat. x + 1 : nat
let x : nat = 5
in x + 1
Below we use these ideas to consider an applicative programming language with (possibly re-
cursive) definitions of functions of several arguments. We then consider an imperative language
where we consider both functions and procedures and use the Principle of Correspondence
to obtain the parameter mechanisms of call-by-constant and call-by-value. Other parameter
mechanisms are easily handled using the same ideas (some explicitly in the text and others
in exercises); let us mention call-by-reference, call-by-result, call-by-value-result, call-by-name
88
and call-by-text. Next we consider higher order functions and procedures. Finally we use the
Principles of Abstraction and Correspondence to handle modules and classes; this needs no new
ideas although some of the type-checking issues are interesting.
We begin with the simplest case where it is possible to define functions of one argument (unary)
functions. Let us consider throughout extensions of the second applicative language of Chapter
3. Add the following kind of function definitions:
d ::= f (x : τ0 ) : τ1 = e
e ::= f (e)
where f is another letter we will use to range over variables (but reserving its use to contexts
where functions are expected).
Static Semantics
This is just as before as regards free and defining variables with the extensions
FV(f (x : τ0 ) : τ1 = e) = FV(e)\{x}
DV(f (x : τ0 ) : τ1 = e) = {f }
FV(f (e)) = {f } ∪ FV(e)
It is convenient to consider types a little more systematically than before. Just as we have
expressible and denotable values (EVal and DVal) we now introduce the sets of ETypes and
DTypes, expressible and denotable types (ranged over by et and dt respectively) where
et ::= τ
dt ::= τ | τ0 → τ1
More complex expressible types will be needed later; denotable types of the form τ0 → τ1 will
be used for functions which take arguments of type τ0 and deliver results of type τ1 . Later we
will want also sets of storeable types and other such sets. Now we take
ranged over, as before, by α and β and give rules for the predicates
α `V e : et
89
where α : V and FV(e) ⊆ V , and
α `V d : β
where α : V and FV(d) ⊆ V . These rules are just as before with the evident extensions for
function calls and definitions:
α `V e : et0
Function Calls: (if α(f ) = et0 → et1 )
α `V f (e) : et1
α `V e : τ 1
Function Definitions:
α `V f (x : τ0 ) : τ1 = e : {τ0 → τ1 }
Dynamic Semantics
and add the following production to the definition of the category of definitions
d ::= ρ
It is important to note that what is meant here is that the sets Dec, Exp, Closures, DVal and
Env are being defined mutually recursively. For example the following is an expression of type
nat
let f = λx : nat
(let{y = 3, g = λy : bool. ∼y : bool} in if g(ff) then x else y) : nat
and w = 5
in f (2) + w
There is no more harm in such recursions than in those found in context-free grammars; a
detailed discussion is left to Appendix B.
90
Note too that closures have in an obvious sense no free variables. This raises the puzzle of
what we intend to do about the free variable in function definitions. In fact in elaborating such
definitions we will bind the free variables to their values in the elaboration environment. This
is known as static binding (= binding of free variables determined by their textual occurrence),
and will be followed throughout these notes. The alternative of delaying binding until the
function is called, and then using the calling environment, is known as dynamic binding, and is
considered in the exercises.
To extend the static semantics we type denotable values defining the predicate for dval in DVal
and dt in DTypes
dval : dt
ρ:α
by the rules
With all this we now easily extend the old dynamic semantics with the usual transition relations
ρ `α e −→ e0
ρ `α d −→ d0
• Function Calls:
This rule is just a formal version of the Principle of Correspondence for the language under
consideration.
• Function Definitions:
91
Example 25 We write f (x : τ0 ) : τ1 = e for the less readable f = λx : τ0 . e : τ1 (and miss out
τ0 and/or τ1 when they are obvious). Consider the expression
def
e = let double(x : nat) : nat = 2 ∗ x
in double(double(2))
We have
∅ `∅ e −→ let ρ in double(double(2))
def
where ρ = {double(x) = 2 ∗ x} and now note the computation
and so
∅ ` e −→∗ 8
Our function calls are call-by-value in the sense that the argument is evaluated before the body
of the function. On the other hand it is evaluated just after the function call; a slight variant
effects the evaluation before.
This variant has no effect on the result of our computations (prove this!) although it is not
hard to define imperative languages where there could be a difference (because of side-effects).
Another important possibility – call-by-name – is considered below and in the exercises.
We now consider how to extend the above to definitions of functions of several arguments such
as
Intending to use the Principle of Correspondence to account for function calls we expect such
92
transitions as
and therefore simultaneous simple definitions. To this end we adopt a “minimalist” approach
adding two syntactic classes to the applicative language of the last chapter.
Formals: This is the set Forms ranged over by form and given by
ae ::= · | e, ae
Then we extend the category of definitions allowing more simple definitions and function defi-
nitions
e ::= f (ae)
Static Semantics
93
and for function calls, FV(f (ae)) = {f } ∪ FV(ae).
Turning to types we now have ETypes, AcETypes (ranged over by aet) and DTypes where
Then with TEnv = Var −→fin DTypes as always we have the evident predicates
α `V e : et α `V ae : aet α `V d : β
(1) · : ∅
(2) form : β ⇒ (x : τ, form) : {x = τ }, β (if x 6∈ DV(form))
Note that it is here the natural restriction of no variable occurring twice in a formal is made.
α `V ae : aet
Function Calls: (if α(f ) = aet → et)
α `V f (ae) : et
form : β α `V ae : aet
Definitions: (where aet = T (form))
α `V (form = ae) : β
form : β α[β] `V ∪V0 e
α `V (f (form) : τ = e) : {f = aet −→ τ }
(where β : V0 and aet = T (form))
Actual Expr.: α `V · : ·
α `V e : et α `V ae : aet
α `V e, ae : et, aet
Dynamic Semantics
94
with the free and defining variables of ρ as usual and extend the static semantics by defining
the predicates dval : dt and ρ : α much as before.
As for formals they give rise to environments in the content of a value for the corresponding
actuals and so we begin with rules for the predicate
acon ` form : ρ
(1) · ` · : ∅
acon ` form : ρ
(2)
con, acon ` (x : τ, form) : ρ ∪ {x = con}
While this is formally adequate enough it does seem odd to use values rather than environments
as dynamic contexts.
95
Recursion
It will not have escaped the readers attention that no matter how interesting our applicative
language may be it is useless as there is no ability to prescribe interesting computations. For
example we do not succeed in defining the factorial function by
def
d = fact(n : nat) : nat = if n = 0 then 1 else n ∗ fact(n − 1)
as the fact on the right will be taken from the environment of dfact and not understood re-
cursively. (Of course the imperative languages are interesting owing to the possibility of loops;
note too exercise 3, 14, 15.)
d ::= rec d
Thus rec dfact will define the factorial function. In terms of imports and exports rec d imports
all imports of d other than exports which provide the rest of the imports to d; the exports of
rec d are those of d. In other words define X to be FV(d)\DV(d), Y to be DV(d) and R to be
FV(d) ∩ DV(d). Then X is the set of imports of rec d and Y is the set of its exports with R
being defined recursively. Diagrammatically we have
- R
d
X - - Y
The unary recursion operator gives a very flexible way to make recursive definitions since the
d in rec d can take many forms other than simple function definitions like f (x : τ1 . . .) : τ = e.
Simultaneous recursive definitions are written
rec f (. . .) = . . . f . . . g . . . ; . . . ;
rec g(. . .) = . . . f . . . g . . . ;
where the g in the definition of f is taken from the environment but the f in the definition of
96
g is the recursively defined one. A wide scope form is obtained by writing
rec f (. . .) = . . . f . . . g . . . ; . . . ;
g(. . .) = . . . f . . . g . . .
Static Semantics
FV(rec d) = FV(d)\DV(d)
DV(rec d) = DV(d)
We keep TEnv and DTypes, ETypes and AcETypes as before. The natural rule for recursive
declarations is
α[β R] `V ∪R d : β
(where R = FV(d) ∩ DV(d))
α `V rec d : β
However, this is not easy to use in a top-down fashion as given rec d and α one would have
to guess β. But, as covered by exercise 11, it would work. It is more convenient to use the fact
that in α `V d : β the elaborated β does not depend on α but is uniquely determined by d, the
α only being used to check the validity of β. We make this explicit by defining two predicates
for definitions. First for any V and d with FV(d) ⊆ V and β we define
`V d : β
α `V d
The first predicate can be read as saying that if d is a valid definition then it will have type β;
the second says that given α then d is valid. The other predicates will be as before
α `V e : et α `V ae : aet form : β
Rules:
• Definitions:
Nil: (1) `V nil : ∅
(2) α `V nil
form : β
Simple: (1)
`V form = ae : β
97
form : β α `V ae : T (form)
(2)
α `V form = ae
Function: (1) `V f (form) : τ = e : T (form −→ τ )
form : β α[β] `V ∪V0 e : τ
(2) (where β : V0 )
α `V f (form) : τ = e
`V d0 : β0 `V d1 : β1
Sequential: (1)
`V d0 ; d1 : β0 [β1 ]
α `V d0 `V d0 : β α[β] `V ∪V0 d1
(2) (where β : V0 )
α `V d0 ; d1
`V d0 : β0 `V d1 : β1
Simultaneous: (1)
`V d0 and d1 : β0 , β1
α `V d0 α `V d1
(2) (if DV(d0 ) ∩ DV(d1 ) = ∅)
α `V d0 and d1
`V d1 : β1
Private: (1)
`V d0 in d1 : β1
α `V d0 `V d0 : β0 α[β0 ] `V ∪V0 d1
(2) (where β0 : V0 )
α `V d0 in d1
`V d : β
Recursion: (1)
`V rec d : β
`V d : β α[β R] `V ∪R d
(2) (where R = FV(d) ∩ DV(d))
α `V rec d
The other rules are as before except for expression blocks:
`V d : β α `V d α[β] `V ∪V0 e
(where β : V0 )
α `V let d in e
Example 27 Consider the definition
Then to see that ∅ `∅ d one just shows that {f = nat → nat, g = nat → nat} `f,g d0 (where
rec d0 = d). This example also shows why it is needed to explicitly mention the result (= output)
of functions.
Dynamic Semantics
Before discussing our specific proposal we should admit that this seems, owing to a certain
clumsiness and its somewhat unnatural approach, to be a possible weak point in our treatment
of operational semantics.
98
At first sight one wants to get something of the following effect with recursive definitions
ρ[ρ0 V0 ] `α∪α0 d −→∗ ρ0
(where ρ0 : DV(d) and for suitable α0 : V0 )
ρ `α rec d −→∗ ρ0
Taken literally this is not possible. For example put d = f (x : nat) : nat = f (x) and suppose
ρ0 (f ) = d. Then for V = ∅ and ρ = ∅ we would have
and so we would have d = λx : nat. (let ρ0 in f (x)) : nat which is clearly impossible as d
cannot occur in itself (via ρ0 ). Of course it is just in finding solutions to suitable analogues of
this equation that the Scott-Strachey approach finds one of its major achievements.
Let us try to overcome the problem by not trying to guess ρ0 but trying to elaborate d without
any knowledge of the values of the recursively defined identifiers. Thus in our example we first
elaborate the body
and let ρ0 be the resulting “environment”. Note that we no longer have closures as there can
be free variables in the abstractions. So we know that for any imported value of f that ρ0 gives
the corresponding export. But in rec d the imports and the exports must be the same, that is
def
f = ρ(f ) in some recursive sense and we can take f = rec ρ0 . To get a closure we now take
the all important step of binding f to rec ρ0 in ρ0 and take the elaboration of rec d to be
What we have done is unwound the recursive definition by one step and bound into the body
instructions for further unwinding. Indeed it will be the case that
` rec ρ0 −→ ρ1
Then we will evaluate the argument e, then we will unwind the definition once more (in prepa-
ration for the next call!), then we will evaluate the body. This is perhaps not too bad; in the
usual operational semantics of recursive definitions (see exercise 7) one first evaluates the ar-
gument, then unwinds the definition for the present call and then evaluates the body. Thus we
have simply performed in advance one step of the needed unwindings during the elaboration.
Let us now turn our attention to the formal details, the changes from previously mostly concern
allowing free variables in closures, and we define
99
and put
and
and add
d ::= ρ
∀x ∈ V. `W ρ(x) : β(x)
(1)
`W ρ : β
∀x ∈ V. α `W ρ(x)
(2)
α `W ρ
ρ `α e −→ e0
and keep the same set Γα of terminal expressions. Similarly we define ρ `α ae −→ ae0 and
ρ `α d −→ d0 .
100
The rules are formally the same as before except that for ρ : W conditions of the form ρ(f ) = . . .
are understood to mean that f ∈ W and ρ(f ) = . . . and similarly for ρ(x) = . . . (this affects
looking up the values of variables and function calls).
In other words we first elaborate d without knowing anything about the values of recursively
defined variables and then from the resulting ρ0 we yield ρ0 altered to bind its free variables by
rec ρ0 . Here are a couple of examples. More can be found in the exercises.
where ρ0 = {fact(x : nat) : nat = let ∅ in . . .} (and from now on we omit the tedious
“let ∅ in”). Then we have
ρ `α rec d −→ rec ρ0 −→ ρ1
101
−→∗ let {x = 1} in let ρ1 in 1 ∗ [let x = 0 in let ρ1 in . . .]
−→ 1
does not succeed as, intuitively, we need to know the value of fact before the elaboration –
which produces this value – has finished. On the other hand simple things like the elaboration
of rec x = 5 do succeed. If desired we could have specified in the static semantics that only
recursive function definitions were allowed.
In several programming languages the bodies of functions are commands, but are treated, via
special syntactic devices, as expressions – see exercise 12. We take a straightforward view where
the bodies are (clearly) expressions. Abstracts of commands give rise to procedures as in:
which may also have side-effects and indeed are often executed for their side-effects. To see why
we write var in the formal parameter let us see how the Principle of Correspondence allows us
to treat a procedure call. First the above declaration, d, will be elaborated thus
where l = ρ(y). Then the procedure call p(e) in the resulting environment ρ0 will look like this
102
And we see the reason for writing var . . . is to get an easy correspondence with our previous
declaration mechanism. The computation now proceeds by evaluating e, finding a new location
l0 , making l0 refer to the value of e in the state and then executing the body of the procedure
with x bound to l0 . This is very clearly nothing else but the classical call-by-value. Constant
declarations will give rise to a call-by-constant parameter mechanism.
We begin by working these ideas out in the evident extension of the imperative language of
Chapter 3. Then we proceed to other parameter mechanisms by considering the corresponding
declaration mechanisms. (Many real languages will not possess such a convenient correspon-
dence; one way to deal with their parameter mechanisms would be to add the corresponding
declaration mechanisms when defining the set of possible configurations.)
For the extension we drop the const x : τ = e and var x : τ = e productions and add:
Static Semantics
We have the following sets of identifiers with the evident definitions and meanings: FI(e), FI(ae),
FI(d), DI(d), DI(form), FI(c). For example
Turning to types we define ETypes, AcETypes and DTypes; these are as before except that
both locations and procedures are denotable, causing a change in DTypes
et ::= τ
aet ::= · | τ, aet
dt ::= et | et loc | aet −→ et | aet proc
and of course TEnv = Id −→fin DTypes. We also need T (form) ∈ AcETypes with the evident
definition
103
Then we define the expected predicates
α `I e : et α `I ae : aet `I d : β α `I d form : β α `I c
Dynamic Semantics
We begin with environments, abstracts and denotable values. First the set, Abstracts (ranged
over by abs), is
then
d ::= ρ
FI(λform. c) = FI(c)\DI(form)
Then DI(ρ) and FI(ρ) are defined. Next we define the evident predicates
`I dval : dt α `I dval `I ρ : β α `I ρ : β
104
as expected; for example
• Expressions: We have
and
Tα = {hcon, σi}
ρ `α he, σi −→ he0 , σ 0 i
and
Tα = {hacon, σi}
ρ `α hae, σi −→ hae0 , σ 0 i
• Declarations: We have
Γα = {hd, σi | α `I d} (for α : I)
and
Tα = {hρ, σi | hρ, σi ∈ Γα }
ρ `α hd, σi −→ hd0 , σ 0 i
105
• Formals: We define
acon, L ` form : ρ, σ
meaning that in the context of an actual expression constant acon and given an existing set,
L, of locations the formal (part of a declaration) form yields a new (little) environment ρ
and store σ.
• Commands: We have
Γα , Tα
and
as usual.
Rules: The rules are generally just those we already know and only the new points are covered.
• Declarations:
ρ `α hae, σi −→ hae0 , σ 0 i
Simple: (1)
ρ `α hform = ae, σi −→ hform = ae0 , σ 0 i
acon, L ` form : ρ0 , σ0
(2) (where σ : L)
ρ `α hform = ae, σi −→ hρ0 , σ ∪ σ0 i
Procedure: ρ `α hprocedure p(form) c, σi −→ h{p = λform. ρ\I; c}, σi
(where I = FI(c)\DI(form))
ρ\R `α[α0 ] d −→ d0
Recursive: (1)
ρ `α rec d −→ rec d0
(where if `FI(d) d : β then R = FI(d) ∩ DI(d) and α = β R)
(2) ρ `α rec ρ0 −→ ρ1
106
acon, L ∪ {l} ` form : ρ0 , σ0
Variable:
(con, acon), L ` var x : τ, form : {x = l} ∪ ρ0 , {l = con} ∪ σ0
(where l = Newτ (L ∩ Locτ ))
Example 30 The following program demonstrates the use of private variables shared between
several procedures. This provides a nice version of ALGOL’s own variables and anticipates the
facilities provided by classes and abstract data types. Consider the command
So we see that
Other parameter mechanisms can be considered in the same manner. The general principle is to
admit more ways to declare identifiers (as discussed above) and to admit more ways of evaluating
expressions (and/or actual expressions). The latter is needed because actual expressions can be
107
evaluated to various degrees when abstracts are called. One extreme is absolutely no evaluation
(see exercise 16 for this call-by-text mechanism). We shall first consider call-by-name in the
context of our applicative language which we regard as evaluating the argument to the extent of
binding the call-time environment to it; this well-known idea differs from the official ALGOL-60
definition and is discussed further in exercise 15.
Then we consider call-by-reference in the context of our imperative language where the argu-
ment is evaluated to produce a reference. Other mechanisms are considered in the exercises.
Note that in call-by-name for example the actual parameter may be further evaluated during
computation of the body of the abstract. It is even possible to have mechanisms (e.g., variants
of call-by-result) where some or all of the evaluation is delayed until after the computation of
the body of the abstract.
Call-by-Name
Syntactically it is only necessary to add another possibility for the formal parameters to the
syntax of our applicative language
Static Semantics
The sets of defining variables of name x : τ , form is clearly {x} ∪ DV(form). Regarding types
we add
The definition of the type T (form) of a formal needs the new clause
108
Example 31 Consider these two expressions
and
In the first case we need the fact that α ` u + v, u − v : nat, nat name and in the second that
α ` u + v, u − v : nat name, nat (where α = {u = nat, v = nat}).
Dynamic Semantics
Clearly we must add a new component to the set of denotable values, corresponding to the new
denotable types τ name
where we need NExp = {e : name τ } to allow free variables in the expressions because of the
possibility of recursive definitions. For example consider
Γexp
α,W = {e | ∃et. α `V e : et}
109
Γdef
α,W = {d | α `V d}
ae `µ,W form : ρ0
For actual expressions the result desired will depend on the context and we introduce an
apparatus of different evaluation modes. The set Modes of modes is ranged over by µ given by
M (·) = ·
M (τ, aet) = value, M (aet)
M (τ name, aet) = name, M (aet)
We define transition relations ρ `α,W,µ ae −→ ae0 which are also parameterised on modes. The
set of configurations is, for α : V , W ⊆ V and mode µ
and we define the set Tα,W,µ of terminal actual expressions by some rules of the form `µ,W T (ae)
It is rule 3 which introduces the need for W , insisting that all variables are bound, except,
possibly, for those being recursively defined.
The transition relation is defined for ρ : α W and ae, ae0 ∈ Γα,W,µ and has the form ρ `α,W,µ
ae −→ ae0 . The apparatus of modes gives types what might also be called metatypes and this
may be a useful general idea. The reader should not confuse this with one normal usage of the
term mode as synonymous with type.
Transition Rules:
110
(2) ρ ` x −→ e (if ρ(x) = e : τ name)
• Actual Expr.
ρ `α,W e −→ e0
Value Mode: (1)
ρ `α,(value,µ),W e, ae −→ e0 , ae
ρ `α,µ,W ae −→ ae0
(2)
ρ `α,(value,µ),W con, ae −→ con, ae0
Name Mode: (1) ρ `α,(name,µ),W e, ae −→ (let ρ FV(e) in e), ae
ρ `α,µ,W ae −→ ae0
(2) (if FV(e) ∩ W = ∅)
ρ `α,(name,µ),W e, ae −→ e, ae0
• Definitions: Here we need a rule which ensures that the actual expressions are evaluated in
the right mode. Otherwise the rules are as before.
ρ `α,µ,W ae −→ ae0
Simple: (1) (where µ = M (T (form)))
ρ `α,W form = ae −→ form = ae0
ae ` form : ρ0
(2)
ρ `α,W form = ae −→ ρ0
(if ae ∈ Tα,µ,W where µ = M (T (form))
Formals: (1) · `·,W · : ∅
ae `µ,W form : ρ
(2)
con, ae `(value,µ),W (x : τ, form) : {x = con} ∪ ρ
ae `µ,W form : ρ
(3)
e, ae `(name,µ),W (x : τ name, form) : {x = e : τ name} ∪ ρ
Example 32 The main difference between call-by-name and call-by-value in applicative lan-
guages is that call-by-name may terminate where call-by-value need not. For example consider
the expression
e = let f (x : nat name) : nat = 1 and rec g(x : nat) : nat = g(x) in f (g(2))
Then ρ ` e −→∗ let ρ0 in f (g(2)) where ρ0 = {f (x : nat name) : nat = 1, g(x : nat) : nat =
. . .}. So we look at
On the other hand if we change the formal parameter of f to be call-by-value instead, then, as
the reader may care to check, the evaluation does not terminate.
111
Call-by-Reference
We consider a variant (the simplest one!) where the actual parameter must be a variable (identi-
fier denoting a location). In other languages the actual parameter could be any of a wide variety
of expressions which are evaluated to produce a location; these might include conditionals and
function calls. This would require a number of design decisions on the permitted expressions
and on how the type-checking should work. For lack of time rather than any intrinsic difficulty
we leave such variants to exercise 17. Just note that it will certainly be necessary to rethink
expression evaluation; this should either be changed so that evaluation yields a natural value
(be it location or primitive value) or else different evaluation modes should be introduced.
Static Semantics
Clearly we have DI(locx : τ, form) = {x} ∪ DI(form). For types we add another actual expres-
sion type
and
Dynamic Semantics
It is not necessary to change the definitions of DVal (or Env or Dec) as locations are already
included. However, we allow locations in AcExp and AeCon
ae ::= l, ae
112
acon ::= l, acon
α `I ae : aet
(l ∈ Locτ )
α `I l, ae : τ loc, aet
with the evident definition of M (aet) ∈ Mode and put for α : I and µ,
Rules:
• Actual Expressions:
ρ `α he, σi −→ he0 , σ 0 i
Value Mode: (1)
ρ `α,(val,µ) h(e, ae), σi −→ h(e0 , ae), σ 0 i
ρ `α hae, σi −→ hae0 , σ 0 i
(2)
ρ `α,(val,µ) h(con, ae), σi −→ h(con, ae0 ), σ 0 i
Ref. Mode: (1) ρ `α,(ref ,µ) h(x, ae), σi −→ h(l, ae), σi (if ρ(x) = l)
ρ `α,µ hae, σi −→ hae0 , σ 0 i
(2)
ρ `α,(ref ,µ) h(l, ae), σi −→ h(l, ae0 ), σ 0 i
• Definitions:
ρ `α,µ hae, σi −→ hae0 , σ 0 i
Simple: (1) (if µ = M (T (form)))
ρ `α hform = ae, σi −→ hform = ae0 , σ 0 i
acon, L ` form : ρ0 , σ0
(2) (where σ : L)
ρ `α hform = acon, σi −→ hρ0 , σ ∪ σ0 i
Formals: We just add a rule for declaration-by-reference (= location)
acon, L ` form : ρ0 , σ0
(l, acon), L ` loc x : τ, form : {x = l} ∪ ρ0 , σ0
Note: All we have done is to include the construct x == y of Chapter 3 in our simple
declarations.
• Commands: No new rules are needed.
113
Clearly our discussion of binding mechanisms is only a start, even granting the ground covered
in the exercises. I hope the reader will have been led to believe that a more extensive coverage
is feasible. What is missing is a good guiding framework to permit a systematic coverage.
Since we can define or declare abstractions, such as functions and procedures, Tennent’s Prin-
ciple of Correspondence tells us that we can allow abstractions themselves as parameters of
(other) abstractions. The resulting abstractions are said to be of higher types (the resulting
functions are often called functionals). For example the following recursive definition is of a
function to apply a given function, f , to a given argument, x, a given number, t, of times:
We will illustrate this idea by considering a suitable extension of the imperative language of this
chapter (but neglecting call-by-reference). Another principle would be to allow any denotable
type to be an expressible type; this principle would allow locations or functions and procedures
as expressions and, in particular, as results of functions (by the Principle of Abstraction). For
example we could define an expression (naturally, called an abstraction)
λform. e
that would be an abbreviation for the expression let f (form) : τ = e in f . For a suitable
τ , depending on the context, it might, more naturally, be written as: function form. e; such
functions (and other similar abstractions) are often termed anonymous. Then the following
function would output the composition of two given functions
In this way we obtain (many) versions of the typed λ-calculus. A number of problems arise in
imperative languages where functions are not denotable, but only references to them. In the
definition of Compose one will have locally declared references to functions as the denotations
of f and g; if these are disposed of upon termination of the function call one will have a
dangling reference. Just the same thing happens, but in an even more bare-faced way, if we
allow locations as outputs
At any rate we will leave these issues to exercises, being moderately confident they can be
handled along the lines we have developed.
Now, let us turn to our language with higher types. We extend the syntax by including the
114
category AcETypes of actual expression types:
It is clear how this allows functions and procedures of higher type to be defined; they are passed
as arguments via identifiers that denote them.
Static Semantics
Clearly
As for the predicate form : β we first note the definition of the set, DTypes, of denotable types:
The rules are fairly clear and we just note the procedure case:
form : β
(if p 6∈ I where β : I)
procedure p : aet, form : {p = aet proc}, β
Turning to the other predicates we only need to add a rule for actuals:
α `I ae : aet
(where dt = α(x) is either of the form aet → et or aet proc)
α `I x, ae : dt, aet
Example 33 Try type-checking the following imperative version of Apply in the environment
{x = nat}
115
Dynamic Semantics
Once more there is no need to change (the form of) the definitions of DVal or Env or Dec. We
must now allow abstracts within actual expressions and also AcCon
• Expressions: We define configurations and terminal configurations as usual; for the transi-
tion relation we define for ρ : α J
and
• Declarations: We define Γα,J , Tα,J in the evident way, and the transition relation ρ `α,J
hd, σi −→ hd0 , σ 0 i is of the evident form.
• Commands: Again the configurations, the terminal configurations and the transition rela-
tion are of the evident forms.
• Formals: We will define the predicate acon, L `J form : ρ0 , σ0 where FI(acon) ∩ J = ∅.
116
(if f 6∈ I where β : I)
acon, L `J form : β
((λform. e), acon), L `J procedure p : aet, form : {p = λform. e}, β
(if p 6∈ I where β : I)
As a matter of fact the J’s are not needed, but we obtain finer control over the allowable actual
expression configurations. This can be useful in extensions of our language where abstractions
are allowed.
There is a certain confusion of terminology in the area of modules and classes. Rather than
enumerate the possibilities let me say what I mean here. First there is a Principle of Denotation
which says that one can in principle use an identifier to denote the value of any syntactic phrase
– where “value” is deliberately ambiguous and may indicate various degrees of “evaluation”.
For expressions this says we can declare constants (in imperative languages) but also allows
declaration by name or by text and so on; for commands it means we can have parameterless
subroutines. For declarations we take it as meaning one can declare identifiers as modules, and
they will denote the environment resulting from the elaboration. (There is a corresponding
Principle of Storeability which the reader will spot for himself; it is anything but clear how
useful these principles are!)
Applying the Principle of Abstraction to declarations on the other hand we obtain what we call
classes. Applying a class to actual arguments gives a declaration which can be used to supply
a denotation to a module identifier; then we say the module is an instance of the class. (Of
course everything we say here applies just as well to applicative languages; by now, however, it
is enough just to consider one case!)
A typical example is providing a random natural number facility. Let drand be the declaration
private
var a = seed mod d
within
function draw () : nat
begin a := a ∗ m mod d
result a/d end
where seed, d and m are assumed declared previously. This would declare a function, draw,
providing a random natural number with its own private variable – inaccessible from the outside.
If one wanted to declare and use two random natural numbers, just declare two modules
117
module Y : draw : · → nat = drand
begin . . . X.draw () . . . Y.draw () . . . end
Thus draw is an attribute of both X and Y and the syntax X.draw selects the attribute (in
general there is more than one).
When one wants some parameterisation and/or desires to avoid writing out drand several times,
one can declare and use a class
Finally we note that it is possible to use the compound forms of declarations to produce similar
effects on classes. For example a version of the SIMULA class-prefixing idea is available.
class CLASS1(form1) . . . ; . . . ;
class CLASS2(form2)—; —;
class PREFIXCLASS(form1, form2) . . . —;
CLASS1(form1); CLASS2(form2)
Naturally we will also be able to use simultaneous and private and recursive class declarations
(can you tell me some good examples of the use of these?). One can also easily envisage classes
of higher types (classicals?), but we do not investigate this idea.
Here is our extension of the syntax of the imperative language of the present chapter (but no
call-by-reference, or higher types).
• Types: We need the categories DTSpecs, AcETypes and DecTSpecs of denotable type spec-
ifications, actual expression types and declaration type specifications
Clearly dect will be the type of a module identifier and aet → dect will be the type of a class
identifier.
• Expressions: We add five(!!) new categories of expressions, function, procedure, variable,
module and class expressions, called FExp, PExp, VExp, MExp, CExp and ranged over by
118
f e, pe, ve, me, cle and given by the following productions (where we also allow f , p, v, m,
cl as metavariables over the set, Id, of identifiers)
f e ::= f | me.f
pe ::= p | me.p
ve ::= v | me.v
me ::= m | me.m | cle(ae)
cle ::= cl | me.cl
(and the second possibility generalises expressions of the form f (ae)). The set of actual
expressions is defined as before.
• Commands: We generalize commands of the forms p(ae) and x := e (i.e., procedure calls
and assignment statements) by
c ::= pe(ae) | ve := e
Note that declaration types are used here to specify the types of the attributes of modules
and classes. If we except recursive declarations this information is redundant, but it could
be argued that it increases readability as the attribute types may be buried deep inside the
declarations.
• Formals: The definition of these remains the same as we do not want class or module
parameters.
Note: In this chapter we have essentially been following a philosophy of different expressions
for different uses. This is somewhat inconsistent with previous chapters where we have merged
different kinds of expressions (e.g., natural number and boolean) and been content to separate
them out again via the static semantics. By now the policy of this chapter looks a little ridicu-
lous and it could well be better to merge everything together. However, the reader may have
appreciated the variation.
Static Semantics
For the definitions of FI(f e), . . . , FI(cle) we do not regard the attribute identifiers as free (but
rather as a different use of identifiers from all previous ones; their occurrences are the same
as constant occurrences and they are thought of as standing for themselves). So for example
119
FI(me) is given by the table
m me.m cle(ae)
FI {m} FI(me) FI(cle) ∪ FI(ae)
FI(me.x) = FI(me)
FI(f e(ae)) = FI(f e) ∪ FI(ae)
FI(pe(ae)) = FI(pe) ∪ FI(ae)
FI(ve := e) = FI(ve) ∪ FI(e)
(We are really cheating somewhere here. For example the above scheme would not work if we
added the reasonable production
d ::= me
as then with, for example, a command m; begin . . . x . . . end the x can be in the scope of
the m if the command is in the scope of a declaration of the form module m : dect = var x :
nat = . . . ; . . .
Thus it is no longer possible to define the free identifiers of a phrase in a context-free way. Let
us agree to ignore the problem.)
• Types: We define (mutually recursively) the sets ETypes, FETypes, . . . , ClETypes, DTypes,
TEnv of expression types, function expression types, . . . , class expression types, denotable
types and type environments by
et ::= τ
f et ::= aet → τ
pet ::= aet proc
vet ::= τ loc
met ::= α
clet ::= aet −→ α
dt ::= et | vet | f et | pet | met | clet
120
TEnv = Id −→fin DTypes (with α ranging over TEnv)
To see how the sets DTSpecs and DecTSpecs of denotable and declaration type specifications
specify denotable and declaration types respectively, we define predicates
by the formulae
- DTSpecs:
(1) τ : τ
(2) τ loc : τ loc
(3) aet → τ : aet → τ
(4) aet proc : aet proc
dects : α
(5) (where the premise means proved from the rules for DecTSpecs)
dects : α
dects : α
(6)
aet → dects : aet → α
- DecTSpecs:
dts : α
(1)
(x : dts) : {x = α}
dts : α dects : β
(2) (if x 6∈ I for β : I)
(x : dts, dets) : {x = α} ∪ β
Next T (form) ∈ AcETypes is defined as before. Now we must define the predicates
The old rules are retained and we add new ones as indicated by the following examples.
• Expressions:
α `I me : β
(1) (if β(x) = dt)
α `I me.x : dt
α `I f e : aet → et α `I ae : aet
(2)
α `I f e(ae) : et
• Function Expressions:
(1) α `I f : f t (if α(f ) = f t ∈ FTypes)
α `I me : β
(2) (if β(f ) = f t ∈ FTypes)
α `I me.f : f t
• Class Expressions:
(1) α `I cle : clet (if α(cl) = clet ∈ ClETypes)
α `I me : β
(2) (if β(cl) = clet ∈ ClETypes)
α `I me.cl : clet
• Commands:
121
α `I pe : aet proc α `I ae : clet
(1)
α `I pe(ae)
α `I vet : τ loc α `I e : τ
(2)
α `I (vet := e)
• Declarations:
- Modules:
dects : β
(1)
(module m : dects = d) : {m = β}
dects : β α `I d : β
(2)
α `I module m : dects = d
- Classes:
dects : β
(1)
(class cl(form) : dects; d) : {cl = T (form) −→ β}
dects : β form : α0 α[α0 ] `I∪I0 d : β
(2) (where α0 : I0 )
α `I class cl(form) : dects : d
Dynamic Semantics
First we define the sets FECon, . . . , ClECon of function expression constants, . . . , class
expression constants by
and define the sets DVal and Env of denotable values and environments by
d ::= ρ
These are mutually recursive definitions of a harmless kind. The extensions to the definition of
FI(f e), . . . , FI(de), FI(d), DI(d) are evident; for example FI(λform. d : β) = FI(d)\DI(form).
We must also extend the definitions of α `I f e : f et, . . . , α `I cle : clet and `I d : β and α `I d
(the latter two in the case d = ρ). The former extensions are obvious; for example
122
• Class Abstracts:
form : α0 α[α0 ] `I∪I0 d : β
(where α0 : I0 )
α `I (λform. d : β)
For the latter we have to define `I decon : dt and this also presents little difficulty; for example
• Class Abstracts:
`I (λform. d : β) : T (form) → β
The configurations, final configurations and the transition relations for expressions, actual ex-
pressions and declarations are as before; for formals we have the same predicate as before. Now
fix α : I and ρ : α J (for some J ⊆ I).
The definitions for PExp, . . . , CExp are the analogues of that for function expressions
Rules:
• Class Expressions:
(1) ρ `α hcl, σi −→ hclecon, σi (if ρ(cl) = clecon)
ρ `α hme, σi −→ hme0 , σ 0 i
(2)
ρ `α hme.cl, σi −→ hme0 .cl, σi
(3) ρ `α hρ0 .cl, σi −→ hclecon, σi (if ρ0 (cl) = clecon)
The rules for FExp, . . . , MExp are similar except that in the last case we need also
ρ `α hcle, σi −→ hcle0 , σ 0 i
(1)
ρ `α hcle(ae), σi −→ hcle0 (ae), σ 0 i
(2) ρ `α h(λform. d : β)(ae), σi −→ hprivate form = ae within d, σi
ρ `α hd, σi −→ hd0 , σ 0 i
(3) (where in the top line we mean a transition of Decl)
ρ `α hd, σi −→ hd0 , σ 0 i
123
The new rules for expressions and commands should be clear; for example
• Assignment:
ρ `σ hve, σi −→ hve0 , σ 0 i
(1)
ρ `α hve := e, σi −→ hve0 := e, σ 0 i
ρ `α he, σi −→ he0 , σ 0 i
(2)
ρ `α hl := e, σi −→ hl := e0 , σ 0 i
(3) ρ `α hl := con, σi −→ σ[l = con]
• Modules:
ρ `α hd, σi −→ hd0 , σ 0 i
(1)
ρ `α hmodule m : dects = d, σi −→ hmodule m : dects = d0 , σ 0 i
(2) ρ `α hmodule m : dects = ρ0 , σi −→ h{m = ρ0 }, σi
• Classes:
ρ `α class cl(form) : dects; d −→ {cl = λform. (ρ\I) in d} (where I = DI(form))
7.6 Exercises
1. Consider dynamic binding in the context of a simple applicative language so that, for
example,
let x = 1; f (y) = x + y
in let x = 2 in f (3)
has value 5. What issues arise with type-checking? Can you program iterations (e.g.,
factorial) without using recursive function definitions?
2. In a maximalist solution to the problem (in the applicative language) of neatly specifying
functions of several arguments one could define the class of formal parameters by
e ::= · | e, e | f (e)
a) Do this, but effectively restrict the extension to the minimalist case by a suitable choice
of static semantics.
124
b) Allow the full extension.
c) Go further and extend the types available in the language by putting
e0 := e1
and even the boolean expression e0 ≡ e1 which is true precisely when e0 and e1 evaluate
to the same reference. As well as discussing type-checking issues, try the two following
approaches to expression evaluation:
a) Expressions are evaluated to their natural values which will be either locations or basic
values.
b) Modes of evaluation are introduced, as in the text.
Extend the work to the maximalist position where actual expressions and expressions are
merged, thus allowing simultaneous assignments.
5. Just as expressions are evaluated, and so on, formals are matched (to given actual values)
to produce environments (= matchings). The semantics given above can be criticised as
not being dynamic enough as the matching process is not displayed. Provide an answer
to this; you may find configurations of the form
hform, con, ρi
useful where form is the formal being matched, con is the actual value and ρ is the
matching produced so far. A typical rule could be
hx : τ, con ρi −→ ρ ∪ {x = con}
This is all for the applicative case; what about the imperative one? Investigate dynamic er-
rors, allowing constants and repeated variables in the formals (dynamic error = matching
failure).
6. In the phrase rec d all identifiers in R = FV(d) ∩ DV(d) are taken to be recursively
defined. Investigate the alternative rec x1 , . . . , xn .d where {x1 , . . . xn } ⊆ R.
125
one evaluates f (5) in the environment
ρ = {f (x) = . . . f . . . g . . . , g(x) = — f — g —}
ρ ` f (5) −→ let x = 5 in . . . f . . . g . . .
I could not see how to make this simple and nice idea (leave the recursively defined
variables free) work in the present setting where one has nested definitions and binary
operations on declarations. Can you make it work?
a) rec x : nat = 1
b) rec (y : nat = 1 and x : nat = y)
c) rec (x : nat = y and y : nat = 1)
d) rec (x : nat = x)
e) rec (x : nat = y and y : nat = x)
How are these treated using the above static and dynamic semantics? What do you
think should happen? Specify suitable static and dynamic semantics with any needed
error rules. Justify your decisions, considering how your ideas will extend to imperative
languages with side-effects (which might result in non-determinism).
10 Find definitions d0 and d1 to make different as many as possible of the following definitions:
a) (rec d0 ; d1 )
b) (rec) (rec d0 ; d1 )
c) (rec) (d0 ; rec d1 )
d) (rec) (rec d0 ; rec d1 )
where (rec) d indicates the two possibilities with and without rec.
11. Check that the first alternative for type-checking recursive definitions would work in the
126
sense that
α `V d : β iff `V d : β and α `V d
12. Programming languages like PASCAL often adopt the following idea for function defini-
tion:
function f (form) : τ
begin
c
end
where within c the identifier f as well as possibly denoting a function also denotes a
location, created on function entry and destroyed on exit; the result of a function call is
the final value of this location on exit. For example the following is an obscure definition
of the identity function:
127
programming languages. Look out for the dangers inherent in
15. Discover the official ALGOL 60 definition of call-by-name (it works via a substitution
process); give a semantics following the idea and prove it equivalent to one following the
idea in these notes (substitution = binding a closure).
16. Call-by-text. Give a semantics for call-by-text where the formal is bound to the actual
(not binding in the current environment); when a value is desired the actual is evaluated
in the then current environment. Consider also more “concrete” languages in which the
abstract syntax (of the text) is available to the programmer, or even the concrete syntax:
does the latter possibility lead to any alteration of the current framework?
17. Call-by-reference. Give a maximalist discussion of call-by-reference, still only allowing ac-
tual reference parameters to be variables. Extend this to allow a wider class of expressions
which (must) evaluate to a reference. Extend that in turn to allow any expression as an
actual; if it does not evaluate to a reference the formal should be bound to a new reference
and that should have the value of the actual.
18. Call-by-result. Discuss this mechanism where first the actual is evaluated to a reference, l;
second the formal is bound to a new reference l0 (not initialised); third, after computation
of the body of the abstract, the value of l is set to the value of l0 in the then current store.
Discuss too a variant where the actual is not evaluated at all until after the body for the
abstract. [Hint: Use declaration finalisation.]
19. Call-by-value-result. Discuss this mechanism where first the actual is evaluated to a refer-
ence l; second the formal is bound to a new reference l0 which is initialised to the current
value of l; third, after the computation of the abstract of the body, the value of l is set
to the value of l0 in the then current store.
20. Discuss selectors which are really just functions returning references. A suitable syntax
might be
selector f (form) : τ = e
which means that f returns a reference to a τ value. First consider the case where all
128
lifetimes are semi-infinite (extending beyond block execution). Second consider the case
where lifetimes do not persist beyond the block where they were created; in this case
interesting questions arise in the static semantics.
21. Consider higher-order functions in programming languages which may return abstracts
such as functions or procedures. Thus we add the syntax:
The issues that arise include those of lifetime addressed in exercise 20.
Give a static semantics and two dynamic semantics where the first one is a standard
one using environments and where the second one is for closed expressions only and uses
substitutions as discussed in the exercises of Chapter 3. Prove these equivalent. Add a
recursion operator expression
e ::= Y
129
A A Guide to the Notation
Syntactic Categories
Truthvalues t∈T
Numbers m, n ∈ N
Constants con ∈ Con
Actual Constants acon ∈ ACon
Unary Operations uop ∈ Uop
Binary Operations bop ∈ Bop
Expressions e ∈ Exp
Boolean b ∈ BExp
Actual ae ∈ AExp
Variable ve ∈ VExp
Function f e ∈ FExp
Procedure pe ∈ PExp
Module me ∈ MExp
Class cle ∈ CExp
Commands
(=Statements) c ∈ Com
Definitions/
Declarations d ∈ Def/Dec
Static Semantics
Free Variables/
Identifiers FV/I(e), FI(c), FV/I(d) etc.
Defined Variables/
Identifiers DV/I(d) DV/I(form)
130
Denotable Types dt ∈ DTypes
Type Environments α, β ∈ TEnv (e.g., = Id −→fin DTypes)
Example Formulae α `V e : et α `I c α `I d : β
form : β T (form) = aet
Dynamic Semantics
Denotable Values dval ∈ DVal
Environments ρ ∈ Env (e.g., = Id −→fin DVal)
Storeable Types st ∈ STypes
I ∈ Loc = st Locst L ⊆fin Loc
P
Locations
sval ∈ SVal = st Valst
P
Storeable Values
Stores σ ∈ Stores (e.g., = {σ ∈ Loc −→fin SVal |
∀st ∈ STypes. σ(Locst ) ⊆ SValst })
Evaluation Modes µ ∈ Modes
Transition Systems hΓ, T, −→i γ∈Γ
where Γ is the set of configurations
T ⊆ Γ is the set of final configurations
γ −→ γ 0 is the transition relation
Example
Configurations he, σi; hc, σi, σ; hd, σi
Example Final
Configurations hcon, σi; σ; hρ, σi
Example Transition
Relations ρ `I,µ he, σi −→ he0 , σ 0 i
ρ `I hc, σi −→ hc0 , σ 0 i/σ 0
ρ `I hd, σi −→ hd0 , σ 0 i/ρ0
B Notes on Sets
We use several relations over and operations on sets as well as the (very) standard ones. For
example X ⊆fin Y means X is finite and a subset of Y .
Note: Continuity implies monotonicity. Conversely to prove continuity, first prove monotonic-
131
ity. This establishes the 00 ⊇00 half of (∗); then prove the 00 ⊆00 half.
Example 35
• Cartesian Product:
Show that the finite sum operation is continuous. (Finite Sum is just union, but forced to be
disjoint.)
• Finite Functions: The class of finite functions from X to Y is
X
X →fin Y = A→Y
A⊆fin X
Note that the union is necessarily disjoint. Show that →fin is continuous.
There are also two useful binary operations. For f : A and g : B in X →fin Y we define
f [g] : A ∪ B by
g(c) (c ∈ B)
f [g](c) =
f (c) (c ∈ A\B)
Note this is a special case of the first definition, but it is very useful and worth separate mention.
132
The Importance of Continuity
X = Op(X)
Put X 0 = ∅ and X m+1 = Op(X m ). Then (by induction on m) we have for all m, X m ⊆ X m+1
and putting X = m X m
S
X m)
S
Op(X) = Op( m
Op(X m )
S
= m (by continuity)
X m+1
S
= m
=X
And one can show (do so!) that X is the least solution – that is if Y is any other than X ⊆ Y .
Indeed X is even the least set such that Op(X) ⊆ X.
This can be generalised, suppose Op1 (X1 , . . . , Xn ), . . . , Opn (X1 , . . . , Xn ) are all continuous and
we want to solve the n equations
X1 = Op1 (X1 , . . . , Xn )
..
.
Xn = Opn (X1 , . . . , Xn )
Then for all m and i, Xim ⊆ Xim+1 (prove this) and putting
Xim
[
Xi =
m
we obtain the least solutions to the equations – if Yi are also solutions then for all i, Xi ⊆ Yi .
Indeed the Xi are even the least sets such that Opi (Xi , . . . , Xn ) ⊆ Xi (i = 1, . . . , n). This is
used in the example below. Prove this.
Example 36 Suppose we are given sets Num, Id, Bop and wish to define sets Exp and Com
by the abstract syntax
e ::= m | x | e0 bop e1
e ::= x := e | c0 ; c1 | if e0 = e1 then c0 else c1 | while e0 = e1 do c
133
Then we regard this definition as giving us set equations
and also giving us a notation for working with the solution to the equations. First m is identified
with h1, mi ∈ Exp and x is identified with h2, xi in Exp. Next
Now the set equations are easily solved using the above techniques as they are in the form
where Op1 (Exp, Com) = Num + Id + (Exp × Bop × Exp) and Op2 is defined similarly. Clearly
Op1 and Op2 are continuous as they are built up out of (composed from) the continuous disjoint
sum and product operations (prove they are continuous). Therefore we can apply the above
techniques to find a least solution Exp, Com. Note that Exp and Com are therefore the least
sets such that
At some points in the text environments (and similar things) were mutually recursively defined
with commands and so on. This is justified using our apparatus of continuous set operators
employing, in particular, the finite function operator.
134
Programming T.A. Standish
Languages Editor
1. Introduction
Guarded Commands,
Nondeterminacy and In Section 2, two statements, an alternative con-
struct and a repetitive construct, are introduced, to-
gether with an intuitive (mechanistic) definition of their
Formal Derivation semantics. The basic building block for both of them
ql, q2, q3, q4 := Q1, Q2, Q3, Q4; 1. For any S, we have for all states: wp(S,F) = F (the
do ql > q2 ~ ql, q2 := q2, ql so-called Law of the Excluded Miracle).
[7 q2 > q3 ~ q2, q3 := q3, q2 2. For any S and any two post-conditions, such that
[~ q3 > q4 ~ q3, q4 := q4, q3 for all states P ~ Q, we have for all states:
Qd. wp( S,P) ~ wp( S, Q).
3. For any S and any two post-conditions P and
To conclude this section, we give a program where Q, we have for all states (wp(S,P) and wp(S,Q)) =
not only the computation but also the final state is not wp(S,P and Q).
necessarily uniquely determined. The program should 4. For any deterministic S and any post-conditions P
Preface
The following three lectures were given in the form of a short course
at the meeting Teoria della Dimostrazione e Filosofia della Logica, or-
ganized in Siena, 6–9 April 1983, by the Scuola di Specializzazione in
Logica Matematica of the Università degli Studi di Siena. I am very
grateful to Giovanni Sambin and Aldo Ursini of that school, not only
for recording the lectures on tape, but, above all, for transcribing the
tapes produced by the recorder: no machine could have done that work.
This written version of the lectures is based on their transcription. The
changes that I have been forced to make have mostly been of a stylistic
nature, except at one point. In the second lecture, as I actually gave
it, the order of conceptual priority between the notions of proof and
immediate inference was wrong. Since I discovered my mistake later
the same month as the meeting was held, I thought it better to let
the written text diverge from the oral presentation rather than possi-
bly confusing others by letting the mistake remain. The oral origin of
these lectures is the source of the many redundancies of the written
text. It is also my sole excuse for the lack of detailed references.
First lecture
When I was asked to give these lectures about a year ago, I sug-
gested the title On the Meanings of the Logical Constants and the
Justifications of the Logical Laws. So that is what I shall talk about,
eventually, but, first of all, I shall have to say something about, on
the one hand, the things that the logical operations operate on, which
we normally call propositions and propositional functions, and, on the
Nordic Journal of Philosophical Logic, Vol. 1, No. 1, pp. 11–60.
c 1996 Scandinavian University Press.
other hand, the things that the logical laws, by which I mean the rules
of inference, operate on, which we normally call assertions. We must
remember that, even if a logical inference, for instance, a conjunction
introduction, is written
A B
A&B
which is the way in which we would normally write it, it does not take
us from the propositions A and B to the proposition A & B. Rather, it
takes us from the affirmation of A and the affirmation of B to the affir-
mation of A & B, which we may make explicit, using Frege’s notation,
by writing it
`A `B
` A&B
instead. It is always made explicit in this way by Frege in his writings,
and in Principia, for instance. Thus we have two kinds of entities here:
we have the entities that the logical operations operate on, which we
call propositions, and we have those that we prove and that appear
as premises and conclusion of a logical inference, which we call asser-
tions. It turns out that, in order to clarify the meanings of the logical
constants and justify the logical laws, a considerable portion of the
philosophical work lies already in clarifying the notion of proposition
and the notion of assertion. Accordingly, a large part of my lectures
will be taken up by a philosophical analysis of these two notions.
Let us first look at the term proposition. It has its origin in the Gr.
prìtasij, used by Aristotle in the Prior Analytics, the third part of the
Organon. It was translated, apparently by Cicero, into Lat. propositio,
which has its modern counterparts in It. proposizione, Eng. proposi-
tion and Ger. Satz. In the old, traditional use of the word proposition,
propositions are the things that we prove. We talk about proposition
and proof, of course, in mathematics: we put up a proposition and let
it be followed by its proof. In particular, the premises and conclusion
of an inference were propositions in this old terminology. It was the
standard use of the word up to the last century. And it is this use
which is retained in mathematics, where a theorem is sometimes called
a proposition, sometimes a theorem. Thus we have two words for the
things that we prove, proposition and theorem. The word proposition,
Gr. prìtasij, comes from Aristotle and has dominated the logical tra-
dition, whereas the word theorem, Gr. qe¸rhma, is in Euclid, I believe,
and has dominated the mathematical tradition.
With Kant, something important happened, namely, that the
term judgement, Ger. Urteil, came to be used instead of proposition.
Perhaps one reason is that proposition, or a word with that stem, at
least, simply does not exist in German: the corresponding German
word would be Lehrsatz, or simply Satz. Be that as it may, what hap-
pened with Kant and the ensuing German philosophical tradition was
that the word judgement came to replace the word proposition. Thus,
in that tradition, a proof, Ger. Beweis, is always a proof of a judge-
ment. In particular, the premises and conclusion of a logical inference
are always called judgements. And it was the judgements, or the cat-
egorical judgements, rather, which were divided into affirmations and
denials, whereas earlier it was the propositions which were so divided.
The term judgement also has a long history. It is the Gr. krÐsij,
translated into Lat. judicium, It. giudizio, Eng. judgement, and Ger.
Urteil. Now, since it has as long a history as the word proposition,
these two were also previously used in parallel. The traditional way of
relating the notions of judgement and proposition was by saying that a
proposition is the verbal expression of a judgement. This is, as far as I
know, how the notions of proposition and judgement were related dur-
ing the scholastic period, and it is something which is repeated in the
Port Royal Logic, for instance. You still find it repeated by Brentano
in this century. Now, this means that, when, in German philosophy
beginning with Kant, what was previously called a proposition came
to be called a judgement, the term judgement acquired a double mean-
ing. It came to be used, on the one hand, for the act of judging, just
as before, and, on the other hand, it came to be used instead of the old
proposition. Of course, when you say that a proposition is the verbal
expression of a judgement, you mean by judgement the act of judging,
the mental act of judging in scholastic terms, and the proposition is the
verbal expression by means of which you make the mental judgement
public, so to say. That is, I think, how one thought about it. Thus,
with Kant, the term judgement became ambiguous between the act of
judging and that which is judged, or the judgement made, if you prefer.
German has here the excellent expression gefälltes Urteil, which has no
good counterpart in English.
judgement
z }| {
the act of judging that which is judged
old tradition judgement proposition
Kant Urteil(sakt) (gefälltes) Urteil
A is not, or A is false,
and Frege. And, through Frege’s influence, the whole of modern logic
has come to be based on the single form of judgement, or assertion, A
is true.
Once this step was taken, the question arose, What sort of thing
is it that is affirmed in an affirmation and denied in a denial? that is,
What sort of thing is the A here? The isolation of this concept belongs
to the, if I may so call it, objectivistically oriented branch of German
philosophy in the last century. By that, I mean the tradition which you
may delimit by mentioning the names of, say, Bolzano, Lotze, Frege,
Brentano, and the Brentano disciples Stumpf, Meinong, and Husserl,
although, with Husserl, I think one should say that the split between
the objectivistic and the Kantian branches of German philosophy is
finally overcome. The isolation of this concept was a step which was
entirely necessary for the development of modern logic. Modern logic
simply would not work unless we had this concept, because it is on the
things that fall under it that the logical operations operate.
This new concept, which simply did not exist before the last cen-
tury, was variously called. And, since it was something that one had not
met before, one had difficulties with what one should call it. Among
the terms that were used, I think the least committing one is Ger.
Urteilsinhalt, content of a judgement, by which I mean that which is
affirmed in an affirmation and denied in a denial. Bolzano, who was
the first to introduce this concept, called it proposition in itself, Ger.
Satz an sich. Frege also grappled with this terminological problem.
In Begriffsschrift, he called it judgeable content, Ger. beurteilbarer In-
halt. Later on, corresponding to his threefold division into expres-
sion, sense, and reference, in the case of this kind of entity, what was
the expression, he called sentence, Ger. Satz, what was the sense, he
called thought, Ger. Gedanke, and what was the reference, he called
truth value, Ger. Wahrheitswert. So the question arises, What should
I choose here? Should I choose sentence, thought, or truth value? The
closest possible correspondence is achieved, I think, if I choose Gedanke,
that is, thought, for late Frege. This is confirmed by the fact that, in
his very late logical investigations, he called the logical operations the
Gedankengefüge. Thus judgeable content is early Frege and thought
is late Frege. We also have the term state of affairs, Ger. Sachverhalt,
which was introduced by Stumpf and used by Wittgenstein in the Trac-
tatus. And, finally, we have the term objective, Ger. Objektiv, which
was the term used by Meinong. Maybe there were other terms as well
in circulation, but these are the ones that come immediately to my
mind.
Now, Russell used the term proposition for this new notion, which
has become the standard term in Anglo-Saxon philosophy and in mod-
ern logic. And, since he decided to use the word proposition in this
new sense, he had to use another word for the things that we prove and
that figure as premises and conclusion of a logical inference. His choice
was to translate Frege’s Urteil, not by judgement, as one would expect,
but by assertion. And why, one may ask, did he choose the word as-
sertion rather than translate Urteil literally by judgement? I think it
was to avoid any association with Kantian philosophy, because Urteil
was after all the central notion of logic as it was done in the Kantian
tradition. For instance, in his transcendental logic, which forms part
of the Kritik der reinen Vernunft, Kant arrives at his categories by
analysing the various forms that a judgement may have. That was his
clue to the discovery of all pure concepts of reason, as he called it.
Thus, in Russell’s hands, Frege’s Urteil came to be called assertion,
and the combination of Frege’s Urteilsstrich, judgement stroke, and
Inhaltsstrich, content stroke, came to be called the assertion sign.
Observe now where we have arrived through this development,
namely, at a notion of proposition which is entirely different, or dif-
ferent, at least, from the old one, that is, from the Gr. prìtasij and
the Lat. propositio. To repeat, the things that we prove, in particu-
lar, the premises and conclusion of a logical inference, are no longer
propositions in Russell’s terminology, but assertions. Conversely, the
things that we combine by means of the logical operations, the con-
nectives and the quantifiers, are not propositions in the old sense, that
is, what Russell calls assertions, but what he calls propositions. And,
as I said in the very beginning, the rule of conjunction introduction,
for instance, really allows us to affirm A & B, having affirmed A and
having affirmed B,
`A `B
` A&B
It is another matter, of course, that we may adopt conventions that
allow us to suppress the assertion sign, if it becomes too tedious to
write it out. Conceptually, it would nevertheless be there, whether I
write it as above or
A true B true
A & B true
as I think that I shall do in the following.
So far, I have made no attempt at defining the notions of judge-
ment, or assertion, and proposition. I have merely wanted to give a
preliminary hint at the difference between the two by showing how the
terminology has evolved.
To motivate my next step, consider any of the usual inference rules
of the propositional or predicate calculus. Let me take the rule of
disjunction introduction this time, for some change,
A
A∨B
Now, what do the variables A and B range over in a rule like this?
That is, what are you allowed to insert into the places indicated by
these variables? The standard answer to this question, by someone
who has received the now current logical education, would be to say
that A and B range over arbitrary formulas of the language that you are
considering. Thus, if the language is first order arithmetic, say, then
A and B should be arithmetical formulas. When you start thinking
about this answer, you will see that there is something strange about
it, namely, its language dependence. Because it is clearly irrelevant for
the validity of the rule whether A and B are arithmetical formulas, cor-
responding to the language of first order arithmetic, or whether they
contain, say, predicates defined by transfinite, or generalized, induc-
tion. The unary predicate expressing that a natural number encodes
an ordinal of the constructive second number class, for instance, is cer-
tainly not expressible in first order arithmetic, and there is no reason
at all why A and B should not be allowed to contain that predicate.
Or, surely, for the validity of the rule, A and B might just as well
be set theoretical formulas, supposing that we have given such a clear
sense to them that we clearly recognize that they express propositions.
Thus what is important for the validity of the rule is merely that A and
B are propositions, that is, that the expressions which we insert into
the places indicated by the variables A and B express propositions. It
seems, then, that the deficiency of the first answer, by which I mean
the answer that A and B should range over formulas, is eliminated
by saying that the variables A and B should range over propositions
instead of formulas. And this is entirely natural, because, after all, the
notion of formula, as given by the usual inductive definition, is nothing
but the formalistic substitute for the notion of proposition: when you
divest a proposition in some language of all sense, what remains is the
mere formula. But then, supposing we agree that the natural way out
of the first difficulty is to say that A and B should range over arbitrary
propositions, another difficulty arises, because, whereas the notion of
formula is a syntactic notion, a formula being defined as an expression
that can be formed by means of certain formation rules, the notion
of proposition is a semantic notion, which means that the rule is no
longer completely formal in the strict sense of formal logic. That a rule
of inference is completely formal means precisely that there must be
no semantic conditions involved in the rule: it may only put conditions
on the forms of the premises and conclusion. The only way out of this
second difficulty seems to be to say that, really, the rule has not one
but three premises, so that, if we were to write them all out, it would
read
A prop B prop A true
A ∨ B true
that is, from A and B being propositions and from the truth of A, we
are allowed to conclude the truth of A ∨ B. Here I am using
A prop
judgement proposition
evident judgement true proposition
Second lecture
A is true,
which is certainly the most basic form of judgement, for instance?
When one is faced with this question for the first time, it is tempt-
ing to answer simply that it is right to say that A is true provided that
A is true, and that it is wrong to say that A is true provided that A is
not true, that is, provided that A is false. In fact, this is what Aristotle
says in his definition of truth in the Metaphysics. For instance, he says
that it is not because you rightly say that you are white that you are
white, but because you are white that what you say is correct. But a
moment’s reflection shows that this first answer is simply wrong. Even
if every even number is the sum of two prime numbers, it is wrong of
me to say that unless I know it, that is, unless I have proved it. And it
would have been wrong of me to say that every map can be coloured
by four colours before the recent proof was given, that is, before I ac-
quired that knowledge, either by understanding the proof myself, or
by trusting its discoverers. So the condition for it to be right of me to
affirm a proposition A, that is, to say that A is true, is not that A is
true, but that I know that A is true. This is a point which has been
made by Dummett and, before him, by Brentano, who introduced the
apt term blind judgement for a judgement which is made by someone
who does not know what he is saying, although what he says is correct
in the weaker sense that someone else knows it, or, perhaps, that he
himself gets to know it at some later time. When you are forced into
answering a yes or no question, although you do not know the answer,
and happen to give the right answer, right as seen by someone else, or
by you yourself when you go home and look it up, then you make a
blind judgement. Thus you err, although the teacher does not discover
your error. Not to speak of the fact that the teacher erred more greatly
by not giving you the option of giving the only the answer which would
have been honest, namely, that you did not know.
The preceding consideration does not depend on the particular form
of judgement, in this case, A is true, that I happened to use as an
example. Quite generally, the condition for it to be right of you to
make a judgement is that you know it, or, what amounts to the same,
that it is evident to you. The notion of evidence is related to the notion
of knowledge by the equation
evident = known.
When you say that a judgement is evident, you merely express that
you have understood, comprehended, grasped, or seen it, that is, that
you know it, because to have understood is to know. This is reflected
in the etymology of the word evident, which comes from Lat. ex, out
of, from, and videre, to see, in the metaphorical sense, of course.
There is absolutely no question of a judgement being evident in
itself, independently of us and our cognitive activity. That would be
just as absurd as to speak of a judgement as being known, not by
somebody, you or me, but in itself. To be evident is to be evident to
somebody, as inevitably as to be known is to be known by somebody.
That is what Brouwer meant by saying, in Consciousness, Philosophy,
and Mathematics, that there are no nonexperienced truths, a basic
intuitionistic tenet. This has been puzzling, because it has been under-
stood as referring to the truth of a proposition, and clearly there are
true propositions whose truth has not been experienced, that is, propo-
sitions which can be shown to be true in the future, although they have
not been proved to be true now. But what Brouwer means here is not
that. He does not speak about propositions and truth: he speaks about
judgements and evidence, although he uses the term truth instead of
the term evidence. And what he says is then perfectly right: there is
no evident judgement whose evidence has not been experienced, and
experience it is what you do when you understand, comprehend, grasp,
or see it. There is no evidence outside our actual or possible experi-
ence of it. The notion of evidence is by its very nature subject related,
relative to the knowing subject, that is, in Kantian terminology.
As I already said, when you make, or utter, a judgement under
normal circumstances, you thereby express that you know it. There is
no need to make this explicit by saying,
I know that . . .
A is true
it is clear from the form of your utterance that you express a wish.
There is no need of making this explicit by saying,
Some languages, like Greek, use the optative mood to make it clear
that an utterance expresses a wish or desire.
Consider the pattern that we have arrived at now,
act object
z }| { z }| {
I know A is true
Here the grammatical subject I refers to the subject, self, or ego, and
the grammatical predicate know to the act, which in this particular
case is an act of knowing, but might as well have been an act of con-
jecturing, doubting, wishing, fearing, etc. Thus the predicate know
indicates the modality of the act, that is, the way in which the subject
relates to the object, or the particular force which is involved, in this
case, the epistemic force. Observe that the function of the grammatical
moods, indicative, subjunctive, imperative, and optative, is to express
modalities in this sense. Finally, A is true is the judgement or, in gen-
eral, the object of the act, which in this case is an object of knowledge,
but might have been an object of conjecture, doubt, wish, fear, etc.
The closest possible correspondence between the analysis that I am
giving and Frege’s notation for a judgement
`A
I know . . .
. . . is true.
Then it is the vertical stroke which is superfluous, whereas the hori-
zontal stroke is needed to show that the judgement has the form of an
affirmation. But this can hardly be read out of Frege’s own account of
the assertion sign: you have to read it into his text.
What is a judgement before it has become evident, or known? That
is, of the two, judgement and evident judgement, how is the first to be
defined? The characteristic of a judgement in this sense is merely that
it has been laid down what knowledge is expressed by it, that is, what
you must know in order to have the right to make, or utter, it. And
this is something which depends solely on the form of the judgement.
For example, if we consider the two forms of judgement
A is a proposition
and
A is true,
then there is something that you must know in order to have the right
to make a judgement of the first form, and there is something else
which you must know, in addition, in order to have the right to make
a judgement of the second form. And what you must know depends
in neither case on A, but only on the form of the judgement, . . . is
a proposition or . . . is true, respectively. Quite generally, I may say
that a judgement in this sense, that is, a not yet known, and perhaps
even unknowable, judgement, is nothing but an instance of a form
of judgement, because it is for the various forms of judgement that
I lay down what you must know in order to have the right to make
a judgement of one of those forms. Thus, as soon as something has
the form of a judgement, it is already a judgement in this sense. For
example, A is a proposition is a judgement in this sense, because it
has a form for which I have laid down, or rather shall lay down, what
you must know in order to have the right to make a judgement of that
form. I think that I may make things a bit clearer by showing again
in a picture what is involved here. Let me take the first form to begin
with.
evident judgement
z }| {
judgement
z }| {
I know A is a proposition
A
expression form of judgement
Here is involved, first, an expression A, which should be a complete
expression. Second, we have the form . . . is a proposition, which is
the form of judgement. Composing these two, we arrive at A is a
proposition, which is a judgement in the first sense. And then, third,
we have the act in which I grasp this judgement, and through which it
becomes evident. Thus it is my act of grasping which is the source of
the evidence. These two together, that is, the judgement and my act
of grasping it, become the evident judgement. And a similar analysis
can be given of a judgement of the second form.
evident judgement
z }| {
judgement
z }| {
I know A is true
A
proposition form of judgement
Such a judgement has the form . . . is true, but what fills the open place,
or hole, in the form is not an expression any longer, but a proposition.
And what is a proposition? A proposition is an expression for which
the previous judgement has already been grasped, because there is no
question of something being true unless you have previously grasped it
as a proposition. But otherwise the picture remains the same here.
Now I must consider the discussion of the notion of judgement fin-
ished and pass on to the notion of proof. Proof is a good word, because,
unlike the word proposition, it has not changed its meaning. Proof ap-
parently means the same now as it did when the Greeks discovered
the notion of proof, and therefore no terminological difficulties arise.
Observe that both Lat. demonstratio and the corresponding words in
the modern languages, like It. dimostrazione, Eng. demonstration, and
Ger. Beweis, are literal translations of Gr. pìdeicij, deriving as it does
from Gr. deÐknumi, I show, which has the same meaning as Lat. mon-
strare and Ger. weisen.
If you want to have a first approximation to the notion of proof, a
first definition of what a proof is, the strange thing is that you can-
not look it up in any modern textbook of logic, because what you get
out of the standard textbooks of modern logic is the definition of what
a formal proof is, at best with a careful discussion clarifying that a
formal proof in the sense of this definition is not what we ordinarily
call a proof in mathematics. That is, you get a formal proof defined
as a finite sequence of formulas, each one of them being an immediate
consequence of some of the preceding ones, where the notion of imme-
diate consequence, in turn, is defined by saying that a formula is an
immediate consequence of some other formulas if there is an instance
of one of the figures, called rules of inference, which has the other for-
mulas as premises and the formula itself as conclusion. Now, this is not
what a real proof is. That is why you have the warning epithet formal
in front of it, and do not simply say proof.
What is a proof in the original sense of the word? The ordinary
dictionary definition says, with slight variations, that a proof is that
which establishes the truth of a statement. Thus a proof is that which
makes a mathematical statement, or enunciation, into a theorem, or
proposition, in the old sense of the word which is retained in mathe-
matics. Now, remember that I have reserved the term true for true
propositions, in the modern sense of the word, and that the things
that we prove are, in my terminology, judgements. Moreover, to avoid
terminological confusion, judgements qualify as evident rather than
true. Hence, translated into the terminology that I have decided upon,
the dictionary definition becomes simply,
A proof is what makes a judgement evident.
Accepting this, that is, that the proof of a judgement is that which
makes it evident, we might just as well say that the proof of a judge-
ment is the evidence for it. Thus proof is the same as evidence. Com-
bining this with the outcome of the previous discussion of the notion of
evidence, which was that it is the act of understanding, comprehend-
ing, grasping, or seeing a judgement which confers evidence on it, the
inevitable conclusion is that the proof of a judgement is the very act
of grasping it. Thus a proof is, not an object, but an act. This is what
Brouwer wanted to stress by saying that a proof is a mental construc-
tion, because what is mental, or psychic, is precisely our acts, and the
word construction, as used by Brouwer, is but a synonym for proof.
Thus he might just as well have said that the proof of a judgement is
the act of proving, or grasping, it. And the act is primarily the act as it
is being performed. Only secondarily, and irrevocably, does it become
the act that has been performed.
As is often the case, it might have been better to start with the verb
rather than the noun, in this case, with the verb to prove rather than
with the noun proof. If a proof is what makes a judgement evident,
then, clearly, to prove a judgement is to make it evident, or known.
To prove something to yourself is simply to get to know it. And to
prove something to someone else is to try to get him, or her, to know
it. Hence
to prove = to get to know = to understand,
comprehend, grasp, or see.
This means that prove is but another synonym for understand, com-
prehend, grasp, or see. And, passing to the perfect tense,
to have proved = to know = to have understood,
comprehended, grasped, or seen.
We also speak of acquiring and possessing knowledge. To possess
knowledge is the same as to have acquired it, just as to know something
is the same as to have understood, comprehended, grasped, or seen it.
Thus the relation between the plain verb to know and the venerable
expressions to acquire and to possess knowledge is given by the two
equations,
to get to know = to acquire knowledge
and
to know = to possess knowledge.
On the other hand, the verb to prove and the noun proof are related
by the two similar equations,
to prove = to acquire, or construct, a proof
and
to have proved = to possess a proof.
It is now manifest, from these equations, that proof and knowledge are
the same. Thus, if proof theory is construed, not in Hilbert’s sense,
as metamathematics, but simply as the study of proofs in the original
sense of the word, then proof theory is the same as theory of knowledge,
which, in turn, is the same as logic in the original sense of the word, as
the study of reasoning, or proof, not as metamathematics.
Remember that the proof of a judgement is the very act of knowing
it. If this act is atomic, or indivisible, then the proof is said to be im-
mediate. Otherwise, that is, if the proof consists of a whole sequence,
or chain, of atomic actions, it is mediate. And, since proof and knowl-
edge are the same, the attributes immediate and mediate apply equally
well to knowledge. In logic, we are no doubt more used to saying of
inferences, rather than proofs, that they are immediate or mediate, as
the case may be. But that makes no difference, because inference and
proof are the same. It does not matter, for instance, whether we say
rules of inference or proof rules, as has become the custom in program-
ming. And, to take another example, it does not matter whether we
say that a mediate proof is a chain of immediate inferences or a chain
of immediate proofs. The notion of formal proof that I referred to in
the beginning of my discussion of the notion of proof has been arrived
at by formalistically interpreting what you mean by an immediate in-
ference, by forgetting about the difference between a judgement and
a proposition, and, finally, by interpreting the notion of proposition
formalistically, that is, by replacing it by the notion of formula. But a
real proof is and remains what it has always been, namely, that which
makes a judgement evident, or simply, the evidence for it. Thus, if we
do not have the notion of evidence, we do not have the notion of proof.
That is why the notion of proof has fared so badly in those branches
of philosophy where the notion of evidence has fallen into disrepute.
We also speak of a judgement being immediately and mediately
evident, respectively. Which of the two is the case depends of course on
the proof which constitutes the evidence for the judgement. If the proof
is immediate, then the judgement is said to be immediately evident.
And an immediately evident judgement is what we call an axiom. Thus
an axiom is a judgement which is evident by itself, not by virtue of
some previously proved judgements, but by itself, that is, a self-evident
judgement, as one has always said. That is, always before the notion
of evidence became disreputed, in which case the notion of axiom and
the notion of proof simply become deflated: we cannot make sense
of the notion of axiom and the notion of proof without access to the
notion of evidence. If, on the other hand, the proof which constitutes
the evidence for a judgement is a mediate one, so that the judgement
is evident, not by itself, but only by virtue of some previously proved
judgements, then the judgement is said to be mediately evident. And
a mediately evident judgement is what we call a theorem, as opposed
to an axiom. Thus an evident judgement, that is, a proposition in the
old sense of the word which is retained in mathematics, is either an
axiom or a theorem.
Instead of applying the attributes immediate and mediate to proof,
or knowledge, I might have chosen to speak of intuitive and discursive
proof, or knowledge, respectively. That would have implied no differ-
ence of sense. The proof of an axiom can only be intuitive, which is
to say that an axiom has to be grasped immediately, in a single act.
The word discursive, on the other hand, comes from Lat. discurrere,
to run to and fro. Thus a discursive proof is one which runs, from
premises to conclusion, in several steps. It is the opposite of an intu-
itive proof, which brings you to the conclusion immediately, in a single
step. When one says that the immediate propositions in the old sense
of the word proposition, that is, the immediately evident judgements
in my terminology, are unprovable, what is meant is of course only that
they cannot be proved discursively. Their proofs have to rest intuitive.
This seems to be all that I have to say about the notion of proof at the
moment, so let me pass on to the next item on the agenda, the forms
of judgement and their semantical explanations.
The forms of judgement have to be displayed in a table, simply,
and the corresponding semantical explanations have to be given, one
for each of those forms. A form of judgement is essentially just what
is called a category, not in the sense of category theory, but in the
logical, or philosophical, sense of the word. Thus I have to say what
my forms of judgement, or categories, are, and, for each one of those
forms, I have to explain what you must know in order to have the right
to make a judgement of that form. By the way, the forms of judgement
have to be introduced in a specific order. Actually, not only the forms
of judgement, but all the notions that I am undertaking to explain
here have to come in a specific order. Thus, for instance, the notion
of judgement has to come before the notion of proposition, and the
notion of logical consequence has to be dealt with before explaining
the notion of implication. There is an absolute rigidity in this order.
The notion of proof, for instance, has to come precisely where I have
put it here, because it is needed in some other explanations further on,
where it is presupposed already. Revealing this rigid order, thereby
arriving eventually at the concepts which have to be explained prior
to all other concepts, turns out to be surprisingly difficult: you seem
to arrive at the very first concepts last of all. I do not know what
it should best be called, maybe the order of conceptual priority, one
concept being conceptually prior to another concept if it has to be
explained before the other concept can be explained.
Let us now consider the first form of judgement,
A is a proposition,
or, as I shall continue to abbreviate it,
A prop.
What I have just displayed to you is a linguistic form, and I hope that
you can recognize it. What you cannot see from the form, and which
I therefore proceed to explain to you, is of course its meaning, that is,
what knowledge is expressed by, or embodied in, a judgement of this
form. The question that I am going to answer is, in ontological terms,
What is a proposition?
This is the usual Socratic way of formulating questions of this sort. Or
I could ask, in more knowledge theoretical terminology,
What is it to know a proposition?
or, if you prefer,
What knowledge is expressed by a judgement
of the form A is a proposition?
or, this may be varied endlessly,
What does a judgement of the form A is a proposition mean?
These various ways of posing essentially the same question reflect
roughly the historical development, from a more ontological to a more
knowledge theoretical way of posing, and answering, questions of this
sort, finally ending up with something which is more linguistic in na-
ture, having to do with form and meaning.
Now, one particular answer to this question, however it be formu-
lated, is that a proposition is something that is true or false, or, to use
Aristotle’s formulation, that has truth or falsity in it. Here we have
to be careful, however, because what I am going to explain is what a
proposition in the modern sense is, whereas what Aristotle explained
was what an enunciation, being the translation of Gr. pìfansij, is.
And it was this explanation that he phrased by saying that an enun-
ciation is something that has truth or falsity in it. What he meant by
this was that it is an expression which has a form of speech such that,
when you utter it, you say something, whether truly or falsely. That
is certainly not how we now interpret the definition of a proposition as
something which is true or false, but it is nevertheless correct that it
echoes Aristotle’s formulation, especially in its symmetric treatment of
truth and falsity.
An elaboration of the definition of a proposition as something that
is true or false is to say that a proposition is a truth value, the true
or the false, and hence that a declarative sentence is an expression
which denotes a truth value, or is the name of a truth value. This
was the explanation adopted by Frege in his later writings. If a propo-
sition is conceived in this way, that is, simply as a truth value, then
there is no difficulty in justifying the laws of the classical propositional
calculus and the laws of quantification over finite, explicitly listed, do-
mains. The trouble arises when you come to the laws for forming
quantified propositions, the quantifiers not being restricted to finite
domains. That is, the trouble is to make the two laws
A(x) prop A(x) prop
(∀x)A(x) prop (∃x)A(x) prop
evident when propositions are conceived as nothing but truth values.
To my mind, at least, they simply fail to be evident. And I need not
be ashamed of the reference to myself in this connection: as I said
in my discussion of the notion of evidence, it is by its very nature
subject related. Others must make up their minds whether these laws
are really evident to them when they conceive of propositions simply
as truth values. Although we have had this notion of proposition and
these laws for forming quantified propositions for such a long time,
we still have no satisfactory explanations which serve to make them
evident on this conception of the notion of proposition. It does not
help to restrict the quantifiers, that is, to consider instead the laws
(x ∈ A) (x ∈ A)
B(x) prop B(x) prop
(∀x ∈ A)B(x) prop (∃x ∈ A)B(x) prop
A is a proposition,
the semantical explanation which goes together with it is this, and here
I am using the knowledge theoretical formulation, that to know a propo-
sition, which may be replaced, if you want, by problem, expectation,
or intention, you must know what counts as a verification, solution,
fulfillment, or realization of it. Here verification matches with propo-
sition, solution with problem, fulfillment with expectation as well as
with intention, and realization with intention. Realization is the term
introduced by Kleene, but here I am of course not using it in his sense:
Kleene’s realizability interpretation is a nonstandard, or nonintended,
interpretation of intuitionistic logic and arithmetic. The terminology
of intention and fulfillment was taken over by Heyting from Husserl, via
Oskar Becker, apparently. There is a long chapter in the sixth, and last,
of his Logische Untersuchungen which bears the title Bedeutungsinten-
tion und Bedeutungserfüllung, and it is these two terms, intention and
fulfillment, Ger. Erfüllung, that Heyting applied in his analysis of the
notions of proposition and truth. And he did not just take the terms
from Husserl: if you observe how Husserl used these terms, you will see
that they were appropriately applied by Heyting. Finally, verification
seems to be the perfect term to use together with proposition, coming
as it does from Lat. verus, true, and facere, to make. So to verify is to
make true, and verification is the act, or process, of verifying something.
For a long time, I tried to avoid using the term verification, because
it immediately gives rise to discussions about how the present account
of the notions of proposition and truth is related to the verificationism
that was discussed so much in the thirties. But, fortunately, this is fifty
years ago now, and, since we have a word which lends itself perfectly
to expressing what needs to be expressed, I shall simply use it, without
wanting to get into discussion about how the present semantical theory
is related to the verificationism of the logical positivists.
What would an example be? If you take a proposition like,
means, that is, what you must know in order to have the right to
make a judgement of this form. And the explanation would be that, to
know that a proposition is true, a problem is solvable, an expectation
is fulfillable, or an intention is realizable, you must know how to verify,
solve, fulfill, or realize it, respectively. Thus this explanation equates
truth with verifiability, solvability, fulfillability, or realizability. The
important point to observe here is the change from is in A is true to
can in A can be verified, or A is verifiable. Thus what is expressed in
terms of being in the first formulation really has the modal character
of possibility.
Now, as I said earlier in this lecture, to know a judgement is the
same as to possess a proof of it, and to know a judgement of the
particular form A is true is the same as to know how, or be able, to
verify the proposition A. Thus knowledge of a judgement of this form
is knowledge how in Ryle’s terminology. On the other hand, to know
how to do something is the same as to possess a way, or method, of
doing it. This is reflected in the etymology of the word method, which
is derived from Gr. met, after, and ådìj, way. Taking all into account,
we arrive at the conclusion that a proof that a proposition A is true
is the same as a method of verifying, solving, fulfilling, or realizing A.
This is the explanation for the frequent appearance of the word method
in Heyting’s explanations of the meanings of the logical constants. In
connection with the word method, notice the tendency of our language
towards hypostatization. I can do perfectly well without the concept of
method in my semantical explanations: it is quite sufficient for me to
have access to the expression know how, or knowledge how. But it is in
the nature of our language that, when we know how to do something,
we say that we possess a method of doing it.
Summing up, I have now explained the two forms of categorical
judgement,
A is a proposition
and
A is true,
respectively, and they are the only forms of categorical judgement that
I shall have occasion to consider. Observe that knowledge of a judge-
ment of the second form is knowledge how, more precisely, knowledge
how to verify A, whereas knowledge of a judgement of the first form
is knowledge of a problem, expectation, or intention, which is knowl-
edge what to do, simply. Here I am introducing knowledge what as a
counterpart of Ryle’s knowledge how. So the difference between these
two kinds of knowledge is the difference between knowledge what to do
and knowledge how to do it. And, of course, there can be no question
of knowing how to do something before you know what it is that is to
be done. The difference between the two kinds of knowledge is a cat-
egorical one, and, as you see, what Ryle calls knowledge that, namely,
knowledge that a proposition is true, is equated with knowledge how
on this analysis. Thus the distinction between knowledge how and
knowledge that evaporates on the intuitionistic analysis of the notion
of truth.
Third lecture
The reason why I said that the word verification may be dangerous
is that the principle of verification formulated by the logical positivists
in the thirties said that a proposition is meaningful if and only if it is
verifiable, or that the meaning of a proposition is its method of ver-
ification. Now that is to confuse meaningfulness and truth. I have
indeed used the word verifiable and the expression method of verifica-
tion. But what is equated with verifiability is not the meaningfulness
but the truth of a proposition, and what qualifies as a method of ver-
ification is a proof that a proposition is true. Thus the meaning of a
proposition is not its method of verification. Rather, the meaning of a
proposition is determined by what it is to verify it, or what counts as
a verification of it.
The next point that I want to bring up is the question,
Are there propositions which are true,
but which cannot be proved to be true?
And it suffices to think of mathematical propositions here, like the
Goldbach conjecture, the Riemann hypothesis, or Fermat’s last theo-
rem. This fundamental question was once posed to me outright by a
colleague of mine in the mathematics department, which shows that
even working mathematicians may find themselves puzzled by deep
philosophical questions. At first sight, at least, there seem to be two
possible answers to this question. One is simply,
No,
and the other is,
Perhaps,
although it is of course impossible for anybody to exhibit an example
of such a proposition, because, in order to do that, he would already
have to know it to be true. If you are at all puzzled by this question,
it is an excellent subject of meditation, because it touches the very
conflict between idealism and realism in the theory of knowledge, the
first answer, No, being indicative of idealism, and the second answer,
Perhaps, of realism. It should be clear, from any point of view, that
the answer depends on how you interpret the three notions in terms
of which the question is formulated, that is, the notion of proposition,
the notion of truth, and the notion of proof. And it should already be
clear, I believe, from the way in which I have explained these notions,
that the question simply ceases to be a problem, and that it is the first
answer which is favoured.
To see this, assume first of all that A is a proposition, or problem.
Then
A is true
is a judgement which gives rise to a new problem, namely, the problem
of proving that A is true. To say that that problem is solvable is pre-
cisely the same as saying that the judgement that A is true is provable.
Now, the solvability of a problem is always expressed by a judgement.
Hence
(A is true) is provable
is a new judgement. What I claim is that we have the right to make this
latter judgement if and only if we have the right to make the former
judgement, that is, that the proof rule
A is true
(A is true) is provable
are both valid. This is the sense of saying that A is true if and only if
A can be proved to be true. To justify the first rule, assume that you
know its premise, that is, that you have proved that A is true. But, if
you have proved that A is true, then you can, or know how to, prove
that A is true, which is what you need to know in order to have the
right to judge the conclusion. In this step, I have relied on the principle
that, if something has been done, then it can be done. To justify the
second rule, assume that you know its premise, that is, that you know
how to prove the judgement A is true. On that assumption, I have to
explain the conclusion to you, which is to say that I have to explain
how to verify the proposition A. This is how you do it. First, put your
knowledge of the premise into practice. That yields as result a proof
that A is true. Now, such a proof is nothing but knowledge how to
verify, or a method of verifying, the proposition A. Hence, putting it,
in turn, into practice, you end up with a verification of the proposition
A, as required. Observe that the inference in this direction is essentially
a contraction of two possibilities into one: if you know how to know
how to do something, then you know how to do it.
All this is very easy to say, but, if one is at all puzzled by the ques-
tion whether there are unprovable truths, then it is not an easy thing to
make up one’s mind about. For instance, it seems, from Heyting’s writ-
ings on the semantics of intuitionistic logic in the early thirties, that
he had not arrived at this position at that time. The most forceful and
persistent criticism of the idea of a knowledge independent, or knowl-
edge transcendent, notion of truth has been delivered by Dummett,
although it seems difficult to find him ever explicitly committing him-
self in his writings to the view that, if a proposition is true, then it can
also be proved to be true. Prawitz seems to be leaning towards this
nonrealistic principle of truth, as he calls it, in his paper Intuitionistic
Logic: A Philosophical Challenge. And, in his book Det Osägbara,
printed in the same year, Stenlund explicitly rejects the idea of true
propositions that are in principle unknowable. The Swedish proof the-
orists seem to be arriving at a common philosophical position.
Next I have to say something about hypothetical judgements, be-
fore I proceed to the final piece, which consists of the explanations
of the meanings of the logical constants and the justifications of the
logical laws. So far, I have only introduced the two forms of categor-
ical judgement A is a proposition and A is true. The only forms of
judgement that I need to introduce, besides these, are forms of hypo-
thetical judgement. Hypothetical means of course the same as under
assumptions. The Gr. Ípìqesij, hypothesis, was translated into Lat.
suppositio, supposition, and they both mean the same as assumption.
Now, what is the rule for making assumptions, quite generally? It is
simple. Whenever you have a judgement in the sense that I am using
the word, that is, a judgement in the sense of an instance of a form of
judgement, then it has been laid down what you must know in order
to have the right to make it. And that means that it makes perfectly
good sense to assume it, which is the same as to assume that you know
it, which, in turn, is the same as to assume that you have proved it.
Why is it the same to assume it as to assume that you know it? Be-
cause of the constant tacit convention that the epistemic force, I know
. . . , is there, even if it is not made explicit. Thus, when you assume
something, what you do is that you assume that you know it, that is,
that you have proved it. And, to repeat, the rule for making assump-
tions is simply this: whenever you have a judgement, in the sense of an
instance of a form of judgement, you may assume it. That gives rise
to the notion of hypothetical judgement and the notion of hypothetical
proof, or proof under hypotheses.
The forms of hypothetical judgement that I shall need are not so
many. Many more can be introduced, and they are needed for other
purposes. But what is absolutely necessary for me is to have access to
the form
A1 true, . . . , An true | A prop,
which says that A is a proposition under the assumptions that
A1, . . . , An are all true, and, on the other hand, the form
which says that the proposition A is true under the assumptions that
A1, . . . , An are all true. Here I am using the vertical bar for the relation
of logical consequence, that is, for what Gentzen expressed by means of
the arrow → in his sequence calculus, and for which the double arrow
⇒ is also a common notation. It is the relation of logical consequence,
which must be carefully distinguished from implication. What stands
to the left of the consequence sign, we call the hypotheses, in which
case what follows the consequence sign is called the thesis, or we call
the judgements that precede the consequence sign the antecedents and
the judgement that follows after the consequence sign the consequent.
This is the terminology which Gentzen took over from the scholastics,
except that, for some reason, he changed consequent into succedent and
consequence into sequence, Ger. Sequenz, usually improperly rendered
by sequent in English.
hypothetical judgement
(logical) consequence
z }| {
A1 true, . . . , An true | A prop
A1 true, . . . , An true | A true
| {z } | {z }
antecedents consequent
hypotheses thesis
Since I am making the assumptions A1 true, . . . , An true, I must be
presupposing something here, because, surely, I cannot make those
assumptions unless they are judgements. Specifically, in order for A1
true to be a judgement, A1 must be a proposition, and, in order for
A2 true to be a judgement, A2 must be a proposition, but now merely
under the assumption that A1 is true, . . . , and, in order for An true
to be a judgement, An must be a proposition under the assumptions
that A1, . . . , An−1 are all true. Unlike in Gentzen’s sequence calculus,
the order of the assumptions is important here. This is because of the
generalization that something being a proposition may depend on other
things being true. Thus, for the assumptions to make sense, we must
presuppose
A1 prop,
A1 true | A2 prop,
..
.
A1 true, . . . , An−1 true | An prop.
Supposing this, that is, supposing that we know this, it makes perfectly
good sense to assume, first, that A1 is true, second, that A2 is true,
. . . , finally, that An is true, and hence
A1 true, . . . , An true | A prop
is a perfectly good judgement whatever expression A is, that is, what-
ever expression you insert into the place indicated by the variable A.
And why is it a good judgement? To answer that question, I must
explain to you what it is to know such a judgement, that is, what
constitutes knowledge, or proof, of such a judgement. Now, quite gen-
erally, a proof of a hypothetical judgement, or logical consequence, is
nothing but a hypothetical proof of the thesis, or consequent, from the
hypotheses, or antecedents. The notion of hypothetical proof, in turn,
which is a primitive notion, is explained by saying that it is a proof
which, when supplemented by proofs of the hypotheses, or antecedents,
becomes a proof of the thesis, or consequent. Thus the notion of cate-
gorical proof precedes the notion of hypothetical proof, or inference, in
the order of conceptual priority. Specializing this general explanation
of what a proof of a hypothetical judgement is to the particular form
of hypothetical judgement
of the consequent.
I am sorry that I have had to be so brief in my treatment of hypo-
thetical judgements, but what I have said is sufficient for the following,
except that I need to generalize the two forms of hypothetical judge-
ment so as to allow generality in them. Thus I need judgements which
are, not only hypothetical, but also general, which means that the first
form is turned into
A1 (x1 , . . . , xm) true, . . . , An (x1 , . . . , xm) true |x1,...,xm A(x1 , . . . , xm ) prop
A true | ⊥ true
(A true)
B prop
that B is true from the assumption that A is true. Now take your proof
of the right premise and adjoin it to the verification of A ⊃ B. Then
you get a categorical proof ..
.
A true
..
.
B true
of the conclusion that B is true. Here, of course, I am implicitly using
the principle that, if you supplement a hypothetical proof with proofs
of its hypotheses, then you get a proof of its conclusion. But this is in
the nature of a hypothetical proof: it is that property which makes a
hypothetical proof into what it is. So now you have a proof that B is
true, a proof which is knowledge how to verify B. Putting it, in turn,
into practice, you end up with a verification of B. This finishes my
explanation of how the proposition B is verified.
In the course of my semantical explanation of the elimination rule
for implication, I have performed certain transformations which are
very much like an implication reduction in the sense of Prawitz. Indeed,
I have explained the semantical role of this syntactical transformation.
The place where it belongs in the meaning theory is precisely in the
semantical explanation, or justification, of the elimination rule for im-
plication. Similarly, the reduction rules for the other logical constants
serve to explain the elimination rules associated with those constants.
The key to seeing the relationship between the reduction rules and
the semantical explanations of the elimination rules is this: to verify
a proposition by putting a proof of yours that it is true into practice
corresponds to reducing a natural deduction to introductory form and
deleting the last inference. This takes for granted, as is in fact the
case, that an introduction is an inference in which you conclude, from
the possession of a verification of a proposition, that you know how to
verify it. In particular, verifying a proposition B by means of a proof
that B is true .. ..
. .
A ⊃ B true A true
B true
which ends with an application of modus ponens, corresponds to re-
ducing the proof of the left premise to introductory form
(A true)
..
.
..
B true .
A ⊃ B true A true
B true
Thus, in this formulation, there are two rules and not only one. Also,
it is still presupposed, of course, that A and B are propositions.
Explanation. It suffices for me to explain one of the rules, say the
first, because the explanation of the other is completely analogous. To
this end, assume that you know the premise, and I shall explain to you
the conclusion, which is to say that I shall explain how to verify A.
This is how you do it. First use your knowledge of the premise to get a
verification of A & B. By the meaning of conjunction, just explained,
that verification consists of a proof that A is true as well as a proof
that B is true, .. ..
. and .
A true B true
Now select the first of these two proofs. By the definitions of the
notions of proof and truth, that proof is knowledge how to verify A.
So, putting it into practice, you end up with a verification of A. This
finishes the explanations of the rules of conjunction.
The next logical operation to be treated is disjunction. And, as
always, the formation rule must be explained first.
∨-formation.
A prop B prop
A ∨ B prop
Explanation. To justify it, assume that you know the premises, that
is, that you know what it is to verify A as well as what it is to verify
B. On that assumption, I explain to you what proposition A ∨ B is by
saying that a verification of A ∨ B is either a proof that A is true or a
proof that B is true, .. ..
. or .
A true B true
Thus, in the wording of the Kolmogorov interpretation, a solution to
the problem A ∨ B is either a method of solving the problem A or a
method of solving the problem B.
∨-introduction.
A true B true
A ∨ B true A ∨ B true
In both of these rules, the premises of the formation rule, which say
that A and B are propositions, are still in force.
Explanation. Assume that you know the premise of the first rule
of disjunction introduction, that is, that you have proved, or possess a
proof of, the judgement that A is true. By the definition of disjunction,
this proof is a verification of the proposition A ∨ B. Hence, by the
principle that, if something has been done, then it can be done, you
certainly can, or know how to, verify the proposition A ∨ B. And it is
this knowledge which you express by judging the conclusion of the rule,
that is, by judging the proposition A ∨ B to be true. The explanation
of the second rule of disjunction introduction is entirely similar.
∨-elimination.
(A true) (B true)
A ∨ B true C true C true
C true
Here it is presupposed, not only that A and B are propositions, but also
that C is a proposition provided that A ∨ B is true. Observe that, in
this formulation of the rule of disjunction elimination, C is presupposed
to be a proposition, not outright, but merely on the hypothesis that
A ∨ B is true. Otherwise, it is just like the Gentzen rule.
Explanation. Assume that you know, or have proved, the premises.
By the definition of truth, your knowledge of the first premise is knowl-
edge how to verify the proposition A ∨ B. Put that knowledge of yours
into practice. By the definition of disjunction, you then end up either
with a proof that A is true or with a proof that B is true,
.. ..
. or .
A true B true
In the first case, join the proof that A is true to the proof that you
already possess of the second premise, which is a hypothetical proof
that C is true under the hypothesis that A is true,
A true
..
.
C true
where such and such is something of which you know, that is, are
certain, that it cannot be done.
Observe that the justification of the elimination rule for falsehood
only rests on the knowledge that ⊥ is false. Thus, if A is a proposition,
not necessarily ⊥, and C is a proposition provided that A is true, then
the inference
A true
C true
is valid as soon as A is false. Choosing C to be ⊥, we can conclude, by
implication introduction, that A ⊃ ⊥ is true provided that A is false.
Conversely, if A ⊃ ⊥ is true and A is true, then, by modus ponens, ⊥
would be true, which it is not. Hence A is false if A ⊃ ⊥ is true. These
two facts together justify the nominal definition of ∼A, the negation of
A, as A ⊃ ⊥, which is commonly made in intuitionistic logic. However,
the fact that A is false if and only if ∼A is true should not tempt one
to define the notion of denial by saying that
A is false
means that
∼A is true.
That the proposition A is false still means that it is impossible to verify
A, and this is a notion which cannot be reduced to the notions of nega-
tion, negation of propositions, that is, and truth. Denial comes before
negation in the order of conceptual priority, just as logical consequence
comes before implication, and the kind of generality which a judgement
may have comes before universal quantification.
As has been implicit in what I have just said,
A is false = A is not true = A is not verifiable
= A cannot be verified.
Moreover, in the course of justifying the rule of falsehood elimination,
I proved that ⊥ is false, that is, that ⊥ is not true. Now, remember
that, in the very beginning of this lecture, we convinced ourselves that a
proposition is true if and only if the judgement that it is true is provable.
Hence, negating both members, a proposition is false if and only if the
judgement that it is true cannot be proved, that is, is unprovable. Using
this in one direction, we can conclude, from the already established
falsity of ⊥, that the judgement that ⊥ is true is unprovable. This is,
if you want, an absolute consistency proof: it is a proof of consistency
with respect to the unlimited notion of provability, or knowability, that
pervades these lectures. And
(⊥ is true) is unprovable
is the judgement which expresses the absolute consistency, if I may call
it so. By my chain of explanations, I hope that I have succeeded in
making it evident.
The absolute consistency brings with it as a consequence the rel-
ative consistency of any system of correct, or valid, inference rules.
Suppose namely that you have a certain formal system, a system of
inference rules, and that you have a formal proof in that system of the
judgement that ⊥ is true. Because of the absolute consistency, that is,
the unprovability of the judgement that ⊥ is true, that formal proof, al-
though formally correct, is no proof, not a real proof, that is. How can
that come about? Since a formal proof is a chain of formally immediate
inferences, that is, instances of the inference rules of the system, that
can only come about as a result of there being some rule of inference
which is incorrect. Thus, if you have a formal system, and you have
convinced yourself of the correctness of the inference rules that belong
to it, then you are sure that the judgement that ⊥ is true cannot be
proved in the system. This means that the consistency problem is real-
ly the problem of the correctness of the rules of inference, and that, at
some stage or another, you cannot avoid having to convince yourself
of their correctness. Of course if you take any old formal system, it
may be that you can carry out a metamathematical consistency proof
for it, but that consistency proof will rely on the intuitive correctness
of the principles of reasoning that you use in that proof, which means
that you are nevertheless relying on the correctness of certain forms
of inference. Thus the consistency problem is really the problem of
the correctness of the rules of inference that you follow, consciously or
unconsciously, in your reasoning.
After this digression on consistency, we must return to the seman-
tical explanations of the rules of inference. The ones that remain are
the quantifier rules.
∀-formation.
A(x) prop
(∀x)A(x) prop
Explanation. The premise of this rule is a judgement which has
generality in it. If I were to make it explicit, I would have to write it
|x A(x) prop.
So now you have acquired a proof that A(a) is true. By the definitions
of the notions of proof and truth, this proof is knowledge how to verify
the proposition A(a). Thus, putting it into practice, you end up with
a verification of A(a), as required.
∃-formation.
A(x) prop
(∃x)A(x) prop
Explanation. Just as in the formation rule associated with the uni-
versal quantifier, the premise of this rule is really the general judgement
|x A(x) prop,
showing that the proposition A(a) is true. Observe that the knowledge
of the premise is needed in order to guarantee that A(a) is a proposition,
so that it makes sense to talk about a proof that A(a) is true. In
the Kolmogorov interpretation, (∃x)A(x) would be explained as the
problem of finding an expression a, of the same arity as the variable x,
and a method of solving the problem A(a).
∃-introduction.
A(a) true
(∃x)A(x) true
Here, as usual, the premise of the formation rule is still in force, which
is to say that A(x) is assumed to be a proposition for arbitrary x.
Explanation. Assume that you know the premise, that is, assume
that you possess a proof that A(a) is true,
..
.
A(a) true
By the preceding explanation of the meaning of the existential quanti-
fier, the expression a together with this proof make up a verification of
the proposition (∃x)A(x). And, possessing a verification of the propo-
sition (∃x)A(x), you certainly know how to verify it, which is what you
must know in order to have the right to conclude that (∃x)A(x) is true.
Like in my explanations of all the other introduction rules, I have here
taken for granted the principle that, if something has been done, then
it can be done.
∃-elimination.
(A(x) true)
(∃x)A(x) true C true
C true
Here it is presupposed, not only that A(x) is a proposition for arbitrary
x, like in the introduction rule, but also that C is a proposition provided
that the proposition (∃x)A(x) is true.
Explanation. First of all, in order to make it look familiar, I have
written the second premise in Gentzen’s notation
(A(x) true)
C true
rather than in the notation
A(x) true |x C true,
but there is no difference whatever in sense. Thus the second premise
is really a hypothetico-general judgement. Now, assume that you know
the premises. By the definition of the notion of truth, your knowledge of
the first premise is knowledge how to verify the proposition (∃x)A(x).
Put that knowledge of yours into practice. You then end up with
a verification of the proposition (∃x)A(x). By the definition of the
existential quantifier, this verification consists of an expression a of the
same arity as the variable x and a proof that the proposition A(a) is
true, ..
.
A(a) true
Now use your knowledge, or proof, of the second premise. Because of
the meaning of a hypothetico-general judgement, this proof
A(x) true
..
.
C true
is a free variable proof that C is true from the hypothesis that A(x)
is true. Being a free variable proof means that you may substitute
anything you want, in particular, the expression a, for the variable x.
You then get a hypothetical proof
A(a) true
..
.
C true
that C is true from the hypothesis that A(a) is true. Supplementing this
hypothetical proof with the proof that A(a) is true that you obtained
as a result of putting your knowledge of the first premise into practice,
you get a proof ..
.
A(a) true
..
.
C true
that C is true, and this proof is nothing but knowledge how to verify
the proposition C. Thus, putting it into practice, you end up having
verified the proposition C, as required.
The promise of the title of these lectures, On the Meanings of the
Logical Constants and the Justifications of the Logical Laws, has now
been fulfilled. As you have seen, the explanations of the meanings of
the logical constants are precisely the explanations belonging to the
formation rules. And the justifications of the logical laws are the ex-
planations belonging to the introduction and elimination rules, which
are the rules that we normally call rules of inference. For lack of time,
I have only been able to deal with the pure logic in my semantical ex-
planations. To develop some interesting parts of mathematics, you also
need axioms for ordinary inductive definitions, in particular, axioms of
computation and axioms for the natural numbers. And, if you need
predicates defined by transfinite, or generalized, induction, then you
will have to add the appropriate formation, introduction, and elimina-
tion rules for them.
I have already explained how you see the consistency of a formal
system of correct inference rules, that is, the impossibility of construct-
ing a proof ..
.
⊥ true
that falsehood is true which proceeds according to those rules, not by
studying metamathematically the proof figures divested of all sense, as
was Hilbert’s program, but by doing just the opposite: not divesting
them of sense, but endowing them with sense. Similarly, suppose that
you have a proof ..
.
A true
that a proposition A is true which depends, neither on any assumptions,
nor on any free variables. By the definition of truth and the identifica-
tion of proof and knowledge, such a proof is nothing but knowledge how
to verify the proposition A. And, as I remarked earlier in this lecture,
verifying the proposition A by putting that knowledge into practice is
the same as reducing the proof to introductory form and deleting the
last, introductory inference. Moreover, the way of reducing the proof
which corresponds to the semantical explanations, notably of the elim-
ination rules, is precisely the way that I utilized for the first time in
my paper on iterated inductive definitions in the Proceedings of the
Second Scandinavian Logic Symposium, although merely because of
its naturalness, not for any genuine semantical reasons, at that time.
But no longer do we need to prove anything, that is, no longer do we
need to prove metamathematically that the proof figures, divested of
sense, reduce to introductory form. Instead of proving it, we endow
the proof figures with sense, and then we see it! Thus the definition
of convertibility, or computability, and the proof of normalization have
been transposed into genuine semantical explanations which allow you
to see this, just as you can see consistency semantically. And this is
the point that I had intended to reach in these lectures.
Postscript, Feb. 1996
Department of Mathematics
University of Stockholm
Sweden
Notions of computation and monads
Eugenio Moggi∗
Abstract
The λ-calculus is considered an useful mathematical tool in the study of programming
languages, since programs can be identified with λ-terms. However, if one goes further and
uses βη-conversion to prove equivalence of programs, then a gross simplification is introduced
(programs are identified with total functions from values to values), that may jeopardise the
applicability of theoretical results. In this paper we introduce calculi based on a categorical
semantics for computations, that provide a correct basis for proving equivalence of programs,
for a wide range of notions of computation.
Introduction
This paper is about logics for reasoning about programs, in particular for proving equivalence of
programs. Following a consolidated tradition in theoretical computer science we identify programs
with the closed λ-terms, possibly containing extra constants, corresponding to some features of
the programming language under consideration. There are three semantic-based approaches to
proving equivalence of programs:
• The operational approach starts from an operational semantics, e.g. a partial function
mapping every program (i.e. closed term) to its resulting value (if any), which induces a
congruence relation on open terms called operational equivalence (see e.g. [Plo75]). Then
the problem is to prove that two terms are operationally equivalent.
• The denotational approach gives an interpretation of the (programming) language in a
mathematical structure, the intended model. Then the problem is to prove that two terms
denote the same object in the intended model.
• The logical approach gives a class of possible models for the (programming) language.
Then the problem is to prove that two terms denotes the same object in all possible models.
The operational and denotational approaches give only a theory: the operational equivalence ≈
or the set T h of formulas valid in the intended model respectively. On the other hand, the logical
approach gives a consequence relation `, namely Ax ` A iff the formula A is true in all models
of the set of formulas Ax, which can deal with different programming languages (e.g. functional,
imperative, non-deterministic) in a rather uniform way, by simply changing the set of axioms
Ax, and possibly extending the language with new constants. Moreover, the relation ` is often
semidecidable, so it is possible to give a sound and complete formal system for it, while T h and ≈
are semidecidable only in oversimplified cases.
We do not take as a starting point for proving equivalence of programs the theory of βη-
conversion, which identifies the denotation of a program (procedure) of type A → B with a
total function from A to B, since this identification wipes out completely behaviours like non-
termination, non-determinism or side-effects, that can be exhibited by real programs. Instead, we
proceed as follows:
1. We take category theory as a general theory of functions and develop on top a categorical
semantics of computations based on monads.
∗ Research partially supported by EEC Joint Collaboration Contract # ST2J-0374-C(EDB).
1
2. We consider simple formal systems matching the categorical semantics of computation.
3. We extend stepwise categorical semantics and formal system in order to interpret richer
languages, in particular the λ-calculus.
4. We show that w.l.o.g. one may consider only (monads over) toposes, and we exploit this fact
to establish conservative extension results.
The methodology outlined above is inspired by [Sco80]1 , and it is followed in [Ros86, Mog86] to
obtain the λp -calculus. The view that “category theory comes, logically, before the λ-calculus”led
us to consider a categorical semantics of computations first, rather than to modify directly the
rules of βη-conversion to get a correct calculus.
Related work
The operational approach to find correct λ-calculi w.r.t. an operational equivalence, was first
considered in [Plo75] for call-by-value and call-by-name operational equivalence. This approach
was later extended, following a similar methodology, to consider other features of computations like
nondeterminism (see [Sha84]), side-effects and continuations (see [FFKD86, FF89]). The calculi
based only on operational considerations, like the λv -calculus, are sound and complete w.r.t. the
operational semantics, i.e. a program M has a value according to the operational semantics iff it
is provably equivalent to a value (not necessarily the same) in the calculus, but they are too weak
for proving equivalences of programs.
Previous work on axiom systems for proving equivalence of programs with side effects has
shown the importance of the let-constructor (see [Mas88, MT89a, MT89b]). In the framework of
the computational lambda-calculus the importance of let becomes even more apparent.
The denotational approach may suggest important principles, e.g. fix-point induction (see
[Sco69, GMW79]), that can be found only after developing a semantics based on mathematical
structures rather than term models, but it does not give clear criteria to single out the general
principles among the properties satisfied by the model. Moreover, the theory at the heart of De-
notational Semantics, i.e. Domain Theory (see [GS89, Mos89]), has focused on the mathematical
structures for giving semantics to recursive definitions of types and functions (see [SP82]), while
other structures, that might be relevant to a better understanding of programming languages, have
been overlooked. This paper identify one of such structures, i.e. monads, but probably there are
others just waiting to be discovered.
The categorical semantic of computations presented in this paper has been strongly influenced
by the reformulation of Denotational Semantics based on the category of cpos, possibly without
bottom, and partial continuous functions (see [Plo85]) and the work on categories of partial mor-
phisms in [Ros86, Mog86]. Our work generalises the categorical account of partiality to other
notions of computations, indeed partial cartesian closed categories turn out to be a special case of
λc -models (see Definition 3.9).
A type theoretic approach to partial functions and computations is proposed in [CS87, CS88]
by introducing a type-constructor Ā, whose intuitive meaning is the set of computations of type
A. Our categorical semantics is based on a similar idea. Constable and Smith, however, do not
adequately capture the general axioms for computations (as we do), since their notion of model,
based on an untyped partial applicative structure, accounts only for partial computations.
1 “I am trying to find out where λ-calculus should come from, and the fact that the notion of a cartesian closed
category is a late developing one (Eilenberg & Kelly (1966)), is not relevant to the argument: I shall try to explain
in my own words in the next section why we should look to it first”.
2
computations (of type A), and take as denotations of programs (of type A) the elements of T A.
In particular, we identify the type A with the object of values (of type A) and obtain the object
of computations (of type A) by applying an unary type-constructor T to A. We call T a notion
of computation, since it abstracts away from the type of values computations may produce. There
are many choices for T A corresponding to different notions of computations.
Example 1.1 We give few notions of computation in the category of sets.
• partiality T A = A⊥ (i.e. A + {⊥}), where ⊥ is the diverging computation
• nondeterminism T A = Pf in (A)
S
• side-effects T A = (A × S) , where S is a set of states, e.g. a set U L of stores or a set of
input/output sequences U ∗
• exceptions T A = (A + E), where E is the set of exceptions
A
• continuations T A = R(R ) , where R is the set of results
• interactive input T A = (µγ.A + γ U ), where U is the set of characters.
More explicitly T A is the set of U -branching trees with finite branches and A-labelled leaves
• interactive output T A = (µγ.A + (U × γ)).
More explicitly T A is (isomorphic to) U ∗ × A.
Further examples (in a category of cpos) could be given based on the denotational semantics for
various programming languages (see [Sch86, GS89, Mos89]).
Rather than focusing on a specific T , we want to find the general properties common to all notions
of computation, therefore we impose as only requirement that programs should form a category.
The aim of this section is to convince the reader, with a sequence of informal argumentations, that
such a requirement amounts to say that T is part of a Kleisli triple (T, η, ∗ ) and that the category
of programs is the Kleisli category for such a triple.
Definition 1.2 ([Man76]) A Kleisli triple over a category C is a triple (T, η, ∗ ), where T : Obj(C) →
Obj(C), ηA : A → T A for A ∈ Obj(C), f ∗ : T A → T B for f : A → T B and the following equations
hold:
∗
• ηA = idT A
• ηA ; f ∗ = f for f : A → T B
• f ∗ ; g ∗ = (f ; g ∗ )∗ for f : A → T B and g: B → T C.
A Kleisli triple satisfies the mono requirement provided ηA is mono for A ∈ C.
Intuitively ηA is the inclusion of values into computations (in several cases ηA is indeed a mono) and
f ∗ is the extension of a function f from values to computations to a function from computations
to computations, which first evaluates a computation and then applies f to the resulting value. In
summary
ηA
a: A 7−→ [a]: T A
f
a: A 7−→ f (a): T B
f∗
c: T A 7−→ (let x⇐c in f (x)): T B
In order to justify the axioms for a Kleisli triple we have first to introduce a category C T whose
morphisms correspond to programs. We proceed by analogy with the categorical semantics for
terms, where types are interpreted by objects and terms of type B with a parameter (free variable)
of type A are interpreted by morphisms from A to B. Since the denotation of programs of type B
are supposed to be elements of T B, programs of type B with a parameter of type A ought to be
3
interpreted by morphisms with codomain T B, but for their domain there are two alternatives, either
A or T A, depending on whether parameters of type A are identified with values or computations
of type A. We choose the first alternative, because it entails the second. Indeed computations
of type A are the same as values of type T A. So we take CT (A, B) to be C(A, T B). It remains
to define composition and identities in CT (and show that they satisfy the unit and associativity
axioms for categories).
∗
Definition 1.3 Given a Kleisli triple (T, η, ) over C, the Kleisli category CT is defined as
follows:
• the objects of CT are those of C
• the set CT (A, B) of morphisms from A to B in CT is C(A, T B)
• the identity on A in CT is ηA : A → T A
• f ∈ CT (A, B) followed by g ∈ CT (B, C) in CT is f ; g ∗ : A → T C.
It is natural to take ηA as the identity on A in the category CT , since it maps a parameter x to [x],
i.e. to x viewed as a computation. Similarly composition in CT has a simple explanation in terms
of the intuitive meaning of f ∗ , in fact
f g
x: A 7−→ f (x): T B y: B 7−→ g(y): T C
∗
f ;g
x: A 7−→ (let y⇐f (x) in g(y)): T C
i.e. f followed by g in CT with parameter x is the program which first evaluates the program
f (x) and then feed the resulting value as parameter to g. At this point we can give also a simple
justification for the three axioms of Kleisli triples, namely they are equivalent to the unit and
associativity axioms for CT :
∗
• f ; ηB = f for f : A → T B
• ηA ; f ∗ = f for f : A → T B
• (f ; g ∗ ); h∗ = f ; (g; h∗ )∗ for f : A → T B, g: B → T C and h: C → T D.
Example 1.4 We go through the notions of computation given in Example 1.1 and show that they
are indeed part of suitable Kleisli triples.
• partiality T A = A⊥ (= A + {⊥})
ηA is the inclusion of A into A⊥
if f : A → T B, then f ∗ (⊥) = ⊥ and f ∗ (a) = f (a) (when a ∈ A)
• nondeterminism T A = Pf in (A)
ηA is the singleton map a 7→ {a}
if f : A → T B and c ∈ T A, then f ∗ (c) = ∪x∈c f (x)
S
• side-effects T A = (A × S)
ηA is the map a 7→ (λs: S.ha, si)
if f : A → T B and c ∈ T A, then f ∗ (c) = λs: S.(let ha, s0 i = c(s) in f (a)(s0 ))
• exceptions T A = (A + E)
ηA is the injection map a 7→ inl(a)
if f : A → T B, then f ∗ (inr(e)) = e (when e ∈ E) and f ∗ (inl(a)) = f (a) (when a ∈ A)
A
• continuations T A = R(R )
ηA is the map a 7→ (λk: RA .k(a))
if f : A → T B and c ∈ T A, then f ∗ (c) = (λk: RB .c(λa: A.f (a)(k)))
4
• interactive input T A = (µγ.A + γ U )
ηA maps a to the tree consisting only of one leaf labelled with a
if f : A → T B and c ∈ T A, then f ∗ (c) is the tree obtained by replacing leaves of c labelled
by a with the tree f (a)
• interactive output T A = (µγ.A + (U × γ))
ηA is the map a 7→ h, ai
if f : A → T B, then f ∗ (hs, ai) = hs ∗ s0 , bi, where f (a) = hs0 , bi and s ∗ s0 is the concatenation
of s followed by s0 .
Kleisli triples are just an alternative description for monads. Although the formers are easy
to justify from a computational perspective, the latters are more widely used in the literature on
Category Theory and have the advantage of being defined only in terms of funtors and natural
transformations, which make them more suitable for abstract manipulation.
Definition 1.5 ([Mac71]) A monad over a category C is a triple (T, η, µ), where T : C → C is
. .
a functor, η: IdC → T and µ: T 2 → T are natural transformations and the following diagrams
commute:
µT A ηT A T ηA
T 3A > T 2A TA > T 2A < TA
@
@ id
T µA µA @ T A µA
@ idT A
@
∨ ∨ R
@ ∨
T 2A > TA TA
µA
Proposition 1.6 ([Man76]) There is a one-one correspondence between Kleisli triples and mon-
ads.
Proof Given a Kleisli triple (T, η, ∗ ), the corresponding monad is (T, η, µ), where T is the extension
of the function T to an endofunctor by taking T (f ) = (f ; ηB )∗ for f : A → B and µA = id∗T A .
Conversely, given a monad (T, η, µ), the corresponding Kleisli triple is (T, η, ∗ ), where T is the
restriction of the functor T to objects and f ∗ = (T f ); µB for f : A → T B.
Remark 1.7 In general the categorical semantics of partial maps, based on a category C equipped
with a dominion M (see [Ros86]), cannot be reformulated in terms of a Kleisli triple over C
satisfying some additional properties, unless C has lifting, i.e. the inclusion functor from C into the
category of partial maps P(C, M) has a right adjoint ⊥ characterised by the natural isomorphism
·
C(A, B⊥ ) ∼
= P(C, M)(A, B)
5
Remark 2.1 We regard the metalanguage as more fundamental. In fact, its models are more
general, as they don’t have to satisfy the mono requirement, and the interpretation of programs (of
some given programming language) can be defined simply by translation into (a suitable extension
of) the metalanguage. It should be pointed out that the mono requirement cannot be axiomatised
in the metalanguage, as we would need conditional equations [x]T = [y]T → x = y, and that
existence assertions cannot be translated into formulas of the metalanguage, as we would need
◦
existentially quantified formulas (e ↓σ ) ≡ (∃!x: σ.e◦ = [x]T )2 .
In Section 2.3 we will explain once for all the correspondence between theories of a simple
programming language and categories with a monad satisfying the mono requirement. For other
programming languages we will give only their translation in a suitable extension of the metalan-
guage. In this way, issues like call-by-value versus call-by-name affect the translation, but not the
metalanguage.
In Categorical Logic it is common practice to identify a theory T with a category F(T ) with
additional structure such that there is a one-one correspondence between models of T in a category
C with additional structure and structure preserving functors from F(T ) to C (see [KR77]) 3 . This
identification was originally proposed by Lawvere, who also showed that algebraic theories can be
viewed as categories with finite products.
In Section 2.2 we give a class of theories that can be viewed as categories with a monad, so that
any category with a monad is, up to equivalence (of categories with a monad), one of such theories.
Such a reformulation in terms of theories is more suitable for formal manipulation and more
appealing to those unfamiliar with Category Theory. However, there are other advantages in having
an alternative presentation of monads. For instance, natural extensions of the syntax may suggest
extensions of the categorical structure that may not be immediate to motivate and justify otherwise
(we will exploit this in Section 3). In Section 2.3 we take a programming language perspective
and establish a correspondence between theories (with equivalence and existence assertions) for a
simple programming language and categories with a monad satisfying the mono requirement, i.e.
ηA mono for every A.
As starting point we take many sorted monadic equational logic, because it is more primitive
than many sorted equational logic, indeed monadic theories are equivalent to categories without
any additional structure.
A A base type
` A type
` A type
var
x: A ` x: A
x: A ` e1 : A1
f f: A1 → A2
x: A ` f(e1 ): A2
x: A1 ` e1 : A2 x: A1 ` e2 : A2
eq
x: A1 ` e1 =A2 e2
equivalence between the category of theories and translations and the category of small categories with additional
structure and structure preserving functors. In the case of typed λ-calculus, for instance, such an equivalence
between λ-theories and cartesian closed categories requires a modification in the definition of λ-theory, which allows
not only equations between λ-terms but also equations between type expressions.
6
RULE SYNTAX SEMANTICS
A
` A type = [[A]]
var
` A type = c
x: A ` x: A = idc
f: A1 → A2
x: A ` e1 : A1 = g
x: A ` f(e1 ): A2 = g; [[f]]
eq
x: A1 ` e1 : A2 = g1
x: A1 ` e2 : A2 = g2
x: A1 ` e1 =A2 e2 ⇐⇒ g1 = g2
Remark 2.2 Terms of (many sorted) monadic equational logic have exactly one free variable (the
one declared in the context) which occurs exactly once, and equations are between terms with the
same free variable.
An interpretation [[ ]] of the language in a category C is parametric in an interpretation of the
symbols in the signature and is defined by induction on the derivation of well-formedness for
(types,) terms and equations (see Table 1) according to the following general pattern:
• the interpretation [[A]] of a base type A is an object of C
• the interpretation [[f]] of an unary function f: A1 → A2 is a morphism from [[A1 ]] to [[A2 ]] in
C; similarly for the interpretation of a term x: A1 ` e: A2
• the interpretation of an assertion x: A ` φ (in this case just an equation) is either true or
false.
Remark 2.3 The interpretation of equations is standard. However, if one want to consider more
complex assertions, e.g. formulas of first order logic, then they should be interpreted by subobjects;
in particular equality = : A should be interpreted by the diagonal ∆[[A]] .
The formal consequence relation on the set of equations is generated by the inference rules for
equivalences ((refl), (simm) and (trans)), congruence and substitutivity (see Table 2). This formal
consequence relation is sound and complete w.r.t. interpretation of the language in categories, i.e.
an equation is formally derivable from a set of equational axioms if and only if all the interpretations
satisfying the axioms satisfy the equation. Soundness follows from the admissibility of the inference
rules in any interpretation, while completeness follows from the fact that any theory T (i.e. a set
of equations closed w.r.t. the inference rules) is the set of equations satisfied by the canonical
interpretation in the category F(T ), i.e. T viewed as a category.
Definition 2.4 Given a monadic equational theory T , the category F(T ) is defined as follows:
• objects are (base) types A,
• morphisms from A1 to A2 are equivalence classes [x: A1 ` e: A2 ]T of terms w.r.t. the equiv-
alence relation induced by the theory T , i.e.
7
x: A ` e: A1
refl
x: A ` e =A1 e
x: A ` e1 =A1 e2
symm
x: A ` e2 =A1 e1
x: A ` e1 =A1 e2 x: A ` e2 =A1 e3
trans
x: A ` e2 =A1 e3
x: A ` e1 =A1 e2
congr f: A1 → A2
x: A ` f(e1 ) =A2 f(e2 )
x: A ` e: A1 x: A1 ` φ
subst
x: A ` [e/x]φ
8
RULE SYNTAX SEMANTICS
A
`ml A type = [[A]]
T
`ml τ type = c
`ml T τ type = Tc
var
`ml τ type = c
x: τ `ml x: τ = idc
f: τ1 → τ2
x: τ `ml e1 : τ1 = g
x: τ `ml f(e1 ): τ2 = g; [[f]]
[ ]T
x: τ `ml e: τ 0 = g
x: τ `ml [e]T : T τ 0 = g; η[[τ 0]]
let
x: τ `ml e1 : T τ1 = g1
x1 : τ1 `ml e2 : T τ2 = g2
x: τ `ml (letT x1 ⇐e1 in e2 ): T τ2 = g1 ; g2∗
eq
x: τ1 `ml e1 : τ2 = g1
x: τ1 `ml e2 : τ2 = g2
x: τ1 `ml e1 =τ2 e2 ⇐⇒ g 1 = g2
x: τ `ml e1 =τ1 e2
[ ].ξ
x: τ `ml [e1 ]T =T τ1 [e2 ]T
9
Proof We have to show that the three axioms for Kleisli triples are valid. The validity of each
axiom amounts to the derivability of an equation. For instance, ητ∗ = idT τ is valid provided
x0 : T τ `ml (letT x⇐x0 in [x]T ) =T τ x0 is derivable, indeed it follows from (T.η). The reader can
check that the equations corresponding to the axioms ητ ; f ∗ = f and f ∗ ; g ∗ = (f ; g ∗ )∗ follow from
(T.β) and (ass) respectively.
Remark 2.6 The let-constructor play a fundamental role: operationally it corresponds to sequen-
tial evaluation of programs and categorically it corresponds to composition in the Kleisli category
CT (while substitution corresponds to composition in C). In the λv -calculus (let x⇐e in e0 ) is treated
as syntactic sugar for (λx.e0 )e. We think that this is not the right way to proceed, because it ex-
plains the let-constructor (i.e. sequential evaluation of programs) in terms of constructors available
only in functional languages. On the other hand, (let x⇐e in e0 ) cannot be treated as syntactic
sugar for [e/x]e0 (involving only the more primitive substitution) without collapsing computations
to values.
The existence predicate e ↓ is inspired by the logic of partial terms/elements (see [Fou77, Sco79,
Mog88]); however, there are important differences, e.g.
x: τ `pl p(e) ↓τ2
strict p: τ1 * τ2
x: τ `pl e ↓τ1
is admissible for partial computations, but not in general. For certain notions of computation there
may be other predicates on computations worth considering, or the existence predicate itself may
have a more specialised meaning, for instance:
10
RULE SYNTAX SEMANTICS
A
`pl A type = [[A]]
T
`pl τ type = c
`pl T τ type = Tc
var
`pl τ type = c
x: τ `pl x: τ = ηc
p: τ1 * τ2
x: τ `pl e1 : τ1 = g
∗
x: τ `pl p(e1 ): τ2 = g; [[p]]
[]
x: τ `pl e: τ 0 = g
x: τ `pl [e]: T τ 0 = g; ηT [[τ 0 ]]
µ
x: τ `pl e: T τ 0 = g
x: τ `pl µ(e): τ 0 = g; µ[[τ 0]]
let
x: τ `pl e1 : τ1 = g1
x1 : τ1 `pl e2 : τ2 = g2
x: τ `pl (let x1 ⇐e1 in e2 ): τ2 = g 1 ; g2 ∗
eq
x: τ1 `pl e1 : τ2 = g1
x: τ1 `pl e2 : τ2 = g2
x: τ1 `pl e1 ≡τ2 e2 ⇐⇒ g1 = g2
ex
x: τ1 `pl e: τ2 = g
x: τ1 `pl e ↓τ2 ⇐⇒ ∃!h: [[τ1 ]] → [[τ2 ]] s.t. g = h; η[[τ2 ]]
11
x: τ `pl e: τ1
refl
x: τ `pl e ≡τ1 e
x: τ `pl e1 ≡τ1 e2
symm
x: τ `pl e2 ≡τ1 e1
x: τ `pl e1 ≡τ1 e2 x: τ `pl e2 ≡τ1 e3
trans
x: τ `pl e2 ≡τ1 e3
x: τ `pl e1 ≡τ1 e2
congr p: τ1 * τ2
x: τ `pl p(e1 ) ≡τ2 p(e2 )
`pl τ type
E.x
x: τ `pl x ↓τ
x: τ `pl e1 ≡τ1 e2 x: τ `pl e1 ↓τ1
E.congr
x: τ `pl e2 ↓τ1
x: τ `pl e ↓τ1 x: τ1 `pl φ
subst
x: τ `pl [e/x]φ
Programs can be translated into terms of the metalanguage via a translation ◦ s.t. for every well-
formed program x: τ1 `pl e: τ2 the term x: τ1 `ml e◦ : T τ2 is well-formed and [[x: τ1 `pl e: τ2 ]] =
[[x: τ1 `ml e◦ : T τ2 ]] (the proof of these properties is left to the reader).
Definition 2.7 Given a signature Σ for the programming language, let Σ◦ be the signature for the
metalanguage with the same base types and a function p: τ1 → T τ2 for each command p: τ1 * τ2
in Σ. The translation ◦ from programs over Σ to terms over Σ◦ is defined by induction on raw
programs:
∆
• x◦ ≡ [x]T
∆
• (let x1 ⇐e1 in e2 )◦ ≡ (letT x1 ⇐e1 ◦ in e2 ◦ )
◦ ∆
• p(e1 ) ≡ (letT x⇐e1 ◦ in p(x))
∆
• [e]◦ ≡ [e◦ ]T
◦ ∆
• µ(e) ≡ (letT x⇐e◦ in x)
The inference rules for deriving equivalence and existence assertions of the simple programming
language can be partitioned as follows:
• general rules (see Table 6) for terms denoting computations, but with variables ranging over
values; these rules replace those of Table 2 for many sorted monadic equational logic
• rules capturing the properties of type- and term-constructors (see Table 7) after interpretation
of the programming language; these rules replace the additional rules for the metalanguage
given in Table 4.
12
x: τ `pl e1 ≡τ1 e2
[ ].ξ
x: τ `pl [e1 ] ≡T τ1 [e2 ]
x: τ `pl e1 : τ1
E.[ ]
x: τ `pl [e1 ] ↓T τ1
x: τ `pl e1 ≡T τ1 e2
µ.ξ
x: τ `pl µ(e1 ) ≡τ1 µ(e2 )
x: τ `pl e1 : τ1
µ.β
x: τ ` µ([e1 ]) ≡τ1 e1
x: τ `pl e1 ↓T τ1
µ.η
x: τ ` [µ(e1 )] ≡T τ1 e1
x: τ `pl e1 ≡τ1 e2 x0 : τ1 `pl e01 ≡τ2 e02
let.ξ
x: τ `pl (let x ⇐e1 in e1 ) ≡τ2 (let x0 ⇐e2 in e02 )
0 0
x: τ `pl e1 : τ1
unit
x: τ `pl (let x1 ⇐e1 in x1 ) ≡τ1 e1
x: τ `pl e1 : τ1 x1 : τ1 `pl e2 : τ2 x2 : τ2 `pl e3 : τ3
ass
x: τ `pl (let x2 ⇐(let x1 ⇐e1 in e2 ) in e3 ) ≡τ3 (let x1 ⇐e1 in (let x2 ⇐e2 in e3 ))
x: τ `pl e1 ↓τ1 x1 : τ1 `pl e2 : τ2
let.β
x: τ `pl (let x1 ⇐e1 in e2 ) ≡τ2 [e1 /x1 ]e2
x: τ `pl e1 : τ1
let.p p: τ1 * τ2
x: τ `pl p(e1 ) ≡τ1 (let x1 ⇐e1 in p(x1 ))
13
Soundness and completeness of the formal consequence relation w.r.t. interpretation of the
simple programming language in categories with a monad satisfying the mono requirement is
established in the usual way (see Section 2.1). The only step which differs is how to view a theory
T of the simple programming language (i.e. a set of equivalence and existence assertions closed
w.r.t. the inference rules) as a category F(T ) with the required structure.
Definition 2.8 Given a theory T of the simple programming language, the category F(T ) is de-
fined as follows:
• objects are types τ ,
• morphisms from τ1 to τ2 are equivalence classes [x: τ1 `pl e: τ2 ]T of existing programs x: τ1 `pl
e ↓τ2 ∈ T w.r.t. the equivalence relation induced by the theory T , i.e.
Proof We have to show that the three axioms for Kleisli triples are valid. The validity of each axiom
amounts to the derivability of an existence and equivalence assertion. For instance, η τ∗ = idT τ is
valid provided x0 : T τ `pl x0 ↓T τ and x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ x0 are derivable. The
existence assertion follows immediately from (E.x), while the equivalence is derived as follows:
• x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ [(let x⇐µ(x0 ) in x)]
by (µ.β), (refl) and (let.ξ)
• x0 : T τ `pl [(let x⇐µ(x0 ) in x)] ≡T τ [µ(x0 )] by (unit) and (let.ξ)
• x0 : T τ `pl [µ(x0 )] ≡T τ x0 by (E.x) and (µ.η)
• x0 : T τ `pl [(let x⇐µ(x0 ) in µ([x]))] ≡T τ x0 by (trans).
We leave to the reader the derivation of the existence and equivalence assertions corresponding
to the other axioms for Kleisli triples, and prove instead the mono requirement i.e. that f 1 ; ητ =
f2 ; ητ implies f1 = f2 . Let fi be [x: τ 0 `pl ei : τ ]T , we have to derive x: τ 0 `pl e1 ≡τ e2 from
x: τ 0 `pl [e1 ] ≡T τ [e2 ] (and x: τ 0 `pl ei ↓τ ) :
• x: τ 0 `pl µ([e1 ]) ≡τ µ([e2 ]) by the first assumption and (µ.ξ)
• x: τ 0 `pl µ([ei ]) ≡τ ei by (µ.β)
• x: τ 0 `pl e1 ≡τ e2 by (trans).
14
Remark 2.10 One can show that the canonical interpretation of a program x: τ1 `pl e: τ2 in the
category F(T ) is the morphism [x: τ1 `pl [e]: T τ2 ]T . This interpretation establishes a one-one
correspondence between morphisms from τ1 to T τ2 in the category F(T ), i.e. morphisms from τ1
to τ2 in the Kleisli category, and equivalence classes of programs x: τ1 `pl e: τ2 (not necessarely
existing). The inverse correspondence maps a morphism [x: τ1 `pl e0 : T τ2 ]T to the equivalence class
of x: τ1 `pl µ(e0 ): τ2 . Indeed, x: τ1 `pl e ≡τ2 µ([e]) and x: τ1 `pl e0 ≡τ2 [µ(e0 )] are derivable provided
x: τ1 `pl e0 ↓T τ2 .
Remark 3.1 To understand why a category with finite products and a monad is not enough to
interpret the metalanguage (and where the natural transformation t is needed), one has to look at
the interpretation of a let-expression
Γ `ml e1 : T τ1 Γ, x: τ1 `ml e2 : T τ2
let
Γ `ml (letT x⇐e1 in e2 ): T τ2
4 If the metalanguage does not have finite products, we conjecture that its theories would no longer correspond to
categories with finite products and a strong monad (even by taking as objects contexts and/or the Karoubi envelope,
used in [Sco80] to associate a cartesian closed category to an untyped λ-theory), but instead to multicategories with
a Kleisli triple. We felt the greater generality (of not having products in the metalanguage) was not worth the
mathematical complications.
15
Definition 3.2 A strong monad over a category C with (explicitly given) finite products is a
monad (T, η, µ) together with a natural transformation tA,B from A × T B to T (A × B) s.t.
TA
∧ @
I
@
rT A @
T rA @
@
t1,A @
1 × TA > T (1 × A)
tA×B,C
(A × B) × T C > T ((A × B) × C)
@
@
αA,B,T C @
T αA,B,C @
@
∨ idA × tB,C tA,B×CR
@
A × (B × T C) > A × T (B × C) > T (A × (B × C))
A×B
@
@
idA × ηB @
ηA×B @
@
∨ tA,B R@
A × TB > T (A × B)
∧ @
I
@
idA × µB @
µA×B @
@
tA,T B T tA,B @
2
A×T B > T (A × T B) > T 2 (A × B)
where r and α are the natural isomorphisms
rA : (1 × A) → A , αA,B,C : (A × B) × C → A × (B × C)
Remark 3.3 The diagrams above are taken from [Koc72], where a characterisation of strong mon-
ads is given in terms of C-enriched categories (see [Kel82]). Kock fixes a commutative monoidal
closed category C (in particular a cartesian closed category), and in this setup he establishes a
one-one correspondence between strengths stA,B : B A → (T B)T A and tensorial strengths tA,B : A ⊗
T B → T (A ⊗ B) for a endofunctor T over C (see Theorem 1.3 in [Koc72]). Intuitively a strength
stA,B internalises the action of T on morphisms from A to B, and more precisely it makes (T, st)
a C-enriched endofunctor on C enriched over itself (i.e. the hom-object C(A, B) is B A ). In this
setting the diagrams of Definition 3.2 have the following meaning:
• the first two diagrams are (1.7) and (1.8) in [Koc72], saying that t is a tensorial strength of
T . So T can be made into a C-enriched endofunctor.
. .
• the last two diagrams say that η: IdC → T and µ: T 2 → T are C-enriched natural transfor-
mations, where IdC , T and T 2 are enriched in the obvious way (see Remark 1.4 in [Koc72]).
There is another purely categorical characterisation of strong monads, suggested to us by G.
Plotkin, in terms of C-indexed categories (see [JP78]). Both characterisations are instances of a
general methodological principle for studying programming languages (or logics) categorically (see
[Mog89b]):
when studying a complex language the 2-category Cat of small categories, functors and
natural transformations may not be adequate; however, one may replace Cat with a
different 2-category, whose objects captures better some fundamental structure of the
language, while less fundamental structure can be modelled by 2-categorical concepts.
16
Monads are a 2-categorical concept, so we expect notions of computations for a complex language
to be modelled by monads in a suitable 2-category.
The first characterisation takes a commutative monoidal closed structure on C (used in [Laf88,
See87] to model a fragment of linear logic), so that C can be enriched over itself. Then a strong
monad over a cartesian closed category C is just a monad over C in the 2-category of C-enriched
categories.
The second characterisation takes a class D of display maps over C (used in [HP87] to model
dependent types), and defines a C-indexed category C/D . Then a strong monad over a category
C with finite products amounts to a monad over C/D in the 2-category of C-indexed categories,
where D is the class of first projections (corresponding to constant type dependency).
In general the natural transformation t has to be given explicitly as part of the additional
structure. However, t is uniquely determined (but it may not exists) by T and the cartesian
structure on C, when C has enough points.
Proposition 3.4 (Uniqueness) If (T, η, µ) is a monad over a category C with finite products and
enough points (i.e. ∀h: 1 → A.h; f = h; g implies f = g for any f, g: A → B), then (T, η, µ, t) is
a strong monad over C if and only if tA,B is the unique family of morphisms s.t. for all points
a: 1 → A and b: 1 → T B
ha, bi; tA,B = b; T (h!B ; a, idB i)
where !B : B → 1 is the unique morphism from B to the terminal object.
Proof Note that there is at most one tA,B s.t. ha, bi; tA,B = b; T (h!B ; a, idB i) for all points a: 1 → A
and b: 1 → T B, because C has enough points.
First we show that if (T, η, µ, t) is a strong monad, then tA,B satisfies the equation above. By
naturality of t and by the first diagram in Definition 3.2 the following diagram commutes
ha, bi tA,B
1 > A × TB > T (A × B)
@ ∧ ∧
@ hid , bi
@ 1
a × idT B T (a × idB )
@
@
R
@ t1,B
1 × TB > T (1 × B)
@
@ r
@ TB T rB
@
@
R
@ ∨
TB
Since rB is an isomorphism (with inverse h!B , idB i), then the two composite morphisms ha, bi; tA,B
−1
and hid1 , bi; rT B ; T (rB ); T (a×idB ) from 1 to T (A×B) must coincide. But the second composition
can be rewritten as b; T (h!B ; a, idB i).
Second we have to show that if t is the unique family of morphisms satisfying the equation
above, then (T, η, µ, t) is a strong monad. This amount to prove that t is a natural transformation
and that the three diagrams in Definition 3.2 commute. The proof is a tedious diagram chasing,
which relies on C having enough points. For instance, to prove that t1,A ; T rA = rT A it is enough
to show that hid1 , ai; t1,A ; T rA = hid1 , ai; rT A for all points a: 1 → A.
Example 3.5 We go through the monads given in Example 1.4 and show that they have a tensorial
strength.
• partiality T A = A⊥ (= A + {⊥})
tA,B (a, ⊥) = ⊥ and tA,B (a, b) = ha, bi (when b ∈ B)
• nondeterminism T A = Pf in (A)
tA,B (a, c) = {ha, bi|b ∈ c}
17
S
• side-effects T A = (A × S)
tA,B (a, c) = (λs: S.(let hb, s0 i = c(s) in hha, bi, s0 i))
• exceptions T A = (A + E)
tA,B (a, inr(e)) = inr(e) (when e ∈ E) and
tA,B (a, inl(b)) = inl(ha, bi) (when b ∈ B)
A
• continuations T A = R(R )
tA,B (a, c) = (λk: RA×B .c(λb: B.k(ha, bi)))
• interactive input T A = (µγ.A + γ U )
tA,B (a, c) is the tree obtained by replacing leaves of c labelled by b with the leaf labelled by
ha, bi
• interactive output T A = (µγ.A + (U × γ))
tA,B (a, hs, bi) = hs, ha, bii.
Remark 3.6 The tensorial strength t induces a natural transformation ψA,B from T A × T B to
T (A × B), namely
ψA,B = cT A,T B ; tT B,A ; (cT B,A ; tA,B )∗
where c is the natural isomorphism cA,B : A × B → B × A.
The morphism ψA,B has the correct domain and codomain to interpret the pairing of a com-
putation of type A with one of type B, obtained by first evaluating the first argument and then
the second, namely
ψA,B
c1 : T A, c2 : T B 7−→ (let x⇐c1 in (let y⇐c2 in [hx, yi])): T (A × B)
There is also a dual notion of pairing, ψ̃A,B = cT A,T B ; ψB,A ; T cB,A (see [Koc72]), which amounts
to first evaluating the second argument and then the first.
18
RULE SYNTAX SEMANTICS
A
`ml A type = [[A]]
T
`ml τ type = c
`ml T τ type = Tc
1
`ml 1 type = 1
×
`ml τ1 type = c1
`ml τ2 type = c2
`ml τ1 × τ2 type = c 1 × c2
∅
`ml τi type (1 ≤ i ≤ n) = ci
x1 : τ 1 , . . . , x n : τ n ` = c1 × . . . × c n
Proposition 3.8 Every theory T of the metalanguage, viewed as a category F(T ), is equipped
with finite products and a strong monad whose tensorial strength is
tτ1 ,τ2 = [x: τ1 × T τ2 `ml (letT x2 ⇐π2 x in [hπ1 x, x2 i]T ): T (τ1 × τ2 )]T
Once we have a metalanguage for algebraic terms it is straightforward to add data-types charac-
terised by universal properties and extend the categorical semantics accordingly 5. For instance, if
we want to have function spaces, then we simply require the category C (where the metalanguage
is interpreted) to have exponentials B A and add the inference rules for the simply typed λ-calculus
(see Table 11) to those for the metalanguage. From a programming language perspective the situ-
ation is more delicate. For instance, the semantics of functional types should reflect the choice of
calling mechanism 6 :
• in call-by-value a procedure of type A → B expects a value of type A and computes a result
A
of type B, so the interpretation of A → B is (T B) ;
• in call-by-name a procedure of type A → B expects a computation of type A, which is
evaluated only when needed, and computes a result of type B, so the interpretation of
A → B is (T B)T A .
In both cases the only exponentials needed to interpret the functional types of a programming
A
language are of the form (T B) . By analogy with partial cartesian closed categories (pccc), where
only p-exponentials are required to exists (see [Mog86, Ros86]), we adopt the following definition
of λc -model:
5 The next difficult step in extending the metalanguage is the combination of dependent types and computations,
19
RULE SYNTAX SEMANTICS
vari
`ml τi type (1 ≤ i ≤ n) = ci
x1 : τ 1 , . . . , x n : τ n ` x i : τ i = πic1 ,...,cn
∗
Γ ` ∗: 1 = ![[Γ]]
hi
Γ ` e 1 : τ1 = g1
Γ ` e 2 : τ2 = g2
Γ ` he1 , e2 i: τ1 × τ2 = hg1 , g2 i
πi
Γ ` e: τ1 × τ2 = g
[[τ ]],[[τ ]]
Γ ` πi (e): τ1 = g; πi 1 2
f: τ1 → τ2
Γ `ml e1 : τ1 = g
Γ `ml f(e1 ): τ2 = g; [[f]]
[ ]T
Γ `ml e: τ = g
Γ `ml [e]T : T τ = g; η[[τ ]]
let
Γ `ml e1 : T τ1 = g1
Γ, x: τ1 `ml e2 : T τ2 = g2
Γ `ml (letT x⇐e1 in e2 ): T τ2 = hid[[Γ]] , g1 i; t[[Γ]],[[τ1]] ; g2∗
eq
Γ `ml e1 : τ = g1
Γ `ml e2 : τ = g2
Γ `ml e1 =τ e2 ⇐⇒ g1 = g2
20
Γ ` e: τ
refl
Γ ` e =τ e
Γ ` e 1 =τ e2
symm
Γ ` e 2 =τ e1
Γ ` e 1 =τ e2 Γ ` e 2 =τ e3
trans
Γ ` e 2 =τ e3
Γ ` e 1 = τ1 e 2
congr f: τ1 → τ2
Γ ` f(e1 ) =τ2 f(e2 )
Γ ` e: τ Γ, x: τ ` φ
subst
Γ ` [e/x]φ
Inference Rules of Many Sorted Equational Logic
1.η Γ ` ∗ =1 x
Γ ` e1 =τ1 e01 Γ ` e2 =τ2 e02
hi.ξ
Γ ` he1 , e2 i =τ1 ×τ2 he01 , e02 i
Γ ` e 1 : τ1 Γ ` e 2 : τ2
×.β
Γ ` πi (he1 , e2 i) =τi ei
Γ ` e: τ1 × τ2
×.η
Γ ` hπ1 (e), π2 (e)i =τ1 ×τ2 e
rules for product types
Γ `ml e1 =τ e2
[ ].ξ
Γ `ml [e1 ]T =T τ [e2 ]T
Γ `ml e1 =T τ1 e2 Γ, x: τ1 `ml e01 =T τ2 e02
let.ξ
Γ `ml (letT x⇐e1 in e01 ) =T τ2 (letT x⇐e2 in e02 )
Γ `ml e1 : T τ1 Γ, x1 : τ1 `ml e2 : T τ2 Γ, x2 : τ2 `ml e3 : T τ3
ass
Γ `ml (letT x2 ⇐(letT x1 ⇐e1 in e2 ) in e3 ) =T τ3 (letT x1 ⇐e1 in (letT x2 ⇐e2 in e3 ))
Γ `ml e1 : τ1 Γ, x1 : τ1 `ml e2 : T τ2
T.β
Γ `ml (letT x1 ⇐[e1 ]T in e2 ) =T τ2 [e1 /x1 ]e2
Γ `ml e1 : T τ1
T.η
Γ `ml (letT x1 ⇐e1 in [x1 ]T ) =T τ1 e1
Γ, x: τ1 ` e1 =τ2 e2
λ.ξ
Γ ` (λx: τ1 .e1 ) =τ1 →τ2 (λx: τ1 .e2 )
Γ ` e 1 : τ1 Γ, x: τ1 ` e2 : τ2
→ .β
Γ ` (λx: τ1 .e2 )e1 =τ2 [e1 /x]e2
Γ ` e: τ1 → τ2
→ .η x 6∈ DV(Γ)
Γ ` (λx: τ1 .ex) =τ1 →τ2 e
21
Definition 3.9 A λc -model is a category C with finite products, a strong monad (T, η, µ, t) satis-
A
fying the mono requirement (i.e. ηA mono for every A ∈ C) and T -exponential (T B) for every
A, B ∈ C.
Remark 3.10 The definition of λc -model generalises that of pccc, in the sense that every pccc
can be viewed as a λc -model. By analogy with p-exponentials, a T -exponential can be defined by
giving an isomorphism CT (C × A, B) ∼
A
= C(C, (T B) ) natural in C ∈ C. We refer to [Mog89c] for
the interpretation of a call-by-value programming language in a λc -model and the corresponding
formal system, the λc -calculus.
Definition 4.2 We say that a formal system (L2 , `2 ), where `2 ⊆ P(L2 ) × L2 is a formal conse-
quence relation8 over L2 , is a conservative extension of (L1 , `1 ) provided L1 ⊆ L2 and `1 is
the restriction of `2 to P(L1 ) × L1 .
Theorem 4.3 HMLT is a conservative extension of MLT and λMLT . In particular λMLT is a
conservative extension of MLT .
7 Lambek and Scott do not require closure under function spaces and subsets {x ∈ A|φ(x)}.
8 For
instance, in the case of MLT the elements of L are well-formed equality judgements Γ `ml e1 =τ e2 and
P ` C iff there exists a derivation of C, where all assumptions are in P .
22
Proof The first result follows from Theorem 4.9, which implies that for every model C of ML T
the Yoneda embedding maps the interpretation of an MLT -term in C to its interpretation in C, ˆ
and the faithfulness of the Yoneda embedding, which implies that two MLT -terms have the same
ˆ The second result follows, because
interpretation in C iff they have the same interpretation in C.
the Yoneda embedding preserves function spaces. The third conservative extension result follows
immediately from the first two.
The above result means that we can think of computations naively in terms of sets and func-
tions, provided we treat them intuitionistically, and can use the full apparatus of higher-order
(intuitionistic) logic instead of the less expressive many sorted equational logic.
Before giving a conservative extension result for the programming language, we have to express
the mono requirement, equivalence and existence in HMLT . The idea is to extend the translation
from PL-terms to MLT -terms given in Definition 2.7 and exploit the increased expressiveness of
HMLT over MLT to axiomatise the mono requirement and translate existence and equivalence
assertions (see Remark 2.1):
• the mono requirement for τ , i.e. ητ is mono, is axiomatised by
• the equalising requirement for τ , i.e. ητ is the equaliser of T (ητ ) and ηT τ , is axiomatised
by (mono.τ ) and the axiom
◦
• the translation is extended to assertions and functional types as follows:
◦ ∆
– (e1 ≡τ e2 ) ≡ e1 ◦ =T τ e2 ◦
◦ ∆
– (e1 ↓τ ) ≡ (∃!x: τ.e1 ◦ =T τ [x]T )
∆
– (τ1 * τ2 )◦ ≡ τ1 ◦ → T τ2 ◦
Theorem 4.4 HMLT +{(mono.τ )| τ type of PL} (i.e. τ is built using only base types, 1, T A, and
A×B) is a conservative extension of PL (after translation). Similarly, HML T +{(mono.τ )| τ type of λc PL}
(i.e. τ is built using only base types, 1, T A, A × B and A → B) is a conservative extension of
λc PL (after translation).
Proof The proof proceeds as in the previous theorem. The only additional step is to show that for
ˆ under the assumption that C satisfies
every type τ of PL (or λc PL) the axiom (mono.τ ) holds in C,
the mono requirement. Let c be the interpretation of τ in C (therefore Yc is the interpretation of
ˆ then the axiom (mono.τ ) holds in Cˆ provided η̂Yc is a mono. ηc is mono (by the mono
τ in C),
requirement), so η̂Yc = Y(ηc ) is mono (as Y preserves monos).
In the theorem above only types from the programming language have to satisfy the mono require-
ment. Indeed, HMLT + {(mono.τ )| τ type of HMLT } is not a conservative extension of PL (or
λc PL).
Lemma 4.5 If (T, η, µ) is a monad over a topos C satisfying the mono requirement, then it satisfies
also the equalising requirement.
In other words, for any type τ the axiom (eqls.τ ) is derivable in HMLT from the set of axioms
{(mono.τ )| τ type of HMLT }. In general, when C is not a topos, the mono requirement does not
entail the equalising requirement; one can easily define strong monads (over an Heyting algebra)
that satisfy the mono but not the equalising requirement (just take T (A) = A ∨ B, for some
element B 6= ⊥ of the Heyting algebra). In terms of formal consequence relation this means
23
that in HMLT + mono requirement the existence assertion Γ `pl e ↓τ is derivable from Γ `pl
[e] ≡T τ (let x⇐e in [x]), while such derivation is not possible in λc PL. We do not know whether
HMLT + equalising requirement is a conservative extension of PL + equalising requirement, or
whether λc PL is a conservative extension of PL.
A language which combines computations and higher order logic, like HML T , seems to be the
ideal framework for program logics that go beyond proving equivalence of programs, like Hoare’s
logic for partial correctness of imperative languages. In HML T (as well as MLT and PL) one can
describe a programming language by introducing additional constant and axioms. In λML T or
λc PL such constants correspond to program-constructors, for instance:
• lookup: L → T U , which given a location l ∈ L produces the value of such location in the
current store, and update: L × U → T 1, which changes the current store by assigning to l ∈ L
the value u ∈ U ;
• if : Bool × T A × T A → T A and while: T (Bool) × T 1 → T 1;
• new: 1 → T L, which returns a newly created location;
• read: 1 → T U , which computes a value by reading it from the input, and write: U → T 1,
which writes a value u ∈ U on the output.
In HMLT one can describe also a program logic, by adding constants p: T A → Ω corresponding to
properties of computations.
Example 4.6 Let T be the monad for non-deterministic computations (see Example 1.4), then we
can define a predicate may: A × T A → Ω such that may(a, c) is true iff the value a is a possible
outcome of the computation c (i.e. a ∈ c). However, there is a more uniform way of defining the
may predicate for any type. Let 3: T Ω → Ω be the predicate such that 3(X) = > iff > ∈ X,
where Ω is the set {⊥, >} (note that 3( ) = may(>, )). Then, may(a, c) can be defined as
3(letT x⇐c in [a =τ x]T ).
The previous example suggests that predicates defined uniformly on computations of any type
can be better described in terms of modal operators γ: T Ω → Ω, relating a computation of truth
values to a truth value. This possibility has not been investigated in depth, so we will give only a
tentative definition.
Definition 4.7 If (T, η, µ) is a monad over a topos C, then a T -modal operator is a T -algebra
γ: T Ω → Ω, i.e.
µΩ ηΩ
T 2Ω > TΩ < Ω
Tγ γ
idT Ω
∨ ∨
TΩ >Ω
γ
where Ω is the subobject classifier in C.
The commutativity of the two diagrams above can be expressed in the metalanguage:
• x: Ω ` γ([x]T ) ←→ x
• c: T 2 Ω ` γ(let x⇐c in x) ←→ γ(let x⇐c in [γ(x)]T )
We consider some examples and non-examples of modal operators.
Example 4.8 For the monad T of non-deterministic computations (see Example 1.4) there are
only two modal operators 2 and 3:
• 2(X) = ⊥ iff ⊥ ∈ X;
24
• 3(X) = > iff > ∈ X.
Given a nondeterministic computation e of type τ and a predicate A(x) over τ , i.e. a term of type
Ω, then 2(letT x⇐e in [A(x)]T ) is true iff all possible results of e satisfy A(x).
For the monad T of computations with side-effects (see Example 1.4) there is an operator
S
2: (Ω × S) → Ω that can be used to express Hoare’s triples:
• 2f = > iff for all s ∈ S there exists s0 ∈ S s.t. f s = h>, s0 i
this operator does not satisfy the second equivalence, as only one direction is valid, namely
c: T 2 Ω ` γ(let x⇐c in [γ(x)]T ) → γ(let x⇐c in x)
Let P : U → Ω and Q: U × U → Ω be predicates over storable values, e ∈ T 1 a computation of type
1 and x, y ∈ L locations. The intended meaning of the triple {P (x)}e{Q(x, y)} is “if in the initial
state the content u of x satisfies P (u), then in the final state (i.e. after executing e) the content
v of y satisfies Q(u, v)”. This intended meaning can be expressed formally in terms of the modal
operator 2 and the program-constructors lookup and update as follows:
Y Y
∨ ∨
Cˆ > Cˆ
T̂
and for all a ∈ C the following equations hold
Moreover, for every strong monad (T, η, µ, t) over C, there exists a natural transformation t̂ such
that (T̂ , η̂, µ̂, t̂) is a strong monad over Cˆ and for all a, b ∈ C the following equation holds
t̂Ya,Yb = Y(ta,b )
where we have implicitly assume that the Yoneda embedding preserves finite products on the nose,
i.e. the following diagrams commute
1 ×
1 >C< C×C
@
@
@ Y Y×Y
1 @
@ ∨ ∨
R
@
Cˆ < C × Cˆ
ˆ
×
9 This is a simplifying assumption. For our purposes it would be enough to have a natural isomorphism σ: T ; Y → .
Y; T̂ , but then the remaining equations have to be patched. For instance, the equation relating η and η̂ would become
η̂Ya = Y(ηa ); σa .
25
and for all a, b ∈ C. the following equations hold
!Ya = Y(!a ) , πiYa,Yb = Y(πia,b )
Definition 4.10 ([Mac71]) Let T : C → D be a functor between two small categories and A a
cocomplete category. Then, the left Kan extension LA C
T:A → A
D
is the left adjoint of AT and
can be defined as follows:
A
LAT (F )(d) = ColimT ↓d (π; F )
where F : C → A , d ∈ D, T ↓ d is the comma category whose objects are pairs hc ∈ C, f : T c → di,
π: T ↓ d → C is the projection functor (mapping a pair hc, f i to c) and Colim A I
I : A → A (with I
small category) is a functor mapping an I-diagram in A to its colimit.
The following proposition is a 2-categorical reformulation of Theorem 1.3.10 of [MR77]. For the
sake of simplicity, we use the strict notions of 2-functor and 2-natural transformation, although we
should have used pseudo-functors and pseudo-natural transformations.
Proposition 4.11 Let Cat be the 2-category of small categories, CAT the 2-category of locally
small categories and : Cat → CAT the inclusion 2-functor. Then, the following ˆ: Cat → CAT
is a 2-functor
op
• if C is a small category, then Cˆ is the topos of presheaves SetC
• if T : C → D is a functor, then T̂ is the left Kan extension LSet
T op
. ˆ then σ̂F is the natural transfor-
• if σ: S → T : C → D is a natural transformation and F ∈ C,
mation corresponding to idT̂ F via the following sequence of steps
ˆ T op ; T̂ F ) < ∼
C(F, D̂(T̂ F, T̂ F )
ˆ σ op ; T̂ F )
C(F,
∨
ˆ S op ; T̂ F ) ∼
C(F, > D̂(ŜF, T̂ F )
.
Moreover, Y: → ˆ is a 2-natural transformation.
Since monads are a 2-categorical concept (see [Str72]), the 2-functor ˆ maps monads in Cat to
monads in CAT. Then, the statement of Theorem 4.9 about lifting of monads follows immediately
from Proposition 4.11. It remains to define the lifting t̂ of a tensorial strength t for a monad (T, η, µ)
over a small category C.
Proposition 4.12 If C is a small category with finite products and T is an endofunctor over
C, then for every natural transformation ta,b : a × T b → T (a × b) there exists a unique natural
transformation t̂F,G : F × T̂ G → T̂ (F × G) s.t. t̂Ya,Yb = Y(ta,b ) for all a, b ∈ C.
Proof Every F ∈ Cˆ is isomorphic to the colimit ColimĈY↓F (π; Y) (shortly Colimi Yi), where Y is
the Yoneda embedding of C into C. ˆ Similarly G is isomorphic to Colim Yj. Both functors ( × T̂ )
j
and T̂ ( × ) from Cˆ × Cˆ to Cˆ preserves colimits (as T̂ and × F are left adjoints) and commute
with the Yoneda embedding (as Y(a × b) = Ya × Yb and T̂ (Ya) = Y(T a)). Therefore, F × T̂ G and
T̂ (F × G) are isomorphic to the colimits Colimi,j Yi × T̂ (Yj) and Colimi,j T̂ (Yi × Yj) respectively.
Let t̂ be the natural transformation we are looking for, then
Y(ti,j )
Yi × T̂ (Yj) > T̂ (Yi × Yj)
f × T̂ g T̂ (f × g)
∨ ∨
F × T̂ (G) > T̂ (F × G)
t̂F,G
26
for all f : Yi → F and g: Yj → g (by naturality of t̂ and t̂Yi,Yj = Y(ti,j )). But there exists exactly
one morphism t̂F,G making the diagram above commute, as hti,j |i, ji is a morphism between
diagrams in Cˆ of the same shape, and these diagrams have colimit cones hf × T̂ g|f, gi and hT̂ (f ×
g)|f, gi respectively.
Remark 4.13 If T is a monad of partial computations, i.e. it is induced by a dominion M on C
s.t. P(C, M)(a, b) ∼
= C(a, T b), then the lifting T̂ is the monad of partial computations induced by
the dominion M̂ on C, ˆ obtained by lifting M to the topos of presheaves, as described in [Ros86].
For other monads, however, the lifting is not the expected one. For instance, if T is the monad
S YS
of side-effects ( × S) , then T̂ is not (in general) the endofunctor ( × YS) on the topos of
presheaves.
Acknowledgements
I have to thank many people for advice, suggestions and criticisms, in particular: R. Amadio, R.
Burstall, M. Felleisen, R. Harper, F. Honsell, M. Hyland, B. Jay, A. Kock, Y. Lafont, G. Longo,
R. Milner, A. Pitts, G. Plotkin, J. Power and C. Talcott.
References
[BW85] M. Barr and C. Wells. Toposes, Triples and Theories. Springer Verlag, 1985.
27
[CP90] R.L. Crole and A.M. Pitts. New foundations for fixpoint computations. In 4th LICS
Conf. IEEE, 1990.
[CS87] R.L. Constable and S.F. Smith. Partial objects in constructive type theory. In 2nd
LICS Conf. IEEE, 1987.
[CS88] R.L. Constable and S.F. Smith. Computational foundations of basic recursive function
theory. In 3rd LICS Conf. IEEE, 1988.
[FF89] M. Felleisen and D.P. Friedman. A syntactic theory of sequential state. Theoretical
Computer Science, 69(3), 1989.
[FFKD86] M. Felleisen, D.P. Friedman, E. Kohlbecker, and B. Duba. Reasoning with continua-
tions. In 1st LICS Conf. IEEE, 1986.
[Fou77] M.P. Fourman. The logic of topoi. In J. Barwise, editor, Handbook of Mathematical
Logic, volume 90 of Studies in Logic. North Holland, 1977.
[GMW79] M.J.C. Gordon, R. Milner, and C.P. Wadsworth. Edinburgh LCF: A Mechanized Logic
of Computation, volume 78 of Lecture Notes in Computer Science. Springer Verlag,
1979.
[GS89] C. Gunter and S. Scott. Semantic domains. Technical Report MS-CIS-89-16, Dept.
of Comp. and Inf. Science, Univ. of Pennsylvania, 1989. to appear in North Holland
Handbook of Theoretical Computer Science.
[HMM90] R. Harper, J. Mitchell, and E. Moggi. Higher-order modules and the phase distinction.
In 17th POPL. ACM, 1990.
[HP87] J.M.E. Hyland and A.M. Pitts. The theory of constructions: Categorical semantics and
topos-theoretic models. In Proc. AMS Conf. on Categories in Comp. Sci. and Logic
(Boulder 1987), 1987.
[JP78] P.T. Johnstone and R. Pare, editors. Indexed Categories and their Applications, volume
661 of Lecture Notes in Mathematics. Springer Verlag, 1978.
[Kel82] G.M. Kelly. Basic Concepts of Enriched Category Theory. Cambridge University Press,
1982.
[Koc72] A. Kock. Strong functors and monoidal monads. Archiv der Mathematik, 23, 1972.
[KR77] A. Kock and G.E. Reyes. Doctrines in categorical logic. In J. Barwise, editor, Handbook
of Mathematical Logic, volume 90 of Studies in Logic. North Holland, 1977.
[Laf88] Y. Lafont. The linear abstract machine. Theoretical Computer Science, 59, 1988.
[LS86] J. Lambek and P.J. Scott. Introduction to Higher-Order Categorical Logic, volume 7 of
Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1986.
[Mac71] S. MacLane. Categories for the Working Mathematician. Springer Verlag, 1971.
[Man76] E. Manes. Algebraic Theories, volume 26 of Graduate Texts in Mathematics. Springer
Verlag, 1976.
[Mas88] I.A. Mason. Verification of programs that destructively manipulate data. Science of
Computer Programming, 10, 1988.
[Mog86] E. Moggi. Categories of partial morphisms and the partial lambda-calculus. In Pro-
ceedings Workshop on Category Theory and Computer Programming, Guildford 1985,
volume 240 of Lecture Notes in Computer Science. Springer Verlag, 1986.
28
[Mog88] E. Moggi. The Partial Lambda-Calculus. PhD thesis, University of Edinburgh, 1988.
[Mog89a] E. Moggi. An abstract view of programming languages. Technical Report ECS-LFCS-
90-113, Edinburgh Univ., Dept. of Comp. Sci., 1989. Lecture Notes for course CS 359,
Stanford Univ.
[Mog89b] E. Moggi. A category-theoretic account of program modules. In Proceedings of the
Conference on Category Theory and Computer Science, Manchester, UK, Sept. 1989,
volume 389 of Lecture Notes in Computer Science. Springer Verlag, 1989.
[Mog89c] E. Moggi. Computational lambda-calculus and monads. In 4th LICS Conf. IEEE, 1989.
[Mos89] P. Mosses. Denotational semantics. Technical Report MS-CIS-89-16, Dept. of Comp.
and Inf. Science, Univ. of Pennsylvania, 1989. to appear in North Holland Handbook
of Theoretical Computer Science.
[MR77] M. Makkai and G. Reyes. First Order Categorical Logic. Springer Verlag, 1977.
[MT89a] I. Mason and C. Talcott. Programming, transforming, and proving with function ab-
stractions and memories. In 16th Colloquium on Automata, Languages and Program-
ming. EATCS, 1989.
[MT89b] I. Mason and C. Talcott. A sound and complete axiomatization of operational equiva-
lence of programs with memory. In POPL 89. ACM, 1989.
[Plo75] G.D. Plotkin. Call-by-name, call-by-value and the λ-calculus. Theoretical Computer
Science, 1, 1975.
[Plo85] G.D. Plotkin. Denotational semantics with partial functions. Lecture Notes at C.S.L.I.
Summer School, 1985.
[Ros86] G. Rosolini. Continuity and Effectiveness in Topoi. PhD thesis, University of Oxford,
1986.
[Sch86] D.A. Schmidt. Denotational Semantics: a Methodology for Language Development.
Allyn & Bacon, 1986.
[Sco69] D.S. Scott. A type-theoretic alternative to CUCH, ISWIM, OWHY. Oxford notes,
1969.
[Sco79] D.S. Scott. Identity and existence in intuitionistic logic. In M.P. Fourman, C.J. Mul-
vey, and D.S. Scott, editors, Applications of Sheaves, volume 753 of Lecture Notes in
Mathematics. Springer Verlag, 1979.
[Sco80] D.S. Scott. Relating theories of the λ-calculus. In R. Hindley and J. Seldin, editors, To
H.B. Curry: essays in Combinarory Logic, lambda calculus and Formalisms. Academic
Press, 1980.
[See87] R.A.G. Seely. Linear logic, ∗-autonomous categories and cofree coalgebras. In Proc.
AMS Conf. on Categories in Comp. Sci. and Logic (Boulder 1987), 1987.
[Sha84] K. Sharma. Syntactic aspects of the non-deterministic lambda calculus. Master’s thesis,
Washington State University, September 1984. available as internal report CS-84-127
of the comp. sci. dept.
[SP82] M. Smith and G. Plotkin. The category-theoretic solution of recursive domain equa-
tions. SIAM Journal of Computing, 11, 1982.
[Str72] R. Street. The formal theory of monads. Journal of Pure and Applied Algebra, 2, 1972.
29
Principal type-schemes for functional programs
Edinburgh University
This paper is concerned with the polymorphic other research and in teaching to undergraduates,
type discipline of NL, which is a general purpose it has become important to answer these questions -
functional programming language, although it was particularly because the combination of flexibility
first introduced as a metalanguage (whence its (due to polymorphism) , robustness (due to semantic
name) for conducting proofs in the LCF proof system soundness) and detection of errors at compile time
[GMW] . The type discipline was studied in [Mil] , has proved to be one of the strongest aspects of ML.
will be deduced
207
Types are built from type constants (bool, . . . ) and on-line use of ML declarations
as infixed + for functions and postfixed “list” for are allowed, whose scope (e’) is the remainder
lists) ; a type-scheme is a type with (possibly) of the on-line session. As illustrated in the
quantification over type variables at the outermost. introduction, it must be possible to assign type-
type-scheme deduced for such a declaration (and more Note that types are absent from the language
generally, for any ML expression) is a principal Exp . Assuming a set of type variables a and of
type-schemer i.e. that any other type-scheme for the primitive types I, the syntax of types T and
3. Type Instantiation
plicity, our definitions and results here are form-
If S is a substitution of types for type
ulated for a skeletal language, since their extension
variables, often written [T1/CIl,...,Tanlnl or
to ML is a routine matter. For example, recursion
schemes.
language Exp of expressions e is given by the
e ::= Xle e, I lx.e I let x = e & e’ has a generic instance a’ = VB1. ..BnT’ if
(where parentheses may be used to avoid ambiguity). T’ = [Ti/ai]T for some types T1r. ..r~mt and
e’) , except for the important consideration that in acts on bound variables. It follOWS that o > u’
208
implies so > Su’. restricted scope.
Now let Env = Id+V be the domain of environ- at most one assumption about each identifier x.
ments q. The semantic function &:Exp+Env+v A stands for the result of removing any assump-
X
AxU{X:T’} ~ e:T
place of 0,. This example illustrates that free
ASS :
A + (kx. e) :T’+T
type-variables in an assertion are implicitly
209
The following example of a derivation is organised Proof We construct a derivation of AxU{x:o}ke:u
o
as a tree, in which each node follows from those from that of AXU{X:O’} + e:u o by substituting each
immediately above it by an inference rule. use of TAUT for X:u’ with X:U followed by an
i: Va(a+a) 1-
I
i:(a+a)+(a+a) INST
6. The type assignment algorithm W
I I
The type inference system by itself does not
X:a.1- X:(Y
provide an easy method for finding, given A and e,
COMB
a type-scheme o such that A+e:u. We now present
AB S
1- Ax. x:va(cl+cx)
below, such that
one.
which, given a pair of types, either returns a sub-
then SA I- e:Sa. Moreover if there is a deriv- Moreover, V inVOIVe S Only variables in T and r’.
to n.
Z(T) = Val... anT
210
Algorithm W (ii) w other o for which A h e:u is a generic
let W(A, e2) = (S1, T2) If A*e:o, for some a , then w computes
and W(SIA, e2) = (S2, T2) a PrlnC1pal type scheme for e under A.
variable
then S = S2S1 and T = T2 . (ii) If W(A, e) = (S, T) then, for some sub-
The following proposition proves that W meets In fact, from the theorem one also derives as
7. Completeness of W
(i) A 1- e:op
211
References
Science, Springer-Verlag].
pp 194-204.
Cornell University.
23-41.
212
Recursive Functions of Symbolic Expressions
and Their Computation by Machine, Part I
∗
John McCarthy, Massachusetts Institute of Technology, Cambridge, Mass.
April 1960
1 Introduction
A programming system called LISP (for LISt Processor) has been developed
for the IBM 704 computer by the Artificial Intelligence group at M.I.T. The
system was designed to facilitate experiments with a proposed system called
the Advice Taker, whereby a machine could be instructed to handle declarative
as well as imperative sentences and could exhibit “common sense” in carrying
out its instructions. The original proposal [1] for the Advice Taker was made
in November 1958. The main requirement was a programming system for
manipulating expressions representing formalized declarative and imperative
sentences so that the Advice Taker system could make deductions.
In the course of its development the LISP system went through several
stages of simplification and eventually came to be based on a scheme for rep-
resenting the partial recursive functions of a certain class of symbolic expres-
sions. This representation is independent of the IBM 704 computer, or of any
other electronic computer, and it now seems expedient to expound the system
by starting with the class of expressions called S-expressions and the functions
called S-functions.
Putting this paper in LATEXpartly supported by ARPA (ONR) grant N00014-94-1-0775
∗
to Stanford University where John McCarthy has been since 1962. Copied with minor nota-
tional changes from CACM, April 1960. If you want the exact typography, look there. Cur-
rent address, John McCarthy, Computer Science Department, Stanford, CA 94305, (email:
jmc@cs.stanford.edu), (URL: http://www-formal.stanford.edu/jmc/ )
1
In this article, we first describe a formalism for defining functions recur-
sively. We believe this formalism has advantages both as a programming
language and as a vehicle for developing a theory of computation. Next, we
describe S-expressions and S-functions, give some examples, and then describe
the universal S-function apply which plays the theoretical role of a universal
Turing machine and the practical role of an interpreter. Then we describe the
representation of S-expressions in the memory of the IBM 704 by list structures
similar to those used by Newell, Shaw and Simon [2], and the representation
of S-functions by program. Then we mention the main features of the LISP
programming system for the IBM 704. Next comes another way of describ-
ing computations with symbolic expressions, and finally we give a recursive
function interpretation of flow charts.
We hope to describe some of the symbolic computations for which LISP
has been used in another paper, and also to give elsewhere some applications
of our recursive function formalism to mathematical logic and to the problem
of mechanical theorem proving.
2
x<y
(x < y) ∧ (b = c)
x is prime
(p1 → e1 , · · · , pn → en )
where the p’s are propositional expressions and the e’s are expressions of any
kind. It may be read, “If p1 then e1 otherwise if p2 then e2 , · · · , otherwise if
pn then en ,” or “p1 yields e1 , · · · , pn yields en .” 2
We now give the rules for determining whether the value of
(p1 → e1 , · · · , pn → en )
is defined, and if so what its value is. Examine the p’s from left to right. If
a p whose value is T is encountered before any p whose value is undefined is
encountered then the value of the conditional expression is the value of the
corresponding e (if this is defined). If any undefined p is encountered before
2
I sent a proposal for conditional expressions to a CACM forum on what should be
included in Algol 60. Because the item was short, the editor demoted it to a letter to the
editor, for which CACM subsequently apologized. The notation given here was rejected for
Algol 60, because it had been decided that no new mathematical notation should be allowed
in Algol 60, and everything new had to be English. The if . . . then . . . else that Algol 60
adopted was suggested by John Backus.
3
a true p, or if all p’s are false, or if the e corresponding to the first true p is
undefined, then the value of the conditional expression is undefined. We now
give examples.
(1 < 2 → 4, 1 > 2 → 3) = 4
(2 < 1 → 4, T → 3) = 3
0
(2 < 1 → , T → 3) = 3
0
0
(2 < 1 → 3, T → ) is undefined
0
δij = (i = j → 1, T → 0)
n! = (n = 0 → 1, T → n · (n − 1)!)
When we use this formula to evaluate 0! we get the answer 1; because of the
way in which the value of a conditional expression was defined, the meaningless
4
expression 0 · (0 - 1)! does not arise. The evaluation of 2! according to this
definition proceeds as follows:
2! = (2 = 0 → 1, T → 2 · (2 − 1)!)
= 2 · 1!
= 2 · (1 = 0 → 1T → ·(1 − 1)!)
= 2 · 1 · 0!
= 2 · 1 · (0 = 0 → 1, T → 0 · (0 − 1)!)
= 2·1·1
= 2
5
p∧q = (p → q, T → F )
p∨q = (p → T, T → q)
¬p = (p → F, T → T )
p⊃q = (p → q, T → T )
It is readily seen that the right-hand sides of the equations have the correct
truth tables. If we consider situations in which p or q may be undefined, the
connectives ∧ and ∨ are seen to be noncommutative. For example if p is false
and q is undefined, we see that according to the definitions given above p ∧ q
is false, but q ∧ p is undefined. For our applications this noncommutativity is
desirable, since p ∧ q is computed by first computing p, and if p is false q is not
computed. If the computation for p does not terminate, we never get around
to computing q. We shall use propositional connectives in this sense hereafter.
6
change the names of the bound variables in a function expression without
changing the value of the expression, provided that we make the same change
for each occurrence of the variable and do not make two variables the same
that previously were different. Thus λ((x, y), y 2 + x), λ((u, v), v 2 + u) and
λ((y, x), x2 + y) denote the same function.
We shall frequently use expressions in which some of the variables are
bound by λ’s and others are not. Such an expression may be regarded as
defining a function with parameters. The unbound variables are called free
variables.
An adequate notation that distinguishes functions from forms allows an
unambiguous treatment of functions of functions. It would involve too much
of a digression to give examples here, but we shall use functions with functions
as arguments later in this report.
Difficulties arise in combining functions described by λ-expressions, or by
any other notation involving variables, because different bound variables may
be represented by the same symbol. This is called collision of bound variables.
There is a notation involving operators that are called combinators for com-
bining functions without the use of variables. Unfortunately, the combinatory
expressions for interesting combinations of functions tend to be lengthy and
unreadable.
1 a
sqrt = λ((a, x, ), (|x2 − a| < → x, T → sqrt(a, (x + ), ))),
2 x
but the right-hand side cannot serve as an expression for the function be-
cause there would be nothing to indicate that the reference to sqrt within the
expression stood for the expression as a whole.
In order to be able to write expressions for recursive functions, we intro-
duce another notation. label(a, E) denotes the expression E, provided that
occurrences of a within E are to be interpreted as referring to the expression
7
as a whole. Thus we can write
The symbol a in label (a, E) is also bound, that is, it may be altered
systematically without changing the meaning of the expression. It behaves
differently from a variable bound by a λ, however.
·
)
(
and an infinite set of distinguishable atomic symbols. For atomic symbols,
we shall use strings of capital Latin letters and digits with single imbedded
8
blanks.3 Examples of atomic symbols are
A
ABA
AP P LE P IE NUMBER 3
There is a twofold reason for departing from the usual mathematical prac-
tice of using single letters for atomic symbols. First, computer programs fre-
quently require hundreds of distinguishable symbols that must be formed from
the 47 characters that are printable by the IBM 704 computer. Second, it is
convenient to allow English words and phrases to stand for atomic entities for
mnemonic reasons. The symbols are atomic in the sense that any substructure
they may have as sequences of characters is ignored. We assume only that dif-
ferent symbols can be distinguished. S-expressions are then defined as follows:
AB
(A · B)
((AB · C) · D)
(m1 , m2 , · · · , mn )
is represented by the S-expression
3
1995 remark: Imbedded blanks could be allowed within symbols, because lists were then
written with commas between elements.
9
l. (m) stands for (m ·NIL).
2. (m1 , · · · , mn ) stands for (m1 · (· · · (mn · NIL) · · ·)).
3. (m1 , · · · , mn · x) stands for (m1 · (· · · (mn · x) · · ·)).
car[x]
car[cons[(A · B); x]]
In these M-expressions (meta-expressions) any S-expression that occur stand
for themselves.
atom [X] = T
atom [(X · A)] = F
2. eq. eq [x;y] is defined if and only if both x and y are atomic. eq [x; y]
= T if x and y are the same symbol, and eq [x; y] = F otherwise. Thus
10
eq [X; X] = T
eq [X; A] = F
eq [X; (X · A)] is undefined.
4. cdr. cdr [x] is also defined when x is not atomic. We have cdr
[(e1 · e2 )] = e2 . Thus cdr [X] is undefined.
5. cons. cons [x; y] is defined for any x and y. We have cons [e1 ; e2 ] =
(e1 · e2 ). Thus
cons [X; A] = (X A)
cons [(X · A); Y ] = ((X · A)Y )
car, cdr, and cons are easily seen to satisfy the relations
The names “car” and “cons” will come to have mnemonic significance only
when we discuss the representation of the system in the computer. Composi-
tions of car and cdr give the subexpressions of a given expression in a given
position. Compositions of cons form expressions of a given structure out of
parts. The class of functions which can be formed in this way is quite limited
and not very interesting.
11
some examples of functions that are definable in this way.
1. ff[x]. The value of ff[x] is the first atomic symbol of the S-expression x
with the parentheses ignored. Thus
ff[((A · B) · C)] = A
We have
ff [((A · B) · C)]:
ff [((A · B) · C)]
= [T → ff[car[((A · B) · C)]]]
= ff[car[((A · B) · C)]]
= ff[(A · B)]
= [T → ff[car[(A · B)]]]
= ff[car[(A · B)]]
= ff[A]
12
= [atom[A] → A; T → ff[car[A]]]
= [T → A; T → ff[car[A]]]
= A
2. subst [x; y; z]. This function gives the result of substituting the S-
expression x for all occurrences of the atomic symbol y in the S-expression z.
It is defined by
As an example, we have
∨[¬ atom [x] ∧¬ atom [y] ∧ equal [car [x]; car [y]]
(i) car[(m1 , m2 , · · · , mn )] = m1
(ii) cdr[(ms , m2 , · · · , mn )] = (m2 , · · · , mn )
(iii) cdr[(m)] = NIL
(iv) cons[m1 ; (m2 , · · · , mn )] = (m1 , m2 , · · · , mn )
(v) cons[m; NIL] = (m)
We define
13
null[x] = atom[x] ∧ eq[x; NIL]
This predicate is useful in dealing with lists.
Compositions of car and cdr arise so frequently that many expressions can
be written more concisely if we abbreviate
The following functions are useful when S-expressions are regarded as lists.
1. append [x;y].
append [x; y] = [null[x] → y; T → cons [car [x]; append [cdr [x]; y]]]
An example is
An example is
pair[(A, B, C); (X, (Y, Z), U)] = ((A, X), (B, (Y, Z)), (C, U)).
14
4. assoc [x;y]. If y is a list of the form ((u1, v1 ), · · · , (un , vn )) and x is one
of the u’s, then assoc [x; y] is the corresponding v. We have
assoc[X; ((W, (A, B)), (X, (C, D)), (Y, (E, F )))] = (C, D).
and
We have
sublis [((X, (A, B)), (Y, (B, C))); (A, X · Y)] = (A, (A, B), B, C)
15
5. {λ[[x1 ; · · · ; xn ]; E]}∗ is (LAMBDA, (x∗1 , · · · , x∗n ), E ∗).
6. {label[a; E]}∗ is (LABEL, a∗ , E ∗ ).
With these conventions the substitution function whose M-expression is
label [subst; λ [[x; y; z]; [atom [z] → [eq [y; z] → x; T → z]; T → cons [subst
[x; y; car [z]]; subst [x; y; cdr [z]]]]]] has the S-expression
= apply[(LAMBDA, (X, Y ), (CONS, (CAR, X), Y )); ((A, B), (C, D))] = (A, C, D)
and
eval[e; a] = [
4
1995: More characters were made available on SAIL and later on the Lisp machines.
Alas, the world went back to inferior character sets again—though not as far back as when
this paper was written in early 1959.
16
atom [e] → assoc [e; a];
eq [car [e]; EQ] → [eval [cadr [e]; a] = eval [caddr [e]; a]];
eq [car [e]; CONS] → cons [eval [cadr [e]; a]; eval [caddr [e];
and
17
We now explain a number of points about these definitions. 5
1. apply itself forms an expression representing the value of the function
applied to the arguments, and puts the work of evaluating this expression onto
a function eval. It uses appq to put quotes around each of the arguments, so
that eval will regard them as standing for themselves.
2. eval[e; a] has two arguments, an expression e to be evaluated, and a list
of pairs a. The first item of each pair is an atomic symbol, and the second is
the expression for which the symbol stands.
3. If the expression to be evaluated is atomic, eval evaluates whatever is
paired with it first on the list a.
4. If e is not atomic but car[e] is atomic, then the expression has one of the
forms (QUOT E, e) or (AT OM, e) or (EQ, e1 , e2 ) or (COND, (p1, e1 ), · · · , (pn , en )),
or (CAR, e) or (CDR, e) or (CONS, e1 , e2 ) or (f, e1 , · · · , en ) where f is an
atomic symbol.
In the case (QUOT E, e) the expression e, itself, is taken. In the case of
(AT OM, e) or (CAR, e) or (CDR, e) the expression e is evaluated and the
appropriate function taken. In the case of (EQ, e1 , e2 ) or (CONS, e1 , e2 ) two
expressions have to be evaluated. In the case of (COND, (p1 , e1 ), · · · (pn , en ))
the p’s have to be evaluated in order until a true p is found, and then the
corresponding e must be evaluated. This is accomplished by evcon. Finally, in
the case of (f, e1 , · · · , en ) we evaluate the expression that results from replacing
f in this expression by whatever it is paired with in the list a.
5. The evaluation of ((LABEL, f, E), e1 , · · · , en ) is accomplished by eval-
uating (E, e1 , · · · , en ) with the pairing (f, (LABEL, f, E)) put on the front of
the previous list a of pairs.
6. Finally, the evaluation of ((LAMBDA, (x1 , · · · , xn ), E), e1, · · · en ) is ac-
complished by evaluating E with the list of pairs ((x1 , e1 ), · · · , ((xn , en )) put
on the front of the previous list a.
The list a could be eliminated, and LAMBDA and LABEL expressions
evaluated by substituting the arguments for the variables in the expressions
E. Unfortunately, difficulties involving collisions of bound variables arise, but
they are avoided by using the list a.
5
1995: This version isn’t quite right. A comparison of this and other versions of eval
including what was actually implemented (and debugged) is given in “The Influence of the
Designer on the Design” by Herbert Stoyan and included in Artificial Intelligence and Math-
ematical Theory of Computation: Papers in Honor of John McCarthy, Vladimir Lifschitz
(ed.), Academic Press, 1991
18
Calculating the values of functions by using apply is an activity better
suited to electronic computers than to people. As an illustration, however, we
now give some of the steps for calculating
apply [(LABEL, FF, (LAMBDA, (X), (COND, (ATOM, X), X), ((QUOTE,
T),(FF, (CAR, X))))));((A· B))] = A
The first argument is the S-expression that represents the function ff defined
in section 3d. We shall abbreviate it by using the letter φ. We have
apply [φ; ( (A·B) )]
= eval [((LABEL, FF, ψ), (QUOTE, (A·B))); NIL]
= atom [(A·B)],
=F
19
apply [φ; ((A·B))]
= eval [2 ; a]
= car [(A·B)], where we took steps from the earlier computation of atom [eval [X; a]] = A,
The subsequent steps are made as in the beginning of the calculation. The
LABEL and LAMBDA cause new pairs to be added to a, which gives a new
list of pairs a1 . The π1 term of the conditional eval [(ATOM, X); a1 ] has the
20
value T because X is paired with (QUOTE, A) first in a1 , rather than with
(QUOTE, (A·B)) as in a.
Therefore we end up with eval [X; a1 ] from the evcon, and this is just A.
diff [y; x] = [atom [y] → [eq [y; x] → ONE; T → ZERO]; eq [car [Y]; PLUS]
→ cons [PLUS; maplist [cdr [y]; λ[[z]; diff [car [z]; x]]]]; eq[car [y]; TIMES] →
cons[PLUS; maplist[cdr[y]; λ[[z]; cons [TIMES; maplist[cdr [y]; λ[[w]; ¬ eq [z;
w] → car [w]; T → diff [car [[w]; x]]]]]]]
The derivative of the expression (TIMES, X, (PLUS, X, A), Y), as com-
puted by this formula, is
(PLUS, (TIMES, ONE, (PLUS, X, A), Y), (TIMES, X, (PLUS, ONE,
ZERO), Y), (TIMES, X, (PLUS, X, A), ZERO))
Besides maplist, another useful function with functional arguments is search,
which is defined as
21
The function search is used to search a list for an element that has the property
p, and if such an element is found, f of that element is taken. If there is no
such element, the function u of no arguments is computed.
22
- - -
- - - -
- - - -
- - - -
- - -
-
- -
- -
Fig. 1
23
- -A - -D
-C
-E F
-B -C
-A B
(a) (b)
Figure 2
When a list structure is regarded as representing a list, we see that each term
of the list occupies the address part of a word, the decrement part of which
points to the word containing the next term, while the last word has NIL in
its decrement.
An expression that has a given subexpression occurring more than once
can be represented in more than one way. Whether the list structure for
the subexpression is or is not repeated depends upon the history of the pro-
gram. Whether or not a subexpression is repeated will make no difference
in the results of a program as they appear outside the machine, although it
will affect the time and storage requirements. For example, the S-expression
((A·B)·(A·B)) can be represented by either the list structure of figure 3a or
3b.
- -
-A B -A B -
-A B
(a) (b)
Figure 3
24
against an expression being a subexpression of itself. Such an expression could
not exist on paper in a world with our topology. Circular list structures would
have some advantages in the machine, for example, for representing recursive
functions, but difficulties in printing them, and in certain other operations,
make it seem advisable not to use them for the present.
The advantages of list structures for the storage of symbolic expressions
are:
1. The size and even the number of expressions with which the program
will have to deal cannot be predicted in advance. Therefore, it is difficult to
arrange blocks of storage of fixed length to contain them.
2. Registers can be put back on the free-storage list when they are no longer
needed. Even one register returned to the list is of value, but if expressions
are stored linearly, it is difficult to make use of blocks of registers of odd sizes
that may become available.
3. An expression that occurs as a subexpression of several expressions need
be represented in storage only once.
b. Association Lists6 . In the LISP programming system we put more in
the association list of a symbol than is required by the mathematical system
described in the previous sections. In fact, any information that we desire to
associate with the symbol may be put on the association list. This information
may include: the print name, that is, the string of letters and digits which
represents the symbol outside the machine; a numerical value if the symbol
represents a number; another S-expression if the symbol, in some way, serves
as a name for it; or the location of a routine if the symbol represents a function
for which there is a machine-language subroutine. All this implies that in the
machine system there are more primitive entities than have been described in
the sections on the mathematical system.
For the present, we shall only describe how print names are represented
on association lists so that in reading or printing the program can establish
a correspondence between information on punched cards, magnetic tape or
printed page and the list structure inside the machine. The association list of
the symbol DIFFERENTIATE has a segment of the form shown in figure 4.
Here pname is a symbol that indicates that the structure for the print name
of the symbol whose association list this is hanging from the next word on
the association list. In the second row of the figure we have a list of three
words. The address part of each of these words points to a Word containing
6
1995: These were later called property lists.
25
six 6-bit characters. The last word is filled out with a 6-bit combination that
does not represent a character printable by the computer. (Recall that the
IBM 7O4 has a 36-bit word and that printable characters are each represented
by 6 bits.) The presence of the words with character information means that
the association lists do not themselves represent S-expressions, and that only
some of the functions for dealing with S-expressions make sense within an
association list.
- - -
Figure 4
c. Free-Storage List. At any given time only a part of the memory reserved
for list structures will actually be in use for storing S-expressions. The remain-
ing registers (in our system the number, initially, is approximately 15,000) are
arranged in a single list called the free-storage list. A certain register, FREE,
in the program contains the location of the first register in this list. When
a word is required to form some additional list structure, the first word on
the free-storage list is taken and the number in register FREE is changed to
become the location of the second word on the free-storage list. No provision
need be made for the user to program the return of registers to the free-storage
list.
This return takes place automatically, approximately as follows (it is nec-
essary to give a simplified description of this process in this report): There is
a fixed set of base registers in the program which contains the locations of list
structures that are accessible to the program. Of course, because list struc-
tures branch, an arbitrary number of registers may be involved. Each register
that is accessible to the program is accessible because it can be reached from
one or more of the base registers by a chain of car and cdr operations. When
26
the contents of a base register are changed, it may happen that the register
to which the base register formerly pointed cannot be reached by a car − cdr
chain from any base register. Such a register may be considered abandoned
by the program because its contents can no longer be found by any possible
program; hence its contents are no longer of interest, and so we would like to
have it back on the free-storage list. This comes about in the following way.
Nothing happens until the program runs out of free storage. When a free
register is wanted, and there is none left on the free-storage list, a reclamation7
cycle starts.
First, the program finds all registers accessible from the base registers and
makes their signs negative. This is accomplished by starting from each of the
base registers and changing the sign of every register that can be reached from
it by a car − cdr chain. If the program encounters a register in this process
which already has a negative sign, it assumes that this register has already
been reached.
After all of the accessible registers have had their signs changed, the pro-
gram goes through the area of memory reserved for the storage of list structures
and puts all the registers whose signs were not changed in the previous step
back on the free-storage list, and makes the signs of the accessible registers
positive again.
This process, because it is entirely automatic, is more convenient for the
programmer than a system in which he has to keep track of and erase un-
wanted lists. Its efficiency depends upon not coming close to exhausting the
available memory with accessible lists. This is because the reclamation process
requires several seconds to execute, and therefore must result in the addition
of at least several thousand registers to the free-storage list if the program is
not to spend most of its time in reclamation.
27
constant in its address part: atom is programmed as an open subroutine that
tests this part. Unless the M-expression atom[e] occurs as a condition in a
conditional expression, the symbol T or F is generated as the result of the
test. In case of a conditional expression, a conditional transfer is used and the
symbol T or F is not generated.
eq. The program for eq[e; f ] involves testing for the numerical equality of
the locations of the words. This works because each atomic symbol has only
one association list. As with atom, the result is either a conditional transfer
or one of the symbols T or F .
car. Computing car[x] involves getting the contents of the address part of
register x. This is essentially accomplished by the single instruction CLA 0, i,
where the argument is in index register, and the result appears in the address
part of the accumulator. (We take the view that the places from which a
function takes its arguments and into which it puts its results are prescribed
in the definition of the function, and it is the responsibility of the programmer
or the compiler to insert the required datamoving instructions to get the results
of one calculation in position for the next.) (“car” is a mnemonic for “contents
of the address part of register.”)
cdr. cdr is handled in the same way as car, except that the result appears
in the decrement part of the accumulator (“cdr” stands for “contents of the
decrement part of register.”)
cons. The value of cons[x; y] must be the location of a register that has x
and y in its address and decrement parts, respectively. There may not be such
a register in the computer and, even if there were, it would be time-consuming
to find it. Actually, what we do is to take the first available register from the
free-storage list, put x and y in the address and decrement parts, respectively,
and make the value of the function the location of the register taken. (“cons”
is an abbreviation for “construct.”)
It is the subroutine for cons that initiates the reclamation when the free-
storage list is exhausted. In the version of the system that is used at present
cons is represented by a closed subroutine. In the compiled version, cons is
open.
28
quired are computed. However, problems arise in the compilation of recursive
functions.
In general (we shall discuss an exception), the routine for a recursive func-
tion uses itself as a subroutine. For example, the program for subst[x; y; z] uses
itself as a subroutine to evaluate the result of substituting into the subexpres-
sions car[z] and cdr[z]. While subst[x; y; cdr[z]] is being evaluated, the result
of the previous evaluation of subst[x; y; car[z]] must be saved in a temporary
storage register. However, subst may need the same register for evaluating
subst[x; y; cdr[z]]. This possible conflict is resolved by the SAVE and UN-
SAVE routines that use the public push-down list 8 . The SAVE routine is
entered at the beginning of the routine for the recursive function with a re-
quest to save a given set of consecutive registers. A block of registers called
the public push-down list is reserved for this purpose. The SAVE routine has
an index that tells it how many registers in the push-down list are already
in use. It moves the contents of the registers which are to be saved to the
first unused registers in the push-down list, advances the index of the list, and
returns to the program from which control came. This program may then
freely use these registers for temporary storage. Before the routine exits it
uses UNSAVE, which restores the contents of the temporary registers from
the push-down list and moves back the index of this list. The result of these
conventions is described, in programming terminology, by saying that the re-
cursive subroutine is transparent to the temporary storage registers.
29
4. Some error diagnostic and selective tracing facilities are included.
5. The programmer may have selected S-functions compiled into machine
language programs put into the core memory. Values of compiled functions
are computed about 60 times as fast as they would if interpreted. Compilation
is fast enough so that it is not necessary to punch compiled program for future
use.
6. A “program feature” allows programs containing assignment and go to
statements in the style of ALGOL.
7. Computation with floating point numbers is possible in the system, but
this is inefficient.
8. A programmer’s manual is being prepared. The LISP programming
system is appropriate for computations where the data can conveniently be
represented as symbolic expressions allowing expressions of the same kind as
subexpressions. A version of the system for the IBM 709 is being prepared.
30
There are three predicates on strings:
1. char[x], x is a single character.
2. null[x], x is the null string.
3. x = y, defined for x and y characters.
The advantage of linear LISP is that no characters are given special roles,
as are parentheses, dots, and commas in LISP. This permits computations
with all expressions that can be written linearly. The disadvantage of linear
LISP is that the extraction of subexpressions is a fairly involved, rather than
an elementary, operation. It is not hard to write, in linear LISP, functions that
correspond to the basic functions of LISP, so that, mathematically, linear LISP
includes LISP. This turns out to be the most convenient way of programming,
in linear LISP, the more complicated manipulations. However, if the functions
are to be represented by computer routines, LISP is essentially faster.
31
?
-?
π1
X XX
z
f1 f2
HH
j
H +
S
π2
?
f
3
T
?
π3
?
f4
?
?
Figure 5
32
that transforms ξ between β and the exit of the chart, and let φ1 , · · · , φn be
the corresponding functions for β1 , · · · , βn . We then write
@
β φ
@
@
A
@ A
A
?
.... AAU
f1 f2 ..... f n
C
1 C
? CW
φn
φ φ2
.....
β1 β2 βn
Figure 6
7 Acknowledgments
The inadequacy of the λ-notation for naming recursive functions was noticed
by N. Rochester, and he discovered an alternative to the solution involving
label which has been used here. The form of subroutine for cons which per-
mits its composition with other functions was invented, in connection with
another programming system, by C. Gerberick and H. L. Gelernter, of IBM
Corporation. The LlSP programming system was developed by a group in-
cluding R. Brayton, D. Edwards, P. Fox, L. Hodes, D. Luckham, K. Maling,
J. McCarthy, D. Park, S. Russell.
The group was supported by the M.I.T. Computation Center, and by the
M.I.T. Research Laboratory of Electronics (which is supported in part by the
the U.S. Army (Signal Corps), the U.S. Air Force (Office of Scientific Research,
Air Research and Development Command), and the U.S. Navy (Office of Naval
Research)). The author also wishes to acknowledge the personal financial sup-
33
port of the Alfred P. Sloan Foundation.
REFERENCES
34
Syracuse University
SURFACE
College of Engineering and Computer Science -
Former Departments, Centers, Institutes and College of Engineering and Computer Science
Projects
1998
Recommended Citation
Reynolds, John C., "Definitional interpreters for higher-order programming languages" (1998). College of
Engineering and Computer Science - Former Departments, Centers, Institutes and Projects. 13.
https://surface.syr.edu/lcsmith_other/13
This Article is brought to you for free and open access by the College of Engineering and Computer Science at
SURFACE. It has been accepted for inclusion in College of Engineering and Computer Science - Former Departments,
Centers, Institutes and Projects by an authorized administrator of SURFACE. For more information, please contact
surface@syr.edu.
Higher-Order and Symbolic Computation, 11, 363–397 (1998)
°
c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Definitional Interpreters
for Higher-Order Programming Languages*
JOHN C. REYNOLDS**
Systems and Information Science, Syracuse University
Abstract. Higher-order programming languages (i.e., languages in which procedures or labels can occur as
values) are usually defined by interpreters that are themselves written in a programming language based on the
lambda calculus (i.e., an applicative language such as pure LISP). Examples include McCarthy’s definition of
LISP, Landin’s SECD machine, the Vienna definition of PL/I, Reynolds’ definitions of GEDANKEN, and recent
unpublished work by L. Morris and C. Wadsworth. Such definitions can be classified according to whether the
interpreter contains higher-order functions, and whether the order of application (i.e., call by value versus call by
name) in the defined language depends upon the order of application in the defining language. As an example,
we consider the definition of a simple applicative programming language by means of an interpreter written in a
similar language. Definitions in each of the above classifications are derived from one another by informal but
constructive methods. The treatment of imperative features such as jumps and assignment is also discussed.
Keywords: programming language, language definition, interpreter, lambda calculus, applicative language,
higher-order function, closure, order of application, continuation, LISP, GEDANKEN, PAL, SECD machine,
J-operator, reference.
1. Introduction
* Work supported by Rome Air Force Development Center Contract No. 30602-72-C-0281 and ARPA Contract
No. DAHC04-72-C-0003. This paper originally appeared in the Proceedings of the ACM National Conference,
volume 2, August, 1972, ACM, New York, pages 717–740.
** Current address: Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
e-mail: John.Reynolds@cs.cmu.edu
364 REYNOLDS
definition of LISP [1], Landin’s SECD machine [7], the Vienna definition of PL/I [18],
Reynolds’ definitions of GEDANKEN [19], and recent unpublished work by L. Morris [20]
and C. Wadsworth.
(There are a few instances of definitional interpreters that fall outside the conceptual
framework developed in this paper. A broader review of the field is given by deBakker
[21].)
These examples exhibit considerable variety, ranging from very concise and abstract
interpreters to much more elaborate and machine-like ones. To achieve a more precise
classification, we will introduce two criteria. First, we ask whether the defining language is
higher-order, or more precisely, whether any of the functions that comprise the interpreter
either accept or produce values that are themselves functions.
The second criterion involves the notion of order of application. In designing any language
that allows the use of procedures or functions, one must choose between two orders of
application which are called (following ALGOL terminology) call by value and call by
name. Even when the language is purely applicative, this choice will affect the meaning
of some, but not all, programs that can be written in the language. Remembering that an
interpreter is a specific program, we obtain our second criterion: Does the meaning of the
interpreter depend upon the order of application chosen for the defining language?
These two criteria establish four possible classes of interpreters, each of which contains
one or more of the examples cited earlier:
The main goal of this paper is to illustrate and relate these classes of definitional inter-
preters. In the next section we will introduce a simple applicative language, which we will
use as the defining language and also, with several restrictions, as the defined language.
Then we will present a simple interpreter that uses higher-order functions and is order-of-
application dependent, and we will transform this interpreter into examples of the three
remaining classes. Finally, we will consider the problem of adding imperative features to
the defined language (while keeping the defining language purely applicative).
366 REYNOLDS
In an applicative language, the meaningful phrases of a program are called expressions, the
process of executing or interpreting these expressions is called evaluation, and the result of
evaluating an expression is called a value. However, as is evident from a simple arithmetic
expression such as x + y, different evaluations of the same expression can produce different
values, so that the process of evaluation must depend upon something more than just the
expression being evaluated. It is evident that this “something more” must specify a value
for every variable that might occur in the expression (more precisely, occur free). We will
call such a specification an environment, and say that it binds variables to values.
It is also evident that the evaluation process may involve the creation of new environments
from old ones. Suppose x1 , . . . , xn are variables, v1 , . . . , vn are values, and e and e0 are
environments. If e0 specifies the value vi for each xi , and behaves the same way as e for all
other variables, then we will say that e0 is the extension of e that binds the xi ’s to the vi ’s.
The simplest expressions in our applicative language are constants and variables. The
evaluation of a constant always gives the same value, regardless of the environment. We
will not specify the set of constants precisely, but will assume that it contains the integers
and the Boolean constants true and false. The evaluation of a variable simply produces the
value that is bound to that variable by the environment. In the programs in this paper we
will denote variables by alphanumeric strings, with occasional superscripts and subscripts.
If our language is going to involve functions, then we must have a form of expression
whose evaluation will cause the application of function to its arguments. If r0 , r1 , . . . , rn
are expressions, then r0 (r1 , . . . , rn ) is an application expression, whose operator is r0
and whose operands are r1 , . . . , rn . The evaluation of an application expression in an
environment proceeds as follows:
1. The subexpressions r0 , r1 , . . . , rn are evaluated in the same environment to obtain
values f , a1 , . . . , an .
2. If f is not a function of n arguments, then an error stop occurs.
3. Otherwise, the function f is applied to the arguments a1 , . . . , an , and if this application
produces a result, then the result is the value of the application expression.
There are several assumptions hiding behind this description that need to be made explicit:
1. A “function of n arguments” is a kind of value that can be subjected to the process of
being “applied” to a sequence of n values called “arguments”.
2. For some functions and arguments, the process of application may never produce a
result, either because the process does not terminate (i.e., it runs on forever), or because
it causes an error stop. Similarly, for some expressions and environments, the process
of evaluation may never produce a value.
3. In a purely applicative language, the application of the same function to the same
sequence of arguments will always have the same effect, i.e., both the result that is
produced, and the prior question of whether any result is produced, depend only upon
the function and its arguments. Similarly, the evaluation of the same expression in the
same environment will always have the same effect.
DEFINITIONAL INTERPRETERS 367
4. During the evaluation of an application expression, the application process does not
begin until after the operator and all of its operands have been evaluated. This is the
call-by-value order of application mentioned in the introduction. In the alternative order
of application, known as call by name, the application process would begin as soon as
the operator had been evaluated, and each operand would only be evaluated when (and
if) the function being applied actually depended upon its value. This distinction will
be clarified below.
Next, we must have a form of expression whose evaluation will produce a function.
If x1 , . . . , xn are variables and r is an expression, then λ(x1 , . . . , xn ). r is a lambda
expression, whose formal parameters are x1 , . . . , xn and whose body is r. (The parentheses
may be omitted if there is only one formal parameter.) The evaluation of a lambda expression
with n formal parameters always terminates and always produces a function of n arguments.
To describe this function, we must specify what will happen when it is applied to its
arguments.
Suppose that f is the function obtained by evaluating λ(x1 , . . . , xn ). r in an environment
e. Then the application of f to the arguments a1 , . . . , an will cause the evaluation of the
body r in the environment that is the extension of e that binds each xi to the corresponding
ai . If this evaluation produces a value, then the value becomes the result of the application
of f .
The key point is that the environment in which the body is evaluated during application is
an extension of the earlier environment in which the lambda expression was evaluated (rather
than the more recent environment in which the application takes place). As a consequence, if
a lambda expression contains global variables (i.e., variables that are not formal parameters),
its evaluation in different environments can produce different functions. For example, the
lambda expression λx. x + y can produce an incrementing function, an identity function
(for the integers), or a decrementing function, when evaluated in environments that bind y
to the values 1, 0, or −1 respectively.
Nowadays, it is generally accepted that this behavior of lambda expressions and environ-
ments is a basic characteristic of a well-designed higher-order language. Its importance is
that it permits functional data to depend upon the partial results of a program.
Having introduced application and lambda expressions, we may now clarify the distinc-
tion between call by value and call by name. Consider the evaluation of an application
expression r0 (r1 , . . . , rn ) in an environment ea , and suppose that the value of the oper-
ator r0 is a function f that was originally created by evaluating the lambda expression
λ(x1 , . . . , xn ). rλ in an environment eλ . (Possibly this lambda expression is r0 itself, but
more generally r0 may be a non-lambda expression whose functional value was created
368 REYNOLDS
earlier in the computation.) When call by value is used, the following steps will occur
during the evaluation of the application expression:
1. r0 is evaluated in the environment ea to obtain the function value f .
2. r1 , . . . , rn are evaluated in the environment ea to obtain arguments a1 , . . . , an .
3. rλ is evaluated in the extension of eλ that binds each xi to the corresponding ai , to
obtain the value of the application expression.
When call by name is used, the same expressions are evaluated in the same environments.
But the evaluations of the operands r1 , . . . , rn will occur at a later time and may occur a
different number of times. Specifically, instead of being evaluated before step (3), each
operand ri is repeatedly evaluated during step (3), each time that its value ai is actually
used (as a function to be applied, a Boolean value determining a branch, or an argument of
a primitive operation).
At first sight, since the evaluation of the same expression in the same environment al-
ways produces the same effect, it would appear that the result of a program in a purely
applicative language should be unaffected by changing the order of application (although
it is evident that the repeated evaluation of operands occurring with call by name can be
grossly inefficient). But this overlooks the possibility that “repeatedly” may mean “never”.
During step (3) of the evaluation of r0 (r1 , . . . , rn ), it may happen that certain arguments
ai are never used, so that the corresponding operands ri will never be evaluated under call
by name. Now suppose that the evaluation of one of these ri never terminates (or gives an
error stop). Then the evaluation of the original application expression will terminate under
call by name but not call by value. In brief, changing the order of application can affect the
value of an application expression when the function being applied is independent of some
of its arguments and the corresponding operands are nonterminating.
(In ALGOL the distinction between call by value and call by name also involves a change
in “coercion conventions”. However, this change is irrelevant in the absence of assignment.)
In the defined language, we will consider only the use of call by value, but in the defin-
ing language we will consider both orders of application. In particular, we will inquire
whether the above-described situation occurs in our interpreters, so that changing the order
of application in the defining language can affect the meaning of the defined language.
We now introduce some additional kinds of expressions. If rp , rc and ra are expressions,
then if rp then rc else ra is a simple conditional expression, whose premiss is rp , whose
conclusion is rc , and whose alternative is ra . The evaluation of a conditional expression
in an environment e begins with the evaluation of its premiss rp in the same environment.
Then, depending upon whether the value of the premiss is true or false, the value of the
conditional expression is obtained by evaluating either the conclusion rc or the alternative
ra in e. Any other value of the premiss causes an error stop.
It is also convenient to use a LISP-like notation for “multiple” conditional expressions.
If rp1 , . . . , rpn and rc1 , . . . , rcn are expressions, then
is a multiple conditional expression, with the same meaning as the following sequence of
simple conditional expressions:
DEFINITIONAL INTERPRETERS 369
if rp1 then rc1 else if rp2 then rc2 else · · · if rpn then rcn else error.
Next, we introduce a form of expression (due to Landin [7]) that is analogous to the block
in ALGOL. If x1 , . . . , xn are variables, and r1 , . . . , rn and rb are expressions, then
would create an extended environment in which f was bound to a recursive function (for
computing the factorial). But in fact, the occurrence of f inside the declaring expression
will not “feel” the binding of f to the value of the declaring expression, so that the resulting
function will not call itself recursively.
To overcome this problem, we introduce a second kind of block-like expression. If
x1 , . . . , xn are variables, `1 , . . . , `n are lambda expressions, and rb is an expression, then
the Boolean values true and false). However, it is evident that our language must contain
basic (i.e., built-in) operations and tests for manipulating this data. For example, if integers
are to occur as data, we will need at least an incrementing operation and a test for integer
equality. More likely, we will want all of the usual arithmetic operations and tests. If some
form of structured data is to be used, we will need operations for constructing and analyzing
the structures, and tests for classifying them.
Regardless of the specific nature of the data, there are three ways to introduce basic
operations and tests into our applicative language:
1. We may introduce constants denoting the basic functions (whose application will per-
form the basic operations and tests).
2. We may introduce predefined variables denoting the basic functions. These variables
differ from constants in that the programmer can redefine them with his own decla-
rations. They are specified by introducing an initial environment, to be used for the
evaluation of the entire program, that binds the predefined variables to their functional
values.
3. We may introduce special expressions whose evaluation will perform the basic oper-
ations and tests. Since this approach is used in most programming languages (and in
mathematical notation), we will frequently use the common forms of arithmetic and
Boolean expressions without explanation.
Although our defining language will use all of the features described in the previous section,
along with appropriate basic operations and tests, the defined language will be considerably
more limited, in order to avoid complications that would be out of place in an introductory
paper. Specifically:
1. Functions will be limited to a single argument. Thus all applicative expressions will
have a single operand, and all lambda expressions will have a single formal parameter.
2. Only call by value will be used.
3. Only simple conditional expressions will be used.
4. Nonrecursive let expressions will be excluded.
5. All recursive let expressions will contain a single declaration.
6. Values will be integers, booleans, and functions. The only basic operations and tests
will be functions for incrementing integers and for testing integer equality, denoted by
the predefined variables succ and equal, respectively.
The reader may accept an assurance that these limitations will eliminate a variety of
tedious complications without evading any intellectually significant problems. Indeed,
with slight exceptions, the eliminated features can be regarded as syntactic sugar, i.e., they
can be defined as abbreviations for expressions in the restricted language [7, 4].
DEFINITIONAL INTERPRETERS 371
4. Abstract Syntax
We now turn our attention to the defining language. To permit the writing of interpreters, the
values used in the defining language must include expressions of the defined language. At
first sight, this suggests that we should use character strings as values denoting expressions,
but this approach would enmesh us in questions of grammar and parsing that are beyond the
scope of this paper. (An excellent review of these matters is contained in Reference [23].)
Instead, we use the approach of abstract syntax, originally suggested by McCarthy [24].
In this approach, it is assumed that programs are “really” abstract, hierarchically structured
data objects, and that the character strings that one actually reads into the computer are
simply representations of these abstract objects (in the same sense that digit strings are
representations of integers). Thus the problems of grammar and parsing can be set aside as
“input editing”. (Of course, this does not eliminate these problems, but it separates them
clearly from semantic considerations. See, for example, Wozencraft and Evans [25].)
We are left with two closely related problems: how to define sets of abstract expressions
(and other structured data to be used by the interpreters), and how to define the basic
functions for constructing, analyzing, and classifying these objects. Both problems are
solved by introducing three forms of abstract-syntax equations. (A more elaborate defined
language would require a more complex treatment of abstract syntax, as given in Reference
[18], for example.) Within these equations, upper-case letter strings denote sets, and lower-
case letter strings denote basic functions.
Let S0 , S1 , . . . , Sn be upper-case letter strings and a1 , . . . , an be lowercase letter strings.
Then a record equation of the form
S0 = [a1 : S1 , . . . , an : Sn ]
implies that:
1. S0 is a set, disjoint from any other set defined by a record equation, whose members
are records with n fields in which the value of the ith field belongs to the set Si .
(Mathematically, S0 is a disjoint set in one-to-one correspondence with the Cartesian
product S1 × · · · × Sn .)
2. Each ai (is a predefined variable which) denotes the selector function that accepts a
member of S0 and produces its ith field value.
3. Let s0 be the string obtained from S0 by lowering the case of each character. Then s0 ?
denotes the classifier function that tests whether its argument belong to S0 , and mk-s0
denotes the constructor function of n arguments (belonging to the sets S1 , . . . , Sn ) that
creates a record in S0 from its field values.
For example, the record equation
implies that an application expression (i.e., a member of APPL) is a two-field record whose
field values are both expressions (i.e., members of EXP). It also implies that opr and
opnd are selector functions that produce the first and second field values of an application
372 REYNOLDS
expression, that appl? is a classifier function that tests whether a value is an application
expression, and that mk-appl is a two-argument constructor function that constructs an
application expression from its field values. It is evident that if r1 and r2 are expressions,
¡ ¢
opr mk-appl(r1 , r2 ) = r1
¡ ¢
opnd mk-appl(r1 , r2 ) = r2 ,
The remaining forms of abstract syntax equations are the union equation:
S0 = S1 ∪ · · · ∪ Sn ,
which implies that S0 is the union of sets S1 , . . . , Sn , and the function equation:
S0 = S1 , . . . , Sn → Sr ,
which implies that S0 is the set of n-argument functions that accept arguments in S1 , . . . , Sn
and produce results in Sr . (More precisely, S0 is the set of n-argument functions f with
the property that if f is applied to arguments in the sets S1 , . . . , Sn , and if f terminates
without an error stop, then the result of f belongs to Sr .)
We may now use these forms of abstract syntax equations to define the principal set of
data used by our interpreters, i.e., the set EXP of expressions of the defined language:
6. A recursive let expression (a member of LETREC), which consists of a variable called its
declared variable (selected by dvar), a lambda expression called its declaring expression
(selected by dexp), and an expression called its body (selected by body).
We have purposely left the sets CONST and VAR unspecified. For CONST, we will
assume only that there is a basic function const? which tests whether its argument is a
constant, and a basic function evcon which maps each constant into the value that it denotes.
For VAR, we will assume that there is a basic function var? which tests whether its argument
is a variable, that variables can be tested for equality (of the variables themselves, not their
values), and that two particular variables are denoted by the quoted strings “succ” and
“equal”.
We must also define the abstract syntax of two other data sets that will be used by our
interpreter. The first is the set VAL of values of the defined language:
Within the various interpreters that we will present, each variable will range over some
set defined by abstract syntax equations. For clarity, we will use different variables for
different sets, as summarized in the following table:
5. A Meta-Circular Interpreter
In the last line we have used a trick called Currying (after the logician H. Curry) to
solve the problem of introducing a binary operation into a language where all functions
must accept a single argument. (The referee comments that although “Currying” is tastier,
“Schönfinkeling” might be more accurate.) In the defined language, equal is a function
which accepts a single argument a and returns another function, which in turn accepts a
single argument b and returns true or false depending upon whether a = b. Thus in the
defined language, one would write (equal(a))(b) instead of equal(a, b).
DEFINITIONAL INTERPRETERS 375
1. The meta-circular interpreter does not shed much light on the nature of higher-order
functions. For this purpose, we would prefer an interpreter of a higher-order defined
language that was written in a first-order defining language.
2. Changing the order of application used in the defining language induces a similar change
in the defined language. To see this, suppose that eval is applied to an application
expression r0 (r1 ) of the defined language. Then the result of eval will be obtained by
evaluating the application expression (line I.4)
¡ ¢¡ ¢
eval(r0 , e) eval(r1 , e)
in the defining language. If call by value is used in the defining language, then eval(r1 , e)
will be evaluated before the functional value of eval(r0 , e) is applied. But evaluating
eval(r1 , e) interprets the evaluation of r1 , and applying the value of eval(r0 , e) interprets
the application of the value of r0 . Thus in terms of the defined language, r1 will be
evaluated before the value of r0 is applied, i.e., call by value will be used in the defined
language.
On the other hand, if call by name is used in the defining language, then the application
of the functional value of eval(r0 , e) will begin as soon as eval(r0 , e) has been evaluated,
and the operand eval(r1 , e) will only be evaluated when and if the function being applied
depends upon its value. In terms of the defined language, the application of the value of
r0 will begin as soon as r0 has been evaluated, and the operand r1 will only be evaluated
376 REYNOLDS
when and if the function being applied depends upon its value, i.e., call by name will
be used in the defined language.
3. Suppose we wish to extend the defined language by introducing the imperative features
of labels and jumps (including jumps out of blocks). As far as is known, it is impossible
to extend the meta-circular definition straightforwardly to accommodate these features
(without introducing similar features into the defining language).
Our first task is to modify the meta-circular interpreter so that none of the functions that
comprise this interpreter accept arguments or produce results that are functions. An exam-
ination of the abstract syntax shows that this goal will be met if we can replace the two sets
FUNVAL and ENV by sets of values that are not functions. Specifically, the new members
of these sets will be records that represent functions.
We first consider the set FUNVAL. Since the new members of this set are to be records
rather than functions, we can no longer apply these members directly to arguments. Instead
we will introduce a new function apply that will “interpret” the new members of FUNVAL.
Specifically, if fnew is a record in FUNVAL that represents a function fold and if a is any
member of VAL, then apply(fnew , a) will produce the same result as fold (a). Assuming
for the moment that we will be able to define apply, we must replace each application of
a member of FUNVAL (to an argument a) by an application of apply (to the member of
FUNVAL and the argument a). In fact, the only such application occurs in line I.4, which
must become
¡ ¢
appl?(r) → apply eval(opr(r), e), eval(opnd(r), e) . I.40
To decide upon the form of the new members of FUNVAL, we recall that whenever a
function is obtained by evaluating a lambda expression, the function will be determined
by two items of information: (1) the lambda expression itself, and (2) the values that were
bound to the global variables of the lambda expression at the time of its evaluation. It is
evident that these items of information will be sufficient to represent the function. This
suggests that the new set FUNVAL should be a union of disjoint sets of records, one set
for each lambda expression whose value belonged to the old FUNVAL, and that the fields
of each record should contain values of the global variables of the corresponding lambda
expression.
In fact, the meta-circular interpreter contains four lambda expressions (indicated by solid
underlining) that produce members of FUNVAL. The following table gives their locations
and global variables, and the equations defining the new sets of records that will represent
DEFINITIONAL INTERPRETERS 377
their values. (The connotations of the set and selector names we have chosen will become
apparent when we discuss the role of these entities in the interpretation of the defined
language.)
Our remaining task is to replace each of the four solidly underlined lambda expressions
by appropriate record-creation operations, and to insert expressions in the branches of apply
that will interpret the corresponding records. The lambda expression in line I.11 must be
replaced by an expression that creates a CLOSR-record containing the value of the global
variables ` and e:
evlambda = λ(`, e). mk-closr(`, e). I.110
Now apply(f, a) must produce the result of applying the function represented by f to
the argument a. When f is a CLOSR-record, this result may be obtained by evaluating the
body
¡ ¢
eval body(`), ext(fp(`), a, e)
(In this particular case, but not in general, the declaration a = a is unnecessary, since the
formal parameter of the replaced lambda expression and the second formal parameter of
apply are the same variable. From now on, we will omit such vacuous declarations.)
A similar treatment (somewhat simplified since there are no global variables) of the
lambda expression in I.14 and the outer lambda expression in I.15 gives:
¡
initenv = λx. x = “succ” → mk-sc(), I.140
... ¢
x = “equal” → mk-eq1() I.150
...
and
get(enew , x) = eold (x). Applications of get must be inserted at the three points (in lines
I.3, I.9, and I.12) in the interpreter where environments are applied to variables:
Next, there are three lambda expressions that produce environments; they are indicated by
broken underlining which we have carefully preserved during the previous transformations.
The following table gives their locations and global variables, and the equations defining
the new sets of records that will represent their values:
But now we are faced with a new problem. By eliminating the lambda expression in I.90 ,
we have created a recursive let expression
380 REYNOLDS
letrec e0 = mk-rec(r, e, e0 ) · · ·
that violates the structure of the defining language, since its declaring subexpression is no
longer a lambda expression. However, there is still an obvious intuitive interpretation of
this illicit construction: it binds e0 to a “cyclic” record, whose last field is (a pointer to) the
record itself.
If we accept this interpretation, then whenever e is a member of REC, we will have
new(e) = e. This allows us to replace the only occurrence of new(e) by e, so that the
penultimate line of get becomes:
But now our program no longer contains any references to the cyclic new fields, so that
these fields can be deleted from the records in REC. Thus the record equation for REC is
reduced to:
At this point, once we have collected the bits and pieces produced by the various trans-
formations, we will have obtained an interpreter that no longer contains any higher-order
functions. However, it is convenient to make a few simplications:
1. let expressions can be eliminated by substituting the declaring expressions for each
occurrence of the corresponding declared variables in the body.
2. Line I.110 can be eliminated by replacing occurrences of evlambda by mk-closr.
3. Line I.1200 can be eliminated by replacing occurrences of ext by mk-simp.
4. Lines I.1400 -1500 can be eliminated by replacing occurrences of initenv by mk-init().
¡ ¢
interpret = λr. eval r, mk-init() II.1
eval = λ(r, e). II.2
¡
const?(r) → evcon(r), II.3
var?(r) → get(e, r), II.4
¡ ¢
appl?(r) → apply eval(opr(r), e), eval(opnd(r), e) , II.5
lambda?(r) → mk-closr(r, e), II.6
cond?(r) → if eval(prem(r), e) II.7
then eval(conc(r), e) else eval(altr(r), e), II.8
¢
letrec?(r) → eval(body(r), mk-rec(r, e)) II.9
apply = λ(f, a). II.10
¡
closr?(f ) → II.11
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )) , II.12
sc?(f ) → succ(a), II.13
eq1?(f ) → mk-eq2(a), II.14
¢
eq2?(f ) → equal(arg1(f ), a) II.15
get = λ(e, x). II.16
¡ ¡ ¢
init?(e) → x = “succ” → mk-sc(), x = “equal” → mk-eq1() , II.17
simp?(e) → if x = bvar(e) then bval(e) else get(old(e), x), II.18
rec?(e) → if x = dvar(letx(e)) II.19
¢
then mk-closr(dexp(letx(e)), e) else get(old(e), x) . II.20
Just as with FUNVAL, we may examine the different kinds of records in ENV with regard
to their role in the interpretation of the defined language. The unique record in INIT has
no subfields, while the records in SIMP and REC each have one field (selected by old) that
is another member of ENV. Thus environments in our second interpreter are linear lists (in
which each element specifies the binding of a single variable), and the unique record in
INIT serves as the empty list.
It is easily seen that get(e, x) searches such a list to find the binding of the variable x.
When get encounters a record in SIMP, it compares x with the bvar field, and if a match
occurs, it returns the value stored in the bval field. When get encounters a record in REC,
it compares x with dvar(letx(e)) (the declared variable of the recursive let expression
that created the binding), and if a match occurs, it returns the value obtained by evaluating
dexp(letx(e)) (the declaring subexpression of the same recursive let expression) in the
environment e. The fact that e includes the very binding that is being “looked up” reflects
the essential recursive characteristic that the declaring subexpression should “feel” the effect
of the declaration in which it is embedded. When get encounters the empty list, it compares
x with each of the predefined variables, and if a match is found, it returns the appropriate
value.
The definition of get reveals the consequences of our restricting recursive let expressions
by requiring that their declaring subexpressions should be lambda expressions. Because of
this restriction, the declaring subexpressions are always evaluated by the trivial operation
of forming a closure. Therefore, the function get always terminates, since it never calls any
other recursive function, and can never call itself more times than the length of the list that
382 REYNOLDS
it is searching. (On the other hand, if we had permitted arbitrary declaring subexpressions,
line II.20 would contain eval(dexp(letx(e)), e) instead of mk-closr(dexp(letx(e)), e).
This seemingly slight modification would convert get into a function that might run on
forever, as for example, when looking up the variable k in an environment created by the
defined-language construction letrec k = k + 1 in · · · .)
The second interpreter is similar in style, and in many details, to McCarthy’s definition of
LISP [1]. The main differences arise from our insistence upon FUNARG binding, the use
of recursive let expressions instead of label expressions, and the use of predefined variables
instead of variables with flagged property lists.
7. Continuations
The transition from the meta-circular interpreter to our second interpreter has not elimi-
nated order-of-application dependence. It can easily be seen that a change in the order of
application used in the defining-language expression (in II.5)
¡ ¢
apply eval(opr(r), e), eval(opnd(r), e)
will cause a similar change for all application expressions of the defined language.
To eliminate this dependence, we must first identify the circumstances under which an
arbitrary program in the defining language will be affected by the order of application. The
essential effect of switching from call by value to call by name is to postpone the evaluation
of the operands of application expressions (and declaring subexpressions of let expressions),
and to alter the number of times these operands are evaluated. We have already seen that in
a purely applicative language, the only way in which this change can affect the meaning of
a program is to avoid the evaluation of a nonterminating operand. Now suppose we define
an expression to be serious if there is any possibility that its evaluation might not terminate.
Then a sufficient condition for order-of-application independence is that a program should
contain no serious operands or declaring expressions.
Next, suppose that we can divide the functions that may be applied by our program into
serious functions, whose application may sometimes run on forever, and trivial functions,
whose application will always terminate. (Of course, it is well-known that one cannot
effectively decide whether an arbitrary function will always terminate, but one can still
establish this classification in a “fail-safe” manner, i.e., classify a function as serious unless
it can be shown to terminate for all arguments.) Then an expression will only be serious
if its evaluation can cause the application of a serious function, and a program will be
independent of order-of-application if no operand or declaring expression can cause such
an application.
At first sight, this condition appears to be so restrictive that it could not be met in a
nontrivial program. As can be seen with a little thought, the condition implies that whenever
some function calls a serious function, the calling function must return the same result as
the called function, without performing any further computation. But any function that
calls a serious function must be serious itself. Thus by induction, as soon as any serious
function returns a result, every function must immediately return the same result, which
must therefore be the final result of the entire program.
DEFINITIONAL INTERPRETERS 383
Nevertheless, there is a method for transforming an arbitrary program into one that meets
our apparently restrictive condition. The underlying idea has appeared in a variety of
contexts [26, 27, 28], but its application to definitional interpreters is due to L. Morris
[20] and Wadsworth. Basically, one replaces each serious function fold (except the main
program) by a new serious function fnew that accepts an additional argument c called a
continuation. The continuation will be a function itself, and fnew is expected to compute
the same result as fold , apply the continuation to this result, and then return the result of
the continuation, i.e.,
¡ ¢
fnew (x1 , . . . , xn , c) = c fold (x1 , . . . , xn ) .
(In a more complicated interpreter in which different serious functions produced different
kinds of results, we would introduce different kinds of continuations.)
The overall form of our transformed interpreter will be:
Note that the “main level” call of eval by interpret provides an identity function as the
initial continuation.
We must now alter each branch of eval and apply to apply the continuation c to the
former results of these functions. In lines II.3, 4, 6, 13, 14, and 15, the branches evaluate
expressions which are not serious, and which are therefore permissible operands. Thus in
these cases, we may simply apply the continuation c to each expression:
384 REYNOLDS
In lines II.9 and II.12, the branches evaluate expressions that are serious themselves
but contain no serious operands. By themselves, these expressions are permissible, but
they must not be used as operands in applications of the continuation. The solution is
straightforward; instead of applying the continuation c to the result of eval, we pass c as an
argument to eval, i.e., we “instruct” eval to apply c before returning its result:
¢
letrec?(r) → eval(body(r), mk-rec(r, e), c) II.90
..
.
¡
closr?(f ) → II.110
¡ ¢
eval body(lam(f )), mk-simp(fp(lam(f )), a, en(f )), c . II.120
The most complex part of our transformation occurs in the branch of eval that evaluates
application expressions in line II.5. Here we must perform four serious operations:
Moreover, we must specify explicitly that these operations are to be done in the above order.
This will insure that the defined language uses call by value, and also that the subexpressions
of an application expression are evaluated from left to right (operator before operand).
The solution is to call eval to perform operation (1), to give this call of eval a continuation
that will call eval to perform operation (2), to give the second call of eval a continuation that
will call apply to perform (3), and to give apply a continuation (the original continuation c)
that will perform (4). Thus we have:
¡ ¡ ¢¢
appl?(r) → eval opr(r), e, λf. eval opnd(r), e, λa. apply(f, a, c) . II.50
A similar approach handles the branch that evaluates conditional expressions in lines II.7
and 8. Here there are three serious operations to be performed successively:
DEFINITIONAL INTERPRETERS 385
At this stage, since continuations are functional arguments, we have achieved order-of-
application independence at the price of re-introducing higher-order functions. Fortunately,
we can now “defunctionalize” the set CONT in the same way as FUNVAL and ENV. To
interpret the new members of CONT we introduce a function cont such that if cnew represents
the continuation cold and a is a member of VAL then cont(cnew , a) = cold (a). The
application of cont must be introduced at each point in eval and apply where a continuation
is applied to a value, i.e., in lines II.30 , 40 , 60 , 130 , 140 , and 150 .
There are four lambda expressions, indicated by solid underlining, that create continu-
ations. The following table gives their locations and global variables, and the equations
defining the new sets of records that will represent their values:
386 REYNOLDS
From their abstract syntax, it is evident that continuations in our third interpreter are linear
lists, with the unique record in FIN acting as the empty list, and the next fields in the other
records acting as link fields. In effect, a continuation is a list of instructions to be interpreted
by the function cont. Each instruction accepts a “current value” (the second argument of
cont) and produces a new value that will be given to the next instruction. The following list
gives approximate meanings for each type of instruction:
FIN: The current value is the final value of the program. Halt.
EVOPN: The current value is the value of an operator. Evaluate the operand of the appli-
cation expression in the ap field, using the environment in the en field. Then obtain a
new value by applying the current value to the value of the operand.
APFUN: The current value is the value of an operand. Obtain a new value by applying the
function stored in the fun field to the current value.
BRANCH: The current value is the value of a premiss. If it is true (false) obtain a new
value by evaluating the conclusion (alternative) of the conditional expression stored in
the cn field, using the environment in the en field.
Each of the three serious functions, eval, apply, and cont, does a branch on the form of
its first argument, performs trivial operations such as field selection, record creation, and
environment lookup, and then calls another serious function. Thus our third interpreter
is actually a state-transition machine, whose states each consist of the name of a serious
function plus a list of its arguments.
This interpreter is similar in style to Landin’s SECD machine [7], though there is consid-
erable difference in detailed mechanisms. (Very roughly, one can construct the continuation
by merging Landin’s stack and control and concatenating this merged stack with the dump.)
In transforming Interpreter I into Interpreter III, we have moved from a concise, abstract
definition to a more complex machine-like one. If clarity consists of the avoidance of
subtle characteristics of the defining language, then Interpreter III is certainly clearer than
Interpreter I. But if clarity consists of conciseness and the absence of unnecessary com-
plexity, then the reverse is true. The machine-like character of Interpreter III includes a
variety of “cogs and wheels” that are quite arbitrary, i.e., one can easily construct equivalent
interpreters (such as the SECD machine) with different cogs and wheels.
In fact, these “cogs and wheels” were introduced when we defunctionalized the sets
FUNVAL, ENV, and CONT, since we replaced the functions in these sets by representations
that were correct, but not unique. Had we chosen different representations, we would have
obtained an equivalent but quite different interpreter.
This suggests the desirability of retaining the use of higher-order functions, providing
these entities can be given a mathematically rigorous definition that is independent of any
388 REYNOLDS
¡ ¡ ¢¢
appl?(r) → eval opr(r), e, λf. eval opnd(r), e, λa. c(f (a)) .
replacing each function fold by an fnew such that fnew (a, c) = c(fold (a)). This allows
us to replace the order-dependent expression c(f (a)) by the order-independent expression
f (a, c). Of course, we must add continuations as an extra formal parameter to each lambda
expression that creates a member of FUNVAL.
(A similar modification of the functions in ENV is unnecessary, since it can be shown that
the functions in this set always terminate. Just as with get, this depends on the exclusion of
recursive let expressions with arbitrary declaring subexpressions.)
Once the necessity of altering FUNVAL has been realized, the transformation of Inter-
preter I follows the basic lines described in the previous section. We omit the details and
state the final result:
DEFINITIONAL INTERPRETERS 389
This is basically the form of interpreter devised by L. Morris [20] and Wadsworth. It is
almost as concise as the meta-circular interpreter, yet it offers the advantages of order-of-
application independence and, as we will see in the next section, extensibility to accommo-
date imperative control features.
(The zealous reader may wish to verify that defunctionalization and the introduction of
continuations are commutative, i.e., by replacing FUNVAL, ENV, and CONT by appropriate
nonfunctional representations, one can transform Interpreter IV into Interpreter III.)
9. Escape Expressions
We now turn to the problem of adding imperative features to the defined language (while
keeping the defining language purely applicative). These features may be divided into two
classes:
2. Assignment.
jumps, and that significantly enhances the power of a language without assignment. The
specific mechanism that he introduced was called a J-operator, but in this paper we will
develop a slightly simpler mechanism called an escape expression.
If (in the defined language) x is a variable and r is an expression, then
escape x in r
is an escape expression, whose escape variable is x and whose body is r. The evaluation
of an escape expression in an environment e proceeds as follows:
1. The body r is evaluated in the environment that is the extension of e that binds x to a
function called the escape function.
2. If the escape function is never applied during the evaluation of r, then the value of r
becomes the value of the escape expression.
3. If the escape function is applied to an argument a, then the evaluation of the body r is
aborted, and a immediately becomes the value of the escape expression.
Essentially, an escape function is a kind of label, and its application is a kind of jump. The
greater generality lies in the ability to pass arguments while jumping.
(Landin’s J-operator can be defined in terms of the escape expression by regarding let g =
J λx. r1 in r0 as an abbreviation for escape h in let g = λx. h(r1 ) in r0 , where h is
a new variable not occurring in r0 or r1 . Conversely, one can regard escape g in r as an
abbreviation for let g = J λx. x in r.)
In order to extend our interpreters to handle escape expressions, we begin by extending
the abstract syntax of expressions appropriately:
EXP = . . . ∪ ESCP
ESCP = [escv: VAR, body: EXP].
It is evident that in each interpreter we must add a branch to eval that evaluates the new
kind of expression.
First consider Interpreter IV. Since an escape expression is evaluated by evaluating its
body in an extended environment that binds the escape variable to the escape function, and
since the escape function must be represented by a member of the set FUNVAL = VAL,
CONT → VAL, we have
¡
eval = λ(r, e, c). . . . ,
¡ ¢¢
escp?(r) → eval body(r), ext(escv(r), λ(a, c0 ). . . . , e), c ,
where the value of λ(a, c0 ). . . . must be the member of FUNVAL representing the escape
function.
DEFINITIONAL INTERPRETERS 391
Since eval is a serious function, its result, which is obtained by applying the continuation
c to the value of the escape expression, must be the final result of the entire program being
interpreted. This means that c itself must be a function that will accept the value of the
escape expression and carry out the interpretation of the remainder of the program. But the
member of FUNVAL representing the escape function is also serious, and must therefore
also produce the final result of the entire program. Thus to abort the evaluation of the body
and treat the argument a as the value of the escape expression, it is only necessary for the
escape function ignore its own continuation c0 , and to apply the higher-level continuation c
to a. Thus we have:
¡
eval = λ(r, e, c). . . . ,
¡ ¢¢
escp?(r) → eval body(r), ext(escv(r), λ(a, c0 ). c(a), e), c .
The extension of Interpreter III is essentially similar. In this case, we must add to the set
FUNVAL a new kind of record that represents escape functions:
FUNVAL = . . . ∪ ESCF
ESCF = [cn: CONT].
From the viewpoint of this interpreter, it is clear that the escape expression is a signif-
icant extension of the defined language, since it introduces the possibility of embedding
continuations in values.
(The reader should be warned that either of the above interpreters is a more precise
definition of the escape expression than the informal English description given beforehand.
For example, it is possible that the evaluation of the body of an escape expression may
not cause the application of the escape function, but may produce the escape function (or
some function that can call the escape function) as its value. It is difficult to infer the
consequences of such a situation from our informal description, but it is precisely defined
by either of the interpreters. In fact, the possibility that an escape function may propagate
outside of the expression that created it is a powerful facility that can be used to construct
control-flow mechanisms such as coroutines and nondeterministic algorithms.)
When we consider Interpreters I and II, we find an entirely different situation. The ability
to “jump” by switching continuations is no longer possible. An escape function must still be
represented by a member of FUNVAL, but now this implies that, if the function terminates
without an error stop, then its result must become the value of the application expression
that applied the function. As far as is known, there is no way to define the escape expression
392 REYNOLDS
1. In the next section we will introduce assignment in such a way that assignments can
be executed during the evaluation of expressions. In this situation it is unnecessary to
make a semantic distinction between expressions and statements; any statement can be
regarded as an expression whose evaluation produces a dummy value.
2. A label-free sequence of statements s1 ; · · · ; sn can be regarded as an abbreviation for
the expression
¡ ¡ ¢ ¢
· · · (λx1 . . . . λxn . xn )(s1 ) · · · (sn ) .
The effect is to evaluate the statements sequentially from left to right, ignoring the value
of all but the last.
3. If s0 , . . . , sn are label-free statement sequences, and `1 , . . . , `n are labels, then a block
of the form
begin s0 , `1 : s1 ; · · · ; `n : sn end
(where g and x are new variables not occurring in the original block). The effect is
that each label denotes a function that ignores its argument, evaluates the appropriate
sequence of statements, and then escapes out of the enclosing block.
4. An expression of the form goto r can be regarded as an abbreviation for r(0), i.e., a
jump to a label becomes an application of the function denoted by the label to a dummy
argument.
10. Assignment
Although the basic concept of assignment is well understood by any competent programmer,
a surprising degree of care is needed to combine this concept with the language features
we have discussed previously. Intuitively, the notion of assignment presupposes that the
DEFINITIONAL INTERPRETERS 393
operations that are performed during the evaluation of a program will occur in a definite
temporal order. Some of these operations will assign values to “variables”. Other operations
may be affected by these assignments; specifically, an operation may depend upon the value
most recently assigned to each “variable”, which we will call the value currently possessed
by the “variable”.
This suggests that for each instant during program execution, there should be an entity
which specifies the set of “variables” that are present and the values that they currently
possess. We will call such an entity a memory, and denote the set of possible memories by
MEM.
The main subtlety is to realize that the “variables” discussed here are distinct from the
variables used in previous sections. This is necessitated by the fact that most programming
languages permit situations (such as might arise from the use of “call by address”) in which
several variables denote the same “variable”, in the sense that assignment to one of them
will change the value possessed by all. This suggests that a “variable” is actually a new
kind of object to which a variable can be bound. Henceforth, we will call these new objects
references rather than “variables”. (Other terms used commonly in the literature are L-value
and name.) We will denote the set of references by REF.
Abstractly, the nature of references and memories can be characterized by specifying an
initial memory and four functions:
augment(m, a): Produces a memory containing the new reference nextref (m) plus the
references already in m. The new reference possesses the value a, while the remaining
references possess the same values as in m.
update(m, rf , a): Produces a memory containing the same references as m. The refer-
ence rf (assuming it is present) possesses the value a, while the remaining references
possess the same value as in m.
Our next task is to introduce memories into our interpreters. Although any of our inter-
preters could be so extended, we will only consider Interpreter IV.
It is evident that the operation of evaluating a defined-language expression will now
depend upon a memory m and will produce a (possibly) altered memory m0 . Thus the
function eval will accept m as an additional argument. However, because of the use of
continuations, m0 will not be part of the result of eval. Instead, m0 will be passed on as an
additional argument to the continuation that is applied by eval to perform the remainder of
program execution.
In a similar manner, the application of a defined-language function will depend upon and
produce memories. Thus each function in the set FUNVAL will accept a memory as an
additional argument, and will also pass on a memory to its continuation.
On the other hand, there are particular kinds of expressions, specifically constants, vari-
ables, and lambda expressions, whose evaluation cannot cause assignments. For this reason,
the functions evcon and evlambda, and the functions in the set ENV, will not accept or pro-
duce memories.
These considerations lead to the following interpreter, in which memories propagate
through the various operations in a manner that correctly reflects the temporal order of
execution:
At this stage, although we have “threaded” memories through the operations of our
interpreter, we have not yet introduced references, nor any operations that alter or depend
upon memories. To proceed further, however, we must distinguish between two approaches
to assignment, each of which characterizes certain programming languages.
In the “L-value” approach, in each context of the evaluation process where a value would
occur, a reference (i.e., L-value) possessing that value occurs instead. Thus, for example,
expressions evaluate to references, functional arguments and results are references, and
environments bind variables to references. (In richer languages, references would occur
instead of values in still other contexts, such as array elements.) This approach is used in the
languages PAL [3] and ISWIM [2], and in somewhat modified form (i.e., references always
occur in certain kinds of contexts, while values always occur in others) in such languages
as FORTRAN, ALGOL 60, and PL/I. Its formalization is due to Strachey [30], and is used
extensively in the Vienna definition of PL/I [18].
In the “reference” approach, references are introduced as a new kind of value, so that
either references or “normal” values can occur in any meaningful context. This approach
is used in ALGOL 68 [31], BASIL [32] and GEDANKEN [4].
The relative merits of these approaches are discussed briefly in Reference [4]. Although
either approach can be accommodated by the various styles of interpreter discussed in
this paper, we will limit ourselves to incorporating the reference approach into the above
extension of Interpreter IV. We first augment the set of values appropriately:
VAL = INTEGER ∪ BOOLEAN ∪ FUNVAL ∪ REF.
Next we introduce basic operations for creating, assigning, and evaluating references.
For simplicity, we will make these operations basic functions, denoted by the predefined
variables ref, set, and val. The following is an informal description:
ref (a): Accepts a value a and returns a new reference initialized to possess a.
(set(rf ))(a): Accepts a reference rf and a value a. The value a is assigned to rf and also
returned as the result. (Because of our restriction to functions of a single argument, this
function is Curried, i.e., set accepts rf and returns a function that accepts a.)
val(rf ): Accepts a reference rf and returns its currently possessed value.
To introduce these new functions into our interpreter, we extend the initial environment
as follows:
¡
initenv = λx. · · ·
x = “ref” → λ(a, m, c). c(augment(m, a), nextref (m)),
¡ ¢
x = “set” → λ(rf , m, c). c m, λ(a, m0 , c0 ). c0 (update(m0 , rf , a), a) ,
¢
x = “val” → λ(rf , m, c). c(m, lookup(m, rf )) .
The main shortcoming of the reference approach is the incessant necessity of using the
function val. This problem can be alleviated by introducing coercion conventions, as
discussed in Reference [4], that cause references to be replaced by their possessed values
in appropriate contexts. However, since these conventions can be treated as abbreviations,
they do not affect the basic structure of the definitional interpreters.
396 REYNOLDS
Within this paper we have tried to present a systematic, self-contained, and reasonably
complete description of the current state of the art of definitional interpreters. We conclude
with a brief (and hopeful) list of possible future developments:
1. It would still be very desirable to be able to define higher-order languages logically rather
than interpretively, particularly if such an approach can lead to practical correctness
proofs for programs. A major step in this direction, based on the work of Scott [12, 13,
14, 15], has been taken by R. Milner [16]. However, Milner’s work essentially treats a
language using call by name rather than call by value.
2. It should be possible to treat languages with multiprocessing features, or other features
that involve “controlled ambiguity”. An initial step is the work of the IBM Vienna
Laboratory [18], using a nondeterministic state-transition machine.
3. It should also be possible to define languages, such as ALGOL 68 [31], with a highly
refined syntactic type structure. Ideally, such a treatment should be meta-circular, in
the sense that the type structure used in the defined language should be adequate for the
defining language.
4. The conciseness of definitional interpreters makes them powerful tools for language
design, particularly when one wishes to add new capabilities to a language with a
minimum of increased complexity. Of particular interest (at least to the author) are the
problems of devising better type systems and of generalizing assignment (for example,
by permitting memories to be embedded in values.)
References
1. McCarthy, John. Recursive functions of symbolic expressions and their computation by machine, part I.
Communications of the ACM, 3(4):184–195, April 1960.
2. Landin, Peter J. The next 700 programming languages. Communications of the ACM, 9(3):157–166, March
1966.
3. Evans, Jr., Arthur. PAL – A language designed for teaching programming linguistics. In Proceedings of
23rd National ACM Conference, pages 395–403. Brandin/Systems Press, Princeton, New Jersey, 1968.
4. Reynolds, John C. GEDANKEN – A simple typeless language based on the principle of completeness and
the reference concept. Communications of the ACM, 13(5):308–319, May 1970.
5. Church, Alonzo. The Calculi of Lambda-Conversion, volume 6 of Annals of Mathematics Studies. Princeton
University Press, Princeton, New Jersey, 1941.
6. Curry, Haskell Brookes and Feys, Robert. Combinatory Logic, Volume 1. Studies in Logic and the Founda-
tions of Mathematics. North-Holland, Amsterdam, 1958. Second printing 1968.
7. Landin, Peter J. A λ-calculus approach. In Leslie Fox, editor, Advances in Programming and Non-Numerical
Computation: Proceedings of A Summer School, pages 97–141. Oxford University Computing Laboratory
and Delegacy for Extra-Mural Studies, Pergamon Press, Oxford, England, 1966.
8. Floyd, Robert W. Assigning meanings to programs. In J. T. Schwartz, editor, Mathematical Aspects of
Computer Science, volume 19 of Proceedings of Symposia in Applied Mathematics, pages 19–32, New York
City, April 5–7, 1966. American Mathematical Society, Providence, Rhode Island, 1967.
9. Manna, Zohar. The correctness of programs. Journal of Computer and System Sciences, 3(2):119–127,
May 1969.
10. Hoare, C. A. R. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–
580 and 583, October 1969. Reprinted in [11].
11. Gries, David, editor. Programming Methodology. Springer-Verlag, New York, 1978.
DEFINITIONAL INTERPRETERS 397
12. Scott, Dana S. Outline of a mathematical theory of computation. Technical Monograph PRG–2, Program-
ming Research Group, Oxford University Computing Laboratory, Oxford, England, November 1970. A
preliminary version appeared in Proceedings of the Fourth Annual Princeton Conference on Information
Sciences and Systems (1970), 169–176.
13. Scott, Dana S. Lattice theory, data types and semantics. In Randell Rustin, editor, Formal Semantics of
Programming Languages: Courant Computer Science Symposium 2, pages 65–106, New York University,
New York, September 14–16, 1970. Prentice-Hall, Englewood Cliffs, New Jersey, 1972.
14. Scott, Dana S. Models for various type-free calculi. In Patrick Suppes, Leon Henkin, Athanase Joja,
and Gr. C. Moisil, editors, Logic, Methodology and Philosophy of Science IV: Proceedings of the Fourth
International Congress, volume 74 of Studies in Logic and the Foundations of Mathematics, pages 157–187,
Bucharest, Romania, August 29–September 4, 1971. North-Holland, Amsterdam, 1973.
15. Scott, Dana S. Continuous lattices. In F. William Lawvere, editor, Toposes, Algebraic Geometry and Logic,
volume 274 of Lecture Notes in Mathematics, Dalhousie University, Halifax, Nova Scotia, January 16–19,
1971. Springer-Verlag, Berlin, 1972.
16. Milner, Robin. Implementation and applications of Scott’s logic for computable functions. In Proceedings of
an ACM Conference on Proving Assertions about Programs, pages 1–6, Las Cruces, New Mexico, January
6–7, 1972. ACM, New York. SIGPLAN Notices Volume 7, Number 1 and SIGACT News, Number 14.
17. Burstall, Rodney M. Formal description of program structure and semantics in first order logic. In Bernard
Meltzer and Donald Michie, editors, Machine Intelligence 5, pages 79–98. Edinburgh University Press,
Edinburgh, Scotland, 1969.
18. Lucas, Peter, Lauer, Peter E., and Stigleitner, H. Method and notation for the formal definition of program-
ming languages. Technical Report TR 25.087, IBM Laboratory Vienna, June 28, 1968. Revised July 1,
1970.
19. Reynolds, John C. GEDANKEN – a simple typeless language which permits functional data structures and
coroutines. Report ANL–7621, Applied Mathematics Division, Argonne National Laboratory, Argonne,
Illinois, September 1969.
20. Morris, F. Lockwood. The next 700 formal language descriptions. Lisp and Symbolic Computation, 6(3–
4):249–257, November 1993. Original manuscript dated November 1970.
21. de Bakker, Jaco W. Semantics of programming languages. In Julius T. Tou, editor, Advances in Information
Systems Science, volume 2, chapter 3, pages 173–227. Plenum Press, New York, 1969.
22. Park, David M. R. Fixpoint induction and proofs of program properties. In Bernard Meltzer and Donald
Michie, editors, Machine Intelligence 5, pages 59–78. Edinburgh University Press, Edinburgh, 1969.
23. Feldman, Jerome and Gries, David. Translator writing systems. Communications of the ACM, 11(2):77–113,
February 1968.
24. McCarthy, John. Towards a mathematical science of computation. In Cicely M. Popplewell, editor, Infor-
mation Processing 62: Proceedings of IFIP Congress 1962, pages 21–28, Munich, August 27–September
1, 1962. North-Holland, Amsterdam, 1963.
25. Wozencraft, John M. and Evans, Jr., Arthur. Notes on programming linguistics. Technical report, Department
of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, February
1971.
26. van Wijngaarden, Adriaan . Recursive definition of syntax and semantics. In T. B. Steel, Jr., editor,
Formal Language Description Languages for Computer Programming: Proceedings of the IFIP Working
Conference on Formal Language Description Languages, pages 13–24, Baden bei Wien, Austria, September
15–18, 1964. North-Holland, Amsterdam, 1966.
27. Morris, Jr., James H. A bonus from van Wijngaarden’s device. Communications of the ACM, 15(8):773,
August 1972.
28. Fischer, Michael J. Lambda calculus schemata. In Proceedings of an ACM Conference on Proving Assertions
about Programs, pages 104–109, Las Cruces, New Mexico, January 6–7, 1972. ACM, New York.
29. Landin, Peter J. A correspondence between ALGOL 60 and Church’s lambda-notation. Communications
of the ACM, 8(2–3):89–101, 158–165, February–March 1965.
30. Barron, D.W., Buxton, John N., Hartley, D.F., Nixon, E., and Strachey, Christopher. The main features of
CPL. The Computer Journal, 6:134–143, July 1963.
31. van Wijngaarden,Adriaan, Mailloux, B.J., Peck, J.E.L., and Koster, C.H.A. Report on the algorithmic
language ALGOL 68. Numerische Mathematik, 14(2):79–218, 1969.
32. Cheatham, Jr., T.E., Fischer, Alice, and Jorrand, P. On the basis for ELF – an extensible language facility.
In 1968 Fall Joint Computer Conference, volume 33, Part Two of AFIPS Conference Proceedings, pages
937–948, San Francisco, December 9–11, 1968. Thompson Book Company, Washington, D.C.
Higher-Order and Symbolic Computation, 13, 11–49, 2000
°
c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.
Abstract. This paper forms the substance of a course of lectures given at the International Summer School in
Computer Programming at Copenhagen in August, 1967. The lectures were originally given from notes and the
paper was written after the course was finished. In spite of this, and only partly because of the shortage of time, the
paper still retains many of the shortcomings of a lecture course. The chief of these are an uncertainty of aim—it is
never quite clear what sort of audience there will be for such lectures—and an associated switching from formal
to informal modes of presentation which may well be less acceptable in print than it is natural in the lecture room.
For these (and other) faults, I apologise to the reader.
There are numerous references throughout the course to CPL [1–3]. This is a programming language which has
been under development since 1962 at Cambridge and London and Oxford. It has served as a vehicle for research
into both programming languages and the design of compilers. Partial implementations exist at Cambridge and
London. The language is still evolving so that there is no definitive manual available yet. We hope to reach another
resting point in its evolution quite soon and to produce a compiler and reference manuals for this version. The
compiler will probably be written in such a way that it is relatively easy to transfer it to another machine, and in
the first instance we hope to establish it on three or four machines more or less at the same time.
The lack of a precise formulation for CPL should not cause much difficulty in this course, as we are primarily
concerned with the ideas and concepts involved rather than with their precise representation in a programming
language.
Keywords: programming languages, semantics, foundations of computing, CPL, L-values, R-values, para-
meter passing, variable binding, functions as data, parametric polymorphism, ad hoc polymorphism, binding
mechanisms, type completeness
1. Preliminaries
1.1. Introduction
Any discussion on the foundations of computing runs into severe problems right at the
start. The difficulty is that although we all use words such as ‘name’, ‘value’, ‘program’,
‘expression’ or ‘command’ which we think we understand, it often turns out on closer
investigation that in point of fact we all mean different things by these words, so that com-
munication is at best precarious. These misunderstandings arise in at least two ways. The
first is straightforwardly incorrect or muddled thinking. An investigation of the meanings
of these basic terms is undoubtedly an exercise in mathematical logic and neither to the taste
nor within the field of competence of many people who work on programming languages.
As a result the practice and development of programming languages has outrun our ability
to fit them into a secure mathematical framework so that they have to be described in ad
hoc ways. Because these start from various points they often use conflicting and sometimes
also inconsistent interpretations of the same basic terms.
12 STRACHEY
A second and more subtle reason for misunderstandings is the existence of profound
differences in philosophical outlook between mathematicians. This is not the place to
discuss this issue at length, nor am I the right person to do it. I have found, however, that
these differences affect both the motivation and the methodology of any investigation like
this to such an extent as to make it virtually incomprehensible without some preliminary
warning. In the rest of the section, therefore, I shall try to outline my position and describe
the way in which I think the mathematical problems of programming languages should be
tackled. Readers who are not interested can safely skip to Section 2.
The important philosophical difference is between those mathematicians who will not allow
the existence of an object until they have a construction rule for it, and those who admit the
existence of a wider range of objects including some for which there are no construction
rules. (The precise definition of these terms is of no importance here as the difference is
really one of psychological approach and survives any minor tinkering.) This may not seem
to be a very large difference, but it does lead to a completely different outlook and approach
to the methods of attacking the problems of programming languages.
The advantages of rigour lie, not surprisingly, almost wholly with those who require
construction rules. Owing to the care they take not to introduce undefined terms, the
better examples of the work of this school are models of exact mathematical reasoning.
Unfortunately, but also not surprisingly, their emphasis on construction rules leads them to
an intense concern for the way in which things are written—i.e., for their representation,
generally as strings of symbols on paper—and this in turn seems to lead to a preoccupation
with the problems of syntax. By now the connection with programming languages as we
know them has become tenuous, and it generally becomes more so as they get deeper into
syntactical questions. Faced with the situation as it exists today, where there is a generally
known method of describing a certain class of grammars (known as BNF or context-free),
the first instinct of these mathematicians seems to be to investigate the limits of BNF—what
can you express in BNF even at the cost of very cumbersome and artificial constructions?
This may be a question of some mathematical interest (whatever that means), but it has
very little relevance to programming languages where it is more important to discover
better methods of describing the syntax than BNF (which is already both inconvenient and
inadequate for ALGOL) than it is to examine the possible limits of what we already know to
be an unsatisfactory technique.
This is probably an unfair criticism, for, as will become clear later, I am not only tem-
peramentally a Platonist and prone to talking about abstracts if I think they throw light on a
discussion, but I also regard syntactical problems as essentially irrelevant to programming
languages at their present stage of development. In a rough and ready sort of way it seems
to me fair to think of the semantics as being what we want to say and the syntax as how
we have to say it. In these terms the urgent task in programming languages is to explore
the field of semantic possibilities. When we have discovered the main outlines and the
principal peaks we can set about devising a suitably neat and satisfactory notation for them,
and this is the moment for syntactic questions.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 13
But first we must try to get a better understanding of the processes of computing and
their description in programming languages. In computing we have what I believe to be a
new field of mathematics which is at least as important as that opened up by the discovery
(or should it be invention?) of calculus. We are still intellectually at the stage that calculus
was at when it was called the ‘Method of Fluxions’ and everyone was arguing about how
big a differential was. We need to develop our insight into computing processes and to
recognise and isolate the central concepts—things analogous to the concepts of continuity
and convergence in analysis. To do this we must become familiar with them and give them
names even before we are really satisfied that we have described them precisely. If we
attempt to formalise our ideas before we have really sorted out the important concepts the
result, though possibly rigorous, is of very little value—indeed it may well do more harm
than good by making it harder to discover the really important concepts. Our motto should
be ‘No axiomatisation without insight’.
However, it is equally important to avoid the opposite of perpetual vagueness. My own
view is that the best way to do this in a rapidly developing field such as computing, is to be
extremely careful in our choice of terms for new concepts. If we use words such as ‘name’,
‘address’, ‘value’ or ‘set’ which already have meanings with complicated associations and
overtones either in ordinary usage or in mathematics, we run into the danger that these
associations or overtones may influence us unconsciously to misuse our new terms—either
in context or meaning. For this reason I think we should try to give a new concept a neutral
name at any rate to start with. The number of new concepts required may ultimately be
quite large, but most of these will be constructs which can be defined with considerable
precision in terms of a much smaller number of more basic ones. This intermediate form of
definition should always be made as precise as possible although the rigorous description
of the basic concepts in terms of more elementary ideas may not yet be available. Who
when defining the eigenvalues of a matrix is concerned with tracing the definition back to
Peano’s axioms?
Not very much of this will show up in the rest of this course. The reason for this is partly
that it is easier, with the aid of hindsight, to preach than to practice what you preach. In part,
however, the reason is that my aim is not to give an historical account of how we reached
the present position but to try to convey what the position is. For this reason I have often
preferred a somewhat informal approach even when mere formality would in fact have been
easy.
2. Basic concepts
One of the characteristic features of computers is that they have a store into which it is
possible to put information and from which it can subsequently be recovered. Furthermore
the act of inserting an item into the store erases whatever was in that particular area of the
store before—in other words the process is one of overwriting. This leads to the assignment
command which is a prominent feature of most programming languages.
14 STRACHEY
x := 3
x := y + 1
x := x + 1
lend themselves to very simple explications. ‘Set x equal to 3’, ‘Set x to be the value of
y plus 1’ or ‘Add one to x’. But this simplicity is deceptive; the examples are themselves
special cases of a more general form and the first explications which come to mind will not
generalise satisfactorily. This situation crops up over and over again in the exploration of a
new field; it is important to resist the temptation to start with a confusingly simple example.
The following assignment commands show this danger.
All these commands are legal in CPL (and all but the last, apart from minor syntactic
alterations, in ALGOL also). They show an increasing complexity of the expressions written
on the left of the assignment. We are tempted to write them all in the general form
ε1 := ε2
where ε1 and ε2 stand for expressions, and to try as an explication something like ‘evaluate
the two expressions and then do the assignment’. But this clearly will not do, as the meaning
of an expression (and a name or identifier is only a simple case of an expression) on the left
of an assignment is clearly different from its meaning on the right. Roughly speaking an
expression on the left stands for an ‘address’ and one on the right for a ‘value’ which will be
stored there. We shall therefore accept this view and say that there are two values associated
with an expression or identifier. In order to avoid the overtones which go with the word
‘address’ we shall give these two values the neutral names: L-value for the address-like
object appropriate on the left of an assignment, and R-value for the contents-like object
appropriate for the right.
An L-value represents an area of the store of the computer. We call this a location rather than
an address in order to avoid confusion with the normal store-addressing mechanism of the
computer. There is no reason why a location should be exactly one machine-word in size—
the objects discussed in programming languages may be, like complex or multiple precision
numbers, more than one word long, or, like characters, less. Some locations are addressable
(in which case their numerical machine address may be a good representation) but some are
not. Before we can decide what sort of representation a general, non-addressable location
should have, we should consider what properties we require of it.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 15
The two essential features of a location are that it has a content—i.e. an associated
R-value—and that it is in general possible to change this content by a suitable updating
operation. These two operations are sufficient to characterise a general location which are
consequently sometimes known as ‘Load-Update Pairs’ or LUPs. They will be discussed
again in Section 4.1.
2.3. Definitions
In CPL a programmer can introduce a new quantity and give it a value by an initialised
definition such as
let p = 3.5
(In ALGOL this would be done by real p; p := 3.5;). This introduces a new use of the
name p (ALGOL uses the term ‘identifier’ instead of name), and the best way of looking at
this is that the activation of the definition causes a new location not previously used to be
set up as the L-value of p and that the R-value 3.5 is then assigned to this location.
The relationship between a name and its L-value cannot be altered by assignment, and it
is this fact which makes the L-value important. However in both ALGOL and CPL one name
can have several different L-values in different parts of the program. It is the concept of
scope (sometimes called lexicographical scope) which is controlled by the block structure
which allows us to determine at any point which L-value is relevant.
In CPL, but not in ALGOL, it is also possible to have several names with the same L-value.
This is done by using a special form of definition:
let q 'p
which has the effect of giving the name of the same L-value as p (which must already exist).
This feature is generally used when the right side of the definition is a more complicated
expression than a simple name. Thus if M is a matrix, the definition
gives x the same L-value as one of the elements of the matrix. It is then said to be sharing
with M[2,2], and an assignment to x will have the same effect as one to M[2,2].
It is worth noting that the expression on the right of this form of definition is evaluated in
the L-mode to get an L-value at the time the definition is obeyed. It is this L-value which
is associated with x. Thus if we have
let i = 2
let x ' M[i,i]
i := 3
a+b, which only have R-values. In both cases the expression has no name as such although
it does have either one value or two.
2.4. Names
It is important to be clear about this as a good deal of confusion can be caused by differing
uses of the terms. ALGOL 60 uses ‘identifier’ where we have used ‘name’, and reserves the
word ‘name’ for a wholly different use concerned with the mode of calling parameters for
a procedure. (See Section 3.4.3.) ALGOL X, on the other hand, appears likely to use the
word ‘name’ to mean approximately what we should call an L-value, (and hence something
which is a location or generalised address). The term reference is also used by several
languages to mean (again approximately) an L-value.
It seems to me wiser not to make a distinction between the meaning of ‘name’ and that
of ‘identifier’ and I shall use them interchangeably. The important feature of a name is that
it has no internal structure at any rate in the context in which we are using it as a name.
Names are thus atomic objects and the only thing we know about them is that given two
names it is always possible to determine whether they are equal (i.e., the same name) or not.
2.5. Numerals
We use the word ‘number’ for the abstract object and ‘numeral’ for its written representation.
Thus 24 and XXIV are two different numerals representing the same number. There is
often some confusion about the status of numerals in programming languages. One view
commonly expressed is that numerals are the ‘names of numbers’ which presumably means
that every distinguishable numeral has an appropriate R-value associated with it. This seems
to me an artificial point of view and one which falls foul of Occam’s razor by unnecessarily
multiplying the number of entities (in this case names). This is because it overlooks the
important fact that numerals in general do have an internal structure and are therefore not
atomic in the sense that we said names were in the last section.
An interpretation more in keeping with our general approach is to regard numerals as
R-value expressions written according to special rules. Thus for example the numeral 253
is a syntactic variant for the expression
2 × 102 + 5 × 10 + 3
2 × 82 + 5 × 8 + 3
Local rules for special forms of expression can be regarded as a sort of ‘micro-syntax’ and
form an important feature of programming languages. The micro-syntax is frequently used
in a preliminary ‘pre-processing’ or ‘lexical’ pass of compilers to deal with the recognition
of names, numerals, strings, basic symbols (e.g. boldface words in ALGOL) and similar
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 17
objects which are represented in the input stream by strings of symbols in spite of being
atomic inside the language.
With this interpretation the only numerals which are also names are the single digits and
these are, of course, constants with the appropriate R-value.
It is sometimes helpful to have a picture showing the relationships between the various
objects in the programming language, their representations in the store of a computer
and the abstract objects to which they correspond. Figure 1 is an attempt to portray the
conceptual model which is being used in this course.
On the left are some of the components of the programming language. Many of these
correspond to either an L-value or an R-value and the correspondence is indicated by an
arrow terminating on the value concerned. Both L-values and R-values are in the idealised
store, a location being represented by a box and its contents by a dot inside it. R-values
without corresponding L-values are represented by dots without boxes, and R-values which
are themselves locations (as, for example, that of a vector) are given arrows which terminate
on another box in the idealised store.
R-values which correspond to numbers are given arrows which terminate in the right
hand part of the diagram which represents the abstract objects with which the program
deals.
The bottom section of the diagram, which is concerned with vectors and vector elements
will be more easily understood after reading the section on compound data structures.
(Section 3.7.)
3. Conceptual constructs
All the first and simplest programming language—by which I mean machine codes and
assembly languages—consist of strings of commands. When obeyed, each of these causes
the computer to perform some elementary operation such as subtraction, and the more
elaborate results are obtained by using long sequences of commands.
In the rest of mathematics, however, there are generally no commands as such. Expres-
sions using brackets, either written or implied, are used to build up complicated results.
When talking about these expressions we use descriptive phrases such as ‘the sum of x and
y’ or possibly ‘the result of adding x to y’ but never the imperative ‘add x to y’.
As programming languages developed and became more powerful they came under
pressure to allow ordinary mathematical expressions as well as the elementary commands.
It is, after all, much more convenient to write as in CPL, x := a(b+c)+d than the more
elementary
CLA b
ADD c
MPY a
ADD d
STO x
introduced by the assignment command. In order to avoid this as far as possible, the next
section will be concerned with the properties of expressions in the absence of commands.
3.2.1. Values. The characteristic feature of an expression is that it has a value. We have
seen that in general in a programming language, an expression may have two values—an
L-value and an R-value. In this section, however, we are considering expressions in the
absence of assignments and in these circumstances L-values are not required. Like the rest
of mathematics, we shall be concerned only with R-values.
One of the most useful properties of expressions is that called by Quine [4] referential
transparency. In essence this means that if we wish to find the value of an expression which
contains a sub-expression, the only thing we need to know about the sub-expression is its
value. Any other features of the sub-expression, such as its internal structure, the number
and nature of its components, the order in which they are evaluated or the colour of the ink
in which they are written, are irrelevant to the value of the main expression.
We are quite familiar with this property of expressions in ordinary mathematics and often
make use of it unconsciously. Thus we expect the expressions
to have the same value. Note, however, that we cannot replace the symbol string 1+5 by the
symbol 6 in all circumstances as, for example 21 + 52 is not equal to 262. The equivalence
only applies to complete expressions or sub-expressions and assumes that these have been
identified by a suitable syntactic analysis.
3.2.2. Environments. In order to find the value of an expression it is necessary to know the
value of its components. Thus to find the value of a + 5 + b/a we need to know the values
of a and b. Thus we speak of evaluating an expression in an environment (or sometimes
relative to an environment) which provides the values of components.
One way in which such an environment can be provided is by a where-clause.
Thus
a + 3/a where a = 2 + 3/7
a + b − 3/a where a = b + 2/b
have a self evident meaning. An alternative syntactic form which has the same effect is the
initialised definition:
All three methods are exactly equivalent and are, in fact, merely syntactic variants whose
choice is a matter of taste. In each the letter a is singled out and given a value and is known
as the bound variable. The letter b in the second expression is not bound and its value still
has to be found from the environment in which the expression is to be evaluated. Variables
of this sort are known as free variables.
Expressions written in this way with deeply nesting brackets are very difficult to read.
Their importance lies only in emphasising the uniformity of applicative structure from
which they are built up. In normal use the more conventional syntactic forms which are
familiar and easier to read are much to be preferred—providing that we keep the underlying
applicative structure at the back of our minds.
In the examples so far given all the operators have been either a λ-expression or a single
symbol, while the operands have been either single symbols or sub-expressions. There is, in
fact, no reason why the operator should not also be an expression. Thus for example if we use
D for the differentiating operator, D(sin) = cos so that {D(sin)}(×(3, a)) is an expression
with a compound operator whose value would be cos(3a). Note that this is not the same as
the expression ddx sin(3x) for x = a which would be written (D(λx.sin(x(3, x))))(a).
3.2.4. Evaluation. We thus have a distinction between evaluating an operator and applying
it to its operands. Evaluating the compound operator D(sin) produces the result (or value)
cos and can be performed quite independently of the process of applying this to the operands.
Furthermore it is evident that we need to evaluate both the operator and the operands before
we can apply the first to the second. This leads to the general rule for evaluating compound
expressions in the operator-operand form viz:
The interesting thing about this rule is that it specifies a partial ordering of the operations
needed to evaluate an expression. Thus for example when evaluating
(a + b)(c + d/e)
both the additions must be performed before the multiplication, and the division before the
second addition but the sequence of the first addition and the division is not specified. This
partial ordering is a characteristic of algorithms which is not yet adequately reflected in most
programming languages. In ALGOL, for example, not only is the sequence of commands
fully specified, but the left to right rule specifies precisely the order of the operations.
Although this has the advantage of precision in that the effect of any program is exactly
defined, it makes it impossible for the programmer to specify indifference about sequencing
or to indicate a partial ordering. The result is that he has to make a large number of logically
unnecessary decisions, some of which may have unpredictable effects on the efficiency of
his program (though not on its outcome).
There is a device originated by Schönfinkel [5], for reducing operators with several
operands to the successive application of single operand operators. Thus, for example,
instead of +(2, p) where the operator + takes two arguments we introduce another adding
operator say +0 which takes a single argument such that +0 (2) is itself a function which
adds 2 to its argument. Thus (+0 (2))( p) = +(2, p) = 2 + p. In order to avoid a large
number of brackets we make a further rule of association to the left and write +0 2 p in
place of ((+0 2) p) or (+0 (2))( p). This convention is used from time to time in the rest of
this paper. Initially, it may cause some difficulty as the concept of functions which produce
functions as results is a somewhat unfamiliar one and the strict rule of association to the
left difficult to get used to. But the effort is well worth while in terms of the simpler and
more transparent formulae which result.
It might be thought that the remarks about partial ordering would no longer apply to
monadic operators, but in fact this makes no difference. There is still the choice of evaluating
the operator or the operand first and this allows all the freedom which was possible with
several operands. Thus, for example, if p and q are sub-expressions, the evaluation of
p + q (or +( p, q)) implies nothing about the sequence of evaluation of p and q although
both must be evaluated before the operator + can be applied. In Schönfinkel’s form this is
(+0 p)q and we have the choice of evaluating (+0 p) and q in any sequence. The evaluation
of +0 p involves the evaluation of +0 and p in either order so that once more there is no
restriction on the order of evaluation of the components of the original expression.
3.2.5. Conditional expressions. There is one important form of expression which appears
to break the applicative expression evaluation rule. A conditional expression such as
(x = 0) 0,1/x
(in ALGOL this would be written if x = 0 then 0 else 1/x) cannot be treated as an
ordinary function of three arguments. The difficulty is that it may not be possible to evaluate
both arms of the condition—in this case when x = 0 the second arm becomes undefined.
22 STRACHEY
Various devices can be used to convert this to a true applicative form, and in essence
all have the effect of delaying the evaluation of the arms until after the condition has been
decided. Thus suppose that If is a function of a Boolean argument whose result is the
selector First or Second so that If (True) = First and If (False) = Second, the naive interpre-
tation of the conditional expression given above as
is wrong because it implies the evaluation of both members of the list (0, 1/x) before
applying the operator {If (x = 0)}. However the expression
will have the desired effect as the selector function If (x = 0) is now applied to the list
({λa. 0}, {λa. 1/x}) whose members are λ-expressions and these can be evaluated (but not
applied) without danger. After the selection has been made the result is applied to a and
provided a has been chosen not to conflict with other identifiers in the expression, this
produces the required effect.
Recursive (self referential) functions do not require commands or loops for their defini-
tion, although to be effective they do need conditional expressions. For various reasons, of
which the principal one is lack of time, they will not be discussed in this course.
3.3.1. Variables. One important characteristic of mathematics is our habit of using names
for things. Curiously enough mathematicians tend to call these things ‘variables’ although
their most important property is precisely that they do not vary. We tend to assume auto-
matically that the symbol x in an expression such as 3x 2 + 2x + 17 stands for the same
thing (or has the same value) on each occasion it occurs. This is the most important conse-
quence of referential transparency and it is only in virtue of this property that we can use
the where-clauses or λ-expressions described in the last section.
The introduction of the assignment command alters all this, and if we confine ourselves to
the R-values of conventional mathematics we are faced with the problem of variables which
actually vary, so that their value may not be the same on two occasions and we can no longer
even be sure that the Boolean expression x = x has the value True. Referential transparency
has been destroyed, and without it we have lost most of our familiar mathematical tools—for
how much of mathematics can survive the loss of identity?
If we consider L-values as well as R-values, however, we can preserve referential trans-
parency as far as L-values are concerned. This is because L-values, being generalised
addresses, are not altered by assignment commands. Thus the command x := x+1 leaves
the address of the cell representing x (L-value of x) unchanged although it does alter the
contents of this cell (R-value of x). So if we agree that the values concerned are all L-values,
we can continue to use where-clauses and λ-expressions for describing parts of a program
which include assignments.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 23
The cost of doing this is considerable. We are obliged to consider carefully the relationship
between L and R-values and to revise all our operations which previously took R-value
operands so that they take L-values. I think these problems are inevitable and although
much of the work remains to be done, I feel hopeful that when completed it will not seem
so formidable as it does at present, and that it will bring clarification to many areas of
programming language study which are very obscure today. In particular the problems of
side effects will, I hope, become more amenable.
In the rest of this section I shall outline informally a way in which this problem can be
attacked. It amounts to a proposal for a method in which to formalise the semantics of a
programming language. The relation of this proposal to others with the same aim will be
discussed later. (Section 4.3.)
3.3.2. The abstract store. Our conceptual model of the computing process includes an
abstract store which contains both L-values and R-values. The important feature of this
abstract store is that at any moment it specifies the relationship between L-values and the
corresponding R-values. We shall always use the symbol σ to stand for this mapping from
L-values onto R-values. Thus if α is an L-value and β the corresponding R-value we shall
write (remembering the conventions discussed in the last section)
β = σ α.
The effect of an assignment command is to change the contents of the store of the machine.
Thus it alters the relationship between L-values and R-values and so changes σ . We can
therefore regard assignment as an operator on σ which produces a fresh σ . If we update
the L-value α (whose original R-value in σ was β) by a fresh R-value β 0 to produce a new
store σ 0 , we want the R-value of α in σ 0 to be β 0 , while the R-value of all other L-values
remain unaltered. This can be expressed by the equation
Thus U is a function which takes two arguments (an L-value and an R-value) and produces
as a result an operator which transforms σ into σ 0 as defined.
The arguments of U are L-values and R-values and we need some way of getting these
from the expressions written in the program. Both the L-value and the R-value of an
expression such as V[i+3] depend on the R-value of i and hence on the store. Thus both
must involve σ and if ε stands for a written expression in the programming language we
shall write L ε σ and R ε σ for its L-value and R-value respectively.
Both L and R are to be regarded as functions which operate on segments of text of the
programming language. The question of how those segments are isolated can be regarded
as a matter of syntactic analysis and forms no part of our present discussion.
These functions show an application to Schönfinkel’s device which is of more than merely
notational convenience. The function R, for example, shows that its result depends on both
ε and σ , so it might be thought natural to write it as R(ε, σ ). However by writing R ε σ
and remembering that by our convention of association to the left this means (R ε)σ it
becomes natural to consider the application of R to ε separately and before the application
24 STRACHEY
σ 0 = U (α, β 0 )σ
and
C α σ 0 = β 0.
ε1 := ε2
σ 0 = U (α1 , β2 )σ
where
α1 = L ε1 σ
and
β2 = R ε2 σ
so that
σ 0 = U (L ε1 σ, R ε2 σ )σ
σ 0 = θσ
where
θ = λ σ. U (L ε1 σ, R ε2 σ )σ
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 25
Sequences of commands imply the successive application of sequences of θ ’s. Thus, for
example, if γ1 , γ2 , γ3 are commands and θ1 , θ2 , θ3 the equivalent functions on σ , the
command sequence (or compound command)
γ1 ;γ2 ;γ3 ;
σ 0 = θ3 (θ2 (θ1 σ ))
= (θ3 · θ2 · θ1 )σ
Test ε1 If so do γ1
If not do γ2
λσ. If (R ε1 σ )(θ1 , θ2 )σ
R(ε1 ε2 , ε3 )σ = If (R ε1 σ )(R ε2 , R ε3 )σ
and
L(ε1 ε2 , ε3 )σ = If (R ε1 σ )(L ε2 , L ε3 )σ
This form makes it clear that it is f which is being defined and that x is a bound or dummy
variable and could be replaced by any other non-clashing name without altering the value
given to f.
3.4.2. Parameter calling modes. When the function is used (or called or applied) we write
f[ε] where ε can be an expression. If we are using a referentially transparent language
all we require to know about the expression ε in order to evaluate f[ε] is its value. There
are, however, two sorts of value, so we have to decide whether to supply the R-value or the
L-value of ε to the function f. Either is possible, so that it becomes a part of the definition
of the function to specify for each of its bound variables (also called its formal parameters)
whether it requires an R-value or an L-value. These alternatives will also be known as
calling a parameter by value (R-value) or reference (L-value).
Existing programming languages show a curious diversity in their modes of calling pa-
rameters. FORTRAN calls all its parameters by reference and has a special rule for providing
R-value expressions such as a + b with a temporary L-value. ALGOL 60, on the other hand,
has two modes of calling parameters (specified by the programmer): value and name. The
ALGOL call by value corresponds to call by R-value as above; the call by name,3 however,
is quite different (and more complex). Only if the actual parameter (i.e., the expression ε
above) is a simple variable is the effect the same as a call by reference. This incompatibility
in their methods of calling parameters makes it difficult to combine the two languages in a
single program.
3.4.3. Modes of free variables. The obscurity which surrounds the modes of calling the
bound variables becomes much worse when we come to consider the free variables of a
function. Let us consider for a moment the very simple function
f[x] = x + a
where a is a free variable which is defined in the surrounding program. When f is defined
we want in some way to incorporate a into its definition, and the question is do we use its
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 27
R-value or its L-value? The difference is illustrated in the following pair of CPL programs.
(In CPL a function definition using = takes its free variables by R-value and one using ≡
takes them by L-value.)
let a = 3
let f[x] ≡ x + a
... (f[5] = 8),(a = 3) ...
§ let a = 100
... (f[5] = 8),(a = 100) ...
a := 10
... (f[5] = 8),(a = 10) ...
............§|
... (f[5] = 8),(a = 3) ...
Here there is an inner block enclosed in the statement brackets § ....... §| (which
corresponds to begin and end in ALGOL), and inside this an entirely fresh a has been
defined. This forms a hole in the scope of the original a in which it continues to exist but
becomes inaccessible to the programmer. However as its L-value was incorporated in the
definition of f, it is the original a which is used to find f[5]. Note that assignments to a in
the inner block affect only the second a and so do not alter f.
It is possible to imagine a third method of treating free variables (though there is nothing
corresponding for bound variables) in which the locally current meaning of the variables is
used, so that in the example above the second and third occurrences of f[5] would have
the values 105 and 15 respectively. I believe that things very close to this exist in LISP2
and are known as fluid variables. The objection to this scheme is that it appears to destroy
referential transparency irrevocably without any apparent compensating advantages.
In CPL the facilities for specifying the mode of the free variables are considerably
coarser than the corresponding facilities for bound variables. In the case of bound variables
the mode has to be specified explicitly or by default for each variable separately. For the
free variables, however, it is only possible to make a single specification which covers all
the free variables, so that they must all be treated alike. The first method is more flexible
and provides greater power for the programmer, but is also more onerous (although good
28 STRACHEY
default conventions can help to reduce the burden); the second is much simpler to use but
sometimes does not allow a fine enough control. Decisions between methods of this sort
are bound to be compromises reflecting the individual taste of the language designer and
are always open to objection on grounds of convenience. It is no part of a discussion on
the fundamental concepts of programming languages to make this sort of choice—it should
rest content with pointing out the possibilities.
A crude but convenient method of specification, such as CPL uses for the mode of the
free variables of a function, becomes more acceptable if there exists an alternative method
by which the finer distinctions can be made, although at the cost of syntactic inelegance.
Such a method exists in CPL and involves using an analogue to the own variables in ALGOL
60 proposed by Landin [6].
3.4.4. Own variables. The idea behind own variables is to allow some private or secret
information which is in some way protected from outside interference. The details were
never very clearly expressed in ALGOL and at least two rival interpretations sprang up,
neither being particularly satisfactory. The reason for this was that owns were associated
with blocks whereas, as Landin pointed out, the natural association is with a procedure
body. (In this case of functions this corresponds to the expression on the right side of the
function definition.)
The purpose is to allow a variable to preserve its value from one application of a function
to the next—say to produce a pseudo-random number or to count the number of times the
function is applied. This is not possible with ordinary local variables defined within the body
of the function as all locals are redefined afresh on each application of the function. It would
be possible to preserve information in a non-local variable—i.e., one whose scope included
both the function definition and all its applications, but it would not then be protected and
would be accessible from the whole of this part of the program. What we need is a way of
limiting the scope of a variable to be the definition only. In CPL we indicate this by using
the word in to connect the definition of the own variable (which is usually an initialised
one) with the function definitions it qualifies.
In order to clarify this point programs using each of the three possible scopes (non-
local, own and local) are written below in three ways viz. Normal CPL, CPL mixed with
λ-expressions to make the function definition in its standard form, and finally in pure λ-
expressions. The differences in the scope rules become of importance only when there is a
clash of names, so in each of these examples one or both of the names a and x are used
twice. In order to make it easy to determine which is which, a prime has been added to one
of them. However, the scope rules imply that if all the primes were omitted the program
would be unaltered.
1. Non-local variable
We can now return to the question of controlling the mode of calling the free variables
of a function. Suppose we want to define f[x] to be ax + b + c and use the R-value of
a and b but the L-value of c. A CPL program which achieves this effect is
3.4.5. Functions and routines. We have so far discussed the process of functional abstrac-
tion as applied to expressions. The result is called a function and when applied to suitable
arguments it produces a value. Thus a function can be regarded as a complicated sort of
expression. The same process of abstraction can be applied to a command (or sequence of
commands), and the result is know in CPL as a routine. The application of a routine to a
suitable set of arguments is a complicated command, so that although it affects the store of
the computer, it produces no value as a result.
Functions and routines are as different in their nature as expressions and commands. It
is unfortunate, therefore, that most programming languages manage to confuse them very
successfully. The trouble comes from the fact that it is possible to write a function which
also alters the store, so that it has the effect of a function and a routine. Such functions are
sometimes said to have side effects and their uncontrolled use can lead to great obscurity in
the program. There is no generally agreed way of controlling or avoiding the side effects
of functions, and most programming languages make no attempt to deal with the problem
at all—indeed their confusion between routines and functions adds to the difficulties.
The problem arises because we naturally expect referential transparency of R-values in
expressions, particularly those on the right of assignment commands. This is, I think, a very
reasonable expectation as without this property, the value of the expression is much harder
to determine, so that the whole program is much more obscure. The formal conditions
on expressions which have to be satisfied in order to produce this R-value referential
transparency still need to be investigated. However in special cases the question is usually
easy to decide and I suggest that as a matter of good programming practice it should always
be done. Any departure of R-value referential transparency in a R-value context should
either be eliminated by decomposing the expression into several commands and simpler
expressions, or, if this turns out to be difficult, the subject of a comment.
3.4.6. Constants and variables. There is another approach to the problem of side effects
which is somewhat simpler to apply, though it does not get round all the difficulties. This
is, in effect, to turn the problem inside out and instead of trying to specify functions and
expressions which have no side effect to specify objects which are immune from any possible
side effect of others. There are two chief forms which this protection can take which can
roughly be described as hiding and freezing. Their inaccessibility (by reason of the scope
rules) makes them safe from alteration except from inside the body of the function or routine
they qualify. We shall be concerned in this section and the next with different forms of
protection by freezing.
The characteristic thing about variables is that their R-values can be altered by an assign-
ment command. If we are looking for an object which is frozen, or invariant, an obvious
possibility is to forbid assignments to it. This makes it what in CPL we call a constant. It
has an L-value and R-value in the ordinary way, but applying the update function to it either
has no effect or produces an error message. Constancy is thus an attribute of an L-value, and
is, moreover, an invariant attribute. Thus when we create a new L-value, and in particular
when we define a new quantity, we must decide whether it is a constant or a variable.
As with many other attributes, it is convenient in a practical programming language to
have a default convention—if the attribute is not given explicitly some conventional value is
assumed. The choice of these default conventions is largely a matter of taste and judgement,
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 31
but it is an important one as they can affect profoundly both the convenience of the language
and the number of slips made by programmers. In the case of constancy, it is reasonable
that the ordinary quantities, such as numbers and strings, should be variable. It is only
rather rarely that we want to protect a numerical constant such as Pi from interference.
Functions and routines, on the other hand, are generally considered to be constants. We
tend to give them familiar or mnemonic names such a CubeRt or LCM and we would rightly
feel confused by an assignment such as CubeRt := SqRt. Routines and functions are
therefore given the default attribute of being a constant.
3.4.7. Fixed and free. The constancy or otherwise of a function has no connection with
the mode in which it uses its free variables. If we write a definition in its standard form
such as
let f ≡ λx. x + a
we see that this has the effect of initialising f with a λ-expression. The constancy of f merely
means that we are not allowed to assign to it. The mode of its free variables (indicated by
≡) is a property of the λ-expression.
Functions which call their free variables by reference (L-value) are liable to alteration
by assignments to their free variables. This can occur either inside or outside the function
body, and indeed, even if the function itself is a constant. Furthermore they cease to have
a meaning if they are removed from an environment in which their free variables exist. (In
ALGOL this would be outside the block in which their free variables were declared.) Such
functions are called free functions.
The converse of a free function is a fixed function. This is defined as a function which
either has no free variables, or if it has, whose free variables are all both constant and fixed.
The crucial feature of a fixed function is that it is independent of its environment and is
always the same function. It can therefore be taken out of the computer (e.g., by being
compiled separately) and reinserted again without altering its effect.
Note that fixity is a property of the λ-expression—i.e., a property of the R-value, while
constancy is a property of the L-value. Numbers, for example, are always fixed as are all
‘atomic’ R-values (i.e., ones which cannot be decomposed into smaller parts). It is only in
composite objects that the distinction between fixed and free has any meaning. If such an
object is fixed, it remains possible to get at its component parts, but not to alter them. Thus,
for example, a fixed vector is a look-up table whose entries will not (cannot) be altered,
while a free vector is the ordinary sort of vector in which any element may be changed if
necessary.
3.4.8. Segmentation. A fixed routine or function is precisely the sort of object which can
be compiled separately. We can make use of this to allow the segmentation of programs
and their subsequent assembly even when they do communicate with each other through
free variables. The method is logically rather similar to the FORTRAN Common variables.
Suppose R[x] is a routine which uses a, b, and c by reference as free variables. We can
define a function R'[a,b,c] which has as formal parameters all the free variables of R and
32 STRACHEY
whose result is the routine R[x]. Then R' will have no free variables and will thus be a
fixed function which can be compiled separately.
The following CPL program shows how this can be done:
3.5.1. First and second class objects. In ALGOL a real number may appear in an expression
or be assigned to a variable, and either may appear as an actual parameter in a procedure
call. A procedure, on the other hand, may only appear in another procedure call either
as the operator (the most common case) or as one of the actual parameters. There are no
other expressions involving procedures or whose results are procedures. Thus in a sense
procedures in ALGOL are second class citizens—they always have to appear in person
and can never be represented by a variable or expression (except in the case of a formal
parameter), while we can write (in ALGOL still)
nor can we write a type procedure (ALGOL’s nearest approach to a function) with a result
which is itself a procedure.
Historically this second class status of procedures in ALGOL is probably a consequence
of the view of functions taken by many mathematicians: that they are constants whose
name one can always recognise. This second class view of functions is demonstrated by the
remarkable fact that ordinary mathematics lacks a systematic notation for functions. The
following example is given by Curry [7, p. 81].
Suppose P is an operator (called by some a ‘functional’) which operates on functions.
The result of applying P to a function f (x) is often written P[ f (x)]. What then does
P[ f (x + 1)] mean? There are two possible meanings (a) we form g(x) = f (x + 1) and
the result is P[g(x)] or (b) we form h(x) = P[ f (x)] and the result is h(x + 1). In many
cases these are the same but not always. Let
( f (x) − f (0)
for x 6= 0
P[ f (x)] = x
f 0 (x) for x = 0
Then if f (x) = x 2
P[g(x)] = P[x 2 + 2x + 1] = x + 2
while
h(x) = P[ f (x)] = x
so that h(x + 1) = x + 1.
This sort of confusion is, of course, avoided by using λ-expressions or by treating func-
tions as first class objects. Thus, for example, we should prefer to write (P[ f ])[x] in place of
P[ f (x)] above (or, using the association rule P[ f ][x] or even P f x). The two alternatives
which were confused would then become
Pgx where g x = f (x + 1)
and P f (x + 1).
The first of these could also be written P(λx. f (x + 1))x.
I have spent some time on this discussion in spite of its apparently trivial nature, because
I found, both from personal experience and from talking to others, that it is remarkably
difficult to stop looking on functions as second class objects. This is particularly unfortunate
as many of the more interesting developments of programming and programming languages
come from the unrestricted use of functions, and in particular of functions which have
functions as a result. As usual with new or unfamiliar ways of looking at things, it is harder
for the teachers to change their habits of thought than it is for their pupils to follow them. The
34 STRACHEY
difficulty is considerably greater in the case of practical programmers for whom an abstract
concept such as a function has little reality until they can clothe it with a representation and
so understand what it is that they are dealing with.
This has a single free variable, the function g which is taken by R-value. Thus the closure
for f would take the form
If we now identify g with f, so that the function becomes the recursively defined factorial,
all we need to do is to ensure that the FVL contains the closure for f, Thus it will take the form
so that the FVL, which now contains a copy of the closure for f, in fact points to itself. It
is a characteristic feature of recursively defined functions of all sorts that they have some
sort of a closed loop in their representation.
3.6.1. Types. Most programming languages deal with more than one sort of object—for
example with integers and floating point numbers and labels and procedures. We shall call
each of these a different type and spend a little time examining the concept of type and
trying to clarify it.
A possible starting point is the remark in the CPL Working Papers [3] that “The Type of
an object determines its representation and constrains the range of abstract object it may be
used to represent. Both the representation and the range may be implementation dependent”.
This is true, but not particularly helpful. In fact the two factors mentioned—representation
and range—have very different effects. The most important feature of a representation
is the space it occupies and it is perfectly possible to ignore types completely as far as
representation and storage is concerned if all types occupy the same size of storage. This
is in fact the position of most assembly languages and machine code—the only differences
of type encountered are those of storage size.
In more sophisticated programming languages, however, we use the type to tell us what
sort of object we are dealing with (i.e., to restrict its range to one sort of object). We
also expect the compiling system to check that we have not made silly mistakes (such as
multiplying two labels) and to interpret correctly ambiguous symbols (such as +) which
mean different things according to the types of their operands. We call ambiguous operators
of this sort polymorphic as they have several forms depending on their arguments.
The problem of dealing with polymorphic operators is complicated by the fact that the
range of types sometimes overlap. Thus for example 3 may be an integer or a real and it
may be necessary to change it from one type to the other. The functions which perform
this operation are known as transfer functions and may either be used explicitly by the
programmer, or, in some systems, inserted automatically by the compiling system.
36 STRACHEY
3.6.2. Manifest and latent. It is natural to ask whether type is an attribute of an L-value
or of an R-value—of a location or of its content. The answer to this question turns out to be
a matter of language design, and the choice affects the amount of work, which can be done
when a program is compiled as opposed to that which must be postponed until it is run.
In CPL the type is a property of an expression and hence an attribute of both its L-value and
its R-value. Moreover L-values are invariant under assignment and this invariance includes
their type. This means that the type of any particular written expression is determined solely
by its position in the program. This in turn determines from their scopes which definitions
govern the variables of the expression, and hence give their types. An additional rule states
that the type of the result of a polymorphic operator must be determinable from a knowledge
of the types of its operands without knowing their values. Thus we must be able to find the
type of a + b without knowing the value of either a or b provided only that we know both
their types.4
The result of these rules is that the type of every expression can be determined at compile
time so that the appropriate code can be produced both for performing the operations and
for storing the results.
We call attributes which can be determined at compile time in this way manifest; attributes
that can only be determined by running the program are known as latent. The distinction
between manifest and latent properties is not very clear cut and depends to a certain extent
on questions of taste. Do we, for example, take the value of 2 + 3 to be manifest or latent?
There may well be a useful and precise definition—on the other hand there may not. In
either case at present we are less interested in the demarkation problem than in properties
which are clearly on one side or other of the boundary.
3.6.3. Dynamic type determination. The decision in CPL to make types a manifest prop-
erty of expressions was a deliberate one of language design. The opposite extreme is also
worth examining. We now decide that types are to be attributes of R-values only and that
any type of R-value may be assigned to any L-value. We can settle difficulties about stor-
age by requiring that all types occupy the same storage space, but how do we ensure that
the correct operations are performed for polymorphic operators? Assembly languages and
other ‘simple’ languages merely forbid polymorphism. An alternative, which has interest-
ing features, is to carry around with each R-value an indication of its type. Polymorphic
operators will then be able to test this dynamically (either by hardware or program) and
choose the appropriate version.
This scheme of dynamic type determination may seem to involve a great deal of extra
work at run time, and it is true that in most existing computers it would slow down pro-
grams considerably. However the design of central processing units is not immutable and
logical hardware of the sort required to do a limited form of type determination is relatively
cheap. We should not reject a system which is logically satisfactory merely because today’s
computers are unsuitable for it. If we can prove a sufficient advantage for it machines
with the necessary hardware will ultimately appear even if this is rather complicated; the
introduction of floating-point arithmetic units is one case when this has already happened.
3.6.4. Polymorphism. The difficulties of dealing with polymorphic operators are not re-
moved by treating types dynamically (i.e., making them latent). The problems of choosing
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 37
the correct version of the operator and inserting transfer functions if required remain more
or less the same. The chief difference in treating types as manifest is that this information
has to be made available to the compiler. The desire to do this leads to an examination
of the various forms of polymorphism. There seem to be two main classes, which can be
called ad hoc polymorphism and parametric polymorphism.
In ad hoc polymorphism there is no single systematic way of determining the type of the
result from the type of the arguments. There may be several rules of limited extent which
reduce the number of cases, but these are themselves ad hoc both in scope and content. All
the ordinary arithmetic operators and functions come into this category. It seems, moreover,
that the automatic insertion of transfer functions by the compiling system is limited to this
class.
Parametric polymorphism is more regular and may be illustrated by an example. Suppose
f is a function whose argument is of type α and whose results is of β (so that the type of
f might be written α ⇒ β), and that L is a list whose elements are all of type α (so that
the type of L is α list). We can imagine a function, say Map, which applies f in turn to
each member of L and makes a list of the results. Thus Map[f,L] will produce a β list.
We would like Map to work on all types of list provided f was a suitable function, so that
Map would have to be polymorphic. However its polymorphism is of a particularly simple
parametric type which could be written
(α ⇒ β, α list) ⇒ β list
3.6.5. Types of functions. The type of a function includes both the types and modes of
calling of its parameters and the types of its results. That is to say, in more mathematical
terminology, that it includes the domain and the range of the function. Although this seems
a reasonable and logical requirement, it makes it necessary to introduce the parametric
polymorphism discussed above as without it functions such as Map have to be redefined
almost every time they are used.
Some programming languages allow functions with a variable number of arguments;
those are particularly popular for input and output. They will be known as variadic functions,
and can be regarded as an extreme form of polymorphic function.5
A question of greater interest is whether a polymorphic function is a first class object in
the sense of Section 3.5.1. If it is, we need to know what type it is. This must clearly include
in some way the types of all its possible versions. Thus the type of a polymorphic function
includes or specifies in some way the nature of its polymorphism. If, as in CPL, the types
are manifest, all this information must be available to the compiler. Although this is not
impossible, it causes a considerable increase in the complexity of the compiler and exerts a
strong pressure either to forbid programmers to define new polymorphic functions or even
to reduce all polymorphic functions to second class status. A decision on these points has
not yet been taken for CPL.
38 STRACHEY
3.7.1. List processing. While programming was confined to problems of numerical anal-
ysis the need for general forms of data structure was so small that it was often ignored.
For this reason ALGOL, which is primarily a language for numerical problems, contains no
structure other than arrays. COBOL, being concerned with commercial data processing, was
inevitably concerned with larger and more complicated structures. Unfortunately, however,
the combined effect of the business man’s fear of mathematics and the mathematician’s
contempt for business ensured that this fact had no influence on the development of general
programming languages.
It was not until mathematicians began using computers for non-numerical purposes—
initially in problems connected with artificial intelligence—that any general forms of com-
pound data structure for programming languages began to be discussed. Both IPL V and
LISP used data structures built up from lists and soon a number of other ‘List Processing’
languages were devised.
The characteristic feature of all these languages is that they are designed to manipulate
more or less elaborate structures, which are built up from large numbers of components
drawn from a very limited number of types. In LISP, for instance, there are only two sorts
of object, an atom and a cons-word which is a doublet. The crucial feature is that each
member of a doublet can itself be either an atom or another cons-word. Structures are built
up by joining together a number of cons-words and atoms.
This scheme of building up complex structures from numbers of similar and much simpler
elements has a great deal to recommend it. In some sense, moreover, the doublet of LISP
is the simplest possible component from which to construct a structure and it is certainly
possible to represent any other structure in terms of doublets. However from the practical
point of view, not only for economy of implementation but also for convenience in use, the
logically simplest representation is not always the best.
The later list processing languages attempted to remedy this by proposing other forms
of basic building block with more useful properties, while still, of course, retaining the
main plan of using many relatively simple components to form a complex structure. The
resulting languages were generally very much more convenient for some classes of problems
(particularly those they had been designed for) and much less suitable (possibly on grounds
of efficiency) for others. They all, however, had an ad hoc look about them and arguments
about their relative merits seemed somewhat unreal.
In about 1965 or 1966 interest began to turn to more general schemes for compound
data structures which allowed the programmer to specify his own building blocks in some
very general manner rather than having to make do with those provided by the language
designer. Several such schemes are now around and in spite of being to a large extent
developed independently they have a great deal in common—at least as far as the structures
described in the next section as nodes are concerned. In order to illustrate these ideas, I
shall outline the scheme which will probably be incorporated in CPL.
3.7.2. Nodes and elements. The building blocks from which structures are formed are
known as nodes. Nodes may be of many types and the definition of a new node is in fact
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 39
the definition of a new programmer-defined type in the sense of section 3.6. A node may
be defined to consist of one or more components; both the number and the type of each
component is fixed by the definition of the node. A component may be of any basic or
programmer-defined type (such as a node), or may be an element. This represents a data
object of one of a limited number of types; the actual type of object being represented is
determined dynamically. An element definition also forms a new programmer-defined type
in the sense of Section 3.6 and it also specifies which particular data types it may represent.
Both node and element definitions are definitions of new types, but at the same time
they are used to form certain basic functions which can be used to operate on and construct
individual objects of these types. Compound data structures may be built up from individuals
of these types by using these functions.
The following example shows the node and element definitions which allow the lists of
LISP to beformed.
These definitions introduce three new types: Cons and Atom, which are nodes, and
LispList which is an element. They also define the basic selector and constructor functions
which operate on them. These functions have the following effect.
If x is an object of type Cons, it has two components associated with it; the first, which is
of manifest type LispList is obtained by applying the appropriate selector function Car to
x, thus Car[x] is the first component of x and is of type LispList. The second component
of x is Cdr[x] and is an object of type Cons.
If p is an object of type LispList and q is an object of type Cons, we can form a fresh
node of type Cons whose first component is p and whose second component is q by using
the constructor function Cons[p,q] which always has the same name as the node type.
Thus we have the basic identities
Car[Cons[p,q]]= p
Cdr[Cons[p,q]]= q
In an exactly similar way the definition of the node Atom will also define the two selector
functions PrintName and PropertyList and the constructor function Atom.
The number of components of a node is not limited to two—any non-zero number is
allowed. There is also the possibility that any component may be the special object NIL.This
can be tested for by the system predicate Null. Thus, for example, if end of a list is indicated
by a NIL second component, we can test for this by the predicate Null[Cdr[x]].
There is also a constructor function associated with an element type. Thus, for example
if n is an atom, LispList[n] is an object of type LispList dynamically marked as being
an atom and being in fact the atom n. There are two general system functions which apply
40 STRACHEY
to elements, both are concerned with finding their dynamically current type. The function
Type[p] where p is a LispList will have the result either Atom or Cons according to the
current type of p. In a similar way the system predicate Is[Atom,p] will have the value
true if p is dynamically of type Atom.
These definitions give the basic building block of LISP using the same names with the
exception of Atom. In Lisp Atom[p] is the predicate which would be written here as
Is[Atom,p]. We use the function Atom to construct a new atom from a PrintName and a
PropertyList.
Car[Car[A]] := Cdr[Cdr[A]]
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 41
(1) and (2) may be carried out in either order as neither actually alters the structure.
Notice that this assignment has changed the pattern of sharing in the structure so that
now Car[Car[Car[A]]] and Car[Cdr[Cdr[A]]] actually share the same L-value (and
hence also the same R-value). This is because the assignment statements only take a copy
of the R-value of its right hand side, not a copy of all the information associated with it. In
this respect, structures are similar to functions whose FVL is not copied on assignment.
Thus, as with functions, the R-value of a compound data structure gives access to all the
information in the structure but does not contain it all. So that the distinction between fixed
and free applies as much to structures as it does to functions.
3.7.4. Implementation. The discussion of R- and L-values of nodes has so far been quite
general. I have indicated what information must be available, but in spite of giving diagrams
I have not specified in any way how it should be represented. I do not propose to go into
problems of implementation in any detail—in any case many of them are very machine
dependent—but an outline of a possible scheme may help to clarify the concepts.
Suppose we have a machine with a word length which is a few bits longer than a single
address. The R-value of a node will then be an address pointing to a small block of
consecutive words, one for each component, containing the R-values of the components.
An element requires for its R-value an address (e.g., the R-value of a node) and a marker to
say which of the various possibilities is its dynamically current type. (There should be an
escape mechanism in case there are too few bits available for the marker.) The allocation
42 STRACHEY
and control of storage for these nodes presents certain difficulties. A great deal of work has
been done on this problem and workable systems have been devised. Unfortunately there
is no time to discuss these here.
If we use an implementation of this sort for our example in the last section, we shall find
that nodes of type Cons will fill two consecutive words. The ‘puppet string’ R-values can
be replaced by the address of the first of these, so that we can redraw our diagram as
Car[Car[A]] := Cdr[Cdr[A]]
this becomes
3.7.5. Programming example. The following example shows the use of a recursively de-
fined routine which has a structure as a parameter and calls it by reference (L-value). A
tree sort takes place in two phases. During the first the items to be sorted are supplied in
sequence as arguments to the routine AddtoTree. The effect is to build up a tree structure
with an item and two branches at each node. The following node definitions define the
necessary components.
Here the key on which the sort is to be performed is an integer and the rest of the
information is of type Body. The routine for the first phase is
The effect of this is to build up a tree where all the items accessible from the Pre (prede-
cessor) branch of a Knot precede (i.e., have smaller keys) than the item at the Knot itself,
and this in turn precedes all those which are accessible from the Suc (successor) branch.
where the central branch marked 4 stands for the entire data-item N.
The second phase of a tree sort forms a singularly elegant example of the use of a
recursively defined routine. Its purpose is effectively to traverse the tree from left to right
printing out the data-items at each Knot. The way the tree has been built up ensures that
the items will be in ascending order of Keys.
We suppose that we have a routine PrintBody which will print information in a data-item
in the required format. The following routine will then print out the entire tree.
rec PrintTree[Knot:x] is
§ Unless Null[x] do
§ PrintTree[Pre[x]]
PrintBody[Rest[Item[x]]]
PrintTree[Suc[x]] §|
return §|
3.7.6. Pointers. There is no reason why an R-value should not represent (or be) a location;
such objects are known as pointers. Suppose X is a real variable with an L-value α. Then
if P is an object whose R-value is α, we say the type of P is real pointer and that P
44 STRACHEY
‘points to’ X. Notice that the type of a pointer includes the type of the thing it points to, so
that pointers form an example of parametric type. (Arrays form another.) We could, for
example, have another pointer Q which pointed to P; in this case Q would be of type real
pointer pointer.
There are two basic (polymorphic) functions associated with pointers:
Follow[P] (also written ↓ P in CPL) calls its argument by R-value and produces as a
result the L-value of the object pointed to. This is, apart from changes of representation,
the same as its argument. Thus we have
L-value of Follow[P] = P
R-value of Follow[P] = Contents of P
The function Pointer[X] calls its argument by L-value and produces as a result an
R-value which is a pointer to X.
Follow[Pointer[X]]
P := Follow[Y]
↓ P := ↓ P + 2
3.7.7. Other forms of structure. Vectors and arrays are reasonably well understood. They
are parametric types so that the type of an array includes its dimensionality (the number of
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 45
its dimensions but not their size) and also the type of its elements. Thus unlike in nodes,
all the elements of an array have to be of the same type, though their number may vary
dynamically. It is convenient, though perhaps not really necessary, to regard an n-array
(i.e., one with n dimensions) as a vector whose elements are (n − 1)-arrays.
We can then regard the R-value of a vector as something rather similar to that of a node
in that it gives access (or points to) the elements rather than containing them. Thus the
assignment of a vector does not involve copying its elements.
Clearly if this is the case we need a system function Copy (or possibly CopyVector)
which does produce a fresh copy.
There are many other possible parametric structure types which are less well understood.
The following list is certainly incomplete.
List An ordered sequence of objects all of the same type. The number is dynamically
variable.
Ntuple A fixed (manifest) number of objects all of the same type. This has many advan-
tages for the implementer.
Set In the mathematical sense. An unordered collection of objects all of which are of
the same type but different from each other. Operations on sets have been proposed for
some languages. The lack of ordering presents considerable difficulty.
Bag or Coll This is a new sort of collection for which there is, as yet, no generally
accepted name. It consists of an unordered collection of objects all of which are of the
same type and differs from a set in that repetitions are allowed. (The name bag is derived
from probability problems concerned with balls of various colours in a bag.) A bag is
frequently the collection over which an iteration is required—e.g., when averaging.
There are also structures such as rings which cannot be ‘syntactically’ defined in the
manner of nodes. They will probably have to be defined in terms of the primitive functions
which operate on them or produce them.
It is easy enough to include any selection of these in a programming language, but the
result would seem rather arbitrary. We still lack a convincing way of describing those and
any other extensions to the sort of structures that a programmer may want to use.
4. Miscellaneous topics
In this section we take up a few points whose detailed discussion would have been out of
place before.
A general L-value (location) has two important features: There is a function which gives the
corresponding R-value (contents) and another which will update this. If the location is not
simply addressable, it can therefore be represented by a structure with two components—a
Load part and an Update part; these two can generally share a common FVL. Such an
46 STRACHEY
L-value is known as a Load-Update Pair (LUP). We can now represent any location of type
α by an element (in the sense of Section 3.7.2)
Note that these are parametrically polymorphic definitions. There is also a constraint on
the components of a LUP that if X is an α LUP and y is of type α
y = value of § Update[x][y]
result is Load[X] §|
LUPs are of considerable practical value even when using machine code. A uniform
system which tests a general location to see if it is addressable or not (in which case it is a
LUP)—say by testing a single bit—can then use the appropriate machine instruction (e.g.
CDA or STO) or apply the appropriate part of the LUP. This allows all parts of the machine to
be treated in a uniform manner as if they were all addressable. In particular index registers,
which may need loading by special instruction, can then be used much more freely.
Another interesting example of the use of a LUP is in dealing with the registers which
set up the peripheral equipment. In some machines these registers can be set but not read
by the hardware. Supervisory programs are therefore forced to keep a copy of their settings
in normal store, and it is quite easy to fail to keep these two in step. If the L-value of the
offending register is a LUP, and it is always referred to by this, the Update part can be made
to change both the register and its copy, while the Load part reads from the copy.
The importance of this use of LUPs is that it reduces the number of ad hoc features of the
machine and allows much greater uniformity by treatment. This in turn makes it easier for
programmers at the machine code level to avoid oversights and other errors and, possibly
more important, makes it easier to write the software programs dealing with these parts of
the machine in a high level language and to compile them.
The disadvantage in current machines is that, roughly speaking, every indirect reference
requires an extra test to see if the location is addressable. Although this may be unaccept-
able for reasons of space or time (a point of view which requires the support of much more
convincing reasons than have yet been given), it would be a relatively insignificant extra
complication to build a trap into the hardware for this test. It is the job of people investi-
gating the fundamental concepts of programming to isolate the features such as this whose
incorporation in the hardware of machine would allow or encourage the simplification of
its software.
4.2. Macrogenerators
Throughout this course, I have adopted the point of view that programming languages are
dealing with abstract objects (such as numbers or functions) and that the details of the way
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 47
in which we represent these are of relatively secondary importance. It will not have escaped
many readers that in the computing world, and even more so in the world of mathematicians
today, this is an unfashionable if not heretical point of view. A much more conventional
view is that a program is a symbol string (with the strong implication that it is nothing more),
a programming language the set of rules for writing down local strings, and mathematics
in general a set of rules for manipulating strings.
The outcome of this attitude is a macrogenerator whose function is to manipulate or
generate symbol strings in programming languages without any regard to their semantic
content. Typically such a macrogenerator produces ‘code’ in some language which is already
implemented on the machine and whose detailed representation must be familiar to anyone
writing further more definitions. It will be used to extend the power of the base language,
although generally at the expense of syntactic convenience and often transparency, by adding
new macrocommands.
This process should be compared with that of functional abstraction and the definition
of functions and routines. Both aim to extend the power of the language by introducing
now operations. Both put a rather severe limit on the syntactic freedom with which the
extensions can be made.
The difference lies in the fact that macrogenerators deal with the symbols which represent
the variables, values and other objects of concern to a program so that all their manipulation
is performed before the final compiling. In other words all macrogeneration is manifest.
Function and routine definitions on the other hand are concerned with the values themselves,
not with the symbols which represent them and thus, in the first instance are dynamic (or
latent) rather than manifest.
The distinction is blurred by the fact that the boundary between manifest and latent is
not very clear cut, and also by the fact that it is possible by ingenuity and at the expense of
clarity to do by a macrogenerator almost everything that can be done by a function definition
and vice versa. However the fact that it is possible to push a pea up a mountain with your
nose does not mean that this is a sensible way of getting it there. Each of these techniques
of language extension should be used in its proper place.
Macrogeneration seems to be particularly valuable when a semantic extension of the
language is required. If this is one which was not contemplated by the language designer the
only alternative to trickery with macros is to rewrite the compiler—in effect to design a new
language. This has normally been the situation with machine code and assembly languages
and also to a large extend with operating systems. The best way to avoid spending all your
time fighting the system (or language) is to use a macrogenerator and build up your own.
However with a more sophisticated language the need for a macrogenerator diminishes,
and it is a fact that ALGOL systems on the whole use macrogenerators very rarely. It is,
I believe, a proper aim for programming language designers to try to make the use of
macrogenerators wholly unnecessary.
Section 3.3 gives an outline of a possible method for formalising the semantics of program-
ming languages. It is a development of an earlier proposal [8], but it is far from complete
and cannot yet be regarded as adequate.
48 STRACHEY
There are at present (Oct. 1967) only three examples of the formal description of the
semantics of a real programming language, as opposed to those which deal with emasculated
versions of languages with all the difficulties removed. These are the following:
(i) Landin’s reduction of ALGOL to λ-expressions with the addition of assignments and
jumps. This requires a special form of evaluating mechanism (which is, of course, a
notional computer) to deal with the otherwise non-applicative parts of the language.
The method is described in [6] and given in full in [9].
(ii) de Bakker [10] has published a formalisation of most of ALGOL based on an extension
of Markov algorithms. This is an extreme example of treating the language as a symbol
string. It requires no special machine except, of course, the symbol string manipulator.
(iii) A team at the IBM Laboratories in Vienna have published [12, 13] a description of PL/I
which is based on an earlier evaluating mechanism for pure λ-expressions suggested
by Landin [11] and the concept of a state vector for a machine suggested by McCarthy
[14]. This method requires a special ‘PL/I machine’ whose properties and transition
function are described. The whole description is very long and complex and it is hard
to determine how much of this complexity is due to the method of semantic description
and how much to the amorphous nature of PL/I.
The method suggested in Section 3.3 has more in common with the approach of Landin
or the IBM team than it has with de Bakker’s. It differs, however, in that the ultimative
machine required (and all methods of describing semantics come to a machine ultimately)
is in no way specialised. Its only requirement is that it should be able to evaluate pure
λ-expressions. It achieves this result by explicitly bringing in the store of the computer in
an abstract form, an operation which brings with it the unexpected bonus of being able to
distinguish explicitly between manifest and latent properties. However until the whole of a
real language has been described in these terms, it must remain as a proposal for a method,
rather than a method to be recommended.
Notes
1. This is the CPL notation for a conditional expression which is similar to that used by LISP. In ALGOL the
equivalent would be if a > b then j else k.
2. The ALGOL equivalent of this would have to be if a > b then j := i else k := i.
3. ALGOL 60 call by name Let f be an ALGOL procedure which calls a formal parameter x by name. Then a call
for f with an actual parameter expression ε will have the same effect as forming a parameterless procedure λ ().ε
and supplying this by value to a procedure f∗ which is derived from f by replacing every written occurrence
of x in the body of f by x(). The notation λ().ε denotes a parameterless procedure whose body is ε while
x() denotes its application (to a null parameter list).
4. The only elementary operator to which this rule does not already apply is exponentiation. Thus, for example,
if a and b are both integers ab will be an integer if b ≥ 0 and a real if b < 0. If a and b are reals, the type of ab
depends on the sign of a as well as that of b. In CPL this leads to a definition of a ↑ b which differs slightly in
its domain from ab .
5. By analogy with monadic, dyadic and polyadic for functions with one, two and many arguments. Functions
with no arguments will be known as anadic. Unfortunately there appears to be no suitable Greek prefix meaning
variable.
FUNDAMENTAL CONCEPTS IN PROGRAMMING LANGUAGES 49
References
1. Barron, D.W., Buxton, J.N., Hartley, D.F., Nixon, E., and Strachey, C. The main features of CPL. Comp. J.
6 (1963) 134–143.
2. Buxton, J.N., Gray, J.C., and Park, D. CPL elementary programming manual, Edition II. Technical Report,
Cambridge, 1966.
3. Strachey, C. (Ed.). CPL working papers. Technical Report, London and Cambridge Universities, 1966.
4. Quine, W.V. Word and Object. New York Technology Press and Wiley, 1960.
5. Schönfinkel, M. Über die Bausteine der mathematischen Logik. Math. Ann. 92 (1924) 305–316.
6. Landin, F.J. A formal description of ALGOL 60. In Formal Language Description Languages for Computer
Programming, T.B. Steel (Ed.). North Holland Publishing Company, Amsterdam, 1966, pp. 266–294.
7. Curry, H.B. and Feys, R. Combinatory Logic, Vol. 1, North Holland Publishing Company, Amsterdam, 1958.
8. Strachey, C. Towards a formal semantics. In Formal Language Description Languages for Computer Pro-
gramming T.B. Steel (Ed.). North Holland Publishing Company, Amsterdam, 1966, pp. 198–216.
9. Landin, P.J. A correspondence between ALGOL 60 and Church’s Lambda notation. Comm. ACM 8 (1965)
89–101, 158–165.
10. de Bakker, J.W. Mathematical Centre Tracts 16: Formal Definition of Programming Languages. Mathematisch
Centrum, Amsterdam, 1967.
11. Landin, P.J. The Mechanical Evaluation of Expressions. Comp. J. 6 (1964) 308–320.
12. PL/I—Definition Group of the Vienna Laboratory. Formal definition of PL/I. IBM Technical Report TR
25.071, 1966.
13. Alber, K. Syntactical description of PL/I text and its translation into abstract normal form. IBM Technical
Report TR 25.074, 1967.
14. McCarthy, J. Problems in the theory of computation. In Proc. IFIP Congress 1965, Vol. 1, W.A. Kalenich
(Ed.). Spartan Books, Washington, 1965, pp. 219–222.
Computational lambda-calculus and monads
Eugenio Moggi∗
Lab. for Found. of Comp. Sci.
University of Edinburgh
EH9 3JZ Edinburgh, UK
On leave from Univ. di Pisa
1
.
The methodology outlined above is inspired by [13]2 , T and µ: T 2 → T are natural transformations and the
and it is followed in [11, 8] to obtain the λp -calculus. following equations hold:
The view that “category theory comes, logically, be-
• µT A ; µA = T (µA ); µA
fore the λ-calculus” led us to consider a categorical
semantics of computations first, rather than to mod- • ηT A ; µA = idT A = T (ηA ); µA
ify directly the rules of βη-conversion to get a correct
calculus. A computational model is a monad (T, η, µ) satis-
A type theoretic approach to partial functions and fying the mono requirement: ηA is a mono for every
computations is attempted in [1] by introducing a type A ∈ C.
constructor Ā, whose intuitive meaning is the set of There is an alternative description of a monad (see
computations of type A. Our categorical semantics is [7]), which is easier to justify computationally.
based on a similar idea. Constable and Smith, how-
ever, do not adequately capture the general axioms for Definition 1.2 A Kleisli triple over C is a triple
computations (as we do), since they lack a general no- (T, η, ∗ ), where T : Obj(C) → Obj(C), ηA : A → T A,
tion of model and rely instead on operational, domain- f ∗ : T A → T B for f : A → T B and the following equa-
and recursion-theoretic intuition. tions hold:
∗
• ηA = idT A
1 A categorical semantics of • ηA ; f ∗ = f
computations • f ∗ ; g ∗ = (f ; g ∗ )∗
The basic idea behind the semantics of programs de- Every Kleisli triple (T, η, ∗ ) corresponds to a monad
scribed below is that a program denotes a morphism (T, η, µ) where T (f : A → B) = (f ; ηB )∗ and µA =
from A (the object of values of type A) to T B (the id∗T A .
object of computations of type B).
Intuitively ηA is the inclusion of values into compu-
This view of programs corresponds to call-by-value
tations and f ∗ is the extension of a function f from
parameter passing, but there is an alternative view of
values to computations to a function from computa-
“programs as functions from computations to compu-
tions to computations, which first evaluates a compu-
tations” corresponding to call-by-name (see [10]). In
tation and then applies f to the resulting value. The
any case, the real issue is that the notions of value and
equations for Kleisli triples say that programs form
computation should not be confused. By taking call-
a category, the Kleisli category CT , where the set
by-value we can stress better the importance of values.
CT (A, B) of morphisms from A to B is C(A, T B), the
Moreover, call-by-name can be more easily represented
identity over A is ηA and composition of f followed
in call-by-value than the other way around.
by g is f ; g ∗ . Although the mono requirement is very
There are many possible choices for T B correspond-
natural there are cases in which it seems appropriate
ing to different notions of computations, for instance
to drop it, for instance: it may not be satisfied by the
in the category of sets the set of partial computa-
monad of continuations.
tions (of type B) is the lifting B + {⊥} and the set of
Before going into more details we consider some ex-
non-deterministic computations is the powerset P(B).
amples of monads over the category of sets.
Rather than focus on specific notions of computations,
we will identify the general properties that the object Example 1.3 Non-deterministic computations:
T B of computations must have. The basic require-
• T ( ) is the covariant powerset functor, i.e. T (A) =
ment is that programs should form a category, and
P(A) and T (f )(X) is the image of X along f
the obvious choice for it is the Kleisli category for a
monad. • ηA (a) is the singleton {a}
Definition 1.1 A monad over a category C is a • µA (X) is the big union ∪X
.
triple (T, η, µ), where T : C → C is a functor, η: IdC →
Computations with side-effects:
2 “I am trying to find out where λ-calculus should come from,
• T ( ) is the functor ( × S)S , where S is a
and the fact that the notion of a cartesian closed category is a
nonempty set of stores. Intuitively a computa-
late developing one (Eilenberg & Kelly (1966)), is not relevant
to the argument: I shall try to explain in my own words in the tion takes a store and returns a value together
next section why we should look to it first”. with the modified store.
• ηA (a) is (λs: S.ha, si) • On top of the programming language we consider
equivalence and existence assertions (see Table 2).
• µA (f ) is (λs: S.eval(f s)), i.e. the computation
that given a store s, first computes the pair Remark 1.4 The let-constructor is very important se-
computation-store hf 0 , s0 i = f s and then returns mantically, since it corresponds to composition in the
the pair value-store ha, s00 i = f 0 s0 . Kleisli category CT . While substitution corresponds
to composition in C. In the λ-calculus (let x=e in e0 ) is
Continuations: usually treated as syntactic sugar for (λx.e0 )e, and this
( ) can be done also in the λc -calculus. However, we think
• T ( ) is the functor RR , where R is a nonempty
that this is not the right way to proceed, because it
set of results. Intuitively a computation takes a
amounts to understanding the let-constructor, which
continuation and returns a result.
makes sense in any computational model, in terms of
• ηA (a) is (λk: RA .ka) constructors that make sense only in λc -models. On
the other hand, (let x=e in e0 ) cannot be reduced to
A
• µA (f ) is (λk: RA .f (λh: RR .hk)) the more basic substitution (i.e. e0 [x: = e]) without
collapsing CT to C.
One can verify for himself that other notions of compu- The existence assertion e ↓ means that e denotes a
tation (e.g. partial, probabilistic or non-deterministic value and it generalizes the existence predicate used in
with side-effects) fit in the general definition of monad. the logic of partial terms/elements, for instance:
• a partial computation exists iff it terminates;
1.1 A simple language
• a non-deterministic computation exists iff it gives
We introduce a programming language (with existence
exactly one result;
and equivalence assertions), where programs denote
morphisms in the Kleisli category CT corresponding • a computation with side-effects exists iff it does
to a computational model (T, η, µ) over a category C. not change the store.
The language is oversimplified (for instance terms have
exactly one free variable) in order to define its inter-
pretation in any computational model. The additional 2 Extending the language
structure required to interpret λ-terms will be intro-
duced incrementally (see Section 2), after computa- In this section we describe the additional structure re-
tions have been understood and axiomatized in isola- quired to interpret λ-terms in a computational model.
tion. It is well-known that λ-terms can be interpreted in a
The programming language is parametric in a sig- cartesian closed categories (ccc), so one expects that
a monad over a ccc would suffice, however, there are
nature (i.e. a set of base types and unary command
symbols), therefore its interpretation in a computa- two problems:
tional model is parametric in an interpretation of the • the interpretation of (let x=e in e0 ), when e0 has
symbols in the signature. To stress the fact that the other free variables beside x, and
interpretation is in CT (rather than C), we use τ1 * τ2
(instead of τ1 → τ2 ) as arities and ≡ : τ (instead of • the interpretation of functional types.
= : T τ ) as equality of computations of type τ .
Example 2.1 To show why the interpretation of the
• Given an interpretation [[A]] for any base type A, let-constructor is problematic, we try to interpret
i.e. an object of CT , then the interpretation of a x1 : τ1 ` (let x2 =e2 in e): τ , when both x1 and x2 are
type τ : : = A | T τ is an object [[τ ]] of CT defined free in e. Suppose that g2 : τ1 → T τ2 and g: τ1 ×
in the obvious way, [[T τ ]] = T [[τ ]]. τ2 → T τ are the interpretations of x1 : τ1 ` e2 : τ2
and x1 : τ1 , x2 : τ2 ` e: τ respectively. If T were IdC ,
• Given an interpretation [[p]] for any unary com- then [[x1 : τ1 ` (let x2 =e2 in e): τ ]] would be hidτ1 , g2 i; g.
mand p of arity τ1 * τ2 , i.e. a morphism from In the general case, Table 1 says that ; above is
[[τ1 ]] to [[τ2 ]] in CT , then the interpretation of a indeed composition in the Kleisli category, therefore
well-formed program x: τ ` e: τ 0 is a morphism hidτ1 , g2 i; g becomes hidτ1 , g2 i; g ∗ . But in hidτ1 , g2 i; g ∗
[[x: τ ` e: τ 0 ]] in CT from [[τ ]] to [[τ 0 ]] defined by there is a type mismatch, since the codomain of
induction on the derivation of x: τ ` e: τ 0 (see Ta- hidτ1 , g2 i is τ1 × T τ2 , while the domain of T g is
ble 1). T (τ1 × τ2 ).
The problem is that the monad and cartesian prod- where a one-one correspondence is established between
ucts alone do not give us the ability to transform a functorial and tensorial strengths 3 :
pair value-computation (or computation-computation)
into a computation of a pair. What is needed is • the first two equations say that t is a tensorial
a morphism tA,B from A × T B to T (A × B), so strength of T , so that T is a C-enriched functor.
that x1 : τ1 ` (let x2 =e2 in e): T τ will be interpreted by • the last two equations say that η and µ are natu-
hidτ1 , g2 i; tτ1 ,τ2 ; g ∗ . ral transformations between C-enriched functors,
Similarly for interpreting x: τ ` p(e1 , e2 ): τ 0 , we need . .
namely η: IdC → T and µ: T 2 → T .
a morphism ψA,B : T A × T B → T (A × B), which given
a pair of computations returns a computation com- So a strong monad is just a monad over C enriched
puting a pair, so that, when gi : τ → T τi is the inter- over itself in the 2-category of C-enriched categories.
pretation of x: τ ` ei : τi , then [[x: τ ` p(e1 , e2 ): τ 0 ]] is The second explanation was suggested to us by G.
hg1 , g2 i; ψτ1 ,τ2 ; [[p]]∗ . Plotkin, and takes as fundamental structure a class D
of display maps over C, which models dependent types
Definition 2.2 A strong monad over a category C (see [2]), and induces a C-indexed category C/D . Then
with finite products is a monad (T, η, µ) together with a a strong monad over a category C with finite products
natural transformation tA,B from A × T B to T (A × B) amounts to a monad over C/D in the 2-category of
s.t. C-indexed categories, where D is the class of first pro-
t1,A ; T (rA ) = rT A jections (corresponding to constant type dependency).
tA×B,C ; T (αA,B,C ) = αA,B,T C ; (idA × tB,C ); tA,B×C In general the natural transformation t has to be
(idA × ηB ); tA,B = ηA×B given as an extra parameter for models. However, t
is uniquely determined (but it may not exists) by T
(idA × µB ); tA,B = tA,T B ; T (tA,B ); µA×B
and the cartesian structure on C, when C has enough
where r and α are the natural isomorphisms points.
• rA : 1 × A → A Proposition 2.4 If (T, η, µ) is a monad over a cat-
egory C with finite products and enough points (i.e.
• αA,B,C : (A × B) × C → A × (B × C) for any f, g: A → B if h; f = h; g for every points
Remark 2.3 The natural transformation t with the h: 1 → A, then f = g), and tA,B is a family of mor-
above properties is not the result of some ad hoc con- phisms s.t. for all points a: 1 → A and b: 1 → T B
siderations, instead it can be obtained via the following ha, bi; tA,B = b; T (h!B ; a, idB i)
general principle:
where !B is the unique morphism from B to the ter-
when interpreting a complex language the 2-
minal object 1, then (T, η, µ, t) is a strong monad over
category Cat of small categories, functors
C.
and natural transformations may not be ad-
equate and one may have to use a different Remark 2.5 The tensorial strength t induces a natu-
2-category which captures better some funda- ral transformation ψA,B from T A × T B to T (A × B),
mental structures underlying the language. namely
Since monads and adjunctions are 2-category concepts,
ψA,B = cT A,T B ; tT B,A ; (cT B,A ; tA,B )∗
the most natural way to model computations (and
datatypes) for more complex languages is simply by where c is the natural isomorphism
monads (and adjunctions) in a suitable 2-category.
Following this general principle we can give two ex- • cA,B : A × B → B × A
planations for t, one based on enriched categories (see
The morphism ψA,B has the correct domain and
[4]) and the other on indexed categories (see [3]).
codomain to interpret the pairing of a computation of
The first explanation takes as fundamental a com-
type A with one of type B (obtained by first evaluating
mutative monoidal structure on C, which models the
the first argument and then the second). There is also
tensor product of linear logic (see [6, 14]). If C is a
monoidal closed category, in particular a ccc, then it 3 A functorial strength for an endofunctor T is a natural
can be enriched over itself by taking C(A, B) to be transformation stA,B : B A → (T B)T A which internalizes the
the object B A . The equations for t are taken from [5], action of T on morphisms.
a dual notion of pairing, ψ̃A,B = cA,B ; ψB,A ; T cB,A We claim that the formal system is sound and com-
(see [5]), which amounts to first evaluating the second plete w.r.t. interpretation in λc -models. Soundness
argument and then the first. amounts to showing that the inference rules are admis-
sible in any λc -model, while completeness amounts to
The reason why a functional type A → B in a pro- showing that any λc -theory has an initial model (given
gramming language (like ML) cannot be interpreted by a term-model construction). The inference rules of
by the exponential B A (as done in a ccc) is fairly ob- the λc -calculus are partitioned as follows:
vious; in fact the application of a functional procedure
to an argument requires some computation to be per- • general rules for terms denoting computations,
formed before producing a result. By analogy with but with variables ranging over values (see Ta-
partial cartesian closed categories (see [8, 11]), we will ble 4)5
interpret functional types by exponentials of the form
A • the inference rules for let-constructor and types of
(T B) .
computations (see Table 5)
Definition 2.6 A λc -model over a category C with
• the inference rules for product and functional
finite products is a strong monad (T, η, µ, t) together
types (see Table 6)
with a T -exponential for every pair hA, Bi of objects
in C, i.e. a pair
Remark 3.1 A comparison among λc -, λv - and λp -
A A
h(T B) , evalA,T B : ((T B) × A) → T Bi calculus shows that:
let
x: τ ` e1 : τ1 = g1
x1 : τ 1 ` e 2 : τ 2 = g2
x: τ ` (let x1 =e1 in e2 ): τ2 = g1 ; g2∗
p: τ1 * τ2
x: τ ` e1 : τ1 = g1
x: τ ` p(e1 ): τ2 = g 1 ; p∗
[]
x: τ ` e: τ 0 = g
x: τ ` [e]: T τ 0 = g; ηT [[τ 0 ]]
µ
x: τ ` e: T τ 0 = g
x: τ ` µ(e): τ 0 = g; µ[[τ 0]]
ex
x: τ1 ` e: τ2 = g
x: τ1 ` e ↓ τ2 ⇐⇒ g factors through η[[τ2 ]]
let
Γ ` e 1 : τ1 = g1
Γ, x1 : τ1 ` e2 : τ2 = g2
Γ ` (let x1 =e1 in e2 ): τ2 = hid[[Γ]], g1 i; t[[Γ]],[[τ1]] ; g2∗
∗
Γ ` ∗: 1 = ![[Γ]] ; η1
hi
Γ ` e 1 : τ1 = g1
Γ ` e 2 : τ2 = g2
Γ ` he1 , e2 i: τ1 × τ2 = hg1 , g2 i; ψ[[τ1 ]],[[τ2]]
πi
Γ ` e: τ1 × τ2 = g
Γ ` πi (e): τ1 = g; T (πi )
λ
Γ, x1 : τ1 ` e2 : τ2 = g
Γ ` (λx1 : τ1 .e2 ): τ1 * τ2 = Λ[[τ1 ]],T [[τ2 ]],[[Γ]](g); η[[τ1 *τ2 ]]
app
Γ ` e 1 : τ1 = g1
Γ ` e: τ1 * τ2 = g
Γ ` e(e1 ): τ2 = hg, g1 i; ψ(T [[τ2 ]])[[τ1 ]] ,[[τ1 ]] ; (eval[[τ1 ]],T [[τ2]] )∗
E.x Γ ` x ↓ τ
Γ`e↓τ Γ, x: τ ` A
subst
Γ ` A[x: = e]
≡ is an congruence relation
ass Γ ` (let x2 =(let x1 =e1 in e2 ) in e) ≡ (let x1 =e1 in (let x2 =e2 in e)): τ x1 6∈ FV(e)
let.β Γ ` (let x1 =x2 in e) ≡ e[x1 : = x2 ]: τ
let.p Γ ` p(e) ≡ (let x=e in p(x)): τ
E.[ ] Γ ` [e] ↓ T τ
T.β Γ ` µ([e]) ≡ e: τ
T.η Γ ` [µ(x)] ≡ x: T τ
E.∗ Γ ` ∗ ↓ 1
1.η Γ ` ∗ ≡ x: 1
E.h i Γ ` hx1 , x2 i ↓ τ1 × τ2