SML 97.ps
SML 97.ps
SML 97.ps
Revised 1996
October 1996
iii
Preface
A precise description of a programming language is a prerequisite for its implementation
and for its use. The description can take many forms, each suited to a dierent purpose. A
common form is a reference manual, which is usually a careful narrative description of the
meaning of each construction in the language, often backed up with a formal presentation
of the grammar (for example, in Backus-Naur form). This gives the programmer enough
understanding for many of his purposes. But it is ill-suited for use by an implementer, or
by someone who wants to formulate laws for equivalence of programs, or by a programmer
who wants to design programs with mathematical rigour.
This document is a formal description of both the grammar and the meaning of a
language which is both designed for large projects and widely used. As such, it aims
to serve the whole community of people seriously concerned with the language. At a
time when it is increasingly understood that programs must withstand rigorous analysis,
particular for systems where safety is critical, a rigorous language presentation is even
important for negotiators and contractors; for a robust program written in an insecure
language is like a house built upon sand.
Most people have not looked at a rigorous language presentation before. To help them
particularly, but also to put the present work in perspective for those more theoretically
prepared, it will be useful here to say something about three things: the nature of Standard
ML, the task of language denition in general, and the form of the present Denition.
Standard ML
Standard ML is a functional programming language, in the sense that the full power of
mathematical functions is present. But it grew in response to a particular programming
task, for which it was equipped also with full imperative power, and a sophisticated
exception mechanism. It has an advanced form of parametric modules, aimed at organised
development of large programs. Finally it is strongly typed, and it was the rst language to
provide a particular form of polymorphic type which makes the strong typing remarkably
exible. This combination of ingredients has not made it unduly large, but their novelty
has been a fascinating challenge to semantic method (of which we say more below).
ML has evolved over twenty years as a fusion of many ideas from many people. This
evolution is described in some detail in Appendix F of the book, where also we acknowledge
all those who have contributed to it, both in design and in implementation.
`ML' stands for meta language; this is the term logicians use for a language in which
other (formal or informal) languages are discussed and analysed. Originally ML was con-
ceived as a medium for nding and performing proofs in a logical language. Conducting
rigorous argument as dialogue between person and machine has been a growing research
topic throughout these twenty years. The diculties are enormous, and make stern de-
mands upon the programming language which is used for this dialogue. Those who are
not familiar with computer-assisted reasoning may be surprised that a programming lan-
guage, which was designed for this rather esoteric activity, should ever lay claim to being
iv
generally useful. On re
ection, they should not be surprised. LISP is a prime example of
a language invented for esoteric purposes and becoming widely used. LISP was invented
for use in articial intelligence (AI); the important thing about AI here is not that it is
esoteric, but that it is dicult and varied; so much so, that anything which works well
for it must work well for many other applications too.
The same can be said about the initial purpose of ML, but with a dierent emphasis.
Rigorous proofs are complex things, which need varied and sophisticated presentation
{ particularly on the screen in interactive mode. Furthermore the proof methods, or
strategies, involved are some of the most complex algorithms which we know. This all
applies equally to AI, but one demand is made more strongly by proof than perhaps by
any other application: the demand for rigour.
This demand established the character of ML. In order to be sure that, when the user
and the computer claim to have together performed a rigorous argument, their claim is
justied, it was seen that the language must be strongly typed. On the other hand, to be
useful in a dicult application, the type system had to be rather
exible, and permit the
machine to guide the user rather than impose a burden upon him. A reasonable solution
was found, in which the machine helps the user signicantly by inferring his types for him.
Thereby the machine also confers complete reliability on his programs, in this sense: If
a program claims that a certain result follows from the rules of reasoning which the user
has supplied, then the claim may be fully trusted.
The principle of inferring useful structural information about programs is also rep-
resented, at the level of program modules, by the inference of signatures. Signatures
describe the interfaces between modules, and are vital for robust large-scale programs.
When the user combines modules, the signature discipline prevents him from mismatch-
ing their interfaces. By programming with interfaces and parametric modules, it becomes
possible to focus on the structure of a large system, and to compile parts of it in isolation
from one another { even when the system is incomplete.
This emphasis on types and signatures has had a profound eect on the language
Denition. Over half this document is devoted to inferring types and signatures for
programs. But the method used is exactly the same as for inferring what values a program
delivers; indeed, a type or signature is the result of a kind of abstract evaluation of a
program phrase.
In designing ML, the interplay among three activities { language design, denition and
implementation { was extremely close. This was particularly true for the newest part, the
parametric modules. This part of the language grew from an initial proposal by David
MacQueen, itself highly developed; but both formal denition and implementation had
a strong in
uence on the detailed design. In general, those who took part in the three
activities cannot now imagine how they could have been properly done separately.
Language Denition
Every programming language presents its own conceptual view of computation. This view
is usually indicated by the names used for the phrase classes of the language, or by its
v
keywords: terms like package, module, structure, exception, channel, type, procedure,
reference, sharing, . . . . These terms also have their abstract counterparts, which may
be called semantic objects; these are what people really have in mind when they use the
language, or discuss it, or think in it. Also, it is these objects, not the syntax, which
represent the particular conceptual view of each language; they are the character of the
language. Therefore a denition of the language must be in terms of these objects.
As is commonly done in programming language semantics, we shall loosely talk of
these semantic objects as meanings. Of course, it is perfectly possible to understand
the semantic theory of a language, and yet be unable to understand the meaning of
a particular program, in the sense of its intention or purpose. The aim of a language
denition is not to formalise everything which could possibly be called the meaning of a
program, but to establish a theory of semantic objects upon which the understanding of
particular programs may rest.
The job of a language-dener is twofold. First { as we have already suggested { he
must create a world of meanings appropriate for the language, and must nd a way of
saying what these meanings precisely are. Here, he meets a problem; notation of some
kind must be used to denote and describe these meanings { but not a programming
language notation, unless he is passing the buck and dening one programming language
in terms of another. Given a concern for rigour, mathematical notation is an obvious
choice. Moreover, it is not enough just to write down mathematical denitions. The
world of meanings only becomes meaningful if the objects possess nice properties, which
make them tractable. So the language-dener really has to develop a small theory of
his meanings, in the same way that a mathematician develops a theory. Typically, after
initially dening some objects, the mathematician goes on to verify properties which
indicate that they are objects worth studying. It is this part, a kind of scene-setting,
which the language-dener shares with the mathematician. Of course he can take many
objects and their theories directly from mathematics, such as functions, relations, trees,
sequences, . . . . But he must also give some special theory for the objects which make his
language particular, as we do for types, structures and signatures in this book; otherwise
his language denition may be formal but will give no insight.
The second part of the dener's job is to dene evaluation precisely. This means that
he must dene at least what meaning, M , results from evaluating any phrase P of his
language (though he need not explain exactly how the meaning results; that is he need
not give the full detail of every computation). This part of his job must be formal to
some extent, if only because the phrases P of his language are indeed formal objects.
But there is another reason for formality. The task is complex and error-prone, and
therefore demands a high level of explicit organisation (which is, largely, the meaning
of `formality'); moreover, it will be used to specify an equally complex, error-prone and
formal construction: an implementation.
We shall now explain the keystone of our semantic method. First, we need a slight but
important renement. A phrase P is never evaluated in vacuo to a meaning M , but always
against a background; this background { call it B { is itself a semantic object, being a
distillation of the meanings preserved from evaluation of earlier phrases (typically variable
vi
1
1 Introduction
This document formally denes Standard ML.
To understand the method of denition, at least in broad terms, it helps to consider
how an implementation of ML is naturally organised. ML is an interactive language, and a
program consists of a sequence of top-level declarations; the execution of each declaration
modies the top-level environment, which we call a basis, and reports the modication to
the user.
In the execution of a declaration there are three phases: parsing, elaboration, and
evaluation. Parsing determines the grammatical form of a declaration. Elaboration, the
static phase, determines whether it is well-typed and well-formed in other ways, and
records relevant type or form information in the basis. Finally evaluation, the dynamic
phase, determines the value of the declaration and records relevant value information
in the basis. Corresponding to these phases, our formal denition divides into three
parts: grammatical rules, elaboration rules, and evaluation rules. Furthermore, the basis
is divided into the static basis and the dynamic basis; for example, a variable which has
been declared is associated with a type in the static basis and with a value in the dynamic
basis.
In an implementation, the basis need not be so divided. But for the purpose of
formal denition, it eases presentation and understanding to keep the static and dynamic
parts of the basis separate. This is further justied by programming experience. A large
proportion of errors in ML programs are discovered during elaboration, and identied as
errors of type or form, so it follows that it is useful to perform the elaboration phase
separately. In fact, elaboration without evaluation is part of what is normally called
compilation; once a declaration (or larger entity) is compiled one wishes to evaluate it {
repeatedly { without re-elaboration, from which it follows that it is useful to perform the
evaluation phase separately.
A further factoring of the formal denition is possible, because of the structure of the
language. ML consists of a lower level called the Core language (or Core for short), a
middle level concerned with programming-in-the-large called Modules, and a very small
upper level called Programs. With the three phases described above, there is therefore
a possibility of nine components in the complete language denition. We have allotted
one section to each of these components, except that we have combined the parsing,
elaboration and evaluation of Programs in one section. The scheme for the ensuing seven
sections is therefore as follows:
Core Modules Programs
Syntax Section 2 Section 3
Static Semantics Section 4 Section 5 Section 8
Dynamic Semantics Section 6 Section 7
The Core provides many phrase classes, for programming convenience. But about
half of these classes are derived forms, whose meaning can be given by translation into
the other half which we call the Bare language. Thus each of the three parts for the
2 1 INTRODUCTION
Core treats only the bare language; the derived forms are treated in Appendix A. This
appendix also contains a few derived forms for Modules. A full grammar for the language
is presented in Appendix B.
In Appendices C and D the initial basis is detailed. This basis, divided into its static
and dynamic parts, contains the static and dynamic meanings of a small set of predened
identiers. A richer basis is dened in a separate document[18].
The semantics is presented in a form known as Natural Semantics. It consists of a set
of rules allowing sentences of the form
A ` phrase ) A 0
often a type in the static semantics and a value in the dynamic semantics. One should read
such a sentence as follows: \against the background provided by A, the phrase phrase
elaborates { or evaluates { to the object A ". Although the rules themselves are formal the
0
semantic objects, particularly the static ones, are the subject of a mathematical theory
which is presented in a succinct form in the relevant sections.
The robustness of the semantics depends upon theorems. Usually these have been
proven, but the proof is not included.
3
\u xxxx The single character with number xxxx (4 hexadecimal digits de-
noting an integer in the ordinal range of the alphabet).
\" "
\\ \
\f f \ This sequence is ignored, where f f stands for a sequence of one
or more formatting characters.
The formatting characters are a subset of the non-printable characters including at
least space, tab, newline, formfeed. The last form allows long strings to be written on
more than one line, by writing \ at the end of one line and at the start of the next.
A character constant is a sequence of the form #s, where s is a string constant denoting
a string of size one character.
Libraries may provide multiple numeric types and multiple string types. To each
string type corresponds an alphabet with ordinal range [0; N 1] for some N 256;
each alphabet must agree with the ASCII character set on the characters numbered 0 to
127. When multiple alphabets are supported, all characters of a given string constant are
interpreted over the same alphabet. For each special constant, overloading resolution is
used for determining the type of the constant (see Appendix E).
We denote by SCon the class of special constants, i.e., the integer, real, word, character
and string constants; we shall use scon to range over SCon.
2.3 Comments
A comment is any character sequence within comment brackets (* *) in which comment
brackets are properly nested. No space is allowed between the two characters which make
up a comment bracket (* or *). An unmatched (* should be detected by the compiler.
The qualied identiers constitute a link between the Core and the Modules. Through-
out this document, the term \identier", occurring without an adjective, refers to non-
qualied identiers only.
An identier is either alphanumeric: any sequence of letters, digits, primes (') and
underbars ( ) starting with a letter or prime, or symbolic: any non-empty sequence of the
following symbols
! % & $ # + - / : < = > ? @ \ ~ ` ^ | *
In either case, however, reserved words are excluded. This means that for example # and
| are not identiers, but ## and |=| are identiers. The only exception to this rule
is that the symbol = , which is a reserved word, is also allowed as an identier to stand
for the equality predicate. The identier = may not be re-bound; this precludes any
syntactic ambiguity.
A type variable tyvar may be any alphanumeric identier starting with a prime; the
subclass EtyVar of TyVar, the equality type variables, consists of those which start with
two or more primes. The classes VId, TyCon and Lab are represented by identiers
not starting with a prime. However, * is excluded from TyCon, to avoid confusion with
the derived form of tuple type (see Figure 23). The class Lab is extended to include the
numeric labels 1 2 3 , i.e. any numeral not starting with 0. The identier class StrId
is represented by alphanumeric identiers not starting with a prime.
TyVar is therefore disjoint from the other four classes. Otherwise, the syntax class
of an occurrence of identier id in a Core phrase (ignoring derived forms, Section 2.7) is
determined thus:
1. Immediately before \." { i.e. in a long identier { or in an open declaration, id is
a structure identier. The following rules assume that all occurrences of structure
identiers have been removed.
2. At the start of a component in a record type, record pattern or record expression,
id is a record label.
3. Elsewhere in types id is a type constructor.
4. Elsewhere, id is a value identier.
By means of the above rules a compiler can determine the class to which each identier
occurrence belongs; for the remainder of this document we shall therefore assume that
the classes are all disjoint.
where hdi is an optional decimal digit d indicating binding precedence. A higher value of
d indicates tighter binding; the default is 0. infix and infixr dictate left and right
associativity respectively. In an expression of the form exp 1 vid 1 exp 2 vid 2 exp 3, where vid 1
and vid 2 are inxed operators with the same precedence, either both must associate to
the left or both must associate to the right. For example, suppose that << and >> have
equal precedence, but associate to the left and right respectively; then
x << y << z parses as (x << y) << z
x >> y >> z parses as x >> (y >> z)
x << y >> z is illegal
x >> y << z is illegal
The precedence of inxed operators relative to other expression and pattern construc-
tions is given in Appendix B.
The scope of a xity directive dir is the ensuing program text, except that if dir occurs
in a declaration dec in either of the phrases
let dec in end
dec in end
local
then the scope of dir does not extend beyond the phrase. Further scope limitations are
imposed for Modules (see Section 3.3).
These directives and op are omitted from the semantic rules, since they aect only
parsing.
3.5 Syntactic Restrictions 13
be written \(atpat vid atpat ) "; the parentheses may also be dropped if \:ty" or \="
0
follows immediately.
Figure 21: Grammar: Declarations and Bindings
64 B APPENDIX: FULL GRAMMAR
E Overloading
Two forms of overloading are available:
Certain special constants are overloaded. For example, 0w5 may have type word or
some other type, depending on the surrounding program text;
Certain operators are overloaded. For example, + may have type int int ! int
or real real ! real, depending on the surrounding program text;
Programmers cannot dene their own overloaded constants or operators.
Although a formal treatment of overloading is outside the scope of this document, we
do give a complete list of the overloaded operators and of types with overloaded special
constants. This list is consistent with the Basis Library[18].
Every overloaded constant and value identier has among its types a default type,
which is assigned to it, when the surrounding text does not resolve the overloading. For
this purpose, the surrounding text is no larger than the smallest enclosing structure-level
declaration; an implementation may require that a smaller context determines the type.
NONFIX INFIX
var 7! set of monotypes var 7! set of monotypes
abs 7 realint ! realint
! Precedence 7, left associative :
~ 7! realint ! realint div 7! wordint wordint ! wordint
mod 7 wordint wordint ! wordint
!
* 7! num num ! num
/ 7! Real Real ! Real
Precedence 6, left associative :
+ 7! num num ! num
- 7 num
! num ! num
Precedence 4, left associative :
< 7! numtxt numtxt ! numtxt
> 7! numtxt numtxt ! numtxt
<= 7! numtxt numtxt ! numtxt
>= 7! numtxt numtxt ! numtxt
Figure 27: Overloaded identiers
Once overloading resolution has determined the type of a special constant, it is a
compile-time error if the constant does not make sense or does not denote a value within
the machine representation chosen for the type. For example, an escape sequence of the
form \uxxxx in a string constant of 8-bit characters only makes sense if xxxx denotes a
number in the range [0; 255].
The need for type abbreviations in signatures was clear when SML '90 was dened. How-
ever, type abbreviations were not included since, in the presence of both structure sharing
and type abbreviations, principal signatures do not exist[41] { and the SML '90 Deni-
tion depended strongly upon the notion of principal signature. Subsequently, Harper's
and Lillibridge's work on translucent sums[22] and Leroy's work on modules[30] showed
that, in the absence of structure sharing and certain other features of the SML '90 sig-
natures, type abbreviations in signatures are possible. Type abbreviations in signatures
were implemented by David MacQueen in SML/NJ 0.93 and by Xavier Leroy in Caml
Special Light [31].
In SML '96, structure sharing has been removed (see Section G.3 below). Type ab-
breviations are not included directly, but they arise as a derived form, as follows. First,
a new form of signature expression is allowed:
sigexp where type tyvarseq longtycon = ty
Here longtycon has to be specied by sigexp . The type expression ty may refer to type
constructors which are present in the basis in which the whole signature expression is
elaborated, but not to type constructors specied in sigexp .
G.2 Opaque Signature Matching 77
The eect of the where type is, roughly speaking, to instantiate longtycon to ty . For
example, the following sequence of declarations is legal:
signature SIG1 = sig type t; val x: t end;
signature SIG2 = SIG1 where type t = int*int;
structure S1: SIG1 = struct type t = real; val x = 1.0 end;
structure S2: SIG2 = struct type t = int*int; val x = (5, 7) end;
Next, a type abbreviation is a derived form. For example, type u = t*t is equivalent to
include sig type u end where type u = t*t . In SML '96 it is allowed to include
an arbitrary signature expression, not just a signature identier.
is legal, but a subsequent declaration val s = S1.y + 1.5 will fail to elaborate. Simi-
larly, consider the functor declaration:
functor Dict(type t; val leq: t*t->int):>
sig type u = t*t
type 'a dict
end =
struct
type u = t*t
type 'a dict = (t * 'a) list
end
When applied, Dict will propagate the identity of the type t from argument to result,
but it will produce a fresh dict type upon each application.
Types which are specied as \abstract" in a opaque functor result signature give rise
to generation of fresh type names upon each application of the functor, even if the functor
body is a constant structure. For example, after the elaboration of
structure A = struct type t = int end
functor f():> sig type t end = A
structure B = f()
and C = f();
G.3 Sharing
Structure sharing is a key idea in MacQueen's original Modules design[32]. The theoretical
aspects of structure sharing have been the subject of considerable research attention[24,53,
1,55,35]. However, judging from experience, structure sharing is not often used in its full
generality, namely to ensure identity of values. Furthermore, experience from teaching
suggests that the structure sharing concept is somewhat hard to grasp. Finally, the
semantic accounts of structure sharing that have been proposed are rather complicated.
The static semantics of SML '96 has no notion of structure sharing. However, SML '96
does provide a weaker form of structure sharing constraints, in which structure sharing is
regarded as a derived form, equivalent to a collection of type sharing constraints.
G.3.1 Type Sharing
In SML '90, a type sharing constraint sharing type longtycon 1 = = longtycon n was
an admissible form of specication. In SML '96 such a constraint does not stand by itself
as a specication, but may be used to qualify a specication. Thus there is a new form
of specication, which we shall call a qualied specication:
spec sharing type longtycon 1 = = longtycon n
G.3 Sharing 79
Here the long type constructors have to be specied by spec . The type constructors may
have been specied by type, eqtype or datatype specications, or indirectly through
signature identiers and include. In order for the specication to be legal, all the type
constructors must denote
exible type names. More precisely, let B be the basis in which
the qualied specication is elaborated. Let us say that a type name t is rigid (in B )
if t 2 T of B and that t is
exible (in B ) otherwise. For example int is rigid in the
initial basis and every datatype declaration introduces additional rigid type names into
the basis. For the qualied specication to elaborate in basis B , it is required that each
longtycon i denotes a type name which is
exible in B . In particular, no longtycon i may
denote a type function which is not also a type name (e.g., a longtycon must not denote
():s s).
For example, the two signature expressions
sig sig
type s type s
type t datatype t = C
sharing type s = t sharing type s = t
end end
is legal and both t and u are equality types after the sharing qualication. The mechanism
for inferring equality attributes for datatype specications is the same as for inferring
equality attributes for datatype declarations. Thus the specication
datatype answer = YES | NO
datatype 'a option = Some of 'a | None
80 G APPENDIX: WHAT IS NEW?
species two equality types. Every specication of the form datatype datdesc introduces
one type name for each type constructor described by datdesc . The equality attribute of
such a type name is determined at the point where the specication occurs. Thus, in
type s
datatype t = C of s
the type name associated with t will not admit equality, even if s later is instantiated to
an equality type. Type names associated with datatype specications can be instantiated
to other type names by subsequent type sharing or where type qualications. In this
case, no eort is made to ban type environments that do not respect equality. For example,
sig
eqtype s
datatype t = C of int -> int
sharing type s = t
end
is legal in SML '96, even though it cannot be matched by any real structure.
G.3.3 Structure Sharing
For convenience, structure sharing constraints are provided, but only as a shorthand for
type sharing constraints. There is a derived form of specication
spec sharing longstrid 1 = = longstrid k (k 2)
Here spec must specify longstrid 1; . . . ; longstrid k . The equivalent form consists of spec
qualied by all the type sharing constraints
sharing type longstrid i :longtycon = longstrid j :longtycon
(1 i < j k) such that both longstrid i:longtycon and longstrid j :longtycon are specied
by spec .
In SML '90, structure sharing constraints are transitive, but in SML '96 they are not.
For example,
structure A: sig type t end
structure B: sig end
structure C: sig type t end
sharing A=B=C
induces no type sharing. Thus a structure sharing constraint in some cases induces less
sharing in SML '96 than in SML '90.
Next, SML '96 does not allow structure sharing equations which refer to \external"
structures. For example, the program
structure A= struct end;
signature SIG = sig structure B : sig end
sharing A = B
end;
is not legal in SML '96, because the sharing constraint now only qualies the specication
structure B: sig end, which does not specify A. Thus not all legal SML '90 signatures
are legal in SML '96.
The removal of structure sharing has a dramatic simplifying eect on the semantics.
Most importantly, the elaboration rules can be made monogenic (i.e., \deterministic"),
up to renaming of new type names. The need for the notion of principal signature (and
even equality-principal signature) disappears. The notions of structure name, structure
consistency and well-formed signature are no longer required. The notion of cover can be
deleted. Only one kind of realisation, namely type realisation, remains. The notion of
type-explication has been removed, since it can be proved that signatures automatically
are type-explicit in the revised language.
the variable x will only be given a non-trivial polymorphic type scheme (i.e., a type
scheme which is not also a type) if exp is non-expansive. This applies even if there is no
application of ref in the entire program.
Example: in the declaration val x = [] @ [], x can be assigned type 'a list, but
not the type scheme 8'a:'a list (since [] @ [] is an expansive expression). Conse-
quently, (1::x, true::x) will not elaborate in the scope of the declaration. Also, if the
declaration appears at top level, the compiler may refuse elaboration due to a top-level
free type variable (see G.8). Thus the top-level phrase [] @ [] may fail, since it abbrevi-
ates val it = [] @ []. But of course it will not fail if a monotype is explicitly ascribed,
e.g. [] @ []:int list.
On the other hand, in fun f() = [] @ [] (or val f = fn () => [] @ []), f can
be assigned type scheme 8'a:unit ! 'a list so that, for example, (1::f(),true::f())
82 G APPENDIX: WHAT IS NEW?
When elaborated, this binds type constructor tycon to the entire type structure (value
constructors included) to which longtycon is bound in the context. Datatype replication
does not generate a new datatype: the original and the replicated datatype share.
Here is an example of a use of the new construct:
signature MYBOOL =
sig
type bool
val xor: bool * bool -> bool
end;
structure MyBool: MYBOOL =
struct
datatype bool = datatype bool (* from the initial basis *)
fun xor(true, false) = true
| xor(false, true) = true
| xor _ = false
end;
val x = MyBool.xor(true, false);
Here MyBool.xor(true, false) evaluates to true. Note the use of transparent signature
matching; had opaque matching been used instead, the declaration of x would not have
elaborated.
A datatype replication implicitly introduces the value constructors of longtycon into
the current scope. This is signicant for signature matching. For example, the following
program is legal:
datatype t0 = C;
structure A : sig type t val C: t end =
struct
datatype t = datatype t0
end;
Note that C is specied as a value in the signature; the datatype replication copies the
value environment of t0 into the structure and that is why the structure contains the
required C value.
To make it possible for datatype replication to copy value environments associated with
type constructors, the dynamic semantics has been modied so that environments now
contain a TE component (see Figure 13, page 38). Further, in the dynamic semantics
of modules, the # operation, which is used for cutting down structures when they are
matched against signatures, has been extended to cover the TE component (see page 48).
In the above example, the value environment assigned to A.t will be empty, signifying
that the type has no value constructors. Had the signature instead been
sig datatype t val C: t end
84 G APPENDIX: WHAT IS NEW?
then the signature matching would have assigned A.t a value environment with domain
fCg, indicating that A.t has value constructor C.
When the datatype replication is used as a specication, longtycon can refer to a
datatype which has been introduced either by declaration or by specication. Here is an
example of the former:
datatype t = C | D;
signature SIG =
sig
datatype t = datatype t (* replication is not recursive! *)
val f: t -> t
end
(where the presence of the structure declaration forces each val declaration to be parsed
as a strdec), nor
structure A: sig val f: int -> int end =
struct
val f = (fn x => x)(fn x => x)
end
would be legal in SML '96, if the side-condition were enforced. (A type-checker may
at rst infer the type 'a ! 'a from the declaration of f, but since (fn x => x)(fn
x => x) is expansive, the generalisation to 8'a:'a ! 'a is not allowed.) By dropping
the side-condition, it becomes possible to have the textual context of a structure-level
declaration constrain free type variables to monotypes. Thus both the above examples
can be elaborated.
Rather than lifting the notion of principal environments to the modules level, we have
chosen to drop the requirement of principality. Since the notion of principal environ-
ments is no longer used in the rules, even the denition of principal environments has
been removed. In practice, however, type checkers still have to infer types that are as
general as possible, since implementations should not reject programs for which successful
elaboration is possible.
In order to avoid reporting free type variables to users, rules 87 and 89 require that the
environment to which a topdec elaborates must not contain free type variables. It is possi-
ble to satisfy this side-condition by replacing such type variables by arbitrary monotypes;
however, implementers may instead choose to refuse elaboration in such situations.
be matched. With the removal of structure sharing, the primary purpose of consistency
has gone away. In our experience, the secondary purpose has turned out not to be very
signicant in practice. Textual copying of datatype specications in dierent signatures
is best avoided, since changes in the datatype will have to be done several places. In
practice, it is better to specify a datatype in one signature and then access it elsewhere
using structure specications or include. In SML '90 one could specify sharing between a
datatype specication and an external (i.e., declared) datatype, and a consistency check
was useful in this case. But in SML '96 this form of sharing is not allowed, so there
remains no strong reason for preserving consistency; therefore it has been dropped.
In SML '90, admissibility was imposed partly to ensure the existence of principal sig-
natures (which are no longer needed) and partly to ban certain unmatchable signatures.
In SML '90, admissibility was the conjunction of well-formedness, cycle-freedom and con-
sistency. Cycle-freedom is no longer relevant, since there is no structure sharing. We have
already discussed consistency. Well-formedness of signatures is no longer relevant, but the
notion of well-formed type structures is still relevant. It turns out that well-formedness
only needs to be checked in one place (in rule 64). Otherwise, well-formedness is pre-
served by the rules (in a sense which can be made precise). Thus one can avoid a global
well-formedness requirement and dispense with admissibility. This we have done.
G.11 Comments
A clarication concerning unmatched comment brackets was presented in the Commen-
tary; subsequently, Stefan Kahrs discovered a problem with demanding that an unmatched
*) be reported by the compiler. In SML '96, we therefore simply demand that an un-
matched (* must be reported by the compiler.
There is no requirement that all explicit type variables be bound by this binding
construct. For those that are not, the scope rules of the 1990 Denition apply. The
explicit binding construct has no impact on the dynamic semantics. In particular, there
are no explicit type abstractions or applications in the dynamic semantics.
88 G APPENDIX: WHAT IS NEW?
G.20 Overloading
The Standard ML Basis Library[18] rests on an overloading scheme for special constants
and pre-dened identiers. We have adopted this scheme (see Appendix E).
G.21 Reals
real is no longer an equality type and real constants are no longer allowed in patterns.
The Basis Library provides IEEE equality operations on reals.