SML 97.ps

The Denition of Standard ML
Revised 1996
October 31, 1996

Version: 7.4.4
Robin Milner, Mads Tofte, Robert Harper and David MacQueen

Prelude to the Revised Denition
Standard ML is an industrial strength programming language, one of very few with a fully
formal denition. The Denition of Standard ML was published in 1990. Since then
the implementation technology of the language has advanced enormously, and its users
have multiplied. The language and its Denition have therefore incited close scrutiny,
evaluation, much approval, sometimes strong criticism.
The originators of the language have sifted this response, and found that there are
inadequacies in the original language and its formal Denition. They are of three kinds:
missing features which many users want; complex and little-used features which most users
can do without; and mistakes of denition. What is remarkable is that these inadequacies
are rather few, and that they are rather uncontroversial.
This new version of the Denition addresses the three kinds of inadequacy respectively
by additions, subtractions and corrections. But we have only made such amendments
when one or more aspects of SML { the language itself, its usage, its implementation, its
formal Denition { have thus become simpler, without complicating the other aspects. It
is worth noting that even the additions meet this criterion; for example we have introduced
type abbreviations in signatures to simplify the use of the language, but the way we have
done it has even simplied the Denition too. In fact, after our changes the formal
Denition has fewer rules.
In this exercise we have consulted the major implementers and several users, and have
found broad agreement. In the 1990 Denition it was predicted that further versions of
the Denition would be produced as the language develops, with the intention to minimise
the number of versions. This is the rst revised version, and we foresee no others.
The shape of this new version of the Denition is as follows. The 1990 Denition, fully
revised to deal with the amendments, has been sandwiched between this Prelude at the
front, and a postlude (Appendix G) at the back which enumerates all the amendments
which have been done, giving the rationale for each, and outlining the changes which it
implies for the language and for the 1990 Denition.
We wish to thank Dave Berry, Lars Birkedal, Martin Elsman, Stefan Kahrs and John
Reppy for many detailed comments and suggestions.
Robin Milner Mads Tofte Robert Harper David MacQueen
October 1996
iii
Preface
A precise description of a programming language is a prerequisite for its implementation
and for its use. The description can take many forms, each suited to a dierent purpose. A
common form is a reference manual, which is usually a careful narrative description of the
meaning of each construction in the language, often backed up with a formal presentation
of the grammar (for example, in Backus-Naur form). This gives the programmer enough
understanding for many of his purposes. But it is ill-suited for use by an implementer, or
by someone who wants to formulate laws for equivalence of programs, or by a programmer
who wants to design programs with mathematical rigour.
This document is a formal description of both the grammar and the meaning of a
language which is both designed for large projects and widely used. As such, it aims
to serve the whole community of people seriously concerned with the language. At a
time when it is increasingly understood that programs must withstand rigorous analysis,
particular for systems where safety is critical, a rigorous language presentation is even
important for negotiators and contractors; for a robust program written in an insecure
language is like a house built upon sand.
Most people have not looked at a rigorous language presentation before. To help them
particularly, but also to put the present work in perspective for those more theoretically
prepared, it will be useful here to say something about three things: the nature of Standard
ML, the task of language denition in general, and the form of the present Denition.
Standard ML
Standard ML is a functional programming language, in the sense that the full power of
mathematical functions is present. But it grew in response to a particular programming
task, for which it was equipped also with full imperative power, and a sophisticated
exception mechanism. It has an advanced form of parametric modules, aimed at organised
development of large programs. Finally it is strongly typed, and it was the rst language to
provide a particular form of polymorphic type which makes the strong typing remarkably
exible. This combination of ingredients has not made it unduly large, but their novelty
has been a fascinating challenge to semantic method (of which we say more below).
ML has evolved over twenty years as a fusion of many ideas from many people. This
evolution is described in some detail in Appendix F of the book, where also we acknowledge
all those who have contributed to it, both in design and in implementation.
`ML' stands for meta language; this is the term logicians use for a language in which
other (formal or informal) languages are discussed and analysed. Originally ML was con-
ceived as a medium for nding and performing proofs in a logical language. Conducting
rigorous argument as dialogue between person and machine has been a growing research
topic throughout these twenty years. The diculties are enormous, and make stern de-
mands upon the programming language which is used for this dialogue. Those who are
not familiar with computer-assisted reasoning may be surprised that a programming lan-
guage, which was designed for this rather esoteric activity, should ever lay claim to being
iv
generally useful. On re ection, they should not be surprised. LISP is a prime example of
a language invented for esoteric purposes and becoming widely used. LISP was invented
for use in articial intelligence (AI); the important thing about AI here is not that it is
esoteric, but that it is dicult and varied; so much so, that anything which works well
for it must work well for many other applications too.
The same can be said about the initial purpose of ML, but with a dierent emphasis.
Rigorous proofs are complex things, which need varied and sophisticated presentation
{ particularly on the screen in interactive mode. Furthermore the proof methods, or
strategies, involved are some of the most complex algorithms which we know. This all
applies equally to AI, but one demand is made more strongly by proof than perhaps by
any other application: the demand for rigour.
This demand established the character of ML. In order to be sure that, when the user
and the computer claim to have together performed a rigorous argument, their claim is
justied, it was seen that the language must be strongly typed. On the other hand, to be
useful in a dicult application, the type system had to be rather exible, and permit the
machine to guide the user rather than impose a burden upon him. A reasonable solution
was found, in which the machine helps the user signicantly by inferring his types for him.
Thereby the machine also confers complete reliability on his programs, in this sense: If
a program claims that a certain result follows from the rules of reasoning which the user
has supplied, then the claim may be fully trusted.
The principle of inferring useful structural information about programs is also rep-
resented, at the level of program modules, by the inference of signatures. Signatures
describe the interfaces between modules, and are vital for robust large-scale programs.
When the user combines modules, the signature discipline prevents him from mismatch-
ing their interfaces. By programming with interfaces and parametric modules, it becomes
possible to focus on the structure of a large system, and to compile parts of it in isolation
from one another { even when the system is incomplete.
This emphasis on types and signatures has had a profound eect on the language
Denition. Over half this document is devoted to inferring types and signatures for
programs. But the method used is exactly the same as for inferring what values a program
delivers; indeed, a type or signature is the result of a kind of abstract evaluation of a
program phrase.
In designing ML, the interplay among three activities { language design, denition and
implementation { was extremely close. This was particularly true for the newest part, the
parametric modules. This part of the language grew from an initial proposal by David
MacQueen, itself highly developed; but both formal denition and implementation had
a strong in uence on the detailed design. In general, those who took part in the three
activities cannot now imagine how they could have been properly done separately.
Language Denition
Every programming language presents its own conceptual view of computation. This view
is usually indicated by the names used for the phrase classes of the language, or by its
v
keywords: terms like package, module, structure, exception, channel, type, procedure,
reference, sharing, . . . . These terms also have their abstract counterparts, which may
be called semantic objects; these are what people really have in mind when they use the
language, or discuss it, or think in it. Also, it is these objects, not the syntax, which
represent the particular conceptual view of each language; they are the character of the
language. Therefore a denition of the language must be in terms of these objects.
As is commonly done in programming language semantics, we shall loosely talk of
these semantic objects as meanings. Of course, it is perfectly possible to understand
the semantic theory of a language, and yet be unable to understand the meaning of
a particular program, in the sense of its intention or purpose. The aim of a language
denition is not to formalise everything which could possibly be called the meaning of a
program, but to establish a theory of semantic objects upon which the understanding of
particular programs may rest.
The job of a language-dener is twofold. First { as we have already suggested { he
must create a world of meanings appropriate for the language, and must nd a way of
saying what these meanings precisely are. Here, he meets a problem; notation of some
kind must be used to denote and describe these meanings { but not a programming
language notation, unless he is passing the buck and dening one programming language
in terms of another. Given a concern for rigour, mathematical notation is an obvious
choice. Moreover, it is not enough just to write down mathematical denitions. The
world of meanings only becomes meaningful if the objects possess nice properties, which
make them tractable. So the language-dener really has to develop a small theory of
his meanings, in the same way that a mathematician develops a theory. Typically, after
initially dening some objects, the mathematician goes on to verify properties which
indicate that they are objects worth studying. It is this part, a kind of scene-setting,
which the language-dener shares with the mathematician. Of course he can take many
objects and their theories directly from mathematics, such as functions, relations, trees,
sequences, . . . . But he must also give some special theory for the objects which make his
language particular, as we do for types, structures and signatures in this book; otherwise
his language denition may be formal but will give no insight.
The second part of the dener's job is to dene evaluation precisely. This means that
he must dene at least what meaning, M , results from evaluating any phrase P of his
language (though he need not explain exactly how the meaning results; that is he need
not give the full detail of every computation). This part of his job must be formal to
some extent, if only because the phrases P of his language are indeed formal objects.
But there is another reason for formality. The task is complex and error-prone, and
therefore demands a high level of explicit organisation (which is, largely, the meaning
of `formality'); moreover, it will be used to specify an equally complex, error-prone and
formal construction: an implementation.
We shall now explain the keystone of our semantic method. First, we need a slight but
important renement. A phrase P is never evaluated in vacuo to a meaning M , but always
against a background; this background { call it B { is itself a semantic object, being a
distillation of the meanings preserved from evaluation of earlier phrases (typically variable
vi
1
1 Introduction
This document formally denes Standard ML.
To understand the method of denition, at least in broad terms, it helps to consider
how an implementation of ML is naturally organised. ML is an interactive language, and a
program consists of a sequence of top-level declarations; the execution of each declaration
modies the top-level environment, which we call a basis, and reports the modication to
the user.
In the execution of a declaration there are three phases: parsing, elaboration, and
evaluation. Parsing determines the grammatical form of a declaration. Elaboration, the
static phase, determines whether it is well-typed and well-formed in other ways, and
records relevant type or form information in the basis. Finally evaluation, the dynamic
phase, determines the value of the declaration and records relevant value information
in the basis. Corresponding to these phases, our formal denition divides into three
parts: grammatical rules, elaboration rules, and evaluation rules. Furthermore, the basis
is divided into the static basis and the dynamic basis; for example, a variable which has
been declared is associated with a type in the static basis and with a value in the dynamic
basis.
In an implementation, the basis need not be so divided. But for the purpose of
formal denition, it eases presentation and understanding to keep the static and dynamic
parts of the basis separate. This is further justied by programming experience. A large
proportion of errors in ML programs are discovered during elaboration, and identied as
errors of type or form, so it follows that it is useful to perform the elaboration phase
separately. In fact, elaboration without evaluation is part of what is normally called
compilation; once a declaration (or larger entity) is compiled one wishes to evaluate it {
repeatedly { without re-elaboration, from which it follows that it is useful to perform the
evaluation phase separately.
A further factoring of the formal denition is possible, because of the structure of the
language. ML consists of a lower level called the Core language (or Core for short), a
middle level concerned with programming-in-the-large called Modules, and a very small
upper level called Programs. With the three phases described above, there is therefore
a possibility of nine components in the complete language denition. We have allotted
one section to each of these components, except that we have combined the parsing,
elaboration and evaluation of Programs in one section. The scheme for the ensuing seven
sections is therefore as follows:
Core Modules Programs
Syntax Section 2 Section 3
Static Semantics Section 4 Section 5 Section 8
Dynamic Semantics Section 6 Section 7
The Core provides many phrase classes, for programming convenience. But about
half of these classes are derived forms, whose meaning can be given by translation into
the other half which we call the Bare language. Thus each of the three parts for the
2 1 INTRODUCTION
Core treats only the bare language; the derived forms are treated in Appendix A. This
appendix also contains a few derived forms for Modules. A full grammar for the language
is presented in Appendix B.
In Appendices C and D the initial basis is detailed. This basis, divided into its static
and dynamic parts, contains the static and dynamic meanings of a small set of predened
identiers. A richer basis is dened in a separate document[18].
The semantics is presented in a form known as Natural Semantics. It consists of a set
of rules allowing sentences of the form
A ` phrase ) A 0
to be inferred, where A is often a basis (static or dynamic) and A a semantic object {

0
often a type in the static semantics and a value in the dynamic semantics. One should read
such a sentence as follows: \against the background provided by A, the phrase phrase
elaborates { or evaluates { to the object A ". Although the rules themselves are formal the
0
semantic objects, particularly the static ones, are the subject of a mathematical theory
which is presented in a succinct form in the relevant sections.
The robustness of the semantics depends upon theorems. Usually these have been
proven, but the proof is not included.
3
2 Syntax of the Core

2.1 Reserved Words
The following are the reserved words used in the Core. They may not (except = ) be
used as identiers.
abstype and andalso as case datatype do else
end exception fn fun handle if in infix
infixr let local nonfix of op open orelse
raise rec then type val with withtype while
( ) [ ] { } , : ; ... _ | = => -> #
2.2 Special constants

An integer constant (in decimal notation) is an optional negation symbol (~) followed
by a non-empty sequence of decimal digits 0; . . ; 9. An integer constant (in hexadecimal
notation) is an optional negation symbol followed by 0x followed by a non-empty sequence
of hexadecimal digits 0; . . ; 9 and a; . . ; f. (A; . . ; F may be used as alternatives for a; . . ; f.)
A word constant (in decimal notation) is 0w followed by a non-empty sequence of
decimal digits. A word constant (in hexadecimal notation) is 0wx followed by a non-empty
sequence of hexadecimal digits. A real constant is an integer constant in decimal notation,
possibly followed by a point (.) and one or more decimal digits, possibly followed by an
exponent symbol (E or e) and an integer constant in decimal notation; at least one of
the optional parts must occur, hence no integer constant is a real constant. Examples:
0.7 3.32E5 3E~7 . Non-examples: 23 .3 4.E5 1E2.0 .
We assume an underlying alphabet of N characters (N 256), numbered 0 to N 1,
which agrees with the ASCII character set on the characters numbered 0 to 127. The
interval [0; N 1] is called the ordinal range of the alphabet. A string constant is a
sequence, between quotes ("), of zero or more printable characters (i.e., numbered 33{
126), spaces or escape sequences. Each escape sequence starts with the escape character
\ , and stands for a character sequence. The escape sequences are:
\a A single character interpreted by the system as alert (ASCII 7)

\b Backspace (ASCII 8)
\t Horizontal tab (ASCII 9)
\n Linefeed, also known as newline (ASCII 10)
\v Vertical tab (ASCII 11)
\f Form feed (ASCII 12)
\r Carriage return (ASCII 13)
\^ c The control character c, where c may be any character with number
64{95. The number of \^c is 64 less than the number of c.
\ddd The single character with number ddd (3 decimal digits denoting
an integer in the ordinal range of the alphabet).
4 2 SYNTAX OF THE CORE
\u xxxx The single character with number xxxx (4 hexadecimal digits de-
noting an integer in the ordinal range of the alphabet).
\" "
\\ \
\f f \ This sequence is ignored, where f f stands for a sequence of one
or more formatting characters.
The formatting characters are a subset of the non-printable characters including at
least space, tab, newline, formfeed. The last form allows long strings to be written on
more than one line, by writing \ at the end of one line and at the start of the next.
A character constant is a sequence of the form #s, where s is a string constant denoting
a string of size one character.
Libraries may provide multiple numeric types and multiple string types. To each
string type corresponds an alphabet with ordinal range [0; N 1] for some N 256;
each alphabet must agree with the ASCII character set on the characters numbered 0 to
127. When multiple alphabets are supported, all characters of a given string constant are
interpreted over the same alphabet. For each special constant, overloading resolution is
used for determining the type of the constant (see Appendix E).
We denote by SCon the class of special constants, i.e., the integer, real, word, character
and string constants; we shall use scon to range over SCon.
2.3 Comments
A comment is any character sequence within comment brackets (* *) in which comment
brackets are properly nested. No space is allowed between the two characters which make
up a comment bracket (* or *). An unmatched (* should be detected by the compiler.
2.4 Identiers

The classes of identiers for the Core are shown in Figure 1. We use vid , tyvar to
range over VId, TyVar etc. For each class X marked \long" there is a class longX of
long identiers; if x ranges over X then longx ranges over longX. The syntax of these long
identiers is given by the following:
longx ::= x identier
strid 1 ::strid n:x qualied identier (n 1)
VId (value identiers ) long

TyVar (type variables )
TyCon (type constructors ) long
Lab (record labels )
StrId (structure identiers ) long
Figure 1: Identiers
2.5 Lexical analysis 5
The qualied identiers constitute a link between the Core and the Modules. Through-
out this document, the term \identier", occurring without an adjective, refers to non-
qualied identiers only.
An identier is either alphanumeric: any sequence of letters, digits, primes (') and
underbars ( ) starting with a letter or prime, or symbolic: any non-empty sequence of the
following symbols
! % & $ # + - / : < = > ? @ \ ~ ` ^ | *
In either case, however, reserved words are excluded. This means that for example # and
| are not identiers, but ## and |=| are identiers. The only exception to this rule
is that the symbol = , which is a reserved word, is also allowed as an identier to stand
for the equality predicate. The identier = may not be re-bound; this precludes any
syntactic ambiguity.
A type variable tyvar may be any alphanumeric identier starting with a prime; the
subclass EtyVar of TyVar, the equality type variables, consists of those which start with
two or more primes. The classes VId, TyCon and Lab are represented by identiers
not starting with a prime. However, * is excluded from TyCon, to avoid confusion with
the derived form of tuple type (see Figure 23). The class Lab is extended to include the
numeric labels 1 2 3 , i.e. any numeral not starting with 0. The identier class StrId
is represented by alphanumeric identiers not starting with a prime.
TyVar is therefore disjoint from the other four classes. Otherwise, the syntax class
of an occurrence of identier id in a Core phrase (ignoring derived forms, Section 2.7) is
determined thus:
1. Immediately before \." { i.e. in a long identier { or in an open declaration, id is
a structure identier. The following rules assume that all occurrences of structure
identiers have been removed.
2. At the start of a component in a record type, record pattern or record expression,
id is a record label.
3. Elsewhere in types id is a type constructor.
4. Elsewhere, id is a value identier.
By means of the above rules a compiler can determine the class to which each identier
occurrence belongs; for the remainder of this document we shall therefore assume that
the classes are all disjoint.
2.5 Lexical analysis

Each item of lexical analysis is either a reserved word, a numeric label, a special constant
or a long identier. Comments and formatting characters separate items (except within
string constants; see Section 2.2) and are otherwise ignored. At each stage the longest
next item is taken.
6 2 SYNTAX OF THE CORE
2.6 Inxed operators

An identier may be given inx status by the infix or infixr directive, which may
occur as a declaration; this status only pertains to its use as a vid within the scope (see
below) of the directive, and in these uses it is called an inxed operator. (Note that
qualied identiers never have inx status.) If vid has inx status, then \exp 1 vid exp 2"
(resp. \pat 1 vid pat 2") may occur { in parentheses if necessary { wherever the applica-
tion \vid {1=exp 1,2=exp2}" or its derived form \vid (exp 1,exp 2)" (resp \vid (pat 1,pat 2)")
would otherwise occur. On the other hand, an occurrence of any long identier (qualied
or not) prexed by op is treated as non-inxed. The only required use of op is in prexing
a non-inxed occurrence of an identier vid which has inx status; elsewhere op, where
permitted, has no eect. Inx status is cancelled by the nonfix directive. We refer to
the three directives collectively as xity directives.
The form of the xity directives is as follows (n 1):
infix hdi vid 1 vid n
infixr hdi vid 1 vid n
nonfix vid 1 vid n
where hdi is an optional decimal digit d indicating binding precedence. A higher value of
d indicates tighter binding; the default is 0. infix and infixr dictate left and right
associativity respectively. In an expression of the form exp 1 vid 1 exp 2 vid 2 exp 3, where vid 1
and vid 2 are inxed operators with the same precedence, either both must associate to
the left or both must associate to the right. For example, suppose that << and >> have
equal precedence, but associate to the left and right respectively; then
x << y << z parses as (x << y) << z
x >> y >> z parses as x >> (y >> z)
x << y >> z is illegal
x >> y << z is illegal
The precedence of inxed operators relative to other expression and pattern construc-
tions is given in Appendix B.
The scope of a xity directive dir is the ensuing program text, except that if dir occurs
in a declaration dec in either of the phrases
let dec in end
dec in end
local
then the scope of dir does not extend beyond the phrase. Further scope limitations are
imposed for Modules (see Section 3.3).
These directives and op are omitted from the semantic rules, since they aect only
parsing.
3.5 Syntactic Restrictions 13
strexp ::= struct strdec end basic

longstrid structure identier
strexp :sigexp transparent constraint
strexp :>sigexp opaque constraint
funid ( strexp ) functor application
let strdec in strexp end local declaration
strdec ::= dec declaration
structure strbind structure
local strdec 1 in strdec 2 end local
empty
strdec 1 h;i strdec 2 sequential
strbind ::= strid = strexp hand strbind i
sigexp ::= sig spec end basic
sigid signature identier
sigexp where type type realisation
tyvarseq longtycon = ty
sigdec ::= signature sigbind
sigbind ::= sigid = sigexp hand sigbind i
Figure 6: Grammar: Structure and Signature Expressions

14 3 SYNTAX OF MODULES
spec ::= val valdesc value

type typdesc type
eqtype typdesc eqtype
datatype datdesc datatype
datatype tycon = datatype longtycon replication
exception exdesc exception
structure strdesc structure
include sigexp include
empty
spec 1 h;i spec 2 sequential
spec sharing type sharing
longtycon 1 = = longtycon n (n 2)
valdesc ::= vid : ty hand valdesc i
typdesc ::= tyvarseq tycon hand typdesc i
datdesc ::= tyvarseq tycon = condesc hand datdesc i
condesc ::= vid hof ty i h | condesci
exdesc ::= vid hof ty i hand exdesc i
strdesc ::= strid : sigexp hand strdesc i
Figure 7: Grammar: Specications
fundec ::= functor funbind

funbind ::= funid ( strid : sigexp ) = strexp functor binding
hand funbind i
topdec ::= strdec htopdec i structure-level declaration
sigdec htopdec i signature declaration
fundec htopdec i functor declaration
Restriction: No topdec may contain, as an initial segment, a strdec followed
by a semicolon.
Figure 8: Grammar: Functors and Top-level Declarations
62 B APPENDIX: FULL GRAMMAR
atexp ::= scon special constant

hopilongvid value identier
{ hexprow i } record
# lab record selector
() 0-tuple
(exp 1 ,, exp n ) n-tuple, n 2
[exp 1 ,, exp n ] list, n 0
(exp 1 ;; exp n ) sequence, n 2
let dec in exp 1 ; ; exp n end local declaration, n 1
( exp )
exprow ::= lab = exp h , exprowi expression row

appexp ::= atexp
appexp atexp application expression
infexp ::= appexp
infexp 1 vid infexp 2 inx expression
exp ::= infexp
exp : ty typed (L)
exp 1 andalso exp 2 conjunction
exp 1 orelse exp 2 disjunction
exp handle match handle exception
raise exp raise exception
if exp 1 then exp 2 else exp 3 conditional
while exp1 do exp2 iteration
case exp of match case analysis
fn match function
match ::= mrule h | matchi
mrule ::= pat => exp
Figure 20: Grammar: Expressions and Matches

63
dec ::= val tyvarseq valbind value declaration

fun tyvarseq fvalbind function declaration
type typbind type declaration
h
datatype datbind withtype typbind i datatype declaration
datatype tycon = datatype longtycon datatype replication
h
abstype datbind withtype typbind i abstype declaration
with dec end
exception exbind exception declaration
local dec 1 in dec 2 end local declaration
open longstrid 1
longstrid n open declaration, n 1
empty declaration
dec 1 h;i dec 2 sequential declaration
infix hdi vid 1 vid n inx (L) directive, n 1
infixr hdi vid 1 vid n inx (R) directive, n 1
nonfix vid 1 vid n nonx directive, n 1
valbind ::= pat = exp hand valbind i
rec valbind
fvalbind ::= hopivid atpat 11atpat 1n h:tyi=exp1 m; n 1

|hopivid atpat 21atpat 2n h:tyi=exp2 See also note below
|
h ivid atpat m1atpat mn h:tyi=expm
| op
hand fvalbind i
typbind ::= tyvarseq tycon = ty hand typbind i
datbind ::= tyvarseq tycon = conbind hand datbind i
conbind ::= hopivid hof ty i h | conbindi
exbind ::= hopivid hof ty i hand exbind i
hopivid = hopilongvid hand exbind i
Note: In the fvalbind form, if vid has inx status then either op must be present, or
vid must be inxed. Thus, at the start of any clause, \ op vid (atpat,atpat ) " may
0
be written \(atpat vid atpat ) "; the parentheses may also be dropped if \:ty" or \="
0
follows immediately.
Figure 21: Grammar: Declarations and Bindings
64 B APPENDIX: FULL GRAMMAR
atpat ::= wildcard

scon special constant
hopilongvid value identier
{ hpatrow i } record
() 0-tuple
(pat 1 , , pat n) n-tuple, n 2
[pat 1 , , pat n] list, n 0
( pat )
patrow ::= ... wildcard

h
lab = pat , patrowi pattern row
h ih i h patrowi label as variable
vid :ty as pat ,
pat ::= atpat atomic
hopilongvid atpat constructed value
pat 1 vid pat 2 constructed value (inx)
pat : ty typed
hopividh: tyi as pat layered
Figure 22: Grammar: Patterns
ty ::= tyvar type variable

{ htyrow i } record type expression
tyseq longtycon type construction
ty 1 * * ty n tuple type, n 2
ty -> ty 0
function type expression (R)
( ty )
tyrow ::= lab : ty h , tyrowi type-expression row
Figure 23: Grammar: Type expressions

65
C Appendix: The Initial Static Basis

In this appendix (and the next) we dene a minimal initial basis for execution. Richer
bases may be provided by libraries. We shall indicate components of the initial basis by
the subscript 0. The initial static basis is B0 = T0; F0; G0; E0, where F0 = fg, G0 = fg
and
T0 = fbool; int; real; string; char; word; list; ref; exng
The members of T0 are type names, not type constructors; for convenience we have used
type-constructor identiers to stand also for the type names which are bound to them
in the initial static type environment TE 0. Of these type names, list and ref have
arity 1, the rest have arity 0; all except exn and real admit equality. Finally, E0 =
(SE 0; TE 0; VE 0), where SE0 = fg, while TE 0 and VE 0 are shown in Figures 24 and 25,
respectively.
tycon 7! ( , fvid 1 7! (1; is1); . . . ; vid n 7! (n; isn)g ) (n 0)
unit 7! ( ():fg, fg )
bool 7! ( bool, ftrue 7! (bool; c); false 7! (bool; c)g )
int 7! ( int, fg )
word 7! ( word, fg )
real 7! ( real, fg )
string 7! ( string, fg )
char 7! ( char, fg )
list 7! ( list, fnil 7! (8'a : 'a list; c),
::7! (8'a : 'a 'a list ! 'a list; c)g )
ref 7! ( ref, fref 7! (8 'a : 'a ! 'a ref; c)g )
exn 7 ( exn,
! fg )
Figure 24: Static TE 0
NONFIX INFIX
vid 7! (; is) vid 7! (; is)
ref 7! (8 'a : 'a ! 'a ref, c) Precedence 5, right associative :
nil 7! (8'a: 'a list, c) :: 7! (8'a:'a 'a list ! 'a list, c)
true 7! (bool; c) Precedence 4, left associative :
false 7! (bool; c) = 7! (8''a: ''a ''a ! bool; v)
Match 7! (exn; e) Precedence 3, left associative :
Bind 7! (exn; e) := 7! (8'a: 'a ref 'a ! fg; v)
Note: In type schemes we have taken the liberty of writing ty 1 ty 2 in place of f1 7!
ty 1; 2 7! ty 2g.
Figure 25: Static VE 0
66 D APPENDIX: THE INITIAL DYNAMIC BASIS
D Appendix: The Initial Dynamic Basis

We shall indicate components of the initial basis by the subscript 0. The initial dynamic
basis is B0 = F0; G0; E0, where F0 = fg, G0 = fg and E0 = (SE 0; TE 0; VE 0), where
SE 0 = fg, TE 0 is shown in Figure 26 and
VE 0 = f= 7! (=; v); := 7! (:=; v); Match 7! (Match; e); Bind 7! (Bind; e);
true 7! (true; c); false 7! (false; c);
nil 7! (nil; c); :: 7! (::; c); ref 7! (ref; c)g.
tycon 7! fvid 1 7! (v1; is1); . . . ; vid n 7! (vn; isn)g (n 0)

unit 7! fg
bool 7 ! ftrue 7! (true; c); false 7! (false; c)g
int 7 ! fg
word 7! fg
real 7! fg
string 7! fg
char 7! fg
list 7! fnil 7! (nil; c); :: 7! (::; c)g
ref 7 ! fref 7! (ref; c)g
exn 7 ! fg
Figure 26: Dynamic TE 0
67
E Overloading
Two forms of overloading are available:
Certain special constants are overloaded. For example, 0w5 may have type word or
some other type, depending on the surrounding program text;
Certain operators are overloaded. For example, + may have type int int ! int
or real real ! real, depending on the surrounding program text;
Programmers cannot dene their own overloaded constants or operators.
Although a formal treatment of overloading is outside the scope of this document, we
do give a complete list of the overloaded operators and of types with overloaded special
constants. This list is consistent with the Basis Library[18].
Every overloaded constant and value identier has among its types a default type,
which is assigned to it, when the surrounding text does not resolve the overloading. For
this purpose, the surrounding text is no larger than the smallest enclosing structure-level
declaration; an implementation may require that a smaller context determines the type.
E.1 Overloaded special constants

Libraries may extend the set T0 of Appendix C with additional type names. Thereafter,
certain subsets of T0 have a special signicance; they are called overloading classes and
they are:
Int fintg
Real frealg
Word fwordg
String fstringg
Char fcharg
WordInt = Word [ Int
RealInt = Real [ Int
Num = Word [ Real [ Int
NumTxt = Word [ Real [ Int [ String [ Char
Among these, the ve rst (Int, Real, Word, String and Char) are said to be basic; the
remaining are said to be composite. The reason that the basic classes are specied using
rather than = is that libraries may extend each of the basic overloading classes with
further type names. Special constants are overloaded within each of the basic overloading
classes. However, the basic overloading classes must be arranged so that every special
constant can be assigned types from at most one of the basic overloading classes. For
example, to 0w5 may be assigned type word, or some other member of Word, depending
on the surrounding text. If the surrounding text does not determine the type of the
constant, a default type is used. The default types for the ve sets are int, real, word,
string and char respectively.
68 E OVERLOADING
NONFIX INFIX
var 7! set of monotypes var 7! set of monotypes
abs 7 realint ! realint
! Precedence 7, left associative :
~ 7! realint ! realint div 7! wordint wordint ! wordint
mod 7 wordint wordint ! wordint
!
* 7! num num ! num
/ 7! Real Real ! Real
Precedence 6, left associative :
+ 7! num num ! num
- 7 num
! num ! num
Precedence 4, left associative :
< 7! numtxt numtxt ! numtxt
> 7! numtxt numtxt ! numtxt
<= 7! numtxt numtxt ! numtxt
>= 7! numtxt numtxt ! numtxt
Figure 27: Overloaded identiers
Once overloading resolution has determined the type of a special constant, it is a
compile-time error if the constant does not make sense or does not denote a value within
the machine representation chosen for the type. For example, an escape sequence of the
form \uxxxx in a string constant of 8-bit characters only makes sense if xxxx denotes a
number in the range [0; 255].
E.2 Overloaded value identiers

Overloaded identiers all have identier status v. An overloaded identier may be re-
bound with any status (v, c and e) but then it is not overloaded within the scope of the
binding.
The overloaded identiers are given in Figure 27. For example, the entry
abs7! realint ! realint
states that abs may assume one of the types ft ! t j t 2 RealIntg. In general, the same
type name must be chosen throughout the entire type of the overloaded operator; thus
abs does not have type real ! int.
The operator / is overloaded on all members of Real, with default type real real !
real. The default type of any other identier is that one of its types which contains
the type name int. For example, the program fun double(x) = x + x; declares a
function of type int int ! int, while fun double(x:real) = x + x; declares a
function of type real real ! real.
The dynamic semantics of the overloaded operators is dened in [18].
76 G APPENDIX: WHAT IS NEW?
G Appendix: What is New?

This appendix gives an overview of how the present Denition diers from the 1990
Denition of Standard ML[42]. For the purpose of this appendix, we write SML '90 for
the language dened by the 1990 Denition and SML '96 for the present language. For
each major change, we give its rationale and an overview of its practical implications.
Also, the index (page 94 .) may be used for locating changes.
G.1 Type Abbreviations in Signatures

There are cases of type sharing which cannot be expressed in SML '90 signatures although
they arise in structures. For example, there is no SML '90 signature which precisely
describes the relationship between s and t in
structure a =
struct
datatype s = C
type t = s * s
end
In SML '96, one can write type abbreviations in signatures, e.g.,

signature A =
sig
type s
type t = s * s
end
The need for type abbreviations in signatures was clear when SML '90 was dened. How-
ever, type abbreviations were not included since, in the presence of both structure sharing
and type abbreviations, principal signatures do not exist[41] { and the SML '90 Deni-
tion depended strongly upon the notion of principal signature. Subsequently, Harper's
and Lillibridge's work on translucent sums[22] and Leroy's work on modules[30] showed
that, in the absence of structure sharing and certain other features of the SML '90 sig-
natures, type abbreviations in signatures are possible. Type abbreviations in signatures
were implemented by David MacQueen in SML/NJ 0.93 and by Xavier Leroy in Caml
Special Light [31].
In SML '96, structure sharing has been removed (see Section G.3 below). Type ab-
breviations are not included directly, but they arise as a derived form, as follows. First,
a new form of signature expression is allowed:
sigexp where type tyvarseq longtycon = ty
Here longtycon has to be specied by sigexp . The type expression ty may refer to type
constructors which are present in the basis in which the whole signature expression is
elaborated, but not to type constructors specied in sigexp .
G.2 Opaque Signature Matching 77
The eect of the where type is, roughly speaking, to instantiate longtycon to ty . For
example, the following sequence of declarations is legal:
signature SIG1 = sig type t; val x: t end;
signature SIG2 = SIG1 where type t = int*int;
structure S1: SIG1 = struct type t = real; val x = 1.0 end;
structure S2: SIG2 = struct type t = int*int; val x = (5, 7) end;
Next, a type abbreviation is a derived form. For example, type u = t*t is equivalent to
include sig type u end where type u = t*t . In SML '96 it is allowed to include
an arbitrary signature expression, not just a signature identier.
G.2 Opaque Signature Matching

In imposing a signature on a structure, one often wants the types of the resulting structure
to be \abstract" in order to hide their implementation. (Signature matching in SML '90
hides components, but does not hide type sharing.) MacQueen originally suggested and
implemented an abstraction declaration for this purpose[32]. In the Commentary[41] it
was pointed out that the issue is the semantics of matching. SML '96 provides two kinds
of matching, as new forms of structure expression:
strexp : sigexp
strexp :> sigexp
The rst (:) is the SML '90 signature matching; the second (:>) is opaque matching.
Opaque matching can be applied to the result structure of a functor; thus it is more
general than MacQueen's abstraction declaration. In CAML Special Light, all signature
matching is opaque.
With opaque matching, types in the resulting structure will be abstract, to precisely
the degree expressed in sigexp . Thus
signature Sig =
sig
type t = int
val x: t
type u
val y: u
end;
structure S1:> Sig =
struct type t = int
val x = 3
type u = real
val y = 3.0
end
val r = S1.x + 1
is legal, but a subsequent declaration val s = S1.y + 1.5 will fail to elaborate. Simi-
larly, consider the functor declaration:
functor Dict(type t; val leq: t*t->int):>
sig type u = t*t
type 'a dict
end =
struct
type u = t*t
type 'a dict = (t * 'a) list
end
When applied, Dict will propagate the identity of the type t from argument to result,
but it will produce a fresh dict type upon each application.
Types which are specied as \abstract" in a opaque functor result signature give rise
to generation of fresh type names upon each application of the functor, even if the functor
body is a constant structure. For example, after the elaboration of
structure A = struct type t = int end
functor f():> sig type t end = A
structure B = f()
and C = f();
the two types B.t and C.t are dierent.
G.3 Sharing
Structure sharing is a key idea in MacQueen's original Modules design[32]. The theoretical
aspects of structure sharing have been the subject of considerable research attention[24,53,
1,55,35]. However, judging from experience, structure sharing is not often used in its full
generality, namely to ensure identity of values. Furthermore, experience from teaching
suggests that the structure sharing concept is somewhat hard to grasp. Finally, the
semantic accounts of structure sharing that have been proposed are rather complicated.
The static semantics of SML '96 has no notion of structure sharing. However, SML '96
does provide a weaker form of structure sharing constraints, in which structure sharing is
regarded as a derived form, equivalent to a collection of type sharing constraints.
G.3.1 Type Sharing
In SML '90, a type sharing constraint sharing type longtycon 1 = = longtycon n was
an admissible form of specication. In SML '96 such a constraint does not stand by itself
as a specication, but may be used to qualify a specication. Thus there is a new form
of specication, which we shall call a qualied specication:
spec sharing type longtycon 1 = = longtycon n
G.3 Sharing 79
Here the long type constructors have to be specied by spec . The type constructors may
have been specied by type, eqtype or datatype specications, or indirectly through
signature identiers and include. In order for the specication to be legal, all the type
constructors must denote exible type names. More precisely, let B be the basis in which
the qualied specication is elaborated. Let us say that a type name t is rigid (in B )
if t 2 T of B and that t is exible (in B ) otherwise. For example int is rigid in the
initial basis and every datatype declaration introduces additional rigid type names into
the basis. For the qualied specication to elaborate in basis B , it is required that each
longtycon i denotes a type name which is exible in B . In particular, no longtycon i may
denote a type function which is not also a type name (e.g., a longtycon must not denote
():s s).
For example, the two signature expressions
sig sig
type s type s
type t datatype t = C
sharing type s = t sharing type s = t
end end
are both legal. By contrast, the signature expressions

sig sig
type s type s = int
type t = s*s datatype t = C
sharing type s = t sharing type s = t
end end
are both illegal.
G.3.2 The equality attribute of specied types

If spec sharing type longtycon 1 = = longtycon n elaborates successfully, then all n
type constructors will thereafter denote the same type name. This type name will admit
equality, if spec associates an equality type name with one of the type constructors. Thus
eqtype t
type u
sharing type t = u
is legal and both t and u are equality types after the sharing qualication. The mechanism
for inferring equality attributes for datatype specications is the same as for inferring
equality attributes for datatype declarations. Thus the specication
datatype answer = YES | NO
datatype 'a option = Some of 'a | None
species two equality types. Every specication of the form datatype datdesc introduces
one type name for each type constructor described by datdesc . The equality attribute of
such a type name is determined at the point where the specication occurs. Thus, in
type s
datatype t = C of s
the type name associated with t will not admit equality, even if s later is instantiated to
an equality type. Type names associated with datatype specications can be instantiated
to other type names by subsequent type sharing or where type qualications. In this
case, no eort is made to ban type environments that do not respect equality. For example,
sig
eqtype s
datatype t = C of int -> int
sharing type s = t
end
is legal in SML '96, even though it cannot be matched by any real structure.
G.3.3 Structure Sharing
For convenience, structure sharing constraints are provided, but only as a shorthand for
type sharing constraints. There is a derived form of specication
spec sharing longstrid 1 = = longstrid k (k 2)
Here spec must specify longstrid 1; . . . ; longstrid k . The equivalent form consists of spec
qualied by all the type sharing constraints
sharing type longstrid i :longtycon = longstrid j :longtycon
(1 i < j k) such that both longstrid i:longtycon and longstrid j :longtycon are specied
by spec .
In SML '90, structure sharing constraints are transitive, but in SML '96 they are not.
For example,
structure A: sig type t end
structure B: sig end
structure C: sig type t end
sharing A=B=C
induces type sharing on t, whereas

structure A: sig type t end
structure B: sig end
structure C: sig type t end
sharing A=B sharing B=C
G.4 Value Polymorphism 81
induces no type sharing. Thus a structure sharing constraint in some cases induces less
sharing in SML '96 than in SML '90.
Next, SML '96 does not allow structure sharing equations which refer to \external"
structures. For example, the program
structure A= struct end;
signature SIG = sig structure B : sig end
sharing A = B
end;
is not legal in SML '96, because the sharing constraint now only qualies the specication
structure B: sig end, which does not specify A. Thus not all legal SML '90 signatures
are legal in SML '96.
The removal of structure sharing has a dramatic simplifying eect on the semantics.
Most importantly, the elaboration rules can be made monogenic (i.e., \deterministic"),
up to renaming of new type names. The need for the notion of principal signature (and
even equality-principal signature) disappears. The notions of structure name, structure
consistency and well-formed signature are no longer required. The notion of cover can be
deleted. Only one kind of realisation, namely type realisation, remains. The notion of
type-explication has been removed, since it can be proved that signatures automatically
are type-explicit in the revised language.
G.4 Value Polymorphism

The imperative types of SML '90 were somewhat subtle, and they propagated into sig-
natures in an unpleasant way. Experiments on existing code suggest that the power of
imperative types is rarely used fully and that value polymorphism, which can in fact be
seen as a restriction of the imperative type discipline, usually suces[57]. With value
polymorphism, there is only one kind of type variable. The denition of non-expansive
expressions (see G.13 below) is relaxed to admit more expressions. In a declaration
val x = exp
the variable x will only be given a non-trivial polymorphic type scheme (i.e., a type
scheme which is not also a type) if exp is non-expansive. This applies even if there is no
application of ref in the entire program.
Example: in the declaration val x = [] @ [], x can be assigned type 'a list, but
not the type scheme 8'a:'a list (since [] @ [] is an expansive expression). Conse-
quently, (1::x, true::x) will not elaborate in the scope of the declaration. Also, if the
declaration appears at top level, the compiler may refuse elaboration due to a top-level
free type variable (see G.8). Thus the top-level phrase [] @ [] may fail, since it abbrevi-
ates val it = [] @ []. But of course it will not fail if a monotype is explicitly ascribed,
e.g. [] @ []:int list.
On the other hand, in fun f() = [] @ [] (or val f = fn () => [] @ []), f can
be assigned type scheme 8'a:unit ! 'a list so that, for example, (1::f(),true::f())
elaborates. This transformation (-conversion) often gives the desired polymorphism.

But beware that -conversion can change the meaning of the program, if exp does not
terminate or has side-eects.
G.5 Identier Status

The 1990 Denition treated identier status informally (in Section 2.4); a fuller denition
was given in the Commentary[41, Appendix B]. However, some problems with the handling
of exception constructors remained[27, Sect. 10.3].
In the present document, we have collapsed the three identier classes Var, ExCon
and Con into a single class, VId, of value identiers. The semantic objects VE previously
called variable environments are replaced by value environments. A value environment
maps value identiers to pairs of the form (o; is), where o is some semantic object and is
is an identier status (is 2 fv; c; eg) indicating whether the identier should be regarded
as a value variable (v), a value constructor (c) or an exception constructor (e). These
changes have been carried out both in the static and in the dynamic semantics, for both
Core and Modules. Thus the assignment of identier status is incorporated formally in
the present Denition.
The denition of enrichment has been modied to allow an identier that has been
specied as a value to be matched by a value constructor or an exception constructor.
However, a specication of a value or exception constructor must be matched by a value
or exception constructor, respectively.
Thus, the status descriptor says more than just what the lexical status of the identier
is | it is a statement about the value in the corresponding dynamic environment: if the
status of id in the static environment is c, then the value in a matching dynamic environ-
ment must be a value constructor. Similarly, if the status of id in the static environment
is e, then the value in a matching dynamic environment must be an exception name. If
the status of id is just v, however, the corresponding value in the dynamic environment
can be any kind of value (of the appropriate type), including a value constructor and an
exception name.
The exception environment (EE ) has been deleted from the semantics, since it is no
longer required for the denition of enrichment. Also, the constructor environment CE in
the static semantics has been replaced by a value environment in which every identier
has status c.
The new handling of identier status admits some val rec declarations that were
illegal in SML '90 (see the comment to Rule 26).
G.6 Replication of Datatypes

SML '96 allows datatype replication, i.e. declarations and specications of the form
datatype tycon = datatype longtycon
G.6 Replication of Datatypes 83
When elaborated, this binds type constructor tycon to the entire type structure (value
constructors included) to which longtycon is bound in the context. Datatype replication
does not generate a new datatype: the original and the replicated datatype share.
Here is an example of a use of the new construct:
signature MYBOOL =
sig
type bool
val xor: bool * bool -> bool
end;
structure MyBool: MYBOOL =
struct
datatype bool = datatype bool (* from the initial basis *)
fun xor(true, false) = true
| xor(false, true) = true
| xor _ = false
end;
val x = MyBool.xor(true, false);
Here MyBool.xor(true, false) evaluates to true. Note the use of transparent signature
matching; had opaque matching been used instead, the declaration of x would not have
elaborated.
A datatype replication implicitly introduces the value constructors of longtycon into
the current scope. This is signicant for signature matching. For example, the following
program is legal:
datatype t0 = C;
structure A : sig type t val C: t end =
struct
datatype t = datatype t0
end;
Note that C is specied as a value in the signature; the datatype replication copies the
value environment of t0 into the structure and that is why the structure contains the
required C value.
To make it possible for datatype replication to copy value environments associated with
type constructors, the dynamic semantics has been modied so that environments now
contain a TE component (see Figure 13, page 38). Further, in the dynamic semantics
of modules, the # operation, which is used for cutting down structures when they are
matched against signatures, has been extended to cover the TE component (see page 48).
In the above example, the value environment assigned to A.t will be empty, signifying
that the type has no value constructors. Had the signature instead been
sig datatype t val C: t end
then the signature matching would have assigned A.t a value environment with domain
fCg, indicating that A.t has value constructor C.
When the datatype replication is used as a specication, longtycon can refer to a
datatype which has been introduced either by declaration or by specication. Here is an
example of the former:
datatype t = C | D;
signature SIG =
sig
datatype t = datatype t (* replication is not recursive! *)
val f: t -> t
end
G.7 Local Datatypes

This change is concerned with expressions of the form let dec in exp end in which
dec contains a datatype declaration. Let us refer to such a datatype declaration as a
local datatype declaration. There are two reasons why changes to the handling of local
datatype declarations are necessary.
The rst is that the rule given for elaboration of let-expressions in the 1990 Denition
is unsound[27]; the problem has to do with the ability to export type names of locally
declared datatypes out of scope.
The second is that the static semantics relies on the following invariant about all
contexts, C , which arise in elaboration from the initial basis:
tynames C T of C
This invariant is used, for example, in the rule for elaborating datatype declarations,
where type names are picked \fresh" with respect to T of C . As pointed out by Kahrs,
the second premise of rule 16 in the 1990 Denition violates the above invariant.
To solve the rst problem, the rule for elaborating let-expressions (rule 4 in the
present document) has been provided with a side-condition which prevents the type of
exp from containing type names generated by dec . For example,
let datatype t = C in C end
was legal SML '90 but is not legal SML '96.

To solve the second problem, a side-condition has been added in the rule for matches
and the rule for val rec (rules 14 and 26 of the present document). As a consequence,
again fewer programs elaborate. For example, the expression
fn x => let datatype t = C
val _ = if true then x else C
in 5
end
is not legal SML '96, although it was legal SML '90.

G.8 Principal Environments 85
G.8 Principal Environments

In SML '90, the elaboration rule which allows any dec to appear as a strdec is
C of B ` dec ) E E principal for dec in (C of B )
B ` dec ) E
The side-condition forces the type scheme in E to be as general as possible. However, this
side-condition would be undesirably restrictive in SML '96, since the new denition of the
Clos operation admits less polymorphism than the one used in SML '90. For example,
neither
val f = (fn x => x)(fn x => x)
structure A = struct end
val y = f 7
(where the presence of the structure declaration forces each val declaration to be parsed
as a strdec), nor
structure A: sig val f: int -> int end =
struct
val f = (fn x => x)(fn x => x)
end
would be legal in SML '96, if the side-condition were enforced. (A type-checker may
at rst infer the type 'a ! 'a from the declaration of f, but since (fn x => x)(fn
x => x) is expansive, the generalisation to 8'a:'a ! 'a is not allowed.) By dropping
the side-condition, it becomes possible to have the textual context of a structure-level
declaration constrain free type variables to monotypes. Thus both the above examples
can be elaborated.
Rather than lifting the notion of principal environments to the modules level, we have
chosen to drop the requirement of principality. Since the notion of principal environ-
ments is no longer used in the rules, even the denition of principal environments has
been removed. In practice, however, type checkers still have to infer types that are as
general as possible, since implementations should not reject programs for which successful
elaboration is possible.
In order to avoid reporting free type variables to users, rules 87 and 89 require that the
environment to which a topdec elaborates must not contain free type variables. It is possi-
ble to satisfy this side-condition by replacing such type variables by arbitrary monotypes;
however, implementers may instead choose to refuse elaboration in such situations.
G.9 Consistency and Admissibility

The primary purpose of consistency in SML '90 was to allow a very simple elaboration
rule for structure sharing. A secondary purpose was to ban any signature which, because
it species a datatype in inconsistent ways (e.g. with dierent constructors), can never
be matched. With the removal of structure sharing, the primary purpose of consistency
has gone away. In our experience, the secondary purpose has turned out not to be very
signicant in practice. Textual copying of datatype specications in dierent signatures
is best avoided, since changes in the datatype will have to be done several places. In
practice, it is better to specify a datatype in one signature and then access it elsewhere
using structure specications or include. In SML '90 one could specify sharing between a
datatype specication and an external (i.e., declared) datatype, and a consistency check
was useful in this case. But in SML '96 this form of sharing is not allowed, so there
remains no strong reason for preserving consistency; therefore it has been dropped.
In SML '90, admissibility was imposed partly to ensure the existence of principal sig-
natures (which are no longer needed) and partly to ban certain unmatchable signatures.
In SML '90, admissibility was the conjunction of well-formedness, cycle-freedom and con-
sistency. Cycle-freedom is no longer relevant, since there is no structure sharing. We have
already discussed consistency. Well-formedness of signatures is no longer relevant, but the
notion of well-formed type structures is still relevant. It turns out that well-formedness
only needs to be checked in one place (in rule 64). Otherwise, well-formedness is pre-
served by the rules (in a sense which can be made precise). Thus one can avoid a global
well-formedness requirement and dispense with admissibility. This we have done.
G.10 Special Constants

The class of special constants has been extended with word and char constants and with
hexadecimal notation. Also, there are additional escape sequences in strings and support
for UNICODE characters. See Section 2.2.
G.11 Comments
A clarication concerning unmatched comment brackets was presented in the Commen-
tary; subsequently, Stefan Kahrs discovered a problem with demanding that an unmatched
*) be reported by the compiler. In SML '96, we therefore simply demand that an un-
matched (* must be reported by the compiler.
G.12 Inxed Operators

The rules for associativity of inx operators at the same level of precedence have been
modied, to avoid confusion between right- and left-associative operators with the same
binding precedence (see Section 2.6).
G.13 Non-expansive Expressions

The class of non-expansive expressions (Section 4.7) has been extended, to compensate
for the loss of polymorphism which value polymorphism entails.
G.14 Rebinding of built-in identiers 87
G.14 Rebinding of built-in identiers

In SML '96, no datbind , valbind or exbind may bind true, false, nil, :: or ref and
no datbind or exbind may bind it (Section 2.9). Similarly, no datdesc , valdesc or exdesc
may describe true, false, nil, :: or ref and no datdesc or exdesc may describe it
(Section 3.5). These changes are made in order to x the meaning of derived forms and
to avoid ambiguity in the handling of ref in the dynamic semantics of the Core.
G.15 Grammar for Modules

There are several new derived forms for modules, see Appendix A (Figures 18 and 19).
The grammar for topdec has been modied, so that there is no longer any need to put
semicolons at the end of signature and functor declarations. Empty and sequential sig-
nature and functor declarations have been removed, as they no longer serve any purpose.
SML '96 has neither functor signature expressions nor functor specications, since they
could not occur in programs and did not gain wide acceptance.
G.16 Closure Restrictions

Section 3.6 of the 1990 Denition has been deleted.
G.17 Specications

open and local specications have been criticised on the grounds of programming method-
ology[4]. Also, they are no longer needed for dening the derived forms for functors and
they con ict with a desire to have all signatures be type-explicit.
SML '96 therefore admits neither open nor local in specications. Moreover, se-
quential specications must not specify the same identier twice. As a consequence, the
denition of type-explication has been removed: type-explication is automatically pre-
served by elaboration (if one starts in the initial basis) so there is no need to impose
type-explicitness explicitly.
G.18 Scope of Explicit Type Variables

A binding construct for explicit type variables has been introduced at val and fun (see
Figure 21). For example, one can declare the polymorphic identity function by
fun 'a id(x:'a) = x
There is no requirement that all explicit type variables be bound by this binding
construct. For those that are not, the scope rules of the 1990 Denition apply. The
explicit binding construct has no impact on the dynamic semantics. In particular, there
are no explicit type abstractions or applications in the dynamic semantics.
G.19 The Initial Basis

To achieve a clean interface to the new Standard ML Basis Library[18], the initial basis
(Appendices C and D) has been cut down to a bare minimum. The present Denition
only provides what is necessary in order to dene the derived forms and special constants
of type int, real, word, char and string. The following identiers are no longer de-
ned in the initial basis: <>, ^, !, @, Abs, arctan, chr, Chr, close in, close out, cos,
Diff, Div, end of stream, exp, Exp, explode, floor, Floor, implode, input, instream,
Interrupt, Io, ln, Ln, lookahead, map, Mod, Neg, not, real (the coercion function), rev,
sin, size, sqrt, Sqrt, std in, std out, Sum, output, outstream, Prod, Quot. The
corresponding basic values have also been deleted.
G.20 Overloading
The Standard ML Basis Library[18] rests on an overloading scheme for special constants
and pre-dened identiers. We have adopted this scheme (see Appendix E).
G.21 Reals
real is no longer an equality type and real constants are no longer allowed in patterns.
The Basis Library provides IEEE equality operations on reals.

SML 97.ps

Uploaded by

Copyright:

Available Formats

SML 97.ps

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SML 97.ps

Uploaded by

Copyright:

Available Formats

The De nition of Standard ML

October 31, 1996

Robin Milner, Mads Tofte, Robert Harper and David MacQueen

to be inferred, where A is often a basis (static or dynamic) and A a semantic object {

2 Syntax of the Core

2.2 Special constants

\a A single character interpreted by the system as alert (ASCII 7)

2.4 Identi ers

VId (value identi ers ) long

2.5 Lexical analysis

2.6 In xed operators

strexp ::= struct strdec end basic

sigbind ::= sigid = sigexp hand sigbind i

Figure 6: Grammar: Structure and Signature Expressions

spec ::= val valdesc value

Figure 7: Grammar: Speci cations

fundec ::= functor funbind

atexp ::= scon special constant

exprow ::= lab = exp h , exprowi expression row

Figure 20: Grammar: Expressions and Matches

dec ::= val tyvarseq valbind value declaration

fvalbind ::= hopivid atpat 11atpat 1n h:tyi=exp1 m; n  1

atpat ::= wildcard

patrow ::= ... wildcard

Figure 22: Grammar: Patterns

ty ::= tyvar type variable

tyrow ::= lab : ty h , tyrowi type-expression row

Figure 23: Grammar: Type expressions

C Appendix: The Initial Static Basis

D Appendix: The Initial Dynamic Basis

tycon 7! fvid 1 7! (v1; is1); . . . ; vid n 7! (vn; isn)g (n  0)

E.1 Overloaded special constants

E.2 Overloaded value identi ers

G Appendix: What is New?

G.1 Type Abbreviations in Signatures

In SML '96, one can write type abbreviations in signatures, e.g.,

G.2 Opaque Signature Matching

the two types B.t and C.t are di erent.

are both legal. By contrast, the signature expressions

are both illegal.

G.3.2 The equality attribute of speci ed types

induces type sharing on t, whereas

G.4 Value Polymorphism

elaborates. This transformation (-conversion) often gives the desired polymorphism.

G.5 Identi er Status

G.6 Replication of Datatypes

G.7 Local Datatypes

was legal SML '90 but is not legal SML '96.

is not legal SML '96, although it was legal SML '90.

G.8 Principal Environments

G.9 Consistency and Admissibility

G.10 Special Constants

G.12 In xed Operators

G.13 Non-expansive Expressions

G.14 Rebinding of built-in identi ers

G.15 Grammar for Modules

G.16 Closure Restrictions

G.17 Speci cations

The Denition of Standard ML

2.4 Identiers

VId (value identiers ) long

2.6 Inxed operators

Figure 7: Grammar: Specications

fvalbind ::= hopivid atpat 11atpat 1n h:tyi=exp1 m; n 1

tycon 7! fvid 1 7! (v1; is1); . . . ; vid n 7! (vn; isn)g (n 0)

E.2 Overloaded value identiers

the two types B.t and C.t are dierent.

G.3.2 The equality attribute of specied types

elaborates. This transformation (-conversion) often gives the desired polymorphism.

G.5 Identier Status

G.12 Inxed Operators

G.14 Rebinding of built-in identiers

G.17 Specications