1 s2.0 S030439750300392X Main
1 s2.0 S030439750300392X Main
1 s2.0 S030439750300392X Main
www.elsevier.com/locate/tcs
Fundamental Study
The di"erential lambda-calculus
Thomas Ehrhard∗ , Laurent Regnier
Institut de Mathematiques de Luminy, CNRS-UPR 9016, 163 Avenue de Luminy,
F-13288 Marseille, France
Received 15 June 2001; received in revised form 19 March 2003; accepted 14 July 2003
Communicated by P.L. Curien
Abstract
Prerequisites. This paper assumes from the reader some basic knowledge in lambda-
calculus and an elementary (but not technical) knowledge of di"erential calculus.
Notations. Following [15], we denote by (s)t the lambda-calculus application of s to
t. The expression (s)t1 : : : tn denotes the term (· · · (s)t1 · · ·)tn when n¿1, and s when
n = 0. Accordingly, if A1 ; : : : ; An and A are types, both expressions A1 ; : : : ; An → A and
A1 → · · · → An → A denote the type A1 → (· · · (An → A) · · ·). If a1 ; : : : ; an are elements
of some given set S, we denote by [a1 ; : : : ; an ] the corresponding multi-set over S. If
x and y are variables, x; y is equal to 1 if x = y and to 0 otherwise. We denote by
N+ the set of positive integers {1; 2; : : :}.
1. Introduction
1.1. Presentation
each step, only the head occurrence may be substituted. As a side e"ect, only subterms
of the initial term are copied during a sequence of reductions.
It turns out that syntactic linearity can also be deEned more abstractly through non-
deterministic choice. Consider for instance three ordinary lambda-terms s, s and t,
and let u = s+s denote a non-deterministic choice between s and s ; we consider in
the present discussion that this expression reduces to s or s , non-deterministically.
Then it is reasonable to consider that (u)t reduces to (s)t + (s )t as the term u is
used exactly once in the head reduction of (u)t. But we can certainly not say that
(t)u reduces to (t)s + (t)s as in the evaluation of (t)u, the argument u can be used
several times leading to an arbitrary interleaving of uses of s and s , whereas the term
(t)s + (t)s provides only the two trivial interleavings. We retrieve the fact, well known
in linear logic, that application is linear in the function but not in the argument. So,
as a Erst approximation at least, syntactic linearity can be deEned as commutation to
non-deterministic choice. This deEnition, quite satisfactorily, is completely similar to
the usual algebraic deEnition of linearity.
Sums vs. non-determinism. Notice that in our axiomatization of sums in the di"erential
lambda-calculus, the non-deterministic reduction rule s+s → s will not be valid, but
sums will nevertheless intuitively correspond to a version of non-deterministic choice
where all the actual choice operations are postponed: the result of a term reduction
will in general be a large formal sum of terms, and we can reduce the really “non-
deterministic” part of the computation to a unique ultimate step consisting in choosing
one term of that sum. This step is however a Ection and will not appear as a reduc-
tion rule of our calculus as otherwise essential properties such as con2uence would
obviously be lost. Another good reason for not considering this rule as an ordinary
reduction rule is that, as suggested by the discussion above, it can be performed only
when the “redex” stands in linear position.
Note that, from a logical viewpoint, introducing sums is not at all innocuous. In
particular, as it is well known in category theory, it leads to the identiEcation between
Enite categorical products and coproducts, the & and ⊕ connectives of linear logic.
Formal di6erentiation. Keeping in mind this syntactic viewpoint on linearity, we can
give an account of di"erentiation within lambda-calculus. When f(x) is a (suPciently
regular) function on a vector space E, its di"erential f (x) at each point x ∈ E is a linear
map on E; thus if u ∈ E we can write f (x) · u and read this as a linear application of
f (x) to u. We therefore extend the lambda-calculus with a new syntactic construct:
when t and u are terms, then Dk t · u is a term that may be read as the di"erential of t
with respect to its kth argument, linearly applied to u. Similarly, the partial derivative
(@t=@x) · u, to be deEned by induction on t, may be understood as a linear substitution
of u for x in t, that is, a substitution of exactly one linear occurrence of x in t by u.
It is worth noting that linear substitution is a non-deterministic operation, as soon as
the substituted variable has several occurrences: one has to choose a linear occurrence
of the variable to be substituted and there are several possible such choices. This
fundamental non-determinism of the di"erential lambda-calculus might be an evidence
of a link with process calculi; this idea is enforced by the existing relation with the
resource calculi described below.
4 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
Caution. Partial derivation is a linear substitution but does not behave exactly as one
could expect from a substitution operation. Typically we have in general
@t
· x = t:
@x
If t is an ordinary lambda-term, the only case where the equation holds is when the
variable x is linear in t, that is when its sole occurrence is in head position in t. In
particular if x does not occur free in t then the left-hand side is 0.
1.2. Outline
The goal of the paper is to present the basics of di"erential lambda-calculus. Before
going into details, let us mention that this work could as well have been carried out
in the framework of linear logic where di6erential proof-nets can be naturally deEned.
The choice of lambda-calculus may however seem more natural as di"erentiation is
traditionally understood as an operation on functions, and the lambda-calculus claims
to be a general theory of functions.
The di"erential lambda-calculus is an extension of the usual lambda-calculus in two
directions:
• Terms can be summed, and more generally, linearly combined (with coePcients
taken in a commutative semi-ring 2 ) and a new term 0 is introduced. As already
mentioned, this is necessary, because derivating with respect to a variable that oc-
curs more than once leads to a sum. Keep in mind the usual equation of calculus
(uv) = u v + uv where the sum is here because the derivative is taken with respect
to a parameter on which both u and v can depend.
• A di"erential construction Di t · u is added which represents the derivative of a term
t with respect to its ith argument. This new term admits an additional parameter u,
and is “linear” with respect to this parameter.
The most important addition is the reduction rule concerning di"erentiation:
@t
D1 x t · u = x ·u :
@x
This is a di"erential version of the -rule (oriented from left to right). The term
(@t=@x) · u is deEned by induction on t and the various cases in this inductive deEni-
tion correspond to well-known elementary results of di"erential calculus (chain rule,
derivative of a multi-linear function: : :). This rewriting rule and the ordinary -rule will
be the two reduction rules of our system. As in ordinary lambda-calculus these rules
are extended to all terms by context closure (a notion that needs some care to be well
deEned, as we shall see).
2
A semi-ring is deEned exactly like a ring, apart that one only requires addition to be a law of commutative
monoid. A semi-ring will always be assumed to have a multiplicative unit. A typical example of semi-ring
is N with + and × as operations.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 5
@(x)(x)y
· u = (u)(x)y + (D1 x · (u)y)(x)y:
@x
The term D1 x · h represents the function x where we have isolated one linear instance h
of its parameter, that is, the function x to which we provide exactly one copy h of its
parameter. This means that, if we replace x by an “actual function” z t in that term,
the resulting term should be equal to z t , t being t where z is replaced by h “exactly
once”, that is, linearly. In other terms, we should have D1 z t · h = z ((@t=@z) · h).
Of course, D1 x · h can be applied to another parameter which can be used an arbitrary
number of times, in addition to the linearly provided parameter h which has to be used
exactly once. This accounts for the presence of the “z” which still appears in the
expression z ((@t=@z) · h).
Taylor formula. As expected, iterated di"erentiation yields a natural notion of multi-
linear approximation of the application of a term to another one. This notion relates to
ordinary application through the Taylor formula:
∞ 1
(s)u = (D1n s · un )0;
n=0 n!
where D12 s · u2 = D1 (D1 s · u) · u and so on. This equation is satisEed in the models we
alluded to at the beginning of this introduction.
The question is: what does it mean for a term to contain n linear occurrences of
a variable? We prove a simple theorem relating the Taylor formula to linear head
reduction that answers this question in a (not so) particular case: if s = x s0 and u are
two ordinary lambda-terms such that (s)u is -equivalent to a given variable ?, then
the number of occurrences of x in (s)u is the number of times u is substituted in s
during the linear head reduction of (s)u to ?.
More generally if one fully develops each application occurring in a term into its
corresponding Taylor expansion, one expresses the term as an inEnite sum of purely
di6erential terms all of which contain only (multi)linear applications and applications
to 0. Understanding the relation between the term and its full Taylor expansion might
be the starting point of a renewing of the theory of approximations (usually based on
BFohm trees).
Analysts have already extended smoothness and analyticity to “higher types”, deEn-
ing various cartesian closed categories of smooth and analytic functions (see e.g. [14]
where objects are particular locally convex topological vector spaces called “conve-
nient”). The di"erential lambda-calculus is probably the internal language of such cat-
egories.
The idea that di"erentiation is a kind of linear substitution already occurred in dif-
ferent contexts. For example, it is central in the work of Conor McBride where linear
substitution, or more precisely the notion of “one hole contexts”, is deEned in terms
of derivatives for a class of “regular” types which can be seen as generalized formal
power series [1,17].
Various authors introduced notions of linear substitution and reduction in the lambda-
calculus. Let us quote one of them that carries intuitions similar to ours, the lambda-
calculus with multiplicities (or with resources) [5,6]. In this system, application is
written (s)T where T is not a term, but a bag of terms written T = (t1p1 | · · · | tnpn )
where the order on the elements of the bag is irrelevant. The pi ’s are integers or ∞ and
represent the number of time each ti may be used during the reduction. The “∞” cor-
responds to the status of an argument in ordinary lambda-calculus and satisEes t ∞ = t |
t ∞ . The reduction of a redex ( x u)T consists in removing non-deterministically a term
t from the bag T (more precisely, decreasing its multiplicity, removing the term when
its multiplicity reaches 0) and substituting it “linearly” for some occurrence of x in u
through an explicit substitution mechanism.
Intuitions behind the di"erential lambda-calculus and the lambda-calculus with re-
sources are very similar: the term (D1n s · (u1 ; : : : ; un ))t may be seen as the application of
s to the bag (u1 | · · · | un | t ∞ ). However the lambda-calculus with resources equates
the terms (s)(t ∞ ) and (s)(t | t ∞ ), whereas the corresponding terms (s)t and (D1 s · t)t
are distinct in di"erential lambda-calculus. Also, the central role played by the sum in
di"erential lambda-calculus seems to have no equivalent in resource lambda-calculus.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 7
2. Syntax
Let R be a commutative semi-ring with unit; R can be for instance the set of natural
numbers. 4 Given a set S, we denote by RS the free R-module generated by S,
which can be described as the set of all R-valued functions deEned on S which vanish
for almost all values of their argument, with pointwise deEned addition and scalar
multiplication. As usual, an element t of RS will be denoted s∈S as s where s
→ as
is the corresponding R-valued almost everywhere vanishing function (so that this sum
is actually Enite). We denote by Supp(t) the set {s ∈ S | as = 0} (the support of t).
Since R has a multiplicative unit 1, S can naturally be seen as a subset of RS.
Let be given a denumerable set of variables. We deEne by induction on k an
increasing family of sets (k ). We set 0 = ∅ and k+1 is deEned as follows.
Monotonicity: if t belongs to k then t belongs to k+1 .
Variable: if n ∈ N, x is a variable, i1 ; : : : ; in ∈ N+ = N\{0} and u1 ; : : : ; un ∈ k , then
belongs to k+1 . This term is identiEed with all the terms of the shape Di(1) ;:::; i(n) x ·
(u(1) ; : : : ; u(n) ) ∈ k+1 where is a permutation on {1; : : : ; n}.
Abstraction: if n ∈ N, x is a variable, u1 ; : : : ; un ∈ k and t ∈ k , then
D1n x t · (u1 ; : : : ; un )
belongs to k+1 . This term is identiEed with all the terms of the shape D1n x t ·
(u(1) ; : : : ; u(n) ) ∈ k+1 where is a permutation on {1; : : : ; n}.
Application: if s ∈ k and t ∈ Rk , then
(s)t
belongs to k+1 .
Setting n = 0 in the Erst two clauses, and restricting application by the constraint
that t ∈ k ⊆ Rk , one retrieves the usual deEnition of lambda-terms which shows
that di"erential terms are a superset of ordinary lambda-terms.
The permutative identiEcation mentioned above will be called equality up to di6er-
ential permutation. We also work up to -conversion.
Terms and simple terms. We denote by the union of all the sets k . We call simple
terms the elements of and di6erential terms or simply terms the elements of R.
Observe that R = Rk . We write (R) instead of when we want to make
explicit the underlying semi-ring.
Induction on terms. Proving a property by induction on terms means proving this
property for each term t by induction on the least k such that t ∈ Rk (number
which can be called the height of t).
4 For expressing the Taylor formula one has to assume that R is a Eeld. One could also take R = {0; 1}
with 1 + 1 = 1 in which case sum of terms is idempotent and the coePcients of the Taylor formula are all
equal to 1.
8 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
Given the deEnition at heights 6k, these clauses deEne s [t=x] for each simple term
s of height k + 1. We conclude the deEnition for arbitrary terms of height k + 1 by
setting
av v [t=x] = av v [t=x]:
v∈k+1 v∈k+1
In the abstraction case one takes the usual precautions assuming without loss of gen-
erality thanks to -conversion that y = x and y does not occur free in t. Observe that
substitution is linear in s but not in t.
Derivation. We deEne now another operation which bears some similarities with sub-
stitution, but behaves in a linear way with respect to the substituted term: partial
derivative of s with respect to x along u denoted (@s=@x) · u. As substitution, it is
deEned by induction on terms.
@D1n y v · (u1 ; : : : ; un ) @v
· u = D1n y
· u · (u1 ; : : : ; un )
@x @x
n @ui
+ D1n y v · u1 ; : : : ; · u; : : : ; un
i=1 @x
@(v)w @v @w
·u= · u w + D1 v · ·u w
@x @x @x
@ @v
av v · u = av · u;
@x v∈ v∈ @x
where, again, in the abstraction case, the usual precautions have to be respected: y
should be di"erent from x and should not occur free in the term u.
This deEnition says essentially that partial derivative distributes over syntactic con-
structs that are linear such as abstraction. The application case is the most involved one:
partial derivative is safely applied to the function v because application is linear in the
function, producing the term ((@v=@x) · u)w. But in order to apply it to the argument w
which is not in linear position, we intuitively follow two steps: Erstly we replace (v)w
by (D1 v · w)w getting a linear copy of w; secondly we apply partial derivative to this
10 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
copy. This can also be seen as a formal way of expressing the chain rule of di"erential
calculus.
Note that, despite the intuition that D1 v · w is a linear application, it is false that
(v)w = (D1 v · w)w even up to di"erential reduction (to be deEned soon). Thus again
partial derivative may be considered as a substitution operator only to a limited extent.
and
@Di t · u @t @u
· v = Di · v · u + Di t · ·v
@x @x @x
The proof is a simple induction on t. This lemma allows us to write the parallel
substitution of the terms ui for the variables xi in t, denoted t [u1 ; : : : ; un =x1 ; : : : ; xn ],
when none of the variables xi is free in any of the terms ui .
Lemma 3. If x is not free in t, then (@t=@x) · u = 0. For any term t and variable x,
@t @t
· aj uj = aj · uj :
@x j j @x
The proof is an easy induction on t. In particular the application case is true thanks
to our linearization on the 2y.
Proof. This is proven by an easy induction on t. We deal only with the cases “variable”
and “application”.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 11
If x1 ; : : : ; xk are variables not occurring free in the terms u1 ; : : : ; uk one has therefore,
for any permutation of {1; : : : ; k},
@ @t @ @t
··· · u k · · · · u1 = ··· · u(k) · · · · u(1)
@x1 @xk @x(1) @x(k)
and we use the standard notation
@k t
· (u1 ; : : : ; uk )
@x1 · · · @xk
for the common value of these expressions (we avoid this notation when the condition
above on variables is not fulElled).
Derivatives and substitutions. We shall now state two lemmas expressing the commu-
tation between the derivative and the substitution operators.
Lemma 5. If x and y are two distinct variables and if y does not occur free in the
terms u and v, one has
@t[v=y] @t @t @v
·u= · u [v=y] + · ·u [v=y]: (1)
@x @x @y @x
In particular, if moreover x is not free in v, the following commutation holds:
@t [v=y] @t
·u= · u [v=y]: (2)
@x @x
The proof is similar to the previous one. This lemma can also be seen as a version
of the chain rule of di"erential calculus.
Lemma 6. If the variable x is not free in the term v and if y is a variable distinct
from x, we have
@t @t [v=y]
· u [v=y] = · (u [v=y]):
@x @x
@ r1 t
t = · (u1(1) ; : : : ; ur(1) )
@y1(1) · · · @yr(1)
1
1
with r0 +r1 = k, [y1(0) ; : : : ; yr(0) (1) (1) (0) (0) (1) (1)
0 ; y1 ; : : : ; yr1 ] = [x1 ; : : : ; xk ] and [u1 ; : : : ; ur0 ; u1 ; : : : ; ur1 ]
= [u1 ; : : : ; uk ].
@rj t
tj = · (u1(j) ; : : : ; ur(j) )
@y1(j) · · · @yr(j)
j
j
q q ( j) ( j) q ( j) ( j)
with j=0 rj =k, j=0 [y1 ; : : : ; yrj ] = [x1 ; : : : ; xk ] and j=0 [u1 ; : : : ; urj ] =
[u1 ; : : : ; uk ].
3. Di erential reduction
Let us begin by introducing some terminology and easy lemmas on relations between
terms. We shall consider two kind of relations: relations from terms to terms which
are subsets of R × R and relations from simple terms to terms which are subsets
of × R.
Linear relations. A relation & from terms to terms is linear if 0 & 0 and at +bu & at +bu
as soon as t & t and u & u . In the particular case where & is a functional relation, this
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 15
means that it is a linear map. As with linear maps the image of a linear subspace of
R by & is a linear subspace. However the image of the subspace {0} is not in general
reduced to {0} (this is easily seen to be a necessary and suPcient condition for & to be
functional). Observe that as a subset of the linear space R × R = R ⊕ R,
a relation is linear i" it is a linear subspace (just as for linear functions).
x s & x s ;
Di s · t & Di s · t :
Lemma 9. Let & be a contextual relation from terms to terms. If t, u and u are
terms such that u & u , then (@t=@x) · u & (@t=@x) · u and t [u=x] & t [u =x]
Extending a relation. Given a relation & from simple terms to terms we shall deEne
two relations &V and &˜ from terms to terms by:
n n
• t &V t if t = i=1 ai si and t = i=1 ai si where the terms si are simple, the terms si
are such that si & si for each i and the ai are scalars;
• t &˜ t if t = as + u and t = as + u where s is a simple term, s is a term such that
s & s , u is a term and a is a nonzero scalar.
These two operations are monotone and !-continuous in & which means that given
an increasing sequence of relations &n whose union is &, the sequences &Vn and &˜ n are
increasing and their respective unions are &V and &˜ .
Observe that &˜ is not linear. On the other hand &V is the least linear relation from
terms to terms which contains &. In that sense it can be thought of as the extension
by linearity of &. Note that, given t = ai si where the si are pairwise distinct simple
terms and t such that t &V t , it is not true in general that t may be written t = a i si
with si & si . Typically if t = 0 and s is any simple term such that s & s and s & s then
we have t &V s − s (assuming −1 belongs to R).
One-step reduction. We are now ready to give the reduction rules, completing the
deEnition of the di"erential lambda-calculus. Informally the one step reduction 1 is
the smallest relation that is closed under syntactic constructions (e.g., if s 1 s then
(s)v 1 (s )v) and that contains:
Beta-reduction.
@s
t = D1n−1 x · uj · (u1 ; : : : ; uj−1 ; uj+1 ; : : : ; un )
@x
for some j ∈ {1; :
: : ; n}.
We deEne 1 = k1 . Thanks to the !-continuity of relation extensions, we have
that ˜1 = ˜k1.
The one-step reduction is weakly contextual in the following sense:
Reduction. Let be the re2exive and transitive closure of the relation ˜1 , and let
us write t + t when t reduces to t in at least one step (that is, t ˜1 u t for some
u ∈ R).
Proof. Linearity is clear and the other conditions result from linearity and from the
preceding proposition.
18 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
We prove con2uence using the Tait–Martin–LFof technique, and following the pre-
sentation of [15]. We Erst deEne the parallel reduction relation ( from simple terms
to terms as the union of an increasing sequence ( (k ) of relations: (0 is the identity
relation and (k+1 is given by
1. (s)u (k+1 t in one of the following situations:
(a) t = (s )u where s (k s and u (Vk u ;
(b) t =((@n v =@xn ) · (w1 ; : : : ; wn )) [u =x] where s =D1n x v · (w1 ; : : : ; wn ), v (k v , u (Vk
u , and wj (k wj for each j;
2. Di1 ;:::; in x · (u1 ; : : : ; un ) (k+1 t if t = Di1 ;:::; in x · (u1 ; : : : ; un ) where uj (k uj for each j;
3. D1n x s · (u1 ; : : : ; un ) (k+1 t if t = D1n−p x((@p s =@xp ) · uI ) · uJ where I is a subset
of {1; : : : ; n} of p elements, J is its complementary set, uK denotes the sequence
(uk )k∈K for any K ⊆ {1; : : : ; n} (with the obvious order relation), s (k s and uj (k uj
for each j.
Proof. For the Erst inclusion one proves by a straightforward induction on k that
k1 ⊆ (k (using the obvious fact that (k is re2exive).
The second is obtained again by induction using the fact that is contextual.
As ( ⊆ which is linear we also have (V ⊆ from which, together with the Erst
inclusion, we derive the re2exive and transitive closure property.
Proof. We prove by induction on k that if t (Vk t and u (V u , then t [u=x] (V t [u =x]. For
k = 0 we have t = t and we conclude by contextuality of (V (applying Lemma 9).
Assume now that the property holds for k. By linearity of (V and of substitution (the
operation (t; u)
→ t [u=x] is linear in t) we can reduce to the case where t is simple
such that t (k+1 t and u (V u .
Assume Erst that t = (s)w. Then t [u=x] = (s [u=x])w [u=x]. If t = (s )w with s (k s
and w (Vk w we conclude directly by inductive hypothesis and contextuality of (. V
If s = D1n y v · (u1 ; : : : ; un ), v (k v , w (Vk w , uj (k uj for each j, and
n
@ v
t = · (u1 ; : : : ; un ) [w =y];
@yn
then we have s [u=x] = D1n y (v [u=x]) · (u1 [u=x]; : : : ; un [u=x]). Note that this term is not
necessarily simple, because v [u=x] is not simple in general. However since v [u=x] (V
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 19
v [u =x] by inductive hypothesis, v [u=x] is a linear combination of simple terms vl and
v [u =x] is a linear combination with the same coePcients of terms vl such that vl ( vl
for each l. Thus by linearity of (, V of derivatives and of substitution, and by deEnition
of ( we have
@n v [u =x]
t [u=x] (V · (u1 [u =x]; : : : ; un [u =x]) [w [u =x]=y]
@yn
n
@v
= · (u 1 ; : : : ; un ) [u =x] [w [u =x]=y] by Lemma 6
@yn
n
@v
= · (u1 ; : : : ; un ) [w =y] [u =x] by Lemma 2
@yn
= t [u =x]
@t @t
· u(V ·u:
@x @x
Proof. We prove by induction on k that if t (Vk t and if u (V u , then (@t=@x) · u (V (@t =@x)
· u . For k = 0, since (0 is the identity we have to show that (@t=@x) · u (V (@t=@x) · u
which is consequence of Lemma 9 because (V is contextual.
Assume now that the property holds for k. By linearity of (V and of the partial
derivative we can reduce to the case where t and u are simple such that t (k+1 t and
u ( u .
Assume Erst that t = (s)w. Then
@t @s @w
·u= · u w + D1 s · ·u w:
@x @x @x
If t = (s )w with s (k s and w (Vk w , then by inductive hypothesis we have (@s=@x) · u
(V (@s =@x) · u and (@w=@x) · u (V (@w =@x) · u and we conclude by contextuality of (.
V
20 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
then we have
@s @v n @uj
· u = D1n y · u · (u1 ; : : : ; un ) + n
D1 y v · u1 ; : : : ; · u; : : : ; un
@x @x j=1 @x
and by inductive hypothesis we have (@v=@x) · u (V (@v =@x) · u and (@uj =@x) ·
u (V (@uj =@x) · u for each j. The property (@v=@x) · u (V (@v
=@x) · u means
that we may
write (@v=@x) · u and (@v=@x) · u as linear combinations al vl and l al vl where the
terms vl are simple, in such a way that vl ( vl for each l. This together with the linearity
of substitution operators and the deEnition of ( entails that (D1n y((@v=@x) · u) ·
(u1 ; : : : ; un ))w (V ((@n =@yn )((@v =@x) · u ) · (u1 ; : : : ; un ))[w =y]. We proceed similarly for the
other terms of the sum and apply the inductive hypothesis to get eventually:
n
@s @ @v
· u w (V · u · (u1 ; : : : ; un ) [w =y]
@x @yn @x
n
n @ v
@uj
+ n
· u1 ; : : : ; · u ; : : : ; un [w =y]:
j=1 @y @x
Similarly, we get
@w @w
D1 s · ·u w = D1n+1 y v · · u; u1 ; : : : ; un w
@x @x
@n+1 v @w
( n+1 · · u ; u1 ; : : : ; un [w =y]:
@y @x
On the other hand, by iterating Lemmas 4 and 5 (we can assume that y is not free in
u ), we get
n
@t @ @ v
·u = · (u1 ; : : : ; un ) [w =y] · u
@x @x @yn
n
@ @v
= · (u1 ; : : : ; un ) · u
@x @yn
@ @n v @w
+ · (u 1 ; : : : ; u n ) · · u [w =y]
@y @yn @x
n
@ @v
= · u · (u1 ; : : : ; un ) [w =y]
@yn @x
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 21
n @n v
@uj
+ · (u 1 ; : : : ; · u ; : : : ; un ) [w =y]
j=1 @yn @x
@n+1 v @w
+ · · u ; u1 ; : : : ; un [w =y]
@yn+1 @x
and we are done, in this particular case.
If t = Di1 ;:::; in y · (u1 ; : : : ; un ), then
@t n @uj
· u = x;y Di1 ;:::;in u · (u1 ; : : : ; un ) + Di1 ;:::;in y · u1 ; : : : ; · u; : : : ; un :
@x j=1 @x
Moreover, we know that t = Di1 ;:::; in y · (u1 ; : : : ; un ) with uj (k uj for each j. For each j,
we have (@uj =@x) · u (V (@uj =@x) · u by inductive hypothesis and we conclude by contex-
tuality of (. V
Assume last that t = D1n y s · (u1 ; : : : ; un ) and that t = D1n−p y((@p s =@yp ) · uI ) · uJ
where I is a subset of {1; : : : ; n} of p elements, J is its complementary set, s (k s and
uj (k uj for each j. Then, denoting by [1; n] the set {1; : : : ; n},
@t @s n @ul
· u = D1n y · u · (u1 ; : : : ; un ) + D1n y s · · u; u[1; n]\{l} (4)
@x @x l=1 @x
and
@t @p s
@
· u = D1n−p y · u · u · uJ
@x @yp I @x
p
n−p @ s
@uj
+ D1 y · u · · u ; u
j∈J @yp I @x J \{j}
p
@ @s
= D1n−p y · u
· u
I · uJ
@yp @x
p
n−p @ s @ui
+ D1 y · · u ; uI \{i} · uJ
i∈I @yp @x
p
n−p @ s
@uj
+ D1 y ·u · · u ; uJ \{j} ;
j∈J @yp I @x
using Lemma 4 which is possible since y can be assumed not to occur free in u .
By inductive hypothesis we have (@s=@x) · u (V (@s =@x) · u and (@ul =@x) · u (V (@ul =@x) · u
for each l. Therefore (coming back to expression (4) of (@t=@x) · u), we have
D1n y(@s=@x · u) · (u1 ; : : : ; un ) (V D1n−p y((@p =@yp )((@s =@x) · u ) · uI ) · uJ , and for each
l ∈ {1; : : : ; n}:
• if l ∈ I , we have D1n y s · ((@ul =@x) · u; u[1; n]\{l} ) (V D1n−p y((@p s =@yp ) · ((@ul =@x) ·
u ; uI \{l} )) · uJ ,
22 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
• and if l ∈ J , we have D1n y s · ((@ul =@x) · u; u[1; n]\{l} ) (V D1n−p y((@p s =@yp ) · uI ) ·
((@ul =@x) · u ; uJ \{l} ).
This concludes the proof.
Multi-con@uent pairs of relations. Let us say that a pair of binary relations (&; ’)
from terms to terms is multi-con@uent if for any term t, any m ∈ N+ and any terms
t1 ; : : : ; tm , if t & ti for each i, then there exists a term t such that ti ’ t for each i.
Lemma 16. Let & be a relation from simple terms to terms and let ’ be a linear
relation from terms to terms. If the pair (&; ’) is multi-con@uent, then the pair (V&; ’)
is also multi-con@uent.
Proof. Let t; t 1 ; : : : ; t m be terms such that t &V t i for each i. Let us write as usual t = s∈
as s. From t &V t i we deduce that for each simple term s and for each i = 1; : : : ; m, there
is a Enite set Ii (s), some scalars (ais; j )j∈Ii (s) such that as = j∈Ii (s) ais; j and some terms
(Uji (s))j∈Ii (s) such that s & Uji (s) and t i = s∈ j∈Ii (s) ais; j Uji (s). We have s & Uji (s)
for each i and each j ∈ Ii (s). But for each simple term s the set {Uji (s) | i = 1; : : : ; m
and j ∈ Ii (s)} is Enite.
If this set is empty, then each set Ii (s) is empty and therefore t = 0 and ti = 0 for
each i. Therefore t i ’ 0 for each i since ’ is linear.
If this set is nonempty, by multi-con2uence of (&; ’) there is a term V (s), depending
only on s, such that Uji (s) ’ V (s) for each i and each j ∈ Ii (s). By linearity of ’ we
conclude that t i ’ s∈ j∈Ii (s) ais; j V (s) = s∈ as V (s).
Proof. We prove by induction on k that the pair ((Vk ; () V is multi-con2uent and this
will clearly entail the con2uence of (. V The base case k = 0 is trivial since (V0 is just
the identity relation. So let us assume that ((Vk ; () V is multi-con2uent and let us prove
that ((Vk+1 ; ()V is multi-con2uent. For this purpose, by Lemma 16, it suPces to show
that the pair ((k+1 ; () V is multi-con2uent, what we do now. Let t be a simple term,
and let t 1 ; : : : ; t m (with m¿1) be terms such that t (k+1 t i for each i.
Assume Erst that t = (s)w.
If for each i we have t i = (si )wi with s (k si and w (Vk wi , then the inductive hypothesis
applies (thanks to Lemma 16 for the argument side of the application).
Otherwise, we have s = D1n y v · (u1 ; : : : ; un ), v (k vi , w (Vk wi , uj (k uji for each i and
each j, and for some q ∈ {1; : : : ; m + 1}:
• if 16i¡q, t i = ((@n vi =@yn ) · (u1i ; : : : ; uni ))[wi =y]
• and if q6i6m, for some set Ii ⊆ {1; : : : ; n} whose cardinality is pi and whose com-
plementary set is Ji , we have t i = (si )wi where si = D1n−pi y((@pi vi =@ypi ) · uIi i ) · uJi i .
Observe that the Erst case must occur at least once, otherwise we are in the Erst
situation for the application. By inductive hypothesis (invoking Lemma 16 for w), we
can End terms v , w and uj for each j such that for each i, vi (V v , wi (V w and uji (V uj
for j = 1; : : : ; n. For i such that 16i¡q, we apply Lemmas 14 and 15, and we get
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 23
t i (V ((@n v =@yn ) · (u1 ; : : : ; un ))[w =y]. For i such that q6i6m, we apply Lemma 15 as
well as the deEnition of ( and obtain t i (V ((@n v =@yn ) · (u1 ; : : : ; un ))[w =y].
The case t = Di1 ;:::; in x · (u1 ; : : : ; un ) is straightforward.
The last case is when t = D1n x v · (u1 ; : : : ; un ) and for some set Ii ⊆ {1; : : : ; n} whose
cardinality is pi and whose complementary set is Ji , we have t i = D1n−pi x
((@pi vi =@ypi ) · uIi i ) · uJi i (with v (k vi and uj (k uji ), for each i = 1; : : : ; m. Again, the in-
ductive hypothesis provides us with terms v and uj such that for each i, vi (V v and
uji (V uj for each j. Then applying for each i the deEnition of ( (last case, with I = Ji
and J = ∅), we get t i (V x(@n v =@xn ) · (u1 ; : : : ; un ) and we are done.
Since the re2exive and transitive closure of (V is , we Enally get the main result
of this section.
Theorem 18. The relation over terms of the pure di6erential lambda-calculus en-
joys the Church–Rosser property.
Remark. We can easily derive from Lemmas 15 and 14 and from the inclusions
1 ⊆ ( ⊆ (a direct proof would be possible as well) the two following lemmas,
which will be useful in the sequel.
We are given some atomic types ; ; : : :, and if A and B are types, then so
is A → B. The notion of typing context is the usual one, and the typing rules are
24 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
as follows:
3; x : A x : A (Variable)
3 s : A→B 3t:A
(Application)
3 (s)t : B
3; x : A s : B
(Abstraction)
3 x s : A → B
3 s : A1 ; : : : ; Ai → B 3 u : Ai
(Di6erential application)
3 Di s · u : A1 ; : : : ; Ai → B
3s:A 3t:A
(Zero) (Linear combination);
30:A 3 as + bt : A
where a and b are scalars.
The two last rules express that a type may be considered as an R-module.
Consider the di"erential application rule in the case i = 1: we are given a term t with
3 t : A → B that we may view as a function from A to B. The derivative of t should
be a function t from A to a space L of linear applications from A to B. So given
s : A and u : A; t (s) is a linear function from A to B that we may apply to u, getting
a value t (s) · u in B; this is precisely this value that the term (D1 t · u)s denotes. So
D1 t · u denotes the function which maps s : A to t (s) · u : B. When i¿1, the intuition is
exactly the same, but in that case we do not derivate the function with respect to its
Erst parameter, but with respect to its ith parameter.
Let us say that a semi-ring is positive if a + b = 0 ⇒ a = b = 0 for all a; b ∈ R.
Lemma 22. Under the assumption that R is positive, subject reduction holds, that is:
if t and t are canonical terms, if 3 t : A and t t , then 3 t : A.
5. Strong normalization
1 1 1 1 1 1 1 1
(I )I = (I )I + (I )I ˜ I + (I )I = I + (I )I + (I )I
2 2 2 2 2 4 4
1 3 1 3 1 1
˜ I + (I )I = I + (I )I + (I )I
4 4 4 8 8
1 7 1
˜ I + (I )I
8 8
..
. (5)
This problem has of course nothing to do with the di"erential part of the calculus and
would already appear in ordinary lambda-calculus extended with linear combinations.
The module of strongly normalizing terms. Observe Erst that if t ∈ N and if we
write t = as+u with s simple and a = 0, then necessarily s ∈ Supp(t). This is due to the
fact that the scalars are natural numbers (more precisely: N is a positive semi-ring).
Lemma 24. Let t ∈ N. There are only 5nitely many terms t such that t ˜1 t .
5 This property would not hold if R were for instance the semi-ring of nonnegative rational numbers.
26 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
Lemma 25. Let s ∈ N and let s be such that s 1 s . Then s ¡|s|.
Proof. Since the scalarsmare natural numbers, the term s can be written as a sum
of simple terms, s = i=1 ui andsince s is strongly normalizing, so are the terms
m
ui . We have by deEnition s = i=1 |ui |. Now for each i we can End a reduction
of length |ui | of ui to its normal form, and concatenating these reductions, we get a
reduction from s of length s , whence the result since s reduces to s in one step.
We conclude by observing the following easy fact which again results from our
particular choice of scalars.
Proof. Since a = 0 and since all coePcients are positive we have Supp(s) ⊆
Supp(as + t) ⊆ X.
@k+1 s
= · uI ∪{i} ;
@xk+1
where one should observe that k + 1 is the cardinality of I ∪ {i} since i ∈= I . By con-
m
dition (1) satisEed by s; (u1 ; : : : ; un ) and v, we obtain r=1 ar (@k sr =@xk ) · uI ∈ NS.
k k
Applying Lemma 27, we get Enally (@ sq =@x ) · uI ∈ NS since aq = 0.
then
There are several cases to consider as to the reduction C[t] 1 s . The Erst case is
when the redex Ered in this reduction is t itself, and then s = C[t ] where t ∈ Red(t);
we conclude applying directly our hypothesis (6).
In the other cases, the redex Ered in the reduction C[t] 1 s is a subterm of C or
of t. These cases can be subdivided in two categories:
• the cases where the reduction takes place in a subterm in linear position, that is in
one of the terms s, ui or wi ;
• and the cases where the reduction takes place in a subterm in non-linear position,
that is in v or in one of the hi .
We check only one case of each of these categories, the others being similar.
A non-linear case: Assume that v ˜1 v and s = C[t ] with t = (D1n x s · (u1 ; : : : ; un ))v .
Since t is simple, it suPces to show that C[t ] ∈ N. But v is strongly normalizing
since v is and hence t is an N-redex. Moreover |t |0 ¡|t|0 and hence the inductive
hypothesis applies to the pair (C; t ). Let t ∈ Red(t ), it will be suPcient to show that
C[t ] ∈ NN.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 29
If n¿1, we have t = (D1n−1 x((@s=@x) · ui ) · (u1 ; : : : ; ui−1 ; ui+1 ; : : : ; un ))v for some i.
The term tˆ = (D1n−1 x((@s=@x) · ui ) · (u1 ; : : : ; ui−1 ; ui+1 ; : : : ; un ))v belongs to Red(t) and
hence C[tˆ] ∈ NN by our assumption (6) on (C; t). But C[tˆ] C[t ] by contextuality
of and hence C[t ] is strongly normalizing, that is, by Lemma 26, C[t ] ∈ NN
as required.
If now n = 0, we have t = s [v =x]. We have tˆ = s [v=x] ∈ Red(t) and hence C[tˆ] ∈
NN by our assumption (6) on (C; t). By Lemma 21, we get tˆ t and hence C[tˆ]
C[t ] by contextuality of . We conclude as before that C[t ] ∈ NN.
A linear case: Assume that n¿1 and that u1 1 u1 and s = C[t ] with t = (D1n x s · (u1 ;
u2 ; : : : ; un ))v. We have u1 ∈ N and hence u1 is strongly normalizing and thus belongs
to NN, that is, u1 is a linear combination of strongly normalizing simple terms
m
u1 =
aq u1;q
q=1
m
with aq = 0 for all q. Then s is the linear combination s = q=1 aq C[tq ] of simple
terms where
tq = (D1n x s · (u1;q
; u2 : : : ; un ))v
for q = 1; : : : ; m. For each q, we show that C[tq ] ∈ N and this will show that s ∈
NN, as required.
We have |tq |0 ¡|t|0 and hence the inductive hypothesis applies to the pair (C; tq )
(observe indeed that tq is an N-redex since u1;
q is strongly normalizing). Therefore,
it will be suPcient to show that for any t ∈ Red(tq ), one has C[t ] ∈ NN. There
are two cases to consider as to the reduction of tq to t .
Assume Erst that t = (D1n−1 x((@s=@x) · u1; ˆ n−1
q ) · (u2 ; : : : ; un ))v. Let t = (D1 x((@s=
@x) · u1 ) · (u2 ; : : : ; un ))v, we have tˆ
∈ Red(t) and therefore C[tˆ] ∈ NN by our assump-
m
tion (6) on (C; t). But u1 u1 = r=1 ar u1;
r and hence, by Lemma 20 and contextu-
ality of , we get
m
C[tˆ] ar C[tr ];
r=1
Reducibility. Remember that if t; u1 ; : : : ; un are simple terms, then the term Di1 ;:::; in t ·
(u1 ; : : : ; un ) is always simple.
If X and Y are sets of simple terms, one deEnes X → Y ⊆ as
X → Y = {t ∈ | ∀p ∈ N; ∀s ∈ NX; ∀u1 ; : : : ; up ∈ X;
This deEnition, which involves di"erential applications and not only ordinary appli-
cations, is motivated by the next lemma which will be essential in the proof of the
Interpretation lemma 34.
Lemma 32. Let S ⊆ be saturated and let X ⊆ N be closed under variable renam-
ings. Then X → S is saturated.
Proof. We prove Erst property (1) for the saturation of X → S. So, with the notations
of the deEnition of an N-redex t and of an N-context C, assume that C[t ] ∈ NX
→ S for all t ∈ Red(t), we have to show that C[t] ∈ X → S. Let wk+1 ; : : : ; wk+q ∈ X
and let hp+1 ∈ NX; we must show that s = (D1q (C[t]) · (wk+1 ; : : : ; wk+q ))hp+1 ∈ S. But
s = C [t] where C is the N-context
Proof. The Erst inclusion immediately results from the deEnition of N0 . The second
inclusion results from Lemma 31. For the last inclusion, take t ∈ N0 → N, and take a
variable x. Then x ∈ N0 and thus (t)x ∈ N. This clearly implies that t ∈ N.
(with the variables yj(r) taken among x1 ; : : : ; xk and the terms vj(r) taken among u1 ;
: : : ; uk ). By inductive hypothesis, we know that s ∈ NB∗ → A∗ and that w1 ; : : : ; wq ∈
NB∗ , and also that w [s1 ; : : : ; sn =x1 ; : : : ; xn ] ∈ NB∗ , and therefore
(with the variables zj taken among x1 ; : : : ; xk and the terms vj taken among u1 ; : : : ; uk ).
By inductive hypothesis, we know that s ∈ NA∗ and that w ∈ NBi∗ . We conclude
by Lemma 30 that Di s · w ∈ NA∗ , as required.
Abstraction. So t = x s, the typing derivation of t ends with
x1 : A1 ; : : : ; xn : An ; x : B s : C
x1 : A1 ; : : : ; xn : An x s : B → C
and we have A = B → C. We must show that x s ∈ NB∗ → C ∗ , where
@k s
s = · (u1 ; : : : ; uk ) [s1 ; : : : ; sn =x1 ; : : : ; xn ]
@xi1 · · · @xik
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 33
(we assume of course that x is di"erent from all the variables xi and does not occur
free in any of the terms uj or si ). So let v1 ; : : : ; vp ∈ B∗ and let w ∈ NB∗ , we must
show that
(D1p x s · (v1 ; : : : ; vp ))w ∈ NC ∗
and for this purpose, since C ∗ is a saturated subset of N, we are in position of applying
Lemma 28 to s , v1 ; : : : ; vp and w. Indeed, all these terms are strongly normalizing: let
z be a variable di"erent from x; x1 ; : : : ; xn and not occurring in any of the terms uj or
si ; by inductive hypothesis, since z ∈ N0 ⊆ NB∗ , we have s [z=x] ∈ NC ∗ ⊆ NN
and hence s ∈ NN since N is closed under variable renamings, and we also know
that v1 ; : : : ; vp ∈ N and w ∈ NN since B∗ ⊆ N. So it suPces to show that
• for all I ⊆{1; : : : ; p} one has
@m s
· vI ∈ NC ∗ ; (7)
@xm
where m is the cardinality of I
• and
@p s
· (v1 ; : : : ; vp )[w=x] ∈ NC ∗ : (8)
@xp
Let us prove (8), the proof of (7) being completely similar. Let z1 ; : : : ; zn be pairwise
distinct variables, which are distinct from x; x1 ; : : : ; xn and which do not occur free in
any of the terms s, u1 ; : : : ; uk , s1 ; : : : ; sn , v1 ; : : : ; vp and w. If r is a term, we denote by r̃
the term r [z1 ; : : : ; zn =x1 ; : : : ; xn ]. Since B∗ is closed under variable renamings, we have
ṽ1 ; : : : ; ṽp ∈ B∗ and w̃ ∈ NB∗ . Moreover, the variables x; x1 ; : : : ; xn do not occur free in
any of these terms. Therefore, by inductive hypothesis, we have
@k+p s
· (u 1 ; : : : ; u ;
k 1ṽ ; : : : ; ṽ p ) [s1 ; : : : ; sn ; w̃=x1 ; : : : ; xn ; x] ∈ NC ∗ ;
@xi1 · · · @xik @xp
but by our hypotheses on variables and by Lemma 5, this term is equal to (@p s =@xp ) ·
(ṽ1 ; : : : ; ṽp )[w̃=x]. Since NC ∗ is closed under variable renamings, we get ((@p s =@xp ) ·
(ṽ1 ; : : : ; ṽp )[w̃=x])[x1 ; : : : ; xn =z1 ; : : : ; zn ] ∈ NC ∗ , and this latter term is equal to (@p s =
@xp ) · (v1 ; : : : ; vp )[w=x] since the variables zi are fresh.
Proof. Take Erst a closed term t which is typeable of type A. Then by the inter-
pretation lemma we have t ∈ NA∗ . But A∗ ⊆ N, so t is strongly normalizing. For
a nonclosed term which is typeable in some typing context, any of its -closures is
strongly normalizing, and so the term itself is strongly normalizing.
Observe that saturated sets are closed under arbitrary intersections. Therefore, it is
straightforward to adapt the proof above and show strong normalization of a second-
order version of the di"erential lambda-calculus.
34 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
We prove Erst a version of Leibniz formula which will be useful in the sequel.
Lemma 36. Let t and u be terms and let x and y be distinct variables such that y
not occur free in u. Then
@t[x=y] @t @t
·u= · u [x=y] + · u [x=y]:
@x @x @y
generalize now this formula to iterated derivatives, and this leads to the announced
Leibniz formula.
Lemma 37. Let t and u be terms and let x and y be distinct variables such that y
does not occur free in u. Then
@n t[x=y] n n n @n t n
· u = : · u [x=y]:
@xn p=0 p @xp @yn−p
n+1
Proof. Induction on n, using the lemma above and the well known identity p+1 =
n n
p+1 + p .
Consequently
n n−1
@ (x)tV n @ (u)tV n−1
· u [0=x] = n · u [0=x]:
@xn @xn−1
Since y does not occur free in tV, one has, for p¿1,
p
@p (y)tV p @ y p (u)tV if p = 1
· u = · u V
t =
@yp @yp 0 if p ¿ 1
and this proves the Erst statement. The second statement is a consequence of the Erst
one and of an iterated use of Lemma 8 (if the length of the sequence of terms tV is 1,
then the lemma applies directly, otherwise some iteration on the length of this sequence
is needed). A more interesting way of proving this statement is as follows.
Let us say that a term s is linear in a variable y if we have
@s
· z = s [z=y]
@y
36 T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41
where z is some (or equivalently, any) variable not occurring free in s. Applying
Lemmas 4 and 5, one shows that if s is linear in y and y does not occur free in the
term u, then (@s=@x) · u is linear in y, as soon as x is a variable di"erent from y.
We conclude using the fact that if s is linear in y, then s [0=y] = (@s=@y) · 0 = 0 and
the fact that (y)tV is linear in y (indeed, y is linear in y and if s is linear in y and y
does not occur free in t, then (s)t is linear in y).
Theorem 39. Let s and u be terms of the ordinary lambda-calculus, and assume that
(s)u is -equivalent to ?. Then there is exactly one integer n such that (D1n s · un )0
0, and for this value of n, one has
(D1n s · un )0 n!?:
holds in a rather trivial way in that particular case. This formula always holds, seman-
tically, at least in the simply typed case (see [10]), but is not so easy to interpret in
general.
Proof. If the term t is solvable (i.e. has a head normal form, i.e. has a Enite head
reduction), we call the head normal form of t the result of the head reduction of t. We
recall the well-known lambda-calculus property that if t and v are any terms such that
(t)v (resp. t [v=x]) is solvable, then so is t. We write t &k t when t head reduces in k
steps to t . Another standard lambda-calculus property that we shall also use without
further mention is that if t &k t then t [v=x] &k t [v=x].
Assume s and u are as in the theorem. Thus s is solvable. For any term v we denote
by v the term v [u=x]. We deEne a number L(s; u) by induction on the length of the
head reduction of (s0 )u to ? where s0 is the head normal form of s. Without loss of
generality we may assume that s is in head normal form. There are two cases:
• s = x ? where x = ?;
• s = x (x)tV for some sequence of terms tV such that (u)tV ?.
In the former case we set L(s; u) = 0. In the latter case we deEne s+ by
s+ = x(u)tV:
Note that s+ , u satisfy the assumptions of the theorem because (s+ )u (u)tV (s)u.
Let s0+ be the head normal form of s+ ; then s0+ = x v for some v. Let k be the
length of the head reduction of s+ . With these notations we have (u)tV &k v. Therefore
(s)u &1 (u)tV &k v . On the other hand (s0+ )u &1 v so that the length of the head reduction
to ? of (s)u is strictly greater than the length of the head reduction of (s0+ )u to ? as
soon as k¿0.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 37
The number L(s; u) counts the substitutions of the successive head variables of s in
the linear head reduction of (s)u ([7]). The head variable of s is the only occurrence
of variable in (s)u which may be considered as linear. So L(s; u) may be viewed
as counting the number of linear substitutions by u that are performed along the
reduction. The theorem enforces the intuition that the derivation operator implements
linear substitution in lambda-calculus.
Going now in the opposite direction, we want to conclude this section by computing
the Taylor expansion of the outermost application in the well known nonsolvable term
(), where = x (x)x. This expansion is
∞ 1
(D1n · n )0:
n=0 n!
But clearly t0 = 0 and hence tn = 0 for each n. Hence the Taylor expansion of ()
is 0. This re2ects of course the fact that this term is unsolvable, and this observa-
tion enforces the idea that the Taylor expansion provides an approach to term ap-
proximations similar in spirit to the BFohm tree approach, the role of 8 being played
by 0.
Acknowledgements
We would like to thank especially one of the referees of this paper, who made many
insightful comments on an earlier version of this work and derived us to clarify several
delicate syntactical aspects of the di"erential lambda-calculus.
In [10], the Erst author introduced a semantics of linear logic based on KFothe spaces
which are locally convex topological vector spaces of a quite particular kind, in some
sense similar to coherence spaces. We present here shortly a simpliEed version of this
semantics, based on the notion of 5niteness spaces, a “discrete” analogue of KFothe
spaces. This model will be presented more thoroughly in [11].
Given a set I and a subset F of P(I ), let us denote by F⊥ the set of all subsets of
I which have a 5nite intersection with all the elements of F. A Eniteness space is a
pair X = (|X |; F(X )), where |X | is a set (the web of X ) and F(X ) is a subset of P(|X |)
which satisEes F(X ) = F(X )⊥⊥ and whose elements are called the 5nitary subsets of
|X |. Given a Eniteness space X , we deEne RX as the subset of all x ∈ R|X | such
that the set |x| = {a ∈ |X | | xa = 0} belongs to F(X ). The set RX has clearly a module
structure (all operations being deEned pointwise), since the union of two Enitary sets
is still Enitary. 6
The main purpose of these deEnitions
is that, given x ∈ RX and x ∈ RX ⊥ , it is
possible to deEne x; x ∈ R as a∈|X | xa xa since this sum is Enite 7 by deEnition (of
course, X ⊥ is deEned as (|X |; F(X )⊥ )). In this way, the pair (RX ; RX ⊥ ) carries a
well-behaved duality (each module can be seen as the topological dual of the other for
a suitable linear topology in the sense of [16], but this needs not be explained here).
A morphism from X to Y (Eniteness spaces) is a linear function from RX to RY
which is continuous for the topologies mentioned above. But these morphisms admit a
more concrete matricial characterization as we shall see.
We can use Eniteness spaces for interpreting all the formulae of propositional linear
logic and we brie2y survey now the corresponding space constructions and their main
properties. Let X and Y be Eniteness spaces.
6 The point of the condition F(X ) = F(X )⊥⊥ is that, for checking that u ⊆ |X | is Enitary in X , one has
only to check that u has a Enite intersection with all the elements of F(X )⊥ ; therefore if u and v are Enitary
in X , so is u ∪ v.
7 More precisely, it has only Enitely many nonzero terms.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 39
• The direct sum X ⊕ Y and the direct product X & Y of X and Y are the same space
whose web is |X | + |Y | (disjoint union) and where a subset of this disjoint union is
Enitary if each of its restrictions to |X | and |Y | is Enitary (in X and Y , respectively).
InEnite direct sums and products can be deEned as well but do not coincide anymore.
• The tensor product of X and Y is the space X ⊗ Y whose web is |X | × |Y | and
where a subset of that web is Enitary if its two projections are Enitary (in X and Y
respectively). It can be checked that the subset F(X ⊗ Y ) of P(|X | × |Y |) deEned
in this way satisEes indeed that F(X ⊗ Y )⊥⊥ = F(X ⊗ Y ).
⊥
• The linear function space X ( Y is deEned as (X ⊗ Y ⊥ ) . An element of RX ( Y
should be seen as a matrix indexed over |X | × |Y |, with coePcients in R. Given
|X | × |Z|
A ∈ RX ( Y and B ∈ RY ( Z, the product of matrices BA ∈ R is given by
(BA)a; c = b∈|Y | Aa; b Bb; c . Due to the fact that |A| ⊆ F(X ( Y ) and |B| ⊆ F(Y ( Z),
this sum indeed is always Enite and it is not hard to check that BA ∈ RX ( Z.
The identity matrix I ∈ RX ( X is deEned as usual by Ia; b = a; b and is the neutral
element for matrix composition. In this way we have deEned a category of Eniteness
spaces, where a morphism from X to Y is an element of RX ( Y , called linear
category in the sequel.
If A ∈ RX ( Y and x ∈ RX , we can deEne A · x ∈ RY by (A · x)b = a∈|X | Aa; b xa ,
thus allowing one to see any element of RX ( Y as a (linear and continuous 8 )
function from RX to RY . This map from matrices to linear and continuous functions
turns out to be a bijection.
This linear category is a model of multiplicative-additive linear logic, that is, a
?-autonomous category with Enite sums and products (see [3]). In particular, all the
operations we have deEned are functorial. Of course, we need more for getting a
model of full linear logic, or of simply typed lambda-calculus. We have to deEne an
exponential.
Given a Eniteness space X , we deEne !X as follows: |!X | is the set of all Enite
multi-sets of elements of |X |, and if U is a collection of such multi-sets, we decide
that it is Enitary (that is, belongs to F(!X )) if the union of the supports 9 of the
elements of U is Enitary in X . It turns out that this collection F(!X ) of subsets of |!X |
satisEes our basic requirement, namely F(!X )⊥⊥ = F(!X ).
This operation on objects can be turned into an endofunctor on the linear category
described above by deEning its action on matrices: if A ∈ RX ( Y , it is possible
to deEne !A ∈ R!X ( !Y . We do not describe this operation here, we just give its
fundamental property (which completely characterizes it). Given x ∈ RX , we can
where (eb )b∈|Y | is the “canonical basis” of RY deEned by (eb )b = b; b so that the
elements of !X ( Y can be considered as power series from RX to RY . In view
of all the structure presented so far, it is a standard fact in the semantics of linear
logic that the category whose objects are the Eniteness spaces and where a morphism
from X to Y is an element of R!X ( Y , and equipped with a notion of composition
deEned in terms of the comonad structure of the “!” functor, 10 is a cartesian closed
category, that is, a model of simply typed lambda-calculus: the Kleisli category of the
comonad “!”. See for instance [4].
We Enish this short presentation by a word about di"erentiation. Due to the fact
that Enite sums and products coincide, we can build a canonical linear morphism M
from !X ⊗ !X to !X (we apply the ! functor to the co-diagonal of the sum X ⊕ X
which is an element of RX ⊕ X ( X = RX & X ( X ; seen as a linear map from
RX × RX to RX , this morphism is just addition). By pre-composing a power
series ’ ∈ R!X ( Y with M , we obtain an element of R!X ⊗ !X ( Y , that is, a
two-parameter power series, which is characterized by (x; y) = ’(x + y). We have
on the other hand a linear morphism @0 from X to !X which actually embeds 11 X
into !X ; this morphism is the matrix given by @0a; m = [a]; m . If ’ is as above, by pre-
composing @0 with ’ in the linear category, we get a linear morphism from X to Y
which is easily seen to be the derivative of ’ (considered as a power series) at 0.
Now it should be clear how to use M and @0 to compute the derivative of ’ at any
point x of RX , and not only at 0: translate ’ using M and then use @0 for derivating
the obtained power series y
→ ’(x + y) at 0.
Cartesian closedness of the Kleisli category and these operations M and @0 are
the basic ingredients for interpreting the di"erential lambda-calculus in this category.
A type will be interpreted as a Eniteness space (implication on types being interpreted
by the operation (X; Y )
→ !X ( Y on spaces), and a closed term of type A will be
interpreted by an element of RX where X is the space interpreting A.
References
[1] T. Altenkirch, N. Ghani, C. McBride, Derivatives of containers, in: M. Hofmann (Ed.), Proc. Sixth
Typed Lambda Calculi and Applications Conf, Lecture Notes in Computer Science, Vol. 2701, Springer,
Berlin, 2003, pp. 16–30.
[2] H. Barendregt, The lambda calculus, in: Studies in Logic and the Foundations of Mathematics, Vol.
103, North-Holland, Amsterdam, 1984.
[3] M. Barr, ∗-autonomous categories, in: Lecture Notes in Mathematics, Vol. 752, Springer, Berlin, 1979.
10 This notion of composition coincides of course with the standard composition of the power series
corresponds to the dereliction rule of linear logic and is part of the comonad structure of the ! functor.
T. Ehrhard, L. Regnier / Theoretical Computer Science 309 (2003) 1 – 41 41
[4] G. Bierman, What is a categorical model of intuitionistic linear logic? in: M. Dezani-Ciancaglini, G.D.
Plotkin (Eds.), Proc. Second Typed Lambda-Calculi and Applications Conf, Lecture Notes in Computer
Science, Vol. 902, Springer, Berlin, 1995, pp. 73–93.
[5] G. Boudol, The lambda calculus with multiplicities, Tech. Report 2025, INRIA Sophia-Antipolis, 1993.
[6] G. Boudol, P.-L. Curien, C. Lavatelli, A semantics for lambda calculi with resource, Mathematical
Structures in Computer Science 9 (4) (1999) 437–482.
[7] V. Danos, H. Herbelin, L. Regnier, Games semantics and abstract machines, in: Proc. Eleventh Symp.
on Logic in Computer Science, IEEE Computer Society Press, Silver Spring, MD, 1996.
[8] V. Danos, L. Regnier, Head linear reduction, 2003, submitted for publication.
[9] N.G. De Bruijn, Generalizing automath by means of a lambda-typed lambda calculus, in: D.W. Kueker,
E.G.K. Lopez-Escobar, C.H. Smith (Eds.), Mathematical Logic and Theoretical Computer Science,
Lecture Notes in Pure and Applied Mathematics, Marcel Dekker, New York, 1987, pp. 71–92. (Reprinted
in: Selected papers on Automath, Studies in Logic, vol. 133, pp. 313–337, North-Holland, Amsterdam,
1994).
[10] T. Ehrhard, On KFothe sequence spaces and linear logic, Mathematical Structures in Computer Science
12 (2002) 579–623.
[11] T. Ehrhard, Finiteness spaces, Preliminary version accessible from http://iml.univ-mrs.fr/
∼ehrhard/, 2003.
[12] J.-Y. Girard, Linear logic, Theoretical Computer Science 50 (1987) 1–102.
[13] J.-Y. Girard, Coherent banach spaces: a continuous denotational semantics, Theoretical Computer
Science 227 (1999) 275–297.
[14] A. Kriegl, P.W. Michor, The convenient setting of global analysis, in: Mathematical Surveys and
Monographs, vol. 53, American Mathematical Society, Providence, RJ, 1997.
[15] J.-L. Krivine, Lambda-Calculus, Types and Models, Ellis Horwood Series in Computers and Their
Applications, Ellis Horwood, Chichester, UK, 1993 (Translation by RenZe Cori from French 1990 edition
Masson).
[16] S. Lefschetz, Algebraic topology, in: American mathematical society colloquium publications, Vol. 27,
American Mathematical Society, Providence, RI, 1942.
[17] C. McBride, The Derivative of a Regular Type is its Type of One-Hole Contexts, Unpublished,
electronically available, 2000.