COL226 9 Variables in the Calculator
COL226 9 Variables in the Calculator
Variables
So far, we have looked at type-checking, evaluating (in a value domain) and calculating (in
abstract syntax) expressions. However, “programming” makes sense when one performs
the same calculation on different inputs. In that sense, our toy language compiler or a
definitional interpreter are programs that can work on different expressions as input —
however, the toy language is not yet a “programming language”. The next step we will take
is to make some parts of expressions vary. Mathematicians do this all the time: for
example, they calculate a formula b 2 − 4ac for different values of a, b, c. The expressions
a, b, c are called input variables since they can (each) be given various values.
So our first step is to extend the toy language with a new case of expressions — namely,
variables. Assume that we have a (denumerable) set of identifiers (assume also that
these identifiers are different from other symbols in the language), with
x, y, z, x1, y1, … ∈ representing typical variables. (We call x, y, z, x1, y1, … “meta-
variables” — the actual variables in the language will be particular strings).
The functions ht and size are amended as follows, mapping all variables to have height 0,
and size 1.
and
𝒳
𝒳
let rec size e = match e with
Num n -> 1
| Bl b -> 1
| V x -> 1
| Plus (e1, e2) -> 1 + (size e1) + (size e2)
| Times (e1, e2) -> 1 + (size e1) + (size e2)
| And (e1, e2) -> 1 + (size e1) + (size e2)
| Or (e1, e2) -> 1 + (size e1) + (size e2)
| Not e1 -> 1 + (size e1)
| Eq (e1, e2) -> 1 + (size e1) + (size e2)
| Gt(e1, e2) -> 1 + (size e1) + (size e2)
;;
The “has type” relation is now modified to carry a type assumption, written to the left of
the “turnstile”, to handle the presence of variables within expressions, and by adding a rule
to deal with the base case of variables. Note that each statement is modified to read as
Γ⊢_:_
Γ ⊢ e1 : Int Γ ⊢ e2 : Int
(TimesT)
Γ ⊢ e1 * e2 : Int
All multiplication expressions e1 * e2 have type Int, provided the subexpressions e1
and e2 both have type Int under the same type assumptions Γ.
Γ ⊢ e1 : Bool
(NotT)
Γ ⊢ ¬e1 : Bool
All negation expressions ¬e1 have type Bool, provided the subexpressions e1 have
type Bool under the same type assumptions Γ.
Γ ⊢ e1 : Bool Γ ⊢ e2 : Bool
(AndT)
Γ ⊢ e1 ∧ e2 : Bool
All conjunction expressions e1 ∧ e2 have type Bool, provided the subexpressions e1
and e2 both have type BoolT under the same type assumptions Γ.
Γ ⊢ e1 : Bool Γ ⊢ e2 : Bool
(OrT)
Γ ⊢ e1 ∨ e2 : Bool
All disjunction expressions E1 ∨ E2 have type Bool, provided the subexpressions e1
and e2 both have type Bool under the same type assumptions Γ.
Γ ⊢ e1 : τ Γ ⊢ e2 : τ
(EqT)
Γ ⊢ e1 = e2 : Bool
All equality expressions e1 = e2 have type Bool, provided the subexpressions e1 and
e2 both have type τ under the same type assumptions Γ.
Γ ⊢ e1 : Int Γ ⊢ e2 : Int
(GtT)
Γ ⊢ e1 > e2 : Bool
All greater-than expressions e1 > e2 have type Bool, provided the subexpressions e1
and e2 both have type Int under the same type assumptions Γ.
eval[[ x]] ρ = ρ( x)
The big-step (Kahn-style) operational semantics also needs to be modified. The calculates
relation now needs to return an answer when the input expression is a variable. What
answer? Whatever answer the variable is bound to. So we need a data structure that
associates variables to canonical answers. Let us call this a “table”, which is nothing but a
finite-domain function γ ∈ →fin Ans. Why “finite-domain”? Because a calculator has
to operate with finite data structures and not mathematical abstractions such as valuations
(which can be infinite).
Accordingly, we modify the calculates relation by introducing a table in each of the rules.
And we add a rule to deal with the case of variables. To highlight that the table does not
change during the calculation process, we place the table to the left of a turnstile.
γ ⊢ e1 ⟹ n1 γ ⊢ e2 ⟹ n2
(CalcTimes) provided tIMES(n1, n 2, n)
γ ⊢ e1 * e2 ⟹ n
γ ⊢ e1 ⟹ b1
(CalcNot) provided myNot (b1, b)
γ ⊢ ¬e1 ⟹ b
γ ⊢ e1 ⟹ b1 γ ⊢ e2 ⟹ b2
(CalcAnd) provided myAnd(b1, b2, b)
γ ⊢ e1 ∧ e2 ⟹ b
γ ⊢ e1 ⟹ b1 γ ⊢ e2 ⟹ b2
CalcOr) provided myAnd(b1, b2, b)
γ ⊢ e1 ∨ e2 ⟹ b
𝒳
γ ⊢ e1 ⟹ a1 γ ⊢ e2 ⟹ a2
(CalcEq) provided eQ(a1, a 2, b)
γ ⊢ e1 = e2 ⟹ b
γ ⊢ e1 ⟹ n1 γ ⊢ e2 ⟹ n2
(CalcGt) provided gT (n1, n 2, b)
γ ⊢ e1 > e2 ⟹ b
Note that in (CalcVar), no answer is returned if x ∉ dom(γ). The calculator gets stuck!
What about Soundness and Completeness of the calculator with respect to the
Definitional Interpreter? How do those statements change?
We need a notion of a table and a valuation agreeing with each other. That is,
for every x ∈ dom(γ): eval[[γ ( x)]] ρ = ρ( x)
With this assumption on γ and ρ, it is fairly easy to state and prove Soundness.
Exercise: Define the function vars( e). Note the similarity in structure to ht, size, and
eval.
The type preservation theorem requires some additional conditions (apart from the
assumption that elementary operations are type-sound).
We say that a table γ is type-consistent with a type assumption Γ if for every x ∈ dom(γ):
x ∈ dom(Γ) and Γ ⊢ γ ( x) : Γ( x). That is, the answer associated with any variable in a
table is indeed of the same type associated with it by the type assumption.
The Induction Hypothesis is a suitably modified version of the earlier IH, and the cases
in the Induction Step (ht ( e) = 1 + k) are more or less the same as before, with the
appropriate changes.
The opcodes need only a small extension however — an opcode to look up a variable in the
table. (In practice, we get rid of the variables and use some address/indexing mechanism).
The compile function therefore has minimal changes == the inclusion on a line for
compiling variables
The stack machine, now endowed with a table in its configurations, needs to specify
how the LOOKUP(x) opcode is executed. Otherwise, it is substantially the same
(other than now containing a table component). Out of indolence, we have
represented a table as a function from strings to exp (and not very precisely to
answers).
The exception Stuck now takes 3 arguments, namely the table, stack and opcode
list. The only new line is
| s, (LOOKUP x)::c' -> stkmc g ((g x)::s) c'
where the value obtained from the table, namely (g x), is pushed onto the stack.