Masked Types For Sound Object Initialization: Xin Qi Andrew C. Myers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Masked Types for Sound Object Initialization

Xin Qi Andrew C. Myers


Computer Science Department
Cornell University
{qixin,andru}@cs.cornell.edu

Abstract tion 2 describes, a masked type keeps track of the parts of an object
This paper presents a type-based solution to the long-standing prob- that have not been initialized. For example, the type T \ f describes
lem of object initialization. Constructors, the conventional mech- an object of type T whose field f may not be initialized yet, and
anism for object initialization, have semantics that are surprising the type T \∗ represents an object none of whose fields are neces-
to programmers and that lead to bugs. They also contribute to the sarily initialized. As an object is constructed, the type of the object
problem of null-pointer exceptions, which make software less reli- changes to reflect the fields that are initialized. Thus, the type sys-
able. Masked types are a new type-state mechanism that explicitly tem for masked types is flow-sensitive; it has typestate [31]. The
tracks the initialization state of objects and prevents reading from type of an object conservatively tracks its initialization state, so a
uninitialized fields. In the resulting language, constructors are or- partially initialized object cannot be used where a fully initialized
dinary methods that operate on uninitialized objects, and no spe- object is expected.
cial default value (null) is needed in the language. Initialization The problem of object initialization is intertwined with the prob-
of cyclic data structures is achieved with the use of conditionally lem of null pointer exceptions, which significantly hurt software
masked types. Masked types are modular and compatible with data reliability [2]. Because object initialization is unsound, most lan-
abstraction. The type system is presented in a simplified object cal- guages aiming for type safety (e.g., Java, C#, Modula-3) first ini-
culus and is proved to soundly prevent reading from uninitialized tialize fields with null. This semantics implies that null must be a
fields. Masked types have been implemented as an extension to legal value for all object types, leading to ubiquitous, implicit null
Java, in which compilation simply erases extra type information. checks that can generate null pointer exceptions. Recently there has
Experience using the extended language suggests that masked types been interest in controlling null pointer exceptions through non-
work well on real code. null annotations and other means [7, 2, 19, 6]. Non-null annotations
by themselves do not solve the problem of object initialization; in
Categories and Subject Descriptors D.2.4 [Software/Program fact, they make it more important because non-null fields must be
Verification]: Class invariants, programming by contract; D.3.2 initialized before use. But with masked types, there is no need for
[Language Classifications]: Object-oriented languages a default initialization value. It is then straightforward to eliminate
null values entirely from the language. There are legitimate uses
General Terms Languages, Reliability of null other than as an initialization placeholder, but for these
Keywords invariants, null pointer exceptions, conditional masks, uses, an “option” or “maybe” type is a better approach, because it
cyclic data structures, data abstraction makes null checks explicit and rare.
A language with masked types can be simpler in another way.
There is no need to give constructors a special status in the lan-
1. Introduction guage, because types track initialization state. Rather than a lan-
Object initialization remains an unsatisfactory aspect of object- guage feature, constructors become a design pattern: they are ordi-
oriented programming. In the usual approach, objects of a given nary methods that change the initialization state of the receiver.
class are created and initialized only by class constructors. There- Cyclic data structures pose a challenge for object initialization.
fore, when implementing class methods, the programmer can as- However, conditionally masked types make it possible to create
sume that object fields satisfy an invariant established by the con- cyclic data structures, such as doubly-linked lists and trees with
structors. However, in the presence of inheritance, the methods of parent pointers, without resorting to placeholder null values. Con-
partly initialized objects may be invoked before the invariant has ditional masks record dependencies between initialization of differ-
been established. As a result, reasoning about object initialization ent fields, so that initializing one field can “tie the knot”, changing
can be challenging and non-modular. No fully satisfactory solution the initialization state of many fields at once.
to object initialization currently exists. Perhaps the most closely related prior work is that of Fähndrich
This paper presents a new solution to the object initialization and Xia [9], who introduce delayed types for static reasoning about
problem, based on a new type mechanism, masked types. As Sec- partially initialized objects. Masked types support cyclic data struc-
tures that delayed types do not. Masked types also support richer
initialization abstractions: for example, helper methods for par-
tial initialization and reinitialization of recycled objects. Abstract
Permission to make digital or hard copies of all or part of this work for personal or masks, described in Section 3, support initialization abstractions
classroom use is granted without fee provided that copies are not made or distributed that are compatible with data abstraction and inheritance.
for profit or commercial advantage and that copies bear this notice and the full citation Masked types have been formalized for a simplified object lan-
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
guage, described in Section 4. The key soundness theorem is for-
POPL’09, January 18–24, 2009, Savannah, Georgia, USA.
malized and has been proved for this language: well-typed pro-
Copyright c 2009 ACM 978-1-60558-379-2/09/01. . . $5.00. grams never read uninitialized fields.
1 class Point { class whose field c cannot be read, perhaps because it has not been
2 int x, int y; initialized. We say that the field c is masked in this type.
3 Point(int x, int y) {
A type with no mask means that the object is fully initialized.
4 this.x = x;
5 this.y = y; In typical programming practice, this would be the ordinary state
6 display(); of the object, in which its invariant is already established.
7 } On entry to a constructor such as Point(), the newly created
8 void display() { object has all its fields masked. The actual class of the new object
9 System.out.println(x + " " + y); might be a subclass (for example, CPoint), so on exit, subclass
10 } fields remain to be initialized. A subclass mask, written C.sub, is
11 } used to mask all fields introduced in subclasses of C, not including
12 those of C itself. Therefore, just before line 4 in Figure 1, the
13 class CPoint extends Point {
object being constructed has type Point\x\y\Point.sub. (While
14 Color c;
15 CPoint(int x, int y, Color c) { this type looks complicated, it can be inferred automatically.)
16 super(x, y); When a field is initialized by assigning to it, the correspond-
17 this.c = c; ing mask is removed from the type of the object. For example,
18 } line 4 in Figure 1 assigns to field x, so the type of this becomes
19 void display() { Point\y\Point.sub. After the assignment to y on the next line,
20 System.out.println(x + " " + y + " " + c.name()); the type of this becomes Point\Point.sub. Thus, the initializa-
21 } tion of various fields is recorded in the changing type of this.
22 } Because variables may have different types at different program
points, J\mask has a flow-sensitive type system.
Figure 1. Code with an initialization bug Subclass masks such as Point.sub can be removed when the
exact run-time class of an object is known, because there are no
subclass fields left to initialize. The type of a new expression is
Section 5 reports on the implementation of masked types as a known exactly, as is the type of a value of any class known not to
mostly backward-compatible extension to the Java language called have a subclass (in Java, a “final” class).
J\mask. Section 6 discusses experience using J\mask in the con- J\mask has a special mask ∗ as a convenient shorthand for
text of the Java Collections Framework, where masked types are masking all fields, including those masked by the subclass mask.
shown to do a good job of capturing desirable initialization idioms. On entry to the CPoint constructor, the object can be given type
Related work is discussed in Section 7. Section 8 concludes. CPoint\∗, which is equivalent to CPoint\x\y\c\CPoint.sub.

2.2 Mask effects


2. Masked types In J\mask, methods and constructors can have effects [23] that
Figure 1 illustrates a bug that can easily happen in an object- propagate mask information across calls. For example, the J\mask
oriented language like Java. In the class Point, representing a 2D signatures for the Point constructor and the display method can
point, the constructor calls a virtual method display that prints the be annotated explicitly with effect clauses:
coordinates of the point. The two fields x and y are properly initial-
ized before display is called. However, in the subclass CPoint Point(int x, int y) effect * -> Point.sub
representing a colored point, the display method has been over- void display() effect {} -> {}
ridden in a way that causes the added c field to be read before it is The effect of this Point constructor says that at entry to the
initialized, resulting in a null pointer exception. constructor, all fields are uninitialized (precondition mask *) and
This example is simple, but in general, initialization bugs are therefore unreadable; at the end of the constructor, only fields in-
difficult to prevent in an automatic way. It would be too restrictive troduced by subclasses of Point remain uninitialized (postcondi-
to rule out virtual method calls on partially constructed objects. tion mask Point.sub). Because the initial and final masks of the
Further, the bug involves the interaction of code from two different display method are both {}, denoting the absence of any mask,
classes (Point and CPoint). An implementer of CPoint might not the method can be called only with a fully initialized object, and it
have access to the code of Point and would not realize the danger leaves the object fully initialized.
of overriding the display method in this seemingly reasonable With these effects, the bug in Figure 1 would be caught stati-
way. cally. The method display cannot be invoked on line 6, because
Our goal is to prevent code like that of class Point from type- there the type of this is Point\Point.sub, which does not sat-
checking, but to allow complex, legitimate initialization patterns. isfy the precondition of display. The J\mask compiler detects this
The key observation is that before the call to display on line 6, unsafe call without inspecting any subclass of Point.
the fields in Point are initialized, but fields of subclasses of Point This example suggests how mask effects make the J\mask type
are not. However, the type of the method display does not prevent system modular. Mask effects explicitly represent the contract on
the partially initialized receiver from being passed to an overridden initialization states that a method is guaranteed to follow. This
version of the method that reads uninitialized fields, as in CPoint. explicit contract allows the compiler to type-check programs one
class at a time, and also enables programmers to reason about
2.1 Types for initialization state initialization locally.
A masked type T \M, where M is a mask that denotes some object Indeed, masked types and mask effects capture changes to ini-
fields, is the type T but without read access to the denoted fields. tialization state with enough precision that constructors in J\mask
Masked types are a completely static mechanism, so a J\mask are essentially ordinary methods that remove masks from the re-
program is compiled by erasing masks. No run-time penalty is paid ceiver. However, for convenience and backward compatibility, the
for safe object initialization. J\mask language still has constructors.
The simplest form of a mask is just the name of a field. For To reduce the annotation burden, the J\mask language provides
example, an object of type CPoint\c is an instance of the CPoint default effects for methods and constructors. Programmers do not
normally have to annotate code with effects or masks. For ordinary C\f! x = ...;
methods, the default is {} -> {}; for constructors, the default D z = ...;
effect is close to that shown above (see Section 2.3). z.g = x;
The effects shown capture changes to the initialization state The non-aliasing requirement on must-masks might seem re-
of the parameter this, the receiver object. J\mask also supports strictive, but it is usually not a problem: must-masks typically ap-
effects on other parameters, as shown in Section 2.5. pear near allocation sites, where no alias has been created.
For simplicity, exceptions, which are rarely thrown during ini-
tialization anyway, have been ignored in this paper. However, ex- 2.4 Reinitialization
ceptions can be supported by providing a postcondition for each
Beyond initialization, masked types can help reasoning about reini-
exceptional exit path in the effect clause.
tialization. A mask can represent not only an uninitialized field, but
also a field that must be reassigned before further read accesses. To
2.3 Must-masks enforce reinitialization, a mask can be introduced on the field, via
All the masks shown in Section 2.1 are simple masks. A simple the subtyping rule T ≤ T \ f .
mask S, e.g., f , ∗, or C.sub, means that the fields it describe may For example, Figure 2 illustrates a custom memory management
be uninitialized. Thus, there is a subtyping relationship T ≤ T \S, system that manages a pool of recycled objects of the class Node.
because it is safe to treat an initialized field as one that may be Actively used objects are not in the pool and store data in their d
uninitialized. fields. Objects in the pool are threaded into a freelist using their
However, when an object is created, it is known that all the next fields. When a Node object is no longer used, it is put into
fields must be uninitialized. J\mask uses must-masks, written S!, to a pool by calling the recycle method; when a new instance of
describe fields that must definitely be uninitialized. A must-masked Node is needed, the getNode method returns an object from the
type T \S! is also a subtype of T \S, but T is not a subtype of T \S!. pool, if there is any. Masked types can help ensure that the field
One use of must-masks is for initialization of “final” fields, d is reinitialized whenever a Node object is retrieved from the
which is only allowed when the field is must-masked, ensuring that pool and gets a second life. Of course, like most custom memory
the field is initialized exactly once. Must-masks and the absence of management systems, the code in this example does not guarantee
masks roughly correspond to the notions of definite unassignment that no alias exists after an object is recycled. Masked types are not
and definite assignment in the Java Language Specification [12]. intended to enforce this kind of general correctness.
However, J\mask ensures that a final field cannot be read before it
is initialized, while Java does not. J\mask also lifts the limitation in 1 class Node {
Java that final fields can only be initialized in a constructor or an 2 Data d;
3 Node\d next;
initializer. 4 }
Must-masks are also used to express the default effect of a con- 5
structor of class C, which is *! -> C.sub!. Objects start with all 6 class Pool {
fields definitely uninitialized, which is represented with the initial 7 Node\d head;
mask *!. Constructors usually do not initialize fields declared in 8 ...
subclasses, so the default postcondition mask is C.sub!. 9 Node\d\next getNode() {
Must-masks impose restrictions on how an object can be 10 if (head != sentinel) {
aliased: if there is a reference with a must-masked type, it must 11 Node\d\next result = head;
be the only reference through which the object may be accessed; 12 head = head.next;
13 return result;
otherwise, the must-masked field might be initialized through an- 14 } else
other reference to the object, invalidating the must-mask. This does 15 return new Node();
not preclude aliasing, but implies rather that other references have 16 }
to be through fields that are themselves masked. 17 void recycle(Node\next n) {
J\mask uses typestate to keep track of initialization state. A 18 n.next = head;
problem with most previous typestate mechanisms is that they re- 19 head = n;
quire reasoning about potential aliasing, to prevent aliases to the 20 }
same object that disagree about the current state. Aliasing makes 21 }
it notoriously difficult to check whether clients and implementa-
tions are compliant with protocols specified with typestate [1], and Figure 2. Object recycling
much previous work on typestates requires complicated aliasing an-
notations or linear types. J\mask is designed to work with no extra The type Node is a subtype of Node\d, and therefore the sec-
aliasing control mechanism, which provides the added benefit of ond assignment (line 19) in method recycle type-checks, causing
soundness in a multi-threading setting, since operations on an ob- Node objects in the pool to “forget” about the data stored in field d.
ject through aliases from other threads do not invalidate typestates Masked types provide an additional benefit here. Objects in
in the current thread. active use have type Node\next, preventing traversal of the freelist
The key to avoiding reasoning about aliasing is that if an assign- from outside the Pool class.
ment creates an unmasked alias, then must-masks on both sides are
conservatively converted to corresponding simple (“may”) masks. 2.5 Initializing cyclic data structures
For example, after the following code, the type of both x and y is Many data structures that arise in practice contain circular refer-
the simply masked type C\f: ences: for example, doubly linked lists and trees whose nodes have
parent pointers. Safe initialization of these cyclic data structures
C\f! x = ...;
C\f! y = x; poses a challenge. In object-oriented languages, storing a reference
to a partially initialized object is normally required, with no guar-
Similarly the following code also removes the must annotation antee that the object is fully initialized before use.
from the type binding of variable x, because z.g becomes an alias J\mask explicitly tracks fields that point to partially ini-
and the field g is not masked in the type D of variable z: tialized objects with conditionally masked types, written
T \ f [x1 .g1 , . . . , xn .gn ]. The conditional mask f [x1 .g1 , . . . , xn .gn ] The dependencies between masks after line 20 in Figure 3 are
describes a field f referencing a partially initialized object, which summarized in Figure 4, where the mask at the tail of an arrow
will become fully initialized when all fields xi .gi are initialized. In is removed when the mask at its head is removed. The masks on
other words, the removal of the mask on f is conditioned on the this.left and this.right after line 20 transitively depend on
removal of all masks on xi .gi . the mask on this.parent.
Conditional masks are normally introduced by an assignment to
a must-masked field f , when the right-hand side of the assignment this.parent
has more masks than the declared field type. Consider, for example,
a field assignment x.f = y, where x has type T \ f !, y has type
T 0 \g, and the field f of class T has type T 0 . Note that T 0 \g is not l.Node.sub l.parent r.parent r.Node.sub
a subtype of T 0 . J\mask makes this assignment safe by changing
the type of x to T \ f [y.g] after the assignment, showing that the this.left this.right
field x.f is still masked, but its mask should be removed upon the
removal of the mask on y.g. Figure 4. Mask dependencies
1 class Node {
2 Node parent;
The postcondition in the effect of the Binary construc-
3 Node() effect *! -> *! { } tor summarizes the dependencies in the figure: parameters
4 } l and r both have mask ∗[this.parent], which means that
5 all their fields are conditionally masked, and this has type
6 final class Leaf extends Node { Binary\parent!\left[this.parent]\right[this.parent],
7 Leaf() effect *! -> parent! { } which is compatible with the parameter type of the Binary
8 } constructor. Therefore, the construction can proceed to build
9 higher trees. Finally, the tree is fully initialized when the parent
10 final class Binary extends Node { field of the root is initialized, because removing its mask enables
11 Node left, right;
12 Binary(
removing all the masks in Figure 4.
13 Node\parent!\Node.sub[l.parent] -> *[this.parent] l, In general, a field f should be unreadable unless every object
14 Node\parent!\Node.sub[r.parent] -> *[this.parent] r) transitively reachable through f has been appropriately initialized.
15 effect *! -> parent!, left[this.parent], That is, its masks have been removed at least to the level according
16 right[this.parent] { to the type of the field through which the object is referenced.
17 this.left = l; Therefore, there are three ways to remove a conditional mask
18 this.right = r; on field f :
19 l.parent = this;
20 r.parent = this; • Like other kinds of masks, the conditional mask can be removed
21 } by directly initializing the field f .
22 }
• As shown in Figure 3, on line 28, conditional masks on
23
24 Leaf\parent! l = new Leaf(); root.left and root.right are removed by removing the
25 Leaf\parent! r = new Leaf(); mask root.parent they (transitively) depend on.
26 Binary\parent!\left[root.parent]\right[root.parent] • The last way to remove a conditional mask is by creating cyclic
27 root = new Binary(l, r);
dependencies. For example, the following code creates cyclic
28 root.parent = root; // Now root has type Binary.
dependencies between x.f and y.g, which cancel each other.
// x starts with type C\f!, and y starts with D\g!
Figure 3. Initialization of a tree with parent pointers x.f = y; // now x has type C\f[y.g]
y.g = x; // now y has type D\g[x.f]
Figure 3 shows how to safely initialize a binary tree with parent // x can be typed C, and y can be typed D
pointers. For convenience, we assume all local variables, including
In general, if some dependencies form a strongly connected
formal parameters, are final. (Section 5 discusses how to relax
component in which no mask depends on a mask outside the
this.)
component, they can all be removed together.
Figure 3 also demonstrates effects on parameters other than the
receiver this: the parameters l and r of the Binary constructor Subtyping generalizes to conditionally masked types: T ≤
both have the type Node\∗[this.parent] upon the exit of the T \ f [x1 .g1 , . . . , xn .gn ] ≤ T \ f . In fact, a type T with unmasked field
constructor. f can be viewed as a type that has empty conditions for the mask
In this example, initialization is bottom-up, as it would be, on f , that is, T \ f [ ], and a simply masked type T \ f can be seen
for example, in a shift-reduce parser. Child nodes are created, as having an unsatisfiable condition on f , because a simple mask
initialized, and then used to construct their parent node. However, cannot be removed by removing other masks.
child nodes cannot be fully initialized before their parent fields are Conditional masks and simple masks do not impose any restric-
set, and moreover, they cannot even be considered fully initialized tion on aliasing, because mask subtyping ensures that they cannot
before the fields of all the objects that are transitively reachable are be invalidated by any future change to the object. This property has
set. (Top-down initialization of this data structure creates similar been called heap monotonicity [8].
issues.) Conditional masks also provide a way to create temporarily
The parent field of a node will eventually point to an object unreadable aliases for must-masked objects. Because the aliases are
that is created later and that contains child pointers pointing back unreadable, the must annotations need not be removed. In Figure 3,
to the current node, creating parent–child cycles. Of course, the for example, the assignment on line 17 creates an alias this.left
parent field of the root of the tree must point to something special. for the left child object stored in variable l, but l remains of
For example, it can point to the root itself, as shown on line 28, or type Node\parent!, since the field this.left is masked with
to a sentinel node. the conditional mask left[l.parent] after line 17. Not losing the
must information means the initialization state of l is tracked more In general, C.M.sub represents the subclass mask of abstract
precisely. mask M with respect to class C, and the interpretation of M in the
For simplicity, fields currently must be declared with unmasked context of C is a set consisting of all the concrete masks added into
or simply masked types; no conditional masks or must-masks are M in C and its superclasses, together with subclass mask C.M.sub.
allowed. It should be straightforward to add support for condition- Before type checking, the J\mask compiler internally expands all
ally masked field types, but this is left for future work. abstract masks into their interpretations.
Subclass masks are important for modular type checking, be-
3. Abstract masks cause they make it possible to distinguish the current definition of
an abstract mask and overriding definitions in subclasses, which are
With the exception of ∗ and C.sub, the masks we have seen so
generally unavailable in a modular setting.
far are concrete, explicitly naming instance fields. Concrete masks
create difficulties for data abstraction, because the fields might not
be visible where the masks are needed. For example, in Figure 3,
if the two fields left and right of class Binary were private, it 1 class C {
would be impossible to declare the local variable root as shown on 2 T f;
3 mask M += f;
line 26, because its type mentions the names of the fields outside
4 void initM() effect M -> {} {
the class definition. 5 this.f = ...;
1 class Node { 6 }
2 mask Children; 7 }
3 ... 8
4 } 9 class D extends C {
5 10 T g;
6 final class Binary extends Node { 11 mask M += g;
7 private Node left, right; 12 void initM() effect M -> {} {
8 mask Children += left, right; 13 this.g = ...;
9 Binary(...) 14 super.initM();
10 effect *! -> parent!, 15 }
11 Children[this.parent] { ... } 16 }
12 ...
13 } Figure 6. Code that needs mask constraints
14 ...
15 Binary\parent!\Children[root.parent]
16 root = new Binary(l, r); Mask constraints. Subclass masks help prevent unsafe calls, but
17 root.parent = root; since they describe fields that are generally not known in the cur-
rent class, safely removing them by initialization requires some
Figure 5. The tree example with abstract masks additional mechanism. Figure 6 illustrates an initialization helper
method initM, which is intended to remove the abstract mask M
Therefore J\mask introduces abstract masks that abstract over from its receiver. It is properly overridden in the subclass D to han-
sets of concrete fields, providing a way to write types that mask dle the overridden abstract mask M. However, the initM method
fields that are not visible. Figure 5 shows an updated version of the would not type-check as written in Figure 6, because right after
code from Figure 3, where the two fields left and right are now line 5, the type of this is actually C\C.M.sub, rather than the un-
private, and an abstract mask Children is introduced to mask them masked type C.
outside the class Binary. The Children mask is first declared in J\mask uses mask constraints to solve this problem. Ev-
class Node (line 2), with an empty set of fields, and is overridden ery J\mask method can declare a mask constraint of the form
in Binary (line 8) to include the two children of a binary node. captures M1 , . . . , Mn , where M1 , . . . , Mn are abstract masks. This
J\mask currently allows abstract masks to be overridden only to constraint means that the body of the method is type-checked as-
include more fields; more complex overriding is left to future work. suming that the masks Mi are the same as their concrete definition
The ∗ mask, introduced in Section 2.1, is not much different in the class where the method is defined, with no subclass masks.
from any other abstract mask, except that it is built-in, and is For example, the signature of initM on lines 4 and 12 can be
automatically overridden in every class to include all the fields updated with a mask constraint:
declared in that class.
void initM() effect M -> {} captures M
3.1 Modular checking of abstract masks
Subclass masks. The Point/CPoint example in Section 2.1 The example then type-checks, because at the entries to initM
showed that unsafe calls to overridden methods could be in classes C and D, the type of this becomes C\f and D\f\g
caught in a modular way with the help of the subclass mask respectively, rather than C\f\C.M.sub and D\f\g\D.M.sub.
Point.sub. The mask Point.sub can be connected to the However, when type-checking callers against the public signa-
abstract mask ∗ through the equivalence of the two types ture of the method, the abstract mask should still be interpreted to
Point\∗ and Point\x\y\Point.sub. Any type with an ab- include the subclass mask.
stract mask can be similarly expanded. For example, given A method defined in class C with a mask constraint on an
the code in Figure 5, the masked type Binary\Children abstract mask M depends on the set of fields that M denotes in
is equivalent to Binary\left\right\Binary.Children.sub, C. It would be unsound to allow that method to be inherited by
where Binary.Children.sub represents all the concrete masks a subclass that overrides the abstract mask. Therefore, the type
that are added into overriding declarations of Children system requires such methods to be overridden when the masks
in subclasses of Binary, excluding Binary itself. The set they depend on are overridden. Consequently, constructors, final
{left, right, Binary.Children.sub} is the interpretation of methods, and static methods cannot have mask constraints, because
Children in the context of Binary. they cannot be overridden in subclasses.
programs Pr ::= hL, ei J\mask only supports single inheritance. The root of the class
class declarations L ::= class C extends C0 {F Mt} hierarchy is denoted by ◦. We write C ≺ C0 to mean that class C
field declarations F ::= T f is a direct subclass of C0 , and the relation ≺∗ is the reflexive and
method declarations Mt ::= T m(T x) effect M1 M2 {e} transitive closure of ≺.
simple masks S ::= f | subC Notably, there is no null value in the language, because none
masks M ::= S | S! | S[p.S p ] is needed for object initialization.
paths p ::= ` | x There are three kinds of masks: simple masks S, must-masks
unmasked types U ::= ◦ | C | C! S!, and conditional masks S[p.S p ]. The auxiliary function simple
types T ::= U | T \M elides the must annotation and conditions of a mask.
expressions e ::= (T p) | new C | e1 ; e2 | e. f
| (T1 p1 ). f = (T2 p2 ) | (T0 p0 ).m((T p))
| let T x = e1 in e2 simple(S) = S
typing environments Γ ::= 0/ | Γ, x : T | Γ, ` : T simple(S!) = S
heaps H ::= 0/ | H, ` 7→ o
objects o ::= C!\M{ f = `} simple(S[p.S p ]) = S
evaluation contexts E ::= [·] | E. f | E; e | let T x = E in e
There are two kinds of simple masks: concrete field masks f ,
Figure 7. Grammar and subclass masks subC , that is, C.sub in the J\mask language.
The calculus does not explicitly model the abstract mask *, because
it can be expanded into a collection of field masks and a subclass
3.2 Mask algebra mask. For the simplicity of the semantics, other abstract masks and
mask constraints are omitted.
J\mask supports two algebraic operations on masks that make ab- We require that in a well-formed type, no two masks mention
stract masks more useful: (M1 + M2 ) and (M1 − M2 ). the same field, and every variable appearing in a condition is in the
An abstract mask can be interpreted as a set of concrete masks domain of the typing environment. The order of masks in a type
on fields and possibly a subclass mask. The two operators on masks does not matter, so T \ f1 \ f2 = T \ f2 \ f1 .
correspond to the set union (+) and set difference (−) of the An unmasked type U is either a normal class type C or an exact
interpretations of the abstract masks. Concrete masks can appear class type C!. An object of C! must be an instance of class C, and
in algebraic masks, where they are interpreted as singleton sets. not of any proper subclass of C. (This overloads the “!” symbol,
Algebraic masks enable the programmer to express initializa- which is also used for must-masks.) The source of exactly typed
tion state abstractly, without knowing all the fields masked by an values is object creation, because the expression new C has type
abstract mask. For example, suppose there is a local variable x, C!. Exact types are useful because they make removal of subclass
starting with the type T \M where M is an abstract mask, and field masks possible, as discussed in Section 2.1.
x. f is initialized: An object is created with expression new C, which adds a fresh
T\M x = ...; memory location to the heap, with all fields uninitialized. Uninitial-
x.f = ...; // The type of x is now T\(M - f) ized fields are not represented in the heap, so there is no need for
Here, one needs not know which concrete masks are included in M, null. Initialization is done by calling appropriate methods.
nor even whether M includes f . To simplify presentation of the semantics and the proof of
Mask algebra also helps programmers compose masks to keep soundness, we allow only paths p (local variables x at compile time,
the typestates in J\mask compact. For example, if a class has n or heap locations ` at run time) to appear in field assignments and
fields, each of which might independently be initialized or unini- method calls. This does not restrict expressiveness, because of let
tialized, it would require 2n different typestates to represent all pos- expressions.
sible initialization states, were there no mask algebra. With mask Every read through a path p is represented as an expression
algebra, one can simply use the “sum” of the masks corresponding (T p), where the annotation T is a statically known type. The
to all the uninitialized fields. annotation is primarily to make the proof of soundness easier; in
J\mask currently only supports these two algebraic operations the actual J\mask implementation, T is inferred by the compiler.
on masks, but they seem to suffice. Richer operators on masks are Typing environments Γ contain type bindings for both variables
left to future work. x and heap locations `. Bindings for locations are extracted from
the heap and are used to type-check expressions during evaluation.
The J\mask calculus models the heap as a function from mem-
4. The J\mask calculus ory locations l to objects o. The formalization attaches a type to ev-
We now formalize masked types as part of a simple object calculus. ery object on the heap, in addition to value bindings for the fields.
Unfortunately, previous object calculi are not suitable for modeling The object type is always based on some exact class type, which
masked types. is known at run time. The type might also have masks, and since
the base class is always exact, no subclass mask may appear on the
4.1 Grammar heap. Masks in the operational semantics are included only for the
soundness proof and can be erased in the implementation.
Figure 7 shows the grammar of the core J\mask calculus. We use
the notation a for both the list a1 , . . . , an and the set {a1 , . . . , an }, for
n ≥ 0. We abbreviate terms with list subterms in the obvious way,
e.g., T x stands for T1 x1 , . . . , Tn xn , T \M stands for T \M1 \. . .\Mn , 4.2 Class member lookup
and p.S stands for p.S1 , . . . , p.Sn . Figure 8 shows auxiliary functions for looking up class mem-
A program Pr is a pair hL, ei of a set of class declarations L and bers. For a class C, ownFields(C) and ownMethods(C) are the
an expression e (the main method). Each class C is declared with set of fields and methods declared in C itself, and fields(C) and
a superclass C0 , a set of field declarations F and a set of method methods(C) also collect those declared in all the superclasses of
declarations Mt. To simplify presentation, all the class declarations C. fnames(F) is the set of all the field names in field declarations
are assumed to be global information. F. For simplicity, we assume no two fields have the same name.
Γ ` T ≤T0
Γ ` T1 ≤ T2 Γ ` T2 ≤ T3 ` C ≺C0
Γ ` T ≤ T (S- REFL) (S- TRANS) (S- SUP) Γ ` C! ≤C (S- EXACT)
Γ ` T1 ≤ T3 Γ ` C ≤C0

Γ ` T1 ≤ T2
(S- MASK) Γ ` T \S[] ≈ T (S- EMPTY- COND) Γ ` T \S[p.S p ] ≤ T \S[p.S p , p0 .S0 ] (S- COND - SUB)
Γ ` T1 \M ≤ T2 \M

S = simple(M) subC = simple(M) p0 :C!\M ∈ Γ


(S- SIMPLE) (S- EXACT- MASK) (S- EXACT- COND)
Γ ` T \M ≤ T \S Γ ` C!\M ≈ C! Γ ` T \S[p.S p , p0 .subC ] ≈ T \S[p.S p ]

` C ≺C0 fnames(ownFields(C)) = f subC0 = simple(M) ` C ≺C0 fnames(ownFields(C)) = f


(S- SUBMASK) (S- SUBMASK - COND)
Γ ` T \M ≈ T \expand(M, { f , subC }) Γ ` T \M[p.subC0 , p0 .S] ≈ T \M[p. f , p.subC , p0 .S]

Γ ` p:T
p:T ∈ Γ Γ ` ` : T1 Γ ` T1 ≤ T2 Γ ` p : T \ f [p. f , p0 .S]
(TP- PATH) (TP- SUB) (TP- COND - CYCLE)
Γ ` p:T Γ ` ` : T2 Γ ` p : T \ f [p0 .S]

Γ ` p : T \S[p0 . f , p00 .S0 ] Γ ` p0 : T 0 f 6∈ masked(T 0 ) Γ ` p : T \S[p0 .S0 , p00 .S00 ] Γ ` p0 : T 0 \S0 [p000 .S000 ]
(TP- COND - ELIM) (TP- COND - TRANS)
Γ` p : T \S[p00 .S00 ] Γ ` p : T \S[p00 .S00 , p000 .S000 ]

Γ `R e : T, Γ0
Γ ` x : T x : Tx ∈ Γ Γ ` `:T Γ ` T ≤T0
Γ0 = Γ{{x : noMust(Tx )}} ` : T` ∈ Γ Γ ` e1 : T1 , Γ1 Γ ` e : T, Γ0
Γ0 ` noMust(T ) ≤ T 0 Γ0 = Γ{{` : noMust(T` )}} Γ1 `R e2 : T2 , Γ2 e 6= (T x) ∧ e 6= e1 ; e2
(TR- VAR) (TR- LOC) (TR- SEQ) (TR- OTHER)
Γ `R (T x) : T 0 , Γ0 Γ `R (T `) : T 0 , Γ0 Γ `R e1 ; e2 : T2 , Γ2 Γ `R e : T, Γ0

Γ ` e : T, Γ0
Γ ` e : T1 , Γ0 Γ ` e1 : T1 , Γ1
Γ ` T1 ≤ T2 Γ ` p:T Γ1 ` e2 : T2 , Γ2 f = fnames(fields(C))
(T- SUB) (T- PATH) (T- SEQ) (T- NEW)
Γ ` e : T2 , Γ0 Γ ` (T p) : T, Γ Γ ` e1 ; e2 : T2 , Γ2 Γ ` new C :C!\ f !, Γ

Γ `R e1 : T, Γ1 x 6∈ dom(Γ1 ) Γ ` (T1 p1 ) : T1 , Γ T1 6= T10 \ f !


Γ1 , x : T ` e2 : T2 , Γ2 Γ ` e : T, Γ0 Γ `R (T2 p2 ) : ftype(grant(T1 , f ), f ), Γ0
Γ2 = Γ02 , x : T 0 Γ002 = remove(Γ02 , x) T f = ftype(T, f ) p1 : T ∈ Γ0 Γ00 = Γ0 {{p1 : grant(T, f )}}
(T- LET) (T- GET) (T- SET)
Γ ` let T x = e1 in e2 : T2 , Γ002 Γ ` e. f : T f , Γ0 Γ ` (T1 p1 ). f = (T2 p2 ) : ◦\sub◦ , Γ00

Γ ` (T0 p0 ) : T0 , Γ T0 = U\M p0 :U0 \M 0 ∈ Γ


mbody(T0 , m) = Tn+1 0 m(T 0 x) effect M1 M2 {e}
Γ ` (T1 \ f ! p1 ) : T1 \ f !, Γ Γ ` T0 ≤U\M1 {p0 /this}{p/x}
Γ ` (T2 p2 ) : T2 , Γ T2 = U2 \M ∀i ∈ 1..n + 1. Ti00 = Ti0 {p0 /this}{p/x}
ftype(T1 , f ) = U f \S f Γ ` U2 ≤U f ∀i ∈ 1..n. Γ ` (Ti pi ) : Ti00 , Γ
S = {S|S ∈ simple(M) ∧ (S! ∈ M ∨ S 6∈ S f )} ∀i ∈ 0..n. Ti = T 000 \S! ⇒ (Ti00 = T 0000 \S! ∧ ∀ j 6= i. pi 6= p j )
p1 : T \ f ! ∈ Γ Γ0 = Γ{{p1 : T \ f [p2 .S]}} Γ0 = Γ{{p0 : update(p0 , M 0 ,U0 \M2 {p0 /this}{p/x})}}
(T- SET- COND) (T- CALL)
Γ ` (T1 \ f ! p1 ). f = (T2 p2 ) : ◦\sub◦ , Γ0 Γ ` (T0 p0 ).m((T p)) : Tn+1 0 , Γ0

Figure 9. Static semantics

class C extends C0 {F Mt} Γ ` T1 ≈ T2 is sugar for the pair of judgments Γ ` T1 ≤ T2 and


ownFields(C) = F Γ ` T2 ≤ T1 .
ownMethods(C) = Mt Most subtyping rules are intuitive. S- COND - SUB states that
adding conditions makes a conditional mask more conservative.
ownFields(C0 )
[
fields(C) = S- SIMPLE states that a type with a must-mask or a conditional mask
C0 : C≺∗ C0
is a subtype of the corresponding simply masked type.
ownMethods(C0 )
[
methods(C) = The subtyping rule S- SUBMASK uses an auxiliary function
C0 : C≺∗ C0 expand, which expands a mask S into a set of masks S0 , while pre-
F =U f serving any annotation on S:
fnames(F) = f
expand(S, S0 ) = S0
Figure 8. Class member lookup
expand(S!, S0 ) = S0 !
expand(S[p.S p ], S0 ) = S0 [p.S p ]
As shown in Figure 9, there are often a number of dif-
4.3 Subtyping ferent ways of writing equivalent types. The five type equiva-
Subtyping rules are defined in Figure 9. The judgment Γ ` T1 ≤ T2 lence rules (S- EMPTY- COND, S- EXACT- MASK, S- EXACT- COND,
states that type T1 is a subtype of T2 in context Γ. The judgment S- SUBMASK, and S- SUBMASK - COND) can be read as normaliza-
masked(U) = 0/ class(C) = C
C = class(T ) C = class(T ) C ≺C0
masked(T \S!) = masked(T \S) class(C!) = C f 6∈ masked(T ) Mt = . . . m(. . .) . . .
 
masked(T \S[p.S p ]) = masked(T \S) class(T \M) = class(T ) fields(C) = F Mt ∈ ownMethods(C)∨
Fi = T f f 0
Mt 6∈ ownMethods(C) ∧ mbody(C , m) = Mt
masked(T \ f ) = { f } ∪ masked(T )
ftype(T, f ) = T f mbody(T, m) = Mt
masked(T \subC ) = masked(T )

0 if T = T 0 \ f
noMust(U) = U T

( T 0

if T = T 0 \ f [p.S]
noMust(T )\S if M = S! grant(T, f ) =
noMust(T \M) = T0 if T = T 0 \ f !
noMust(T )\M otherwise 

T otherwise

remove(0,
/ x) = 0/ update(x, M, T ) = T
remove((Γ, p : T ), x) = remove(Γ, x), p : remove(T, x) update(`, M,U) = U

remove(U, x) = U update(`, M, T )\M
 0 if Mi = simple(M 0 )!
0
remove(T \S[x.Sx , . . .], x) = remove(T, x)\S update(`, M, T \M ) = update(`, M, T )\Mi if simple(Mi ) = simple(M 0 )

update(`, M, T ) otherwise

Figure 10. Auxiliary definitions

tion rules, where the types on the left-hand side of ≈ are reduced • A let expression cannot end with a variable bound outside the
to those on the right-hand side. Note that in each of the five rules, scope of the let. For example, one cannot write let T x =
the type on the right-hand side is either syntactically simpler than e1 in (e2 ; y) where y is free in the let expression, but rather the
that on the left-hand side, or converts an occurrence of a class on equivalent expression (let T x = e1 in e2 ); y. This helps sim-
the left-hand side to its subclass. This ensures type normalization plify type-checking of right-hand sides of assignments (Γ `R e:
terminates. Normalized types have the following characteristics: T, Γ0 ), so that a separate TR- LET is not necessary.
• If the variable x is bound to a location already in the scope of the
• A type C\M has at most one subclass mask, which must be
let expression, the declared type of x cannot have any must-
subC . A type C!\M has no subclass mask. mask. This prevents x from being an alias with must-masks.
• The condition p.subC does not show up if the path p has an
The expression well-formedness rules help simplify the proof of
exact type. the substitution lemma (Lemma 4.5), without limiting the expres-
• Conditional masks have non-empty conditions. siveness of the calculus.
` e1 wf ` e2 wf
For convenience of presentation, from now on, types are as- ∀x0 ∈ FV(let T x = e1 in e2 ). e2 6= x0 ∧ e2 6= e0 ; x0
sumed to be in normal form, unless otherwise noted. ((e1 = (T` `) ∨ e1 = e00 ; (T` `)) ∧ ` ∈ locs(e2 )) ⇒ T 6= T 0 \S!
(L ET- WF)
` let T x = e1 in e2 wf
4.4 Expression typing
` e1 wf ` e2 wf ` e wf
In the J\mask language, the evaluation of an expression might up- (S EQ - WF) (G ET- WF)
` e1 ; e2 wf ` e. f wf
date some type bindings. For example, initializing a field removes
the mask on that field, if there is one. Therefore, typing judgments, e 6= let T x = e1 in e2 e 6= e1 ; e2 e 6= e0 . f
shown in Figure 9, are of the form Γ ` e : T, Γ0 , where Γ0 is the ` e wf
(OTHER - WF)
typing environment after evaluating e. We write Γ{{p : T }} for envi-
ronment Γ with the type binding of p updated to T . Figure 11. Well-formed expressions
There are two other kinds of judgments in Figure 9. The judg-
ment Γ ` p : T types a path p without updating the typing environ-
ment. The subsumption rule TP- SUB is limited to locations l, not 4.5 Program typing
any variables x, to ensure that the expression (T x) has the most pre- Figure 12 shows the rules for checking the well-formedness of field
cise type annotation T (see T- PATH and TR- VAR). The judgment and method declarations in a class C.
Γ `R e : T, Γ0 is used in T- LET and T- SET for typing the right-hand For a field declaration, the declared type may not use must-
side of assignment, and in M- OK for typing the return expression masks or conditional masks.
(see Section 4.5). It avoids creating aliases for variables with type For a method declaration, the special variable this is assumed
bindings that have must-masks. However, aliases are allowed if to have the precondition masks M1 at the entry point of the method,
they are created with conditional masks, as shown in T- SET- COND, and it must be typable with the postcondition masks M2 when the
where no TR- rule is used. method exits. Method parameters other than the receiver should
Figure 10 defines auxiliary functions used in the typing rules. remain typable with the same types at the entry. J\mask permits
Most of them are self-explanatory. The function update, used in effects on other parameters, but for simplicity, the calculus does
T- CALL, updates the type binding of the receiver according to the not support this feature. M- OK also specifies some constraints on
effect, and ensures monotonicity if the receiver is a location. the method effect: it cannot introduce must-masks, which is only
J\mask has several expression well-formedness rules, written allowed with the new expression; a mask in the precondition that is
` e wf, shown in Figure 11. The important rule is L ET- WF, which not a must-mask can only be replaced with a corresponding mask
imposes two requirements on let expressions: that is more conservative.
T = U\S e, H −→ e0 , H 0
(F- OK)
C ` T f ok e, H −→ e0 , H 0
(R- CONG)
` e wfΓ = this :C\M1 , x : T Γ `R e : Tr , Γr E[e], H −→ E[e0 ], H 0
Γr ` this :C\M2 Γr ` x : T
let T x = (T` `) in e, H −→ e{`/x}, H (R- LET)
S! ∈ M2 ⇒ S! ∈M1
M ∈ M1 ∧ M 0 ∈ M2 ∧ M 6= S!

⇒ ` C\M ≤C\M 0 H(`) = T { f = `} Ti = ftype(T, fi )
∧simple(M) = simple(M 0 ) (R- GET)
(M- OK) (T` `). fi , H −→ (Ti `i ), H
C ` Tr m(T x) effect M1 M2 {e} ok
Figure 12. Program typing H(`) = T { f = `} T` 6= T 0 \ f !
H 0 = H{{` := grant(T, f ) {. . . , f = `0 }}}
(R- SET)
(T` `). f = (T`0 `0 ), H −→ (◦\sub◦ `0 ), H 0
4.6 Decidability of type checking H(`) = T \ f ! { f = `} ftype(T, f ) = U f \S f
The type system of J\mask is decidable: S = {S|S ∈ simple(M) ∧ (S! ∈ M ∨ S 6∈ S f )}
H 0 = H{{` := T \ f [`0 .S] {. . . , f = `0 }}}
• For T- SUB and TP- SUB, we disallow the use of reflexivity of (R- SET- COND)
(T` \ f ! `). f = (U\M `0 ), H −→ (◦\sub◦ `0 ), H 0
subtyping, and require all the rules about type equivalence (≈)
to be used in the direction of normalization (see Section 4.3). mbody(T0 , m) = Tr m(Tx x) . . . {e}
(R- CALL)
• The three rules TP- COND - CYCLE, TP- COND - ELIM, and (T0 `0 ).m((T `)), H −→ e{`0 /this}{`/x}, H
TP- COND - TRANS actually characterize a graph-theoretic
reachability problem on the dependency graph (such as in Fig- ` 6∈ dom(H) fnames(fields(C)) = f
ure 4), which can be solved with depth-first search. H 0 = H, ` 7→ C!\ f !{}
(R- ALLOC)
new C, H −→ (C!\ f ! `), H 0
All other rules are syntax-directed. Therefore, type checking is
decidable for J\mask. (T `); e, H −→ e, H (R- SEQ)

4.7 Operational semantics Figure 13. Small-step operational semantics


Figure 13 shows the judgments for the small-step operational se-
mantics of J\mask, where e, H −→ e0 , H 0 means that expression e
` :C!\M ∈bHc f = fnames(fields(C)) bHc ` `: T
and heap H step to expression e0 and heap H 0 . f 6∈ masked(T ) ⇒
Most of the rules in Figure 13 are standard, and the notable ones ∀f ∈ f.
H(`, f ) = `0 ∧ bHc ` `0 : ftype(T, f )
are those for field assignments (R- SET and R- SET- COND), which (H- LOC)
H ``
are similar to the corresponding expression typing rules (T- SET and
T- SET- COND). ∀` ∈ dom(H). H ` `
In the operational semantics and in the soundness proof, typing (H EAP - WF)
`H
environments are extracted from the heap, represented as bHc:
Figure 14. Well-formed heaps
b0c
/ = 0/
bH, ` 7→ T { f = `}c = bHc, ` : T
L EMMA 4.3. (Progress) If ` H, and bHc ` e : T then either e =
The notation H{{` := o}} means that the value binding of ` in the (T` `) or there is an expression e0 and a heap H 0 such that e, H −→
heap H is updated to another object o. e0 , H 0 .
Figure 14 shows the heap typing rules. A heap H is well-formed,
written ` H, if every field that is not masked in its container’s Progress is proved by structural induction on e. To prove subject
type is bound to a location, and that location can be given a type reduction, we need some preliminary lemmas.
compatible with the declared type of the field. Lemma 4.4 characterizes extensions of typing environments. A
In H- LOC, H(`, f ) refers to the value binding of the field f of typing environment Γ0 is an extension of Γ if:
the object stored in H(`).
• For every type binding x : T ∈ Γ, there is x : T ∈ Γ0 ;
4.8 Type safety • For every type binding ` : T ∈ Γ, there is ` : T 0 ∈ Γ0 and Γ0 `
The soundness theorem of the J\mask calculus states that if an T0 ≤T.
expression e is well-typed, and it can reduce to a value (T` `),
then (T` `) has the same type as e. A corollary of this theorem L EMMA 4.4. If Γ2 is an extension of Γ1 , and Γ1 ` e : T, Γ01 , then
is that object initialization is sound in the sense used elsewhere Γ2 ` e : T, Γ02 , and Γ02 is an extension of Γ01 .
in the paper: if a program tried to read an uninitialized field, the
evaluation would get stuck according to R- GET. P ROOF: By induction on the derivation of Γ1 ` e : T, Γ01 . 

T HEOREM 4.1. (Soundness) If ` e wf, and ` e : T , and e, 0/ →∗ Lemma 4.5 shows that substituting a location for a variable
(T` `), H, then bHc ` (T` `) : T . preserves typing. It is used in the proof of Lemma 4.2 for method
calls and let expressions. Before stating the substitution lemma,
The proof uses the standard technique of proving subject reduc- we first define substitution for typing environments:
tion and progress [35]. An environment Γ0 is the result of substituting a location `
of type T for a variable x in Γ, written Γ0 = Γ{{`/x; ` : T }}, if
L EMMA 4.2. (Subject reduction) If ` e wf, and ` H, and bHc ` Γ = Γ00 , ` : T` , x : Tx , and Γ0 = Γ00 {`/x}, ` : T , and Γ0 ` ` : T` {`/x},
e : T, Γ, and e, H −→ e0 , H 0 , then ` e0 wf, and ` H 0 , and bH 0 c ` e0 : and Γ0 ` ` : Tx {`/x}.
T, Γ0 , and Γ0 is an extension of Γ.
L EMMA 4.5. If Γ = Γ0 , `:T` , x:Tx , and Γ ` e:T, Γr , and T` 6= T 0 \S!, 5.3 Type checking
and Tx 6= T 0 \S! when ` ∈ locs(e), then Γ{{`/x; ` : T`0 }} ` e{`/x} : Flow sensitivity in the J\mask type system shows up only on masks,
T {`/x}, Γr {{`/x; ` : T`00 }} for some T`00 . and not on any of the classes appearing in masked types. Therefore,
each method is type-checked in two phases. The first phase is just
P ROOF: By induction on the derivation of Γ ` e : T, Γr .  normal Java type checking of the erased method code; the second
phase, built upon the dataflow analysis framework provided in
With these lemmas, we prove subject reduction by an induction Polyglot, is flow-sensitive, and uses the result of the first phase as
on the derivation of bHc ` e : T, Γ. Then soundness (Theorem 4.1) its starting point.
follows directly. The proofs appear in the companion technical Once type checking is complete, masks are erased to generate
report [27]. Java code. This works because resolution of method overloading
does not depend on parameter masks.
5. Implementation
5.4 Inner classes
We have implemented a prototype compiler of J\mask as an exten-
sion in the Polyglot framework [26]. The extension code has about A (nonstatic) inner class is a class that is nested in the body of an-
3,700 lines of code, excluding blank lines and comments. other class and contains an implicit reference to an instance (the
J\mask is implemented as a translation to Java. The translation outer instance) of the enclosing class. Every constructor of an in-
is mostly by erasure, that is, by erasing all the masks, effects, and ner class has an implicit formal parameter for the outer instance.
mask constraints from the code. J\mask assumes that the type of the outer instance has no masks,
The compiler also applies several transformations to the J\mask that is, the outer instance has been fully initialized before an in-
source code, before erasing masks. Default effects are inserted stance of the inner class is created. If an inner class with a partially
for constructors and methods that do not have them already. To initialized outer instance is really needed, a transformation as de-
simplify type checking, initialization code, including initializers, scribed in [15] can be applied to make the outer instance explicit.
constructors, and new expressions, is also transformed. J\mask currently does not directly support local classes and anony-
J\mask requires that in a conditionally masked type T \ f [x.g], mous classes, which are inner classes nested in method bodies, al-
every xi , including this, is a final local variable. However, the though these could be converted to normal inner classes.
compiler uses a simple analysis to automatically insert the final
modifier for local variables that are assigned only once, and for
formal parameters that are never reassigned. 6. Experience
The language was evaluated by porting several classes in the Java
5.1 Inserting default effects Collection Framework (Java SDK version 1.4.2) to J\mask. The
For a constructor of class C, the default effect is *! -> C.sub!, ported classes are ArrayList, HashMap, LinkedList, TreeMap,
which describes the behavior of most constructors. The construc- and Vector, together with all the classes and interfaces that they
tor starts with all the fields uninitialized, and it initializes all depend on. There are in total 29 source files, comprising 18,000
the fields inherited from superclasses of C—by calling the super lines of J\mask code (exclusive of empty lines and comments).
constructor—and the fields declared by C, leaving the fields in sub- Porting these classes to J\mask was not difficult. It was com-
classes of C uninitialized. pleted by one of the authors within a couple of days, including time
The default effect for a virtual method is {} -> {} because to debug the compiler. Only 11 constructors and methods required
virtual methods normally work on fully initialized objects. annotation with effects or mask constraints, thanks to the default
In our experience with using J\mask (see Section 6), these effects provided by the compiler (Section 5.1). Besides effects and
default effects work well. Programmers only have to annotate code mask constraints, only 11 other masked types were needed, a very
that uses interesting initialization patterns. small number compared to the size of the code.
The port of this code eliminated all nulls used as placeholders
for initialization. However, some nulls were not removed:
5.2 Transforming initialization code
Java field declarations can include initialization expressions that • Java allows storing the null value into collections and maps.
are implicitly called from constructors in the same order that they • Some method parameters and local variables can be intention-
appear in the class body. The J\mask compiler collects all these ally set to null, indicating that they are not available.
initializers and inserts them directly in constructors, right after
super constructor calls. This initializer code is type-checked in the Among the classes we ported, the following three exhibited
same way as any other constructor code. nontrivial initialization patterns:
A constructor in J\mask is just an initialization method that is
called after an object is allocated on the heap. The J\mask compiler 6.1 LinkedList
converts every constructor in the source code to a final method with
the same name as the class. The transformed constructor can then The LinkedList class implements a doubly-linked cyclic list.
be type-checked just as any other method. The compiler also inserts When an instance of LinkedList is constructed, a sentinel node,
an empty default constructor in the generated Java code. which is an instance of the nested class Entry, needs to be created
Every new expression new C(...) is split into a call to the with its previous and next fields both pointing to itself.
empty default constructor to allocate the memory on the heap, The Java code first constructs an instance of Entry with its
and then a call to the initialization method generated from the previous and next fields set to null, and then initializes the two
corresponding constructor, as shown in the following piece of code: fields with the header node itself. The following code is extracted
from the constructor of LinkedList, where header is the field
final C!\(* - C.sub)! temp = new C(); pointing to the sentinel node:
temp.C(...);
header = new Entry(null, null, null);
Then the fresh local variable temp replaces the original expression. header.previous = header.next = header;
With masked types, the two fields cannot be read before they Fähndrich and Leino [7] make use of raw types to represent ob-
are initialized. In the constructor of the ported LinkedList class, jects that are in the middle of being constructed, that is, objects with
the field header is initialized as follows: some non-null fields containing nulls. Methods can be declared to
header = createHeader(); expect raw objects, and therefore can be called from within the con-
structors. Delayed types [9], extended from [7], provide a solution
The method createHeader is shown below: to the problem of safely initializing cyclic data structures, by intro-
private static Entry createHeader() { ducing labels on object types, which represent the time by which an
Entry\(* - Entry.sub)! h = new Entry(); object is fully initialized. Delay times are associated with scopes,
h.element = dummyElement; and form a stack at run time. Objects created with a delay time
h.next = h; remain raw until execution exits the corresponding scope. Initial-
h.previous = h; ization of cyclic structures is supported by giving objects the same
return h; delay, and they become initialized together at once.
} Compared to raw types, masked types provide a finer-grained
The static field dummyElement points to an object of representation of objects under construction. Conditional masks
java.lang.Object because the header node does not store and delayed types are both means to track dependencies between
any real data element. Therefore, there is no need to use null. objects under construction. However, delay times are an indirect
way to represent dependencies, whereas conditional masks capture
6.2 HashMap dependencies directly and explicitly. Moreover, the fact that de-
The HashMap class has an empty method init, which, according lay times must form a stack restricts the expressiveness of delayed
to comments in the source code, is an “initialization hook for types in initializing cyclic structures. For example, trees where
subclasses”. When a subclass of HashMap is created, it should nodes have parent pointers cannot be built from the bottom up with
override the init method to initialize any new subclass fields, but delayed types, because one cannot coordinate the delay times of
Java has no way to enforce this. With effects and mask constraints, child nodes. Masked types, on the other hand, easily support this
the J\mask version of HashMap can explicitly express the contract pattern, as shown in Figure 3. Masked types also have richer sub-
in the signature of the method init: typing relationships, which can be used to enforce reinitialization.
void init() effect HashMap.sub -> {} captures * Typestates. In most object-oriented programming languages, an
object has the same type for its entire lifetime. However, objects
6.3 TreeMap often evolve over time, that is, having different states at different
TreeMap implements a map as a red-black tree where elements are times. Typestates [31] abstractly describe object states, and when
sorted according to their keys. Each node in the tree contains fields an object is updated, its typestate may also change.
for the left and right children, and a field pointing to its parent. Typestates have been used to express and verify various proto-
A method buildFromSorted is used to build the tree from the cols [31, 4, 5, 1, 10]. Typestates have been interpreted as abstract
bottom up, similarly to the example shown in Figure 3. Masked states in finite state machines and as predicates over objects.
types support sound initialization of TreeMap nodes without using Masked types are not intended for checking general protocols,
null. but rather just focus on safe object initialization. However, masks
cannot be easily encoded in terms of previous typestate mecha-
6.4 Summary nisms. Algebraic masks, for instance, provide compact representa-
Our experience is that J\mask is expressive, since it was easy tions of partial initialization states without requiring abstract states
to port classes with the various initialization patterns found in potentially exponential in the number of fields. Conditional masks
the Java Collection Framework. The explicit annotations in the represent dependencies generated at use sites, rather than being
ported code are infrequent and seem easy to understand, suggesting fixed at declaration sites of predicates. Mask subtyping enriches
masked types are a natural way for programmers to enforce proper the state space, and previous work on typestates does not appear to
initialization of objects. have anything like it.
J\mask uses subclass masks and mask constraints to ensure
modular type checking. These techniques are related to rest types-
7. Related work tates and sliding methods in Fugue [5]. However, Fugue requires
Non-null types. The importance of distinguishing non-null ref- that sliding methods are overridden in every subclass, whereas
erences from possibly-null references at the type level has long mask constraints in J\mask force methods to be overridden only
been recognized. Many languages, including CLU [21], Theta [22], when their watched abstract masks are overridden.
Moby [11], Eiffel [16], ML [24], and Haskell [17], support some Aliasing has always been a hard problem for any typestate
form of non-null and possibly-null types in their type system. In mechanism: first, it is not easy to maintain correct typestate in-
the context of Java, several proposals [2, 19, 6] have been made to formation in the presence of aliasing; second, although there are
support non-null types. typing mechanisms like linear types that help keep track of aliases,
With non-null types, sound object initialization is usually ac- they are inconvenient for ordinary programmers. Previous work on
complished by severely restricting expressiveness. Most existing typestates has proposed various treatments to the aliasing problem:
languages with non-null types restrict how objects can be initial- Nil [31] completely rules out aliasing; Vault [4] and Fugue disal-
ized; for example, some require all (non-null) fields to be initial- low further state changes once an object becomes aliased unless
ized at once [11, 22]. This means fields and methods of an object the changes are temporary; Bierhoff and Aldrich [1] refine the two
under construction cannot be used. Further, cyclic data structures aliasing annotations “not aliased” and “maybe aliased” in Fugue to
are impossible to initialize without using a placeholder value such a richer set of permissions; Fähndrich and Leino [8] also identify a
as null. kind of typestates that are heap-monotonic and work without alias-
Masked types are different from non-null types: when a field is ing information; Fink et al. [10] conduct whole-program verifica-
masked, it is potentially uninitialized and unreadable, and therefore tion and rely on a global alias analysis. The treatment of the alias-
reading that field is statically disallowed; with non-null types, a ing problem in J\mask is inspired by [8]: simple masks and con-
field is always accessible regardless of how it is declared. ditional masks are heap-monotonic, and must-masks, though not
heap-monotonic, are associated with newly created objects whose sive enough to support many useful initialization idioms, includ-
aliasing information is easy to track. We believe J\mask achieves a ing objects with cyclic references. Methods and constructors in the
good trade-off between expressiveness and simplicity for the alias- J\mask languages explicitly express their initialization contracts
ing problem in the context of object initialization. through effects, which enable modular type checking, rather than
Masked types are reminiscent of type-based access control requiring an expensive whole-program analysis. Because default
mechanisms that statically restrict access to individual fields or annotations are very effective, and J\mask requires little reason-
methods, e.g., [18, 28]. However, masked types are very different; ing about aliasing, J\mask has a low annotation burden. This could
they are designed for reasoning about initialization, and access is make the language more accessible to average programmers. Fi-
“granted” by the act of assignment to the resource, which makes nally, by placing object initialization on a sound footing, we believe
little sense as an access control feature. masked types can also enable other language mechanisms.
Static analysis. J\mask, similar to other typestate mechanisms,
has a flow-sensitive type system, which can be viewed as a dataflow Acknowledgments
analysis. An alternative to masked types is an interprocedural def- We would like to thank Sigmund Cherem, Steve Chong, Michael
use analysis, but this would lose many of the advantages of masked Clarkson, Jed Liu, and Ruijie Wang for helpful feedback on early
types. Java already has an intraprocedural analysis [32] to ensure drafts of this paper, and Doug Lea, Wojciech Moczydlowski, and
that every local variable is definitely assigned before it is used. Nate Nystrom for discussions. Thanks also to Jonathan Aldrich and
However, Java cannot safely prevent reading from uninitialized the POPL reviewers for useful comments and suggestions.
fields. There has been work on interprocedural def-use analysis This work was supported by National Science Foundation
in the context of object-oriented languages [30, 29], with varying grants 0430161, 0627649, and CCF-0424422 (TRUST), and by
cost and precision. This prior work detects initialization bugs on the Air Force Research Laboratory, under contract #FA8750-08-2-
fields, but requires non-modular whole-program def-use analyses 0079. The views and conclusions contained herein are those of the
and is subject to the typically limited accuracy of whole-program authors and should not be interpreted as necessarily representing
alias/points-to analyses. By contrast, type checking in J\mask is the official policies or endorsements, either express or implied, of
modular and therefore scalable. Masked types bring another benefit these organizations or the U.S. Government. The U.S. Government
because they specify the initialization contracts of methods, help- is authorized to reproduce and distribute reprints for Governmental
ing programmers reason about the code. Explicitly capturing this purposes notwithstanding any copyright notation thereon.
aspect of programmer intent seems valuable.
FindBugs [13] contains an analysis [14] that is designed specif-
ically to detect null-pointer bugs. The analysis is neither sound nor References
complete, but focuses on improving accuracy. The basic analysis is [1] Kevin Bierhoff and Jonathan Aldrich. Modular typestate checking of
interprocedural, but extensions are proposed in which non-null an- aliased objects. In Proc. 22nd ACM Conference on Object-Oriented
notations are inserted into method signatures to represent contracts. Programming Systems, Languages and Applications (OOPSLA),
Shape analyses are aimed at extracting heap invariants that de- pages 301–320, October 2007.
[2] Patrice Chalin and Perry James. Non-null references by default in
scribe the “shape” of recursive data structures [34]. Conditional
Java: Alleviating the nullity annotation burden. In Proceedings of the
masks capture some part of the shape information of the data struc- 21st European Conference on Object-Oriented Programming, 2007.
ture under construction. However, conditional masks are not con- [3] Sigmund Cherem and Radu Rugina. Maintaining doubly-linked list
cerned with initialized fields, and also are more about dependencies invariants in shape analysis with local reasoning. In Verification,
than the shape of references, and therefore have transitivity and Model Checking, and Abstract Interpretation, 8th International
cycle cancellation. Shape analyses are normally built upon alias Conference (VMCAI 2007), Nice, France, January 2007.
analyses, and contain explicit representation of heap locations, nei- [4] Robert DeLine and Manuel Fähndrich. Enforcing high-level protocols
ther of which is present in the J\mask language. J\mask only tracks in low-level software. In Proc. SIGPLAN 2001 Conference on
mask changes on local variables, which gives it a flavor of local Programming Language Design and Implementation, pages 59–69,
reasoning somewhat similar to the analysis in [3]. June 2001.
Because they summarize a set of concrete fields, abstract masks [5] Robert DeLine and Manuel Fähndrich. Typestates for objects.
have some similarity to data groups [20], a mechanism used for In Proceedings of 18th European Conference on Object-Oriented
modular program verification. Data groups do not have the equiva- Programming (ECOOP’04), 2004.
[6] Torbjörn Ekman and Görel Hedin. Pluggable checking and
lent of mask algebra. Moreover, masked types are about more than
inferencing of non-null types for java. Journal of Object Technology,
just abstracting fields; must-masks and conditional masks are new 6(9):455–475, October 2007.
mechanisms that enable sound initialization of complicated data [7] Manuel Fähndrich and K. Rustan M. Leino. Declaring and checking
structures. non-null types in an object-oriented language. In Proc. 2003 ACM
Other kinds of languages. The initialization problem is not Conference on Object-Oriented Programming Systems, Languages,
unique to object-oriented languages. In a purely functional pro- and Applications (OOSPLA), pages 302–312, October 2003.
gramming style, values are constructed all at once, avoiding the cre- [8] Manuel Fähndrich and K. Rustan M. Leino. Heap monotonic
typestate. In Proceedings of the first International Workshop on
ation of partially initialized values. However, functional languages Alias Confinement and Ownership (IWACO), July 2003.
typically do not easily support the construction of cyclic data struc- [9] Manuel Fähndrich and Songtao Xia. Establishing object invariants
tures well, though it can be achieved in some cases with value re- with delayed types. In Proc. 22nd ACM Conference on Object-
cursion [33]. The typed assembly language in [25] supports initial- Oriented Programming Systems, Languages and Applications
ization flags that are similar to the simple masks in J\mask. (OOPSLA), October 2007.
[10] Stephen Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel
8. Conclusions and future work Geay. Effective typestate verification in the presence of aliasing. In
ISSTA ’06: Proceedings of the 2006 international symposium on
This paper introduces masked types, implemented in the language Software testing and analysis, pages 133–144, 2006.
J\mask, as a solution to the problem of object initialization. Masked [11] Kathleen Fischer and John Reppy. The design of a class mechanism
types provide a strong safety guarantee for initialization: unini- for Moby. In Proc. SIGPLAN 1999 Conference on Programming
tialized fields are never read. Further, masked types are expres- Language Design and Implementation, pages 37–49, 1999.
[12] James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java (POPL), pages 47–57, 1988.
Language Specification. Addison Wesley, 3rd edition, 2005. ISBN [24] Robin Milner, Mads Tofte, and Robert Harper. The Definition of
0321246780. Standard ML. MIT Press, Cambridge, MA, 1990.
[13] David Hovemeyer and William Pugh. Finding bugs is easy. In [25] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From
OOPSLA ’04: Companion to the 19th annual ACM SIGPLAN System F to typed assembly language. ACM Transactions on
conference on Object-oriented programming systems, languages, Programming Languages and Systems, 21(3):528–569, May 1999.
and applications, pages 132–136, 2004. [26] Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers.
[14] David Hovemeyer, Jaime Spacco, and William Pugh. Evaluating Polyglot: An extensible compiler framework for Java. In Proc. 12th
and tuning a static analysis to find null pointer bugs. In PASTE International Compiler Construction Conference (CC’03), pages
’05: Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on 138–152, April 2003. LNCS 2622.
Program analysis for software tools and engineering, pages 13–19, [27] Xin Qi and Andrew C. Myers. Masked types. Technical report,
2005. Computer and Information Science, Cornell University, October
[15] Atsushi Igarashi and Benjamin C. Pierce. On inner classes. In 2008. http://hdl.handle.net/1813/11563.
Informal Proceedings of the Seventh International Workshop on [28] Joel Richardson, Peter Schwarz, and Luis-Felipe Cabrera. CACL:
Foundations of Object-Oriented Languages (FOOL 7), Boston, MA, Efficient fine-grained protection for objects. In Proc. 1992 ACM
January 2000. Conference on Object-Oriented Programming Systems, Languages,
[16] ECMA International. Eiffel analysis, design and programming and Applications, pages 154–165, Vancouver, BC, Canada, October
language. ECMA Standard 367, June 2005. 1992.
[17] Haskell 98: A non-strict, purely functional language, February 1999. [29] Amie L. Souter and Lori L. Pollock. The construction of contextual
Available at http://www.haskell.org/onlinereport/. def-use associations for object-oriented systems. IEEE Trans. Softw.
[18] Anita K. Jones and Barbara Liskov. A language extension for Eng., 29(11):1005–1018, 2003.
expressing constraints on data access. Comm. of the ACM, 21(5):358– [30] Amie L. Souter, Lori L. Pollock, and Dixie Hisley. Inter-class def-use
367, May 1978. analysis with partial class representations. In PASTE ’99: Proceedings
[19] JSR 308: Annotations on Java Types. Available at of the 1999 ACM SIGPLAN-SIGSOFT workshop on Program analysis
http://groups.csail.mit.edu/pag/jsr308/. for software tools and engineering, pages 47–56, 1999.
[20] K. Rustan M. Leino. Data groups: specifying the modification of [31] Robert E. Strom and Shaula Yemini. Typestate: A programming lan-
extended state. In Proc. 13th ACM Conference on Object-Oriented guage concept for enhancing software reliability. IEEE Transactions
Programming Systems, Languages and Applications (OOPSLA), on Software Engineering (TSE), 12(1):157–171, January 1986.
pages 144–153, 1998. [32] Sun Microsystems. Java Language Specification, version 1.0 beta edi-
[21] B. Liskov and J. Guttag. Data abstraction. In Abstraction and tion, October 1995. Available at ftp://ftp.javasoft.com/docs/
Specification in Program Development, chapter 4, pages 56–98. MIT javaspec.ps.zip.
Press and McGraw Hill, 1986. [33] Don Syme. Initializing mutually referential abstract objects: The
[22] Barbara Liskov, Dorothy Curtis, Mark Day, Sanjay Ghemawat, Robert value recursion challenge. Electronic Notes in Theoretical Computer
Gruber, Paul Johnson, and Andrew C. Myers. Theta Reference Man- Science, 148(2):3–25, 2006.
ual. Programming Methodology Group Memo 88, MIT Laboratory [34] Reinhard Wilhelm, Shmuel Sagiv, and Thomas W. Reps. Shape anal-
for Computer Science, Cambridge, MA, February 1994. Available at ysis. In Proc. 9th International Compiler Construction Conference
http://www.pmg.lcs.mit.edu/papers/thetaref/. (CC’00), pages 1–17, 2000.
[23] J. M. Lucassen and D. K. Gifford. Polymorphic effect systems. In [35] Andrew K. Wright and Matthias Felleisen. A syntactic approach to
Proc. 15th ACM Symp. on Principles of Programming Languages type soundness. Information and Computation, 115(1):38–94, 1994.

You might also like