Masked Types For Sound Object Initialization: Xin Qi Andrew C. Myers
Masked Types For Sound Object Initialization: Xin Qi Andrew C. Myers
Masked Types For Sound Object Initialization: Xin Qi Andrew C. Myers
Abstract tion 2 describes, a masked type keeps track of the parts of an object
This paper presents a type-based solution to the long-standing prob- that have not been initialized. For example, the type T \ f describes
lem of object initialization. Constructors, the conventional mech- an object of type T whose field f may not be initialized yet, and
anism for object initialization, have semantics that are surprising the type T \∗ represents an object none of whose fields are neces-
to programmers and that lead to bugs. They also contribute to the sarily initialized. As an object is constructed, the type of the object
problem of null-pointer exceptions, which make software less reli- changes to reflect the fields that are initialized. Thus, the type sys-
able. Masked types are a new type-state mechanism that explicitly tem for masked types is flow-sensitive; it has typestate [31]. The
tracks the initialization state of objects and prevents reading from type of an object conservatively tracks its initialization state, so a
uninitialized fields. In the resulting language, constructors are or- partially initialized object cannot be used where a fully initialized
dinary methods that operate on uninitialized objects, and no spe- object is expected.
cial default value (null) is needed in the language. Initialization The problem of object initialization is intertwined with the prob-
of cyclic data structures is achieved with the use of conditionally lem of null pointer exceptions, which significantly hurt software
masked types. Masked types are modular and compatible with data reliability [2]. Because object initialization is unsound, most lan-
abstraction. The type system is presented in a simplified object cal- guages aiming for type safety (e.g., Java, C#, Modula-3) first ini-
culus and is proved to soundly prevent reading from uninitialized tialize fields with null. This semantics implies that null must be a
fields. Masked types have been implemented as an extension to legal value for all object types, leading to ubiquitous, implicit null
Java, in which compilation simply erases extra type information. checks that can generate null pointer exceptions. Recently there has
Experience using the extended language suggests that masked types been interest in controlling null pointer exceptions through non-
work well on real code. null annotations and other means [7, 2, 19, 6]. Non-null annotations
by themselves do not solve the problem of object initialization; in
Categories and Subject Descriptors D.2.4 [Software/Program fact, they make it more important because non-null fields must be
Verification]: Class invariants, programming by contract; D.3.2 initialized before use. But with masked types, there is no need for
[Language Classifications]: Object-oriented languages a default initialization value. It is then straightforward to eliminate
null values entirely from the language. There are legitimate uses
General Terms Languages, Reliability of null other than as an initialization placeholder, but for these
Keywords invariants, null pointer exceptions, conditional masks, uses, an “option” or “maybe” type is a better approach, because it
cyclic data structures, data abstraction makes null checks explicit and rare.
A language with masked types can be simpler in another way.
There is no need to give constructors a special status in the lan-
1. Introduction guage, because types track initialization state. Rather than a lan-
Object initialization remains an unsatisfactory aspect of object- guage feature, constructors become a design pattern: they are ordi-
oriented programming. In the usual approach, objects of a given nary methods that change the initialization state of the receiver.
class are created and initialized only by class constructors. There- Cyclic data structures pose a challenge for object initialization.
fore, when implementing class methods, the programmer can as- However, conditionally masked types make it possible to create
sume that object fields satisfy an invariant established by the con- cyclic data structures, such as doubly-linked lists and trees with
structors. However, in the presence of inheritance, the methods of parent pointers, without resorting to placeholder null values. Con-
partly initialized objects may be invoked before the invariant has ditional masks record dependencies between initialization of differ-
been established. As a result, reasoning about object initialization ent fields, so that initializing one field can “tie the knot”, changing
can be challenging and non-modular. No fully satisfactory solution the initialization state of many fields at once.
to object initialization currently exists. Perhaps the most closely related prior work is that of Fähndrich
This paper presents a new solution to the object initialization and Xia [9], who introduce delayed types for static reasoning about
problem, based on a new type mechanism, masked types. As Sec- partially initialized objects. Masked types support cyclic data struc-
tures that delayed types do not. Masked types also support richer
initialization abstractions: for example, helper methods for par-
tial initialization and reinitialization of recycled objects. Abstract
Permission to make digital or hard copies of all or part of this work for personal or masks, described in Section 3, support initialization abstractions
classroom use is granted without fee provided that copies are not made or distributed that are compatible with data abstraction and inheritance.
for profit or commercial advantage and that copies bear this notice and the full citation Masked types have been formalized for a simplified object lan-
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
guage, described in Section 4. The key soundness theorem is for-
POPL’09, January 18–24, 2009, Savannah, Georgia, USA.
malized and has been proved for this language: well-typed pro-
Copyright
c 2009 ACM 978-1-60558-379-2/09/01. . . $5.00. grams never read uninitialized fields.
1 class Point { class whose field c cannot be read, perhaps because it has not been
2 int x, int y; initialized. We say that the field c is masked in this type.
3 Point(int x, int y) {
A type with no mask means that the object is fully initialized.
4 this.x = x;
5 this.y = y; In typical programming practice, this would be the ordinary state
6 display(); of the object, in which its invariant is already established.
7 } On entry to a constructor such as Point(), the newly created
8 void display() { object has all its fields masked. The actual class of the new object
9 System.out.println(x + " " + y); might be a subclass (for example, CPoint), so on exit, subclass
10 } fields remain to be initialized. A subclass mask, written C.sub, is
11 } used to mask all fields introduced in subclasses of C, not including
12 those of C itself. Therefore, just before line 4 in Figure 1, the
13 class CPoint extends Point {
object being constructed has type Point\x\y\Point.sub. (While
14 Color c;
15 CPoint(int x, int y, Color c) { this type looks complicated, it can be inferred automatically.)
16 super(x, y); When a field is initialized by assigning to it, the correspond-
17 this.c = c; ing mask is removed from the type of the object. For example,
18 } line 4 in Figure 1 assigns to field x, so the type of this becomes
19 void display() { Point\y\Point.sub. After the assignment to y on the next line,
20 System.out.println(x + " " + y + " " + c.name()); the type of this becomes Point\Point.sub. Thus, the initializa-
21 } tion of various fields is recorded in the changing type of this.
22 } Because variables may have different types at different program
points, J\mask has a flow-sensitive type system.
Figure 1. Code with an initialization bug Subclass masks such as Point.sub can be removed when the
exact run-time class of an object is known, because there are no
subclass fields left to initialize. The type of a new expression is
Section 5 reports on the implementation of masked types as a known exactly, as is the type of a value of any class known not to
mostly backward-compatible extension to the Java language called have a subclass (in Java, a “final” class).
J\mask. Section 6 discusses experience using J\mask in the con- J\mask has a special mask ∗ as a convenient shorthand for
text of the Java Collections Framework, where masked types are masking all fields, including those masked by the subclass mask.
shown to do a good job of capturing desirable initialization idioms. On entry to the CPoint constructor, the object can be given type
Related work is discussed in Section 7. Section 8 concludes. CPoint\∗, which is equivalent to CPoint\x\y\c\CPoint.sub.
Γ ` T1 ≤ T2
(S- MASK) Γ ` T \S[] ≈ T (S- EMPTY- COND) Γ ` T \S[p.S p ] ≤ T \S[p.S p , p0 .S0 ] (S- COND - SUB)
Γ ` T1 \M ≤ T2 \M
Γ ` p:T
p:T ∈ Γ Γ ` ` : T1 Γ ` T1 ≤ T2 Γ ` p : T \ f [p. f , p0 .S]
(TP- PATH) (TP- SUB) (TP- COND - CYCLE)
Γ ` p:T Γ ` ` : T2 Γ ` p : T \ f [p0 .S]
Γ ` p : T \S[p0 . f , p00 .S0 ] Γ ` p0 : T 0 f 6∈ masked(T 0 ) Γ ` p : T \S[p0 .S0 , p00 .S00 ] Γ ` p0 : T 0 \S0 [p000 .S000 ]
(TP- COND - ELIM) (TP- COND - TRANS)
Γ` p : T \S[p00 .S00 ] Γ ` p : T \S[p00 .S00 , p000 .S000 ]
Γ `R e : T, Γ0
Γ ` x : T x : Tx ∈ Γ Γ ` `:T Γ ` T ≤T0
Γ0 = Γ{{x : noMust(Tx )}} ` : T` ∈ Γ Γ ` e1 : T1 , Γ1 Γ ` e : T, Γ0
Γ0 ` noMust(T ) ≤ T 0 Γ0 = Γ{{` : noMust(T` )}} Γ1 `R e2 : T2 , Γ2 e 6= (T x) ∧ e 6= e1 ; e2
(TR- VAR) (TR- LOC) (TR- SEQ) (TR- OTHER)
Γ `R (T x) : T 0 , Γ0 Γ `R (T `) : T 0 , Γ0 Γ `R e1 ; e2 : T2 , Γ2 Γ `R e : T, Γ0
Γ ` e : T, Γ0
Γ ` e : T1 , Γ0 Γ ` e1 : T1 , Γ1
Γ ` T1 ≤ T2 Γ ` p:T Γ1 ` e2 : T2 , Γ2 f = fnames(fields(C))
(T- SUB) (T- PATH) (T- SEQ) (T- NEW)
Γ ` e : T2 , Γ0 Γ ` (T p) : T, Γ Γ ` e1 ; e2 : T2 , Γ2 Γ ` new C :C!\ f !, Γ
remove(0,
/ x) = 0/ update(x, M, T ) = T
remove((Γ, p : T ), x) = remove(Γ, x), p : remove(T, x) update(`, M,U) = U
remove(U, x) = U update(`, M, T )\M
0 if Mi = simple(M 0 )!
0
remove(T \S[x.Sx , . . .], x) = remove(T, x)\S update(`, M, T \M ) = update(`, M, T )\Mi if simple(Mi ) = simple(M 0 )
update(`, M, T ) otherwise
tion rules, where the types on the left-hand side of ≈ are reduced • A let expression cannot end with a variable bound outside the
to those on the right-hand side. Note that in each of the five rules, scope of the let. For example, one cannot write let T x =
the type on the right-hand side is either syntactically simpler than e1 in (e2 ; y) where y is free in the let expression, but rather the
that on the left-hand side, or converts an occurrence of a class on equivalent expression (let T x = e1 in e2 ); y. This helps sim-
the left-hand side to its subclass. This ensures type normalization plify type-checking of right-hand sides of assignments (Γ `R e:
terminates. Normalized types have the following characteristics: T, Γ0 ), so that a separate TR- LET is not necessary.
• If the variable x is bound to a location already in the scope of the
• A type C\M has at most one subclass mask, which must be
let expression, the declared type of x cannot have any must-
subC . A type C!\M has no subclass mask. mask. This prevents x from being an alias with must-masks.
• The condition p.subC does not show up if the path p has an
The expression well-formedness rules help simplify the proof of
exact type. the substitution lemma (Lemma 4.5), without limiting the expres-
• Conditional masks have non-empty conditions. siveness of the calculus.
` e1 wf ` e2 wf
For convenience of presentation, from now on, types are as- ∀x0 ∈ FV(let T x = e1 in e2 ). e2 6= x0 ∧ e2 6= e0 ; x0
sumed to be in normal form, unless otherwise noted. ((e1 = (T` `) ∨ e1 = e00 ; (T` `)) ∧ ` ∈ locs(e2 )) ⇒ T 6= T 0 \S!
(L ET- WF)
` let T x = e1 in e2 wf
4.4 Expression typing
` e1 wf ` e2 wf ` e wf
In the J\mask language, the evaluation of an expression might up- (S EQ - WF) (G ET- WF)
` e1 ; e2 wf ` e. f wf
date some type bindings. For example, initializing a field removes
the mask on that field, if there is one. Therefore, typing judgments, e 6= let T x = e1 in e2 e 6= e1 ; e2 e 6= e0 . f
shown in Figure 9, are of the form Γ ` e : T, Γ0 , where Γ0 is the ` e wf
(OTHER - WF)
typing environment after evaluating e. We write Γ{{p : T }} for envi-
ronment Γ with the type binding of p updated to T . Figure 11. Well-formed expressions
There are two other kinds of judgments in Figure 9. The judg-
ment Γ ` p : T types a path p without updating the typing environ-
ment. The subsumption rule TP- SUB is limited to locations l, not 4.5 Program typing
any variables x, to ensure that the expression (T x) has the most pre- Figure 12 shows the rules for checking the well-formedness of field
cise type annotation T (see T- PATH and TR- VAR). The judgment and method declarations in a class C.
Γ `R e : T, Γ0 is used in T- LET and T- SET for typing the right-hand For a field declaration, the declared type may not use must-
side of assignment, and in M- OK for typing the return expression masks or conditional masks.
(see Section 4.5). It avoids creating aliases for variables with type For a method declaration, the special variable this is assumed
bindings that have must-masks. However, aliases are allowed if to have the precondition masks M1 at the entry point of the method,
they are created with conditional masks, as shown in T- SET- COND, and it must be typable with the postcondition masks M2 when the
where no TR- rule is used. method exits. Method parameters other than the receiver should
Figure 10 defines auxiliary functions used in the typing rules. remain typable with the same types at the entry. J\mask permits
Most of them are self-explanatory. The function update, used in effects on other parameters, but for simplicity, the calculus does
T- CALL, updates the type binding of the receiver according to the not support this feature. M- OK also specifies some constraints on
effect, and ensures monotonicity if the receiver is a location. the method effect: it cannot introduce must-masks, which is only
J\mask has several expression well-formedness rules, written allowed with the new expression; a mask in the precondition that is
` e wf, shown in Figure 11. The important rule is L ET- WF, which not a must-mask can only be replaced with a corresponding mask
imposes two requirements on let expressions: that is more conservative.
T = U\S e, H −→ e0 , H 0
(F- OK)
C ` T f ok e, H −→ e0 , H 0
(R- CONG)
` e wfΓ = this :C\M1 , x : T Γ `R e : Tr , Γr E[e], H −→ E[e0 ], H 0
Γr ` this :C\M2 Γr ` x : T
let T x = (T` `) in e, H −→ e{`/x}, H (R- LET)
S! ∈ M2 ⇒ S! ∈M1
M ∈ M1 ∧ M 0 ∈ M2 ∧ M 6= S!
⇒ ` C\M ≤C\M 0 H(`) = T { f = `} Ti = ftype(T, fi )
∧simple(M) = simple(M 0 ) (R- GET)
(M- OK) (T` `). fi , H −→ (Ti `i ), H
C ` Tr m(T x) effect M1 M2 {e} ok
Figure 12. Program typing H(`) = T { f = `} T` 6= T 0 \ f !
H 0 = H{{` := grant(T, f ) {. . . , f = `0 }}}
(R- SET)
(T` `). f = (T`0 `0 ), H −→ (◦\sub◦ `0 ), H 0
4.6 Decidability of type checking H(`) = T \ f ! { f = `} ftype(T, f ) = U f \S f
The type system of J\mask is decidable: S = {S|S ∈ simple(M) ∧ (S! ∈ M ∨ S 6∈ S f )}
H 0 = H{{` := T \ f [`0 .S] {. . . , f = `0 }}}
• For T- SUB and TP- SUB, we disallow the use of reflexivity of (R- SET- COND)
(T` \ f ! `). f = (U\M `0 ), H −→ (◦\sub◦ `0 ), H 0
subtyping, and require all the rules about type equivalence (≈)
to be used in the direction of normalization (see Section 4.3). mbody(T0 , m) = Tr m(Tx x) . . . {e}
(R- CALL)
• The three rules TP- COND - CYCLE, TP- COND - ELIM, and (T0 `0 ).m((T `)), H −→ e{`0 /this}{`/x}, H
TP- COND - TRANS actually characterize a graph-theoretic
reachability problem on the dependency graph (such as in Fig- ` 6∈ dom(H) fnames(fields(C)) = f
ure 4), which can be solved with depth-first search. H 0 = H, ` 7→ C!\ f !{}
(R- ALLOC)
new C, H −→ (C!\ f ! `), H 0
All other rules are syntax-directed. Therefore, type checking is
decidable for J\mask. (T `); e, H −→ e, H (R- SEQ)
T HEOREM 4.1. (Soundness) If ` e wf, and ` e : T , and e, 0/ →∗ Lemma 4.5 shows that substituting a location for a variable
(T` `), H, then bHc ` (T` `) : T . preserves typing. It is used in the proof of Lemma 4.2 for method
calls and let expressions. Before stating the substitution lemma,
The proof uses the standard technique of proving subject reduc- we first define substitution for typing environments:
tion and progress [35]. An environment Γ0 is the result of substituting a location `
of type T for a variable x in Γ, written Γ0 = Γ{{`/x; ` : T }}, if
L EMMA 4.2. (Subject reduction) If ` e wf, and ` H, and bHc ` Γ = Γ00 , ` : T` , x : Tx , and Γ0 = Γ00 {`/x}, ` : T , and Γ0 ` ` : T` {`/x},
e : T, Γ, and e, H −→ e0 , H 0 , then ` e0 wf, and ` H 0 , and bH 0 c ` e0 : and Γ0 ` ` : Tx {`/x}.
T, Γ0 , and Γ0 is an extension of Γ.
L EMMA 4.5. If Γ = Γ0 , `:T` , x:Tx , and Γ ` e:T, Γr , and T` 6= T 0 \S!, 5.3 Type checking
and Tx 6= T 0 \S! when ` ∈ locs(e), then Γ{{`/x; ` : T`0 }} ` e{`/x} : Flow sensitivity in the J\mask type system shows up only on masks,
T {`/x}, Γr {{`/x; ` : T`00 }} for some T`00 . and not on any of the classes appearing in masked types. Therefore,
each method is type-checked in two phases. The first phase is just
P ROOF: By induction on the derivation of Γ ` e : T, Γr . normal Java type checking of the erased method code; the second
phase, built upon the dataflow analysis framework provided in
With these lemmas, we prove subject reduction by an induction Polyglot, is flow-sensitive, and uses the result of the first phase as
on the derivation of bHc ` e : T, Γ. Then soundness (Theorem 4.1) its starting point.
follows directly. The proofs appear in the companion technical Once type checking is complete, masks are erased to generate
report [27]. Java code. This works because resolution of method overloading
does not depend on parameter masks.
5. Implementation
5.4 Inner classes
We have implemented a prototype compiler of J\mask as an exten-
sion in the Polyglot framework [26]. The extension code has about A (nonstatic) inner class is a class that is nested in the body of an-
3,700 lines of code, excluding blank lines and comments. other class and contains an implicit reference to an instance (the
J\mask is implemented as a translation to Java. The translation outer instance) of the enclosing class. Every constructor of an in-
is mostly by erasure, that is, by erasing all the masks, effects, and ner class has an implicit formal parameter for the outer instance.
mask constraints from the code. J\mask assumes that the type of the outer instance has no masks,
The compiler also applies several transformations to the J\mask that is, the outer instance has been fully initialized before an in-
source code, before erasing masks. Default effects are inserted stance of the inner class is created. If an inner class with a partially
for constructors and methods that do not have them already. To initialized outer instance is really needed, a transformation as de-
simplify type checking, initialization code, including initializers, scribed in [15] can be applied to make the outer instance explicit.
constructors, and new expressions, is also transformed. J\mask currently does not directly support local classes and anony-
J\mask requires that in a conditionally masked type T \ f [x.g], mous classes, which are inner classes nested in method bodies, al-
every xi , including this, is a final local variable. However, the though these could be converted to normal inner classes.
compiler uses a simple analysis to automatically insert the final
modifier for local variables that are assigned only once, and for
formal parameters that are never reassigned. 6. Experience
The language was evaluated by porting several classes in the Java
5.1 Inserting default effects Collection Framework (Java SDK version 1.4.2) to J\mask. The
For a constructor of class C, the default effect is *! -> C.sub!, ported classes are ArrayList, HashMap, LinkedList, TreeMap,
which describes the behavior of most constructors. The construc- and Vector, together with all the classes and interfaces that they
tor starts with all the fields uninitialized, and it initializes all depend on. There are in total 29 source files, comprising 18,000
the fields inherited from superclasses of C—by calling the super lines of J\mask code (exclusive of empty lines and comments).
constructor—and the fields declared by C, leaving the fields in sub- Porting these classes to J\mask was not difficult. It was com-
classes of C uninitialized. pleted by one of the authors within a couple of days, including time
The default effect for a virtual method is {} -> {} because to debug the compiler. Only 11 constructors and methods required
virtual methods normally work on fully initialized objects. annotation with effects or mask constraints, thanks to the default
In our experience with using J\mask (see Section 6), these effects provided by the compiler (Section 5.1). Besides effects and
default effects work well. Programmers only have to annotate code mask constraints, only 11 other masked types were needed, a very
that uses interesting initialization patterns. small number compared to the size of the code.
The port of this code eliminated all nulls used as placeholders
for initialization. However, some nulls were not removed:
5.2 Transforming initialization code
Java field declarations can include initialization expressions that • Java allows storing the null value into collections and maps.
are implicitly called from constructors in the same order that they • Some method parameters and local variables can be intention-
appear in the class body. The J\mask compiler collects all these ally set to null, indicating that they are not available.
initializers and inserts them directly in constructors, right after
super constructor calls. This initializer code is type-checked in the Among the classes we ported, the following three exhibited
same way as any other constructor code. nontrivial initialization patterns:
A constructor in J\mask is just an initialization method that is
called after an object is allocated on the heap. The J\mask compiler 6.1 LinkedList
converts every constructor in the source code to a final method with
the same name as the class. The transformed constructor can then The LinkedList class implements a doubly-linked cyclic list.
be type-checked just as any other method. The compiler also inserts When an instance of LinkedList is constructed, a sentinel node,
an empty default constructor in the generated Java code. which is an instance of the nested class Entry, needs to be created
Every new expression new C(...) is split into a call to the with its previous and next fields both pointing to itself.
empty default constructor to allocate the memory on the heap, The Java code first constructs an instance of Entry with its
and then a call to the initialization method generated from the previous and next fields set to null, and then initializes the two
corresponding constructor, as shown in the following piece of code: fields with the header node itself. The following code is extracted
from the constructor of LinkedList, where header is the field
final C!\(* - C.sub)! temp = new C(); pointing to the sentinel node:
temp.C(...);
header = new Entry(null, null, null);
Then the fresh local variable temp replaces the original expression. header.previous = header.next = header;
With masked types, the two fields cannot be read before they Fähndrich and Leino [7] make use of raw types to represent ob-
are initialized. In the constructor of the ported LinkedList class, jects that are in the middle of being constructed, that is, objects with
the field header is initialized as follows: some non-null fields containing nulls. Methods can be declared to
header = createHeader(); expect raw objects, and therefore can be called from within the con-
structors. Delayed types [9], extended from [7], provide a solution
The method createHeader is shown below: to the problem of safely initializing cyclic data structures, by intro-
private static Entry createHeader() { ducing labels on object types, which represent the time by which an
Entry\(* - Entry.sub)! h = new Entry(); object is fully initialized. Delay times are associated with scopes,
h.element = dummyElement; and form a stack at run time. Objects created with a delay time
h.next = h; remain raw until execution exits the corresponding scope. Initial-
h.previous = h; ization of cyclic structures is supported by giving objects the same
return h; delay, and they become initialized together at once.
} Compared to raw types, masked types provide a finer-grained
The static field dummyElement points to an object of representation of objects under construction. Conditional masks
java.lang.Object because the header node does not store and delayed types are both means to track dependencies between
any real data element. Therefore, there is no need to use null. objects under construction. However, delay times are an indirect
way to represent dependencies, whereas conditional masks capture
6.2 HashMap dependencies directly and explicitly. Moreover, the fact that de-
The HashMap class has an empty method init, which, according lay times must form a stack restricts the expressiveness of delayed
to comments in the source code, is an “initialization hook for types in initializing cyclic structures. For example, trees where
subclasses”. When a subclass of HashMap is created, it should nodes have parent pointers cannot be built from the bottom up with
override the init method to initialize any new subclass fields, but delayed types, because one cannot coordinate the delay times of
Java has no way to enforce this. With effects and mask constraints, child nodes. Masked types, on the other hand, easily support this
the J\mask version of HashMap can explicitly express the contract pattern, as shown in Figure 3. Masked types also have richer sub-
in the signature of the method init: typing relationships, which can be used to enforce reinitialization.
void init() effect HashMap.sub -> {} captures * Typestates. In most object-oriented programming languages, an
object has the same type for its entire lifetime. However, objects
6.3 TreeMap often evolve over time, that is, having different states at different
TreeMap implements a map as a red-black tree where elements are times. Typestates [31] abstractly describe object states, and when
sorted according to their keys. Each node in the tree contains fields an object is updated, its typestate may also change.
for the left and right children, and a field pointing to its parent. Typestates have been used to express and verify various proto-
A method buildFromSorted is used to build the tree from the cols [31, 4, 5, 1, 10]. Typestates have been interpreted as abstract
bottom up, similarly to the example shown in Figure 3. Masked states in finite state machines and as predicates over objects.
types support sound initialization of TreeMap nodes without using Masked types are not intended for checking general protocols,
null. but rather just focus on safe object initialization. However, masks
cannot be easily encoded in terms of previous typestate mecha-
6.4 Summary nisms. Algebraic masks, for instance, provide compact representa-
Our experience is that J\mask is expressive, since it was easy tions of partial initialization states without requiring abstract states
to port classes with the various initialization patterns found in potentially exponential in the number of fields. Conditional masks
the Java Collection Framework. The explicit annotations in the represent dependencies generated at use sites, rather than being
ported code are infrequent and seem easy to understand, suggesting fixed at declaration sites of predicates. Mask subtyping enriches
masked types are a natural way for programmers to enforce proper the state space, and previous work on typestates does not appear to
initialization of objects. have anything like it.
J\mask uses subclass masks and mask constraints to ensure
modular type checking. These techniques are related to rest types-
7. Related work tates and sliding methods in Fugue [5]. However, Fugue requires
Non-null types. The importance of distinguishing non-null ref- that sliding methods are overridden in every subclass, whereas
erences from possibly-null references at the type level has long mask constraints in J\mask force methods to be overridden only
been recognized. Many languages, including CLU [21], Theta [22], when their watched abstract masks are overridden.
Moby [11], Eiffel [16], ML [24], and Haskell [17], support some Aliasing has always been a hard problem for any typestate
form of non-null and possibly-null types in their type system. In mechanism: first, it is not easy to maintain correct typestate in-
the context of Java, several proposals [2, 19, 6] have been made to formation in the presence of aliasing; second, although there are
support non-null types. typing mechanisms like linear types that help keep track of aliases,
With non-null types, sound object initialization is usually ac- they are inconvenient for ordinary programmers. Previous work on
complished by severely restricting expressiveness. Most existing typestates has proposed various treatments to the aliasing problem:
languages with non-null types restrict how objects can be initial- Nil [31] completely rules out aliasing; Vault [4] and Fugue disal-
ized; for example, some require all (non-null) fields to be initial- low further state changes once an object becomes aliased unless
ized at once [11, 22]. This means fields and methods of an object the changes are temporary; Bierhoff and Aldrich [1] refine the two
under construction cannot be used. Further, cyclic data structures aliasing annotations “not aliased” and “maybe aliased” in Fugue to
are impossible to initialize without using a placeholder value such a richer set of permissions; Fähndrich and Leino [8] also identify a
as null. kind of typestates that are heap-monotonic and work without alias-
Masked types are different from non-null types: when a field is ing information; Fink et al. [10] conduct whole-program verifica-
masked, it is potentially uninitialized and unreadable, and therefore tion and rely on a global alias analysis. The treatment of the alias-
reading that field is statically disallowed; with non-null types, a ing problem in J\mask is inspired by [8]: simple masks and con-
field is always accessible regardless of how it is declared. ditional masks are heap-monotonic, and must-masks, though not
heap-monotonic, are associated with newly created objects whose sive enough to support many useful initialization idioms, includ-
aliasing information is easy to track. We believe J\mask achieves a ing objects with cyclic references. Methods and constructors in the
good trade-off between expressiveness and simplicity for the alias- J\mask languages explicitly express their initialization contracts
ing problem in the context of object initialization. through effects, which enable modular type checking, rather than
Masked types are reminiscent of type-based access control requiring an expensive whole-program analysis. Because default
mechanisms that statically restrict access to individual fields or annotations are very effective, and J\mask requires little reason-
methods, e.g., [18, 28]. However, masked types are very different; ing about aliasing, J\mask has a low annotation burden. This could
they are designed for reasoning about initialization, and access is make the language more accessible to average programmers. Fi-
“granted” by the act of assignment to the resource, which makes nally, by placing object initialization on a sound footing, we believe
little sense as an access control feature. masked types can also enable other language mechanisms.
Static analysis. J\mask, similar to other typestate mechanisms,
has a flow-sensitive type system, which can be viewed as a dataflow Acknowledgments
analysis. An alternative to masked types is an interprocedural def- We would like to thank Sigmund Cherem, Steve Chong, Michael
use analysis, but this would lose many of the advantages of masked Clarkson, Jed Liu, and Ruijie Wang for helpful feedback on early
types. Java already has an intraprocedural analysis [32] to ensure drafts of this paper, and Doug Lea, Wojciech Moczydlowski, and
that every local variable is definitely assigned before it is used. Nate Nystrom for discussions. Thanks also to Jonathan Aldrich and
However, Java cannot safely prevent reading from uninitialized the POPL reviewers for useful comments and suggestions.
fields. There has been work on interprocedural def-use analysis This work was supported by National Science Foundation
in the context of object-oriented languages [30, 29], with varying grants 0430161, 0627649, and CCF-0424422 (TRUST), and by
cost and precision. This prior work detects initialization bugs on the Air Force Research Laboratory, under contract #FA8750-08-2-
fields, but requires non-modular whole-program def-use analyses 0079. The views and conclusions contained herein are those of the
and is subject to the typically limited accuracy of whole-program authors and should not be interpreted as necessarily representing
alias/points-to analyses. By contrast, type checking in J\mask is the official policies or endorsements, either express or implied, of
modular and therefore scalable. Masked types bring another benefit these organizations or the U.S. Government. The U.S. Government
because they specify the initialization contracts of methods, help- is authorized to reproduce and distribute reprints for Governmental
ing programmers reason about the code. Explicitly capturing this purposes notwithstanding any copyright notation thereon.
aspect of programmer intent seems valuable.
FindBugs [13] contains an analysis [14] that is designed specif-
ically to detect null-pointer bugs. The analysis is neither sound nor References
complete, but focuses on improving accuracy. The basic analysis is [1] Kevin Bierhoff and Jonathan Aldrich. Modular typestate checking of
interprocedural, but extensions are proposed in which non-null an- aliased objects. In Proc. 22nd ACM Conference on Object-Oriented
notations are inserted into method signatures to represent contracts. Programming Systems, Languages and Applications (OOPSLA),
Shape analyses are aimed at extracting heap invariants that de- pages 301–320, October 2007.
[2] Patrice Chalin and Perry James. Non-null references by default in
scribe the “shape” of recursive data structures [34]. Conditional
Java: Alleviating the nullity annotation burden. In Proceedings of the
masks capture some part of the shape information of the data struc- 21st European Conference on Object-Oriented Programming, 2007.
ture under construction. However, conditional masks are not con- [3] Sigmund Cherem and Radu Rugina. Maintaining doubly-linked list
cerned with initialized fields, and also are more about dependencies invariants in shape analysis with local reasoning. In Verification,
than the shape of references, and therefore have transitivity and Model Checking, and Abstract Interpretation, 8th International
cycle cancellation. Shape analyses are normally built upon alias Conference (VMCAI 2007), Nice, France, January 2007.
analyses, and contain explicit representation of heap locations, nei- [4] Robert DeLine and Manuel Fähndrich. Enforcing high-level protocols
ther of which is present in the J\mask language. J\mask only tracks in low-level software. In Proc. SIGPLAN 2001 Conference on
mask changes on local variables, which gives it a flavor of local Programming Language Design and Implementation, pages 59–69,
reasoning somewhat similar to the analysis in [3]. June 2001.
Because they summarize a set of concrete fields, abstract masks [5] Robert DeLine and Manuel Fähndrich. Typestates for objects.
have some similarity to data groups [20], a mechanism used for In Proceedings of 18th European Conference on Object-Oriented
modular program verification. Data groups do not have the equiva- Programming (ECOOP’04), 2004.
[6] Torbjörn Ekman and Görel Hedin. Pluggable checking and
lent of mask algebra. Moreover, masked types are about more than
inferencing of non-null types for java. Journal of Object Technology,
just abstracting fields; must-masks and conditional masks are new 6(9):455–475, October 2007.
mechanisms that enable sound initialization of complicated data [7] Manuel Fähndrich and K. Rustan M. Leino. Declaring and checking
structures. non-null types in an object-oriented language. In Proc. 2003 ACM
Other kinds of languages. The initialization problem is not Conference on Object-Oriented Programming Systems, Languages,
unique to object-oriented languages. In a purely functional pro- and Applications (OOSPLA), pages 302–312, October 2003.
gramming style, values are constructed all at once, avoiding the cre- [8] Manuel Fähndrich and K. Rustan M. Leino. Heap monotonic
typestate. In Proceedings of the first International Workshop on
ation of partially initialized values. However, functional languages Alias Confinement and Ownership (IWACO), July 2003.
typically do not easily support the construction of cyclic data struc- [9] Manuel Fähndrich and Songtao Xia. Establishing object invariants
tures well, though it can be achieved in some cases with value re- with delayed types. In Proc. 22nd ACM Conference on Object-
cursion [33]. The typed assembly language in [25] supports initial- Oriented Programming Systems, Languages and Applications
ization flags that are similar to the simple masks in J\mask. (OOPSLA), October 2007.
[10] Stephen Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel
8. Conclusions and future work Geay. Effective typestate verification in the presence of aliasing. In
ISSTA ’06: Proceedings of the 2006 international symposium on
This paper introduces masked types, implemented in the language Software testing and analysis, pages 133–144, 2006.
J\mask, as a solution to the problem of object initialization. Masked [11] Kathleen Fischer and John Reppy. The design of a class mechanism
types provide a strong safety guarantee for initialization: unini- for Moby. In Proc. SIGPLAN 1999 Conference on Programming
tialized fields are never read. Further, masked types are expres- Language Design and Implementation, pages 37–49, 1999.
[12] James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java (POPL), pages 47–57, 1988.
Language Specification. Addison Wesley, 3rd edition, 2005. ISBN [24] Robin Milner, Mads Tofte, and Robert Harper. The Definition of
0321246780. Standard ML. MIT Press, Cambridge, MA, 1990.
[13] David Hovemeyer and William Pugh. Finding bugs is easy. In [25] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From
OOPSLA ’04: Companion to the 19th annual ACM SIGPLAN System F to typed assembly language. ACM Transactions on
conference on Object-oriented programming systems, languages, Programming Languages and Systems, 21(3):528–569, May 1999.
and applications, pages 132–136, 2004. [26] Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers.
[14] David Hovemeyer, Jaime Spacco, and William Pugh. Evaluating Polyglot: An extensible compiler framework for Java. In Proc. 12th
and tuning a static analysis to find null pointer bugs. In PASTE International Compiler Construction Conference (CC’03), pages
’05: Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on 138–152, April 2003. LNCS 2622.
Program analysis for software tools and engineering, pages 13–19, [27] Xin Qi and Andrew C. Myers. Masked types. Technical report,
2005. Computer and Information Science, Cornell University, October
[15] Atsushi Igarashi and Benjamin C. Pierce. On inner classes. In 2008. http://hdl.handle.net/1813/11563.
Informal Proceedings of the Seventh International Workshop on [28] Joel Richardson, Peter Schwarz, and Luis-Felipe Cabrera. CACL:
Foundations of Object-Oriented Languages (FOOL 7), Boston, MA, Efficient fine-grained protection for objects. In Proc. 1992 ACM
January 2000. Conference on Object-Oriented Programming Systems, Languages,
[16] ECMA International. Eiffel analysis, design and programming and Applications, pages 154–165, Vancouver, BC, Canada, October
language. ECMA Standard 367, June 2005. 1992.
[17] Haskell 98: A non-strict, purely functional language, February 1999. [29] Amie L. Souter and Lori L. Pollock. The construction of contextual
Available at http://www.haskell.org/onlinereport/. def-use associations for object-oriented systems. IEEE Trans. Softw.
[18] Anita K. Jones and Barbara Liskov. A language extension for Eng., 29(11):1005–1018, 2003.
expressing constraints on data access. Comm. of the ACM, 21(5):358– [30] Amie L. Souter, Lori L. Pollock, and Dixie Hisley. Inter-class def-use
367, May 1978. analysis with partial class representations. In PASTE ’99: Proceedings
[19] JSR 308: Annotations on Java Types. Available at of the 1999 ACM SIGPLAN-SIGSOFT workshop on Program analysis
http://groups.csail.mit.edu/pag/jsr308/. for software tools and engineering, pages 47–56, 1999.
[20] K. Rustan M. Leino. Data groups: specifying the modification of [31] Robert E. Strom and Shaula Yemini. Typestate: A programming lan-
extended state. In Proc. 13th ACM Conference on Object-Oriented guage concept for enhancing software reliability. IEEE Transactions
Programming Systems, Languages and Applications (OOPSLA), on Software Engineering (TSE), 12(1):157–171, January 1986.
pages 144–153, 1998. [32] Sun Microsystems. Java Language Specification, version 1.0 beta edi-
[21] B. Liskov and J. Guttag. Data abstraction. In Abstraction and tion, October 1995. Available at ftp://ftp.javasoft.com/docs/
Specification in Program Development, chapter 4, pages 56–98. MIT javaspec.ps.zip.
Press and McGraw Hill, 1986. [33] Don Syme. Initializing mutually referential abstract objects: The
[22] Barbara Liskov, Dorothy Curtis, Mark Day, Sanjay Ghemawat, Robert value recursion challenge. Electronic Notes in Theoretical Computer
Gruber, Paul Johnson, and Andrew C. Myers. Theta Reference Man- Science, 148(2):3–25, 2006.
ual. Programming Methodology Group Memo 88, MIT Laboratory [34] Reinhard Wilhelm, Shmuel Sagiv, and Thomas W. Reps. Shape anal-
for Computer Science, Cambridge, MA, February 1994. Available at ysis. In Proc. 9th International Compiler Construction Conference
http://www.pmg.lcs.mit.edu/papers/thetaref/. (CC’00), pages 1–17, 2000.
[23] J. M. Lucassen and D. K. Gifford. Polymorphic effect systems. In [35] Andrew K. Wright and Matthias Felleisen. A syntactic approach to
Proc. 15th ACM Symp. on Principles of Programming Languages type soundness. Information and Computation, 115(1):38–94, 1994.