Master Thesis Extendings The Extend Java Compiler
Master Thesis Extendings The Extend Java Compiler
Master Thesis Extendings The Extend Java Compiler
ISSN 1650-2884
LU-CS-EX: 2023-12
LU-CS-EX: 2023-12
June 5, 2023
We would like to thank our supervisor, Idriss Riouak, for his continuous help throughout the
entire thesis work with weekly meetings and for his additional help when we needed it.
3
4
Contents
1 Introduction 9
2 Background 11
2.1 Compiler Architecture and Functionality . . . . . . . . . . . . . . . . . . . 11
2.1.1 Scanning and Parsing . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Abstract Syntax Trees . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 ExtendJ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Attribute Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Reference Attribute Grammars . . . . . . . . . . . . . . . . . . . . 15
2.3.2 The JastAdd Metacompiler . . . . . . . . . . . . . . . . . . . . . . 15
2.4 The Java Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Java 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Java 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 Java 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Evaluating Java Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.2 Precision of a Program . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Implementation 23
3.1 Java 9 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Try-with-resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2 SafeVarargs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 Private Interface Methods . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 Remove Underscore as an Identifier . . . . . . . . . . . . . . . . . 26
3.1.5 Diamond Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Java 10 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Var Type Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Java 11 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5
CONTENTS
4 Evaluation 29
4.1 Extendibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Lines of code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Compiling Real-World Projects . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Modifying Build Scripts . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Manual Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Replacing the javac Executable . . . . . . . . . . . . . . . . . . . . 32
4.3 Precision of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Regression Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Precision on Real-World Projects . . . . . . . . . . . . . . . . . . . 33
4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.1 Compilation Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.2 Memory Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Discussion 39
5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Difficulties Evaluating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusions 43
References 45
6
Contribution Statement
The table below indicates the responsibilities each author had in writing this thesis:
The dark portion of the circle represents the amount of work and responsibilities assigned
to each author for each individual step:
7
CONTENTS
8
Chapter 1
Introduction
Compilers are an essential tool in the software development process. A compiler translates
code from a source programming language into another target language. They make it possi-
ble to write code in a high-level language that is then compiled into a lower-level language or
executable. The high-level language can be designed to be easy and fast to write, understand,
and debug making development efficient. The target language can instead be designed for ex-
ecution speed, memory usage and processor architecture-specific performance. Designing a
language to be compiled can therefore lead to high-level abstractions and good performance.
The Java programming language is a high-level language compiled into a lower-level lan-
guage called Java bytecode. Java code can be compiled using the official reference Java com-
piler included in OpenJDK, or using other open-source or commercial alternatives, e.g., the
Eclipse compiler. ExtendJ [1], formerly JastAddJ, is yet another Java compiler and was
built to support the research of compilers. ExtendJ is based on Reference Attribute Grammars
(RAGs) that are extensible by their nature. RAGs specify the behavior of a programming lan-
guage by declaring attributes for nodes in a tree, e.g., the Abstract Syntax Tree (AST) created
by a parser. The implementation of RAGs that ExtendJ uses is the JastAdd [2] metacom-
piler. The JastAdd metacompiler is a system for compiler construction and code analysis
tools using RAGs. A main feature of both the system and the ExtendJ compiler is enabling
modular extensions [1, 2].
Several extensions have been created for the ExtendJ compiler, e.g., the two static pro-
gram analysis tools, IntraCFG [3] and a non-null checker [4]. Another example is the JFea-
ture [5] tool that uses ExtendJ to identify Java features for different projects.
ExtendJ currently supports Java versions 4-8, where Java 4 is the base version, and the
newer versions have been added as modules to the compiler [6, 7]. To further test the ex-
tendibility of ExtendJ and JastAdd and to allow for future extensions to be built for Ex-
tendJ, this thesis aimed to add support for more recent Java versions. Extending ExtendJ
would improve existing tools, e.g., updating JFeature to be able to analyze more recent
projects using the new Java language features. The first long-term support (LTS) version of
Java was Java 8, later followed by Java 11. This thesis aimed to add support for Java 9, 10, and
9
1. Introduction
11 features so that ExtendJ can eventually be fully compliant with the Java 11 LTS release.
In addition to improving ExtendJ, this thesis aimed to evaluate it in two main ways. The
first one was to assess to what extent it is possible to add extensions in a modular fashion
since that is an essential feature of the compiler and the JastAdd system. The second was
to measure the extended versions’ performance and precision limitations compared to the
OpenJDK-based Java compilers. These goals are presented below as two research questions
we sought to answer in this thesis.
The structure of this report is as follows. Chapter 2 presents the theoretical foundations
for compiler construction, the JastAdd system and the Java language. Chapter 3 introduces
the changes required to add support for the Java 9, Java 10, and Java 11 features. Then, the
evaluation of the compiler is presented in Chapter 4. Finally, we discuss the results in Chapter
5 and answer the research questions in Chapter 6.
10
Chapter 2
Background
A compiler can significantly affect the compiled code’s performance and development effi-
ciency through faster build times. This chapter begins by outlining the fundamental steps
that many compilers employ to compile a program in Section 2.1. Then, the ExtendJ com-
piler and the JastAdd system are described in Sections 2.2 and 2.3. In Section 2.4, we intro-
duce the Java programming language, and the new language features introduced in Java 9, 10,
and 11. Finally, we will describe how the performance and precision of a Java program can be
analyzed in Section 2.5.
11
2. Background
on the stream of tokens it parses according to the production rules. The specification de-
fines production rules for language constructs such as statements and expressions. There is a
conflict if these rules are ambiguous, and two or more rules can match a set of tokens simul-
taneously. Reduce-reduce conflicts occur when two rules match fully and either can be applied.
Shift-reduce conflicts occur when one rule can be applied to perform a reduce action and an-
other to perform a shift action. These conflicts must be avoided to have a well-functioning
parser. If there are no parser rules that match a given set of tokens or if the source files contain
strings that do not match any scanner rule, it is a syntax error.
12
2.2 ExtendJ
Token
decl()
int y = 2 1 + y * 3
2.2 ExtendJ
ExtendJ is a Java compiler mainly built using the JastAdd metacompiler system [1] de-
scribed in Section 2.3.2. The purpose of ExtendJ is to function as a regular Java compiler
while having a modular design to make it possible to add extensions to the language. Func-
tioning as a Java compiler means it creates class files from Java source code that can then be
run by a regular Java run-time environment [1]. The main components of ExtendJ are the
Java version modules consisting of a back-end and a front-end. These consist of aspects [12]
that enables the modular definition of attributes for the AST nodes. Using aspects, new
attributes or methods can be declared and existing functionality extended or overwritten.
The parser in ExtendJ is built using the Beaver [9] parser generator, and the scanner
using the JFlex [8] scanner generator. These do not enable a modular design, but a pre-
processor in JastAdd allows definitions to be split into different files. This enables the
compiler to be separated into modules corresponding to a Java version. Using an appropriate
build script, the modules can be combined, compiled, and packaged into a jar file to create a
Java compiler for any supported Java version.
The latest significant extension of ExtendJ was with the Extending JastAddJ to Java 8 [7]
paper in 2014. ExtendJ currently supports almost all Java 8 features, and contains some bugs
and minor issues, mostly related to type inference. Therefore, it can compile most, but not
all, Java 8 projects1 .
1
The compliance issues for ExtendJ are described further on the ExtendJ web page: https://extendj.
org/compliance.html
13
2. Background
14
2.3 Attribute Grammars
Refining attributes
To further allow for modular extensions JastAdd facilitates a way to make changes to ex-
isting attributes and methods using the keyword refine. This allows new modules to change
the behavior of the existing code without needing to modify it directly. An example of this
15
2. Background
aspect Modifiers {
syn boolean MethodDecl . isAbstract () =
getModifiers () . isAbstract () ||
hostType () . isInterfaceDecl () ;
}
aspect Java8Modifiers {
refine Modifiers
eq MethodDecl . isAbstract () =
getModifiers () . isAbstract () ||
( hostType () . isInterfaceDecl () &&
! isStatic () && ! isDefault () ) ;
}
can be seen in Listing 2.4, where static and default interface methods were introduced
in Java 8. In the example, we can see how the original attribute isAbstract() for method
declaration nodes, MethodDecl, in the aspect Modifiers is refined in the Java 8 aspect
Java8Modifiers. Now the isAbstract() attribute can account for the new types of in-
terface methods without requiring any changes to where the attribute is used. This allows
new modules to redefine the implementation of older attributes meaning the older modules
can be extended without being modified directly.
2.4.1 Java 9
Java 9 introduces six changes to the Java language, four of which are minor, one more sub-
stantial, and one requiring extensive changes to the OpenJDK JVM. The six changes are as
follows:
16
2.4 The Java Language
• final and effectively final variables can be used as resources in the try-with-resources
statement,
• the diamond operator can be used with anonymous classes if the argument type of the
inferred type is denotable,
Of the six changes, the module system is the largest and was part of the larger Jigsaw
Project [23], which also introduced extensive changes to the internal structure of the JVM.
At its core, a module is a set of related packages that have been grouped together. Pack-
ages within modules may be classed as ‘exported’, meaning that their types may be accessed
from outside the module. If packages are not exported, only other packages within the mod-
ule may reach them [24, p. 175]. Adding support for this in ExtendJ would mean the module
information files would need to be parsed and the contents incorporated into the name anal-
ysis. We decided it was not realistic to include support for modules in the scope of this thesis.
17
2. Background
Of the five remaining features, the largest one is the update to the try-with-resources
statement [24, p. 470-475]. The update means resources in the resource list can be declared
outside of the try-statement if they are final or effectively final. This, in turn, can lead to more
concise code. A variable is effectively final if its value is not changed after initialization. For
objects, this means that the reference to the object is not changed, while the state of the object
itself may be altered.
An example using try-with-resources is shown in Listing 2.5, and how it can be re-written
using the changes introduced in Java 9 is shown in Listing 2.6.
ArrayList < String > list = new ArrayList < String >() ;
Stream < String > stream = list . stream () ;
Listing 2.7: In Java 9, all local variables must have an explicit type.
Listing 2.8: In Java 10, local variable types can be inferred using var.
2.4.2 Java 10
Java 10 introduces type inference for local variables with the var identifier. This decreases
the amount of boilerplate code needed when declaring variables [25]. Using var can help
make the code more readable and faster to write if used in the correct situations. The Java
10 specification states the following regarding var: "var is not a keyword, but rather an iden-
tifier with special meaning as the type of a local variable declaration." [26, p. 24]. This means the
identifier is context sensitive and the rules for its used depends on how it is used. It can be
used as normal for e.g., variable and method identifiers, but not as the identifier for classes
or interfaces. The identifier also comes with several limitations on its use in declarations. It
is a compile-time error if:
• var is used to declare more than one variable. E.g., var x = 1, y = 2;,
• var is used to declare a variable with bracket pairs. E.g., var x[] = y;,
• a variable declaration using var has an array initializer. E.g., var x = {1, 2};, or,
18
2.4 The Java Language
• a variable declaration using var has an initializer containing a reference to itself. E.g.,
var x = (x = 1);.
An example of how var can be used is illustrated in Listings 2.7 and 2.8.
An interesting property of the var identifier, is that it allows us to capture certain non-
denotable types as the type of our variable. Non-denotable types are types that cannot be
written with the language syntax. In Java 10, these include intersection types, capture types,
anonymous class types, and the null-type. Of these, intersection types and anonymous class
types can be inferred as is, while the null-type is rejected. Capture types are treated specially,
which is described in the following section. The effect of this is that there are programs that
can be expressed with the help of the var identifier that cannot be expressed without it. A
survey on the OpenJDK code base found that 1% of all variable declarations that have an
initializer would contain a capture type if they were changed to a declaration using var [25].
Type Projections
Capture types may contain synthetic type variables, which are type variables introduced by
the compiler during capture conversion or inference variable resolution [26]. The type of
a variable declared with var may not contain these type variables according to the Java 10
specification [26, p. 76-78], and they are instead replaced by applying an upward type projection
on the type. This projection is also described in the Java 10 specification. The type projection
is always applied to the type of the initializer of a declaration using var. On types that are
not or do not contain synthetic type variables, upward type projection acts as the identity
function, while for others it finds a suitable replacement type. For parameterized types such
as Map<K, V>, it replaces each type with its respective upward projection, and for arrays, it
performs an upward projection on the base type.
Listing 2.9: In Java 10, the type of lambda parameters is either ex-
plicitly typed or there is no type declared.
Listing 2.10: In Java 11, lambda parameters can be declared with the
identifier var.
19
2. Background
2.4.3 Java 11
In Java 11, the decision was made to allow the var identifier to also be used for implicitly
typed lambda expressions. Type inference for lambda expressions is not new, but this change
creates better uniformity in the language and enables the use of type annotations in a concise
way [27]. Listing 2.9 demonstrates how the type of lambda parameters could be inferred in
Java 10, and Listing 2.10 shows the additional way the types can be inferred using var in Java
11.
20
2.5 Evaluating Java Programs
that is evaluated [28]. The general way to measure steady-state performance is to measure
the performance of multiple iterations of a benchmark over multiple VM invocations. To
determine when steady-state performance is reached, the coefficient of variation (CoV) of the
last k iterations is used. Once the CoV falls below a predetermined value, the mean for that
VM invocation using the measurements of the last k iterations is computed. These means
are then used to calculate an overall mean, which is, in turn, used to compute the confidence
interval for the desired confidence level, just as with start-up performance [28].
A confidence interval is an interval for which there is a given likelihood that the actual
mean of the population is within the interval. As such, intervals will be larger for higher
degrees of confidence, i.e., a 95% confidence interval will be larger than a 90% confidence
interval. The way these confidence intervals are calculated depends on the number of samples.
If the number of measurements is small (n < 30), the Student’s t-distribution can be used. If the
number of measurements is large (n >= 30), Gaussian distribution can be used instead [28].
Analysis of Variance (ANOVA) is a way to compare multiple alternatives, where one vari-
able is altered between each alternative. The idea behind ANOVA is to compare the varia-
tion within an alternative to the variation between alternatives. ANOVA assumes that the
variation within each alternative is due to random effects (errors) in the measurements. If
the variation between alternatives is greater than the variation within them, then one can
conclude that there is a statistically significant difference between them. In practice, this
means that three values are computed: the sum-of-squares due to the difference between
alternatives (SSA), the sum-of-squares due to errors between measurements (SSE), and the
sum-of-squares total (SST), which is the sum of the SSA and SSE. A simple way to quantify
whether the variation within or between alternatives is greater is to compare the fractions
SSE/SST versus SSA/SST [28].
• manual inspection,
• program analysis,
Each method has its own benefits and drawbacks that need to be considered. Manual in-
spection is simple and allows a skilled developer to notice bugs early, but it is also a time-
consuming process, and as such does not scale well with larger projects. Program analysis, on
the other hand, allows the developer to make firmer statements regarding the completeness
and correctness of their software. While this is the case, finding and adapting appropriate
21
2. Background
program analysis methods, while also ensuring that the theory behind the analysis is sound,
is no simple matter. Automated testing and execution on real world input, in contrast to
program analysis, are far more practical methods. One of the most common types of auto-
mated testing is unit testing, where each test tests a specific feature. Unit tests are simple to
write, and can allow developers to cover a large part of the codebase. However, a limitation
of unit testing is that it can only show that the program has the expected behavior for specific
inputs. As such, executing the program on real world input can be a complementary step, as
it may help in discovering hard to foresee edge cases.
It is important to note that no program testing method can guarantee that all bugs, or
indeed any bugs, will be found. As Dijkstra stated in 1970 – "Program testing can be used to
show the presence of bugs, but never to show their absence" [29].
22
Chapter 3
Implementation
This chapter describes the implementation of the Java 9, 10, and 11 features. The issues faced
during implementation are also described to help answer the research questions.
3.1.1 Try-with-resources
The extension to the TWR statement could not be done by adding a module. Adding a new
production to the parser in the Java 9 module would create conflicts, but it was necessary to
accommodate the use of resources in addition to declarations. This is due to the parser’s lim-
ited knowledge, making it impossible for a new rule not to cause conflicts with the existing
parser. A new parser rule would have to allow an arbitrary number of resource declarations
and resource uses, but it cannot track whether at least one resource use was parsed. This
means that this new rule would match a TWR statement only containing resource declara-
tions. The old rule would also match this, creating a reduce-reduce conflict. Extending the
current production was not possible either since the existing rules were not compatible with
the extension.
There were two main ways of solving this limitation. The first solution was to rewrite the
Java 7 implementation to make it more extendable and add a Java 9 module that extends it.
The downside is that changing the Java 7 implementation might cause bugs or issues that were
previously not there. The second solution was to exclude the Java 7 implementation from the
23
3. Implementation
build scripts and add a new implementation for TWR in Java 9. Doing this would cause
significant code duplication since the new solution would still be based on the original one.
We chose the first option since this makes the entire implementation easier to understand
and more extendable in the future.
We tried two ways to change the Java 7 implementation to be more extendable. The first
one was to wrap ResourceDeclaration in a statement node, ResourceDeclStmt. This
node could later be extended to contain other forms of resources that are not declarations.
This was fast and relatively easy to implement, but would have two negative effects on the
compiler. One is memory usage, where we now have one more node for each resource with the
only purpose of wrapping a resource node. The second is that the implementation added one
attribute to all statement nodes, meaning further memory usage and unnecessary attributes
for all other statement nodes.
The second method introduced an abstract node type, i.e., Resource, which inherits the
type Stmt. ResourceDeclaration was then changed to be a subtype of Resource. Instead
of inheriting from the type VariableDeclarator, ResourceDeclaration now contains
one, refactoring the previous inheritance into a composition. This refractoring can be seen
in Figure 3.1. With this new structure, it was easy to introduce a new subtype to Resource –
ResourceUse. However, this required changing many attributes in the Java 7 module, where
attributes that were previously part of ResourceDeclaration directly now are part of the
VariableDeclarator contained within it.
Declarator Stmt
VariableDeclarator Resource
VariableDeclarator
A benefit of this approach was that it was more natural to construct parser rules. All
that was needed was changing the type of the list of resources to be of the type Resource
rather than the type ResourceDeclaration in the Java 7 module. Additional rules could
then be added in the Java 9 module for parsing the new type of resource. Another benefit
is potentially decreasing the extra memory needed. In the first solution, we created an ad-
ditional node for each resource, but with this solution, there is one node created for each
ResourceDeclaration and no additional nodes created for each ResourceUse. These
two benefits caused us to select this way of changing the Java 7 implementation.
After implementing the changes to the Java 7 TWR module and adding the front-end for
24
3.1 Java 9 Implementation
the Java 9 solution, the next step was deciding how to implement the bytecode generation.
The Java 9 addition to TWR meant that the resource list could now contain uses of variables
in addition to declarations of variables. What variables can be used as resources are limited to
local or field variables that are final or effectively final. Two main ways of implementing this
addition to the bytecode generation were explored. The first was following the way the Java 9
specification describes it by replacing the use of a variable in a resource list with a declaration
of a temporary variable of the same type [24, p. 472], see Figure 3.2. This temporary variable
requires a unique identifier and a transformation of the AST to insert its declaration in place
of the original variable. All uses of the variable in the TWR statement would then need to be
replaced by the new temporary variable. This would require several additions to ExtendJ,
since there is no current support for creating unique identifiers for temporary variables and
for replacing uses of a variable with the new temporary variable. The second was to ignore
the Java 9 specification, keep the original variable, use it in the TWR block and then close it
in a generated finally clause. This would only require minimal additions to the bytecode
generation and would still follow the behavior defined by the Java 9 specification and was
therefore chosen.
Figure 3.2: Excerpt from the Java 9 specification [24, p. 472] describ-
ing how use of variables in a try-with-resources statement are trans-
formed into a declaration of a temporary resource.
3.1.2 SafeVarargs
Allowing the SafeVarargs annotation to be used on private instance methods was possi-
ble to do in a modular way. The only modification needed was an extension to the error
reporting system which was straightforward to implement. This was done with a refine of
the safeVarargsProblems() attribute, adding a check to allow the use on private method
declarations.
25
3. Implementation
Listing 3.1: The addition that was made to error reporting for all
declarations, so that underscore may not be used as an identifier.
26
3.3 Java 11 Implementation
isVar() to the VarDeclStmt node and related nodes that returns true when the name of
the declared type is "var". The attribute VarDeclStmt.type() was then refined to return
the inferred type of the variable if it was declared using var, and the declared type if it was
not. To find the inferred type another attribute was added to the VarDeclStmt node. This
attribute was also added to EnhancedForStmt, i.e., for each -loops, since the loop variable
can now also be declared using var. The value of the attribute will be the component type of an
array being iterated over or the iterableElementType() of the collection being iterated
over. This also needed a small addition to the code generation for EnhancedForStmt to use
the inferred type when needed.
The var identifier has several restrictions that regular type identifiers do not have. For
example, there are initializers that require an explicit target type, and are not allowed to be
used in a declaration using var. Among these initializers are array literals as well as lambda
expressions. If they are used in this context, then an error should be produced in accordance
with the Java specification [26, p. 433 – 434]. Another restriction mentioned in Section 2.4.2
is that a variable declared with the var identifier cannot be referenced in its own initializer.
To detect these occurrences, an attribute varOccurs() was added to all expression types
that are valid initializers for the var identifier. This attribute returns true if the variable
occurs in the initializer, and false otherwise. This check was then implemented for all
Expr nodes where a variable use may occur, to identify when the variable being declared is
used. Since many Expr nodes may contain other expressions, this check is done recursively
on those Expr nodes.
As stated in Section 2.4.2, capture types should be projected to a supertype using upward
type projection. An attempt was made to implement type projection by closely following
the description found in the Java 10 language specification [26, p. 76-78]. However, this
implementation attempt had to be abandoned. The most significant issue encountered was
in verifying the correctness of the implementation. Writing tests where the inferred type
of an expression contained relevant capture types proved difficult. To test type projections,
capture types containing synthetic type variables are needed and without this, it became
impossible to test the implementation. An extensive search for existing tests was carried out,
but no tests were found. As such, the Java 10 implementation is not feature-complete.
27
3. Implementation
refine TypeAnalysis
eq ParameterDeclaration . type () {
if ( getTypeAccess () != null &&
getTypeAccess () . isVar () ) {
return inferredType () ;
}
return getTypeAccess () . type () ;
}
of the attribute and how it is used is shown in Listing 3.2. The value of inferredType()
was calculated the same way as lambda parameters missing a type declaration in Java 8.
28
Chapter 4
Evaluation
To address the research questions, the extended ExtendJ compiler needed to be evaluated.
To answer RQ1 we examine the additions and if they could be made modularly in Section 4.1.
In Section 4.2, we describe the problems encountered when compiling real-world projects.
Finally, in Sections 4.3 and 4.4, we evaluate the precision and performance of ExtendJ to
answer RQ2.
4.1 Extendibility
A majority, but not all the additions for Java 9, 10, and 11 could be done modularly. However,
the change to try-with-resources could not be implemented without modifying the Java 7
implementation. This meant modifying seven different files and aspects relating to the parser,
semantic analysis, and code generation. Two additional smaller changes needed to be made to
the Java 8 implementation in the reading of bytecode and an aspect name to make it possible
to extend it to Java 9.
29
4. Evaluation
Module SLOC per Module SLOC Compiled ExtendJ Increase in Generated Code
Java 8 5419 142 003 19 507
Java 9 341 142 126 123
Java 10 400 142 757 631
Java 11 168 142 871 114
Table 4.1: Source Lines Of Code (SLOC) counts for ExtendJ 8-11
modules, SLOC counts for the Java code generated when compiling
ExtendJ for the same module level, and the increase in the SLOC
compared to the previous version of ExtendJ.
30
4.2 Compiling Real-World Projects
Listing 4.1: The Gradle task used to compile projects using Ex-
tendJ.
were able to create the task shown in Listing 4.1. The task would run the ExtendJ compiler
with the same classpath and input files used when compiling the project with javac. This
was used to compile the ExtendJ and Disruptor projects with ExtendJ, but not all Gradle
scripts have the same structure and could not be compiled in this way. We attempted to write
similar tasks for Ant and Maven build scripts, but were unsuccessful. The main issue was
determining the classpath and the set of source files.
31
4. Evaluation
whole project.
A modified version of this approach involves constructing the classpath for the whole
project and compiling all relevant Java files together with the needed dependencies. Con-
structing the classpath for an entire project is complex, and we could not find any sources
describing this process. However, there exists a small corpus of Java projects needing at most
Java 8 to compile, as well as defined classpaths and dependencies. This allowed us to compile
Fop, Antlr, Gson and Mockito, but was not helpful in finding any Java 9, 10, and 11 projects.
Despite this, these projects were useful for comparing the performance of different ExtendJ
versions with the corresponding javac versions.
32
4.3 Precision of Implementation
4.4 Performance
To compare the extended version of ExtendJ with the javac compiler we measured the
compilation time and memory usage of the projects we could compile. Compilation time
is important during development since a slow compiler can be both frustrating and make
development take more time. Memory usage is also an important metric, since ExtendJ
needs to be able to compile large projects without requiring special hardware.
Compilation time and memory usage are the two performance metrics we were able to
compare for ExtendJ and OpenJDK based Java compilers. All the following measurements
33
4. Evaluation
ConstructorReferenceAccess
IntersectionCastExpr
MethodReference
DiamondAccess
DefaultMethod
ResourceUse
Lambda var
Lambda
Var
Project
Disruptor 17 0 0 118 18 0 5 1 0 0 0 0 0 0
Mockito 3 0 0 37 13 0 10 0 0 0 0 0 0 0
Gson 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Antlr 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Jython 0 0 0 2 0 0 0 0 0 0 0 0 0 0
Fop 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ExtendJ 0 0 0 0 0 0 0 0 0 0 0 0 0 0
were performed on a benchmark computer running Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-
70-generic x86_64) running on an Intel i7-11700K with 8 cores and fixed 3.5 GHz clock fre-
quency. The computer has 128 GiB DDR4-3200 RAM with a 1 TiB M.2 harddisk. When
measuring performance, the procedures described in Section 2.5.1 were used.
We could not run any javac 10 version on the benchmark computer which means we
could not collect any performance data for javac 10. Also, note that there is no data for
compiling Disruptor using javac 8 and ExtendJ 8 since the project requires javac 9 or
ExtendJ 9 to compile.
34
4.4 Performance
Figure 4.2: Compilation times for the three largest projects when
using different compilers and compiler versions. The prefix ss stands
for steady-state.
35
4. Evaluation
36
4.4 Performance
Figure 4.3: Memory in use after compiling the projects, ordered left
to right by increasing SLOC count.
37
4. Evaluation
38
Chapter 5
Discussion
In this thesis, we have identified and implemented most of the Java 9, 10, and 11 features
in ExtendJ and attempted to evaluate the implementation. During this, we realized the
complexity of evaluating a Java compiler on real-world projects, and explored possible paths
to accomplish this. In this chapter, we first discuss the implementation aspects in Section 5.1.
Then, we discuss the difficulties and insights gained from the attempts to compile real-world
projects in Section 5.2. In Section 5.3, we discuss the performance of ExtendJ and if it affects
the usability of the compiler. Finally, we discuss future work on ExtendJ in Section 5.4.
5.1 Implementation
During the implementation, we aimed to evaluate how modularly extensible ExtendJ is. As
described in Section 4.1, most new features could be added modularly and in only 909 lines
of source code divided between the Java 9, 10, and 11 modules. This is a clear indication that
the design of ExtendJ is modular and allows for new modules to be added efficiently. The
RAGs enable the extensions to be made this modularly and in so few SLOC. The attributes
make it possible to identify what data is available from a node and use that to efficiently
define new attributes. The main feature that could not be added modularly was the extension
to the try-with-resources statement. Java features have generally been implemented with a
high level of abstraction in ExtendJ to allow for extensions, but this was not the case for
the implementation of try-with-resources. With better foresight the design could have been
made extendable from the beginning.
Another important metric is that two graduate-level developers, with the help of a su-
pervisor, were able to understand, modify and extend the compiler in the limited scope of
a master’s thesis. We had previously used the JastAdd system, but had never worked with
the ExtendJ compiler before starting the thesis. However, the implementation of the new
features is not fully compliant and there are limitations to it, such as the lack of type pro-
jections. There are also likely more issues that have not been identified due to the lack of
39
5. Discussion
real-world projects to test it on. The limitations of the precision evaluation are clear from
Table 4.3 with the almost complete absence of Java 9-11 features.
5.3 Performance
The performance of the extended compiler when the new features are not used, does likely not
represent the performance when the features are used. However, it is still possible to discuss
the overhead of the added features. None of the added features add any significant number
of attributes, or what we estimate to be computationally or memory intensive operations.
Considering that the added features are not used for most of the projects in the analysis, it
should be expected that the compilation time and memory usage only increase slightly when
adding support for each new Java version. This increase would be because of new attributes
and the additions to existing attributes.
The ANOVA results from the performance analysis in Section 4.4 show no difference
in compilation time when compiling the projects with ExtendJ 8, 9, 10, or 11. This means
there is no significant compilation time overhead when compiling projects with the different
ExtendJ versions. However, this can only be stated for projects that do not use the features
introduced in Java 9-11.
For all projects, excluding ExtendJ, the compilation time slow-down was below a factor
of 3, for both steady-state and start-up results. However, for ExtendJ, the slow-down was a
factor of 4.0 for steady-state when comparing ExtendJ 8 with javac 8 and at most a factor
40
5.4 Future work
of 3.1 when comparing the other compiler versions. Since ExtendJ is the largest project,
the increasing slow-down factor is concerning for the performance of very large projects.
However, ExtendJ is an outlier and testing this further on projects with SLOC counts at
or above that of ExtendJ is necessary to speak more conclusively on the matter. A slow-
down factor of around 3 should not limit the usability of ExtendJ on modern computers,
but would make it less viable for performance critical applications.
There are two main points of discussion concerning the memory consumption evalua-
tion. The first one is that the ANOVA results presented in Table 4.4 show no statistically
significant difference between the ExtendJ versions when compiling any project. This means
that it is likely that there is no significant increase in memory usage overhead, despite the
added features. The second one is that the largest difference in memory usage between javac
and ExtendJ was for javac 9 and ExtendJ 10 with a factor of below 6 (5.56) for Gson. A
memory increase by a factor of 6 is concerning and could potentially limit the viability of
ExtendJ for some use cases. When using ExtendJ as a normal Java compiler this should
not be a limiting factor, since even the large projects compiled in this thesis have a memory
usage in a reasonable range. However, when using ExtendJ in a tool such as IntraJ, where
the compilation is done in the background to continuously analyze the code, the memory
difference between javac and ExtendJ could be more concerning.
Another thing of note that can be seen in Figure 4.3 is the gap in memory usage between
javac 8 and javac 9. We speculate that this is due to several memory optimizations between
the release of Java 8 and Java 9. The difference in performance between javac 8 and 9 can also
be seen in Figures 4.1 and 4.2, where the start-up compilation time for javac 8 is seemingly
faster than for its counterparts.
41
5. Discussion
42
Chapter 6
Conclusions
In this chapter, we answer the research questions based on the discussions in Chapter 5. The
research questions are re-introduced to help readability.
RQ1
The main feature of ExtendJ is its extendibility and modularity. To evaluate this, we have
extended ExtendJ to support the majority of Java 9, 10, and 11 features. The new features
could be implemented mostly modularly, with one notable exception being due to the limit-
ing way the feature was originally implemented.
In addition to the modularity, the extensions could be made concisely requiring only a
total of 909 lines of code. Considering this and the fact that ExtendJ could be extended with
multiple features to support a large part of Java 9-11 within the scope of a master’s thesis, we
conclude that ExtendJ is extensible to a high degree.
Since the extensions could be made modularly and that we found ExtendJ to be exten-
sible to a high degree, we conclude that ExtendJ is modularly extensible to a high degree.
RQ2
We have measured the performance of the extended ExtendJ compiler to see if it performs
reasonably well compared to the reference Java implementation. Regarding the performance
limitations for the extended ExtendJ compiler, we can only draw conclusions about the
overhead of the added features, since finding and compiling projects using the features were
mostly unsuccessful. For compilation time, we can conclude that ExtendJ generally runs
43
6. Conclusions
within a factor of 3 both when measuring steady-state and start-up performance, but this
might be increased for larger projects and projects using the Java 9-11 features.
We have also measured memory usage and conclude that ExtendJ’s memory usage is up
to a factor of 6 greater than the reference implementation. The higher memory usage and
compilation time are significant, but should not be a limiting factor on modern systems, as
even large projects can be compiled. It may, however, impact ExtendJ’s usefulness as a static
analysis tool, but no definitive statements can be made concerning this.
44
References
[1] Torbjörn Ekman and Görel Hedin. The JastAdd Extensible Java Compiler. OOPSLA
2007,Montreal, Canada,– ACM Sigplan Notices, 42:1 – 17, 2007.
[2] Torbjörn Ekman and Görel Hedin. The JastAdd system — modular extensible compiler
construction. Science of Computer Programming, 69(1-3):14 – 26, 2007.
[3] Idriss Riouak, Christoph Reichenbach, Gorel Hedin, and Niklas Fors. A Precise Frame-
work for Source-Level Control-Flow Analysis. 2021 IEEE 21st International Working Con-
ference on Source Code Analysis and Manipulation (SCAM), Source Code Analysis and Manip-
ulation (SCAM), 2021 IEEE 21st International Working Conference on, SCAM, pages 1 – 11,
2021.
[4] Torbjörn Ekman and Görel Hedin. Pluggable checking and inferencing of nonnull types
for Java. Journal of Object Technology, 6(9):455 – 475, 2007.
[5] Idriss Riouak, Gorel Hedin, Christoph Reichenbach, and Niklas Fors. JFeature: Know
Your Corpus. 2022 IEEE 22nd International Working Conference on Source Code Analysis
and Manipulation (SCAM), Source Code Analysis and Manipulation (SCAM), 2022 IEEE 22nd
International Working Conference on, SCAM, pages 236 – 241, 2022.
[6] Jesper Öqvist. Implementation of Java 7 features in an extensible compiler. LU-CS-EX:
2012:13. Department of Computer Science, Faculty of Engineering, LTH, Lund Uni-
versity, 2012.
[7] Erik Hogeman. Extending JastAddJ to Java 8. LU-CS-EX: 2014:14. Department of Com-
puter Science, Faculty of Engineering, LTH, Lund University, 2014.
[8] JFlex. https://jflex.de/[Online; accessed 19 April 2023].
[9] Beaver. https://beaver.sourceforge.net/[Online; accessed 19 April 2023].
[10] Jesper Öqvist and Görel Hedin. Extending the JastAdd Extensible Java Compiler to Java
7. In Proceedings of the 2013 International Conference on Principles and Practices of Program-
ming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ ’13, page 147–152,
New York, NY, USA, 2013. Association for Computing Machinery.
45
REFERENCES
[11] Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. The Java® Virtual Ma-
chine Specification Java SE 9 Edition, 2017. https://docs.oracle.com/javase/
specs/jvms/se9/jvms9.pdf [Online; accessed 2 March 2023].
[12] Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G
Griswold. An overview of AspectJ. In ECOOP 2001—Object-Oriented Programming: 15th
European Conference Budapest, Hungary, June 18–22, 2001 Proceedings 15, pages 327–354.
Springer, 2001.
[14] Didier Parigot, Gilles Roussel, Etienne Duris, and Martin Jourdan. Attribute grammars:
a declarative functional language. PhD thesis, INRIA, 1995.
[15] Görel Hedin. An introductory tutorial on JastAdd attribute grammars. Generative and
Transformational Techniques in Software Engineering III: International Summer School, GTTSE
2009, Braga, Portugal, July 6-11, 2009. Revised Papers, pages 166–200, 2011.
[16] Görel Hedin. Reference attributed grammars. Informatica, 24(3):301 – 317, 2000.
[17] Anthony M Sloane, Lennart CL Kats, and Eelco Visser. A pure embedding of attribute
grammars. Science of Computer Programming, 78(10):1752–1769, 2013.
[18] Eric Van Wyk, Derek Bodin, Jimin Gao, and Lijesh Krishnan. Silver: An extensible
attribute grammar system. Science of Computer Programming, 75(1):39–54, 2010. Spe-
cial Issue on ETAPS 2006 and 2007 Workshops on Language Descriptions, Tools, and
Applications (LDTA ’06 and ’07).
[19] Harald H Vogt, S Doaitse Swierstra, and Matthijs F Kuiper. Higher order attribute
grammars. ACM SIGPLAN Notices, 24(7):131–145, 1989.
[20] Eva Magnusson and Görel Hedin. Circular reference attributed grammars—their eval-
uation and applications. Science of Computer Programming, 68(1):21–37, 2007.
[24] James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Alex Buckley, and Daniel Smith. The
Java® Language Specification Java SE 9 Edition, 2017. https://docs.oracle.com/
javase/specs/jls/se9/jls9.pdf [Online; accessed 2 March 2023].
[25] Brian Goetz. JEP 286: Local-Variable Type Inference, 2016. https://openjdk.org/
jeps/286 [Online; accessed 16 February 2023].
46
REFERENCES
[26] James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Alex Buckley, and Daniel Smith. The
Java® Language Specification Java SE 10 Edition, 2018. https://docs.oracle.com/
javase/specs/jls/se10/jls10.pdf [Online; accessed 2 March 2023].
[27] Brian Goetz. JEP 323: Local-Variable Syntax for Lambda Parameters, 2017. https:
//openjdk.org/jeps/323 [Online; accessed 16 February 2023].
[28] Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java perfor-
mance evaluation. In Proceedings of the Conference on Object-Oriented Programming Systems,
Languages, and Applications, OOPSLA, pages 57–76 – 76, Department of Electronics and
Information Systems, Ghent University, 2007.
47
INSTITUTIONEN FÖR DATAVETENSKAP | LUNDS TEKNISKA HÖGSKOLA | PRESENTERAD 2023-06-01
En kompilator är ett program som kan översätta kod som är enkel att skriva och
förstå till kod som är effektiv att köra. ExtendJ är en Java-kompilator som är specifikt
designad för att vara enkel att utöka och denna uppsats testar detta genom att utöka
den och utvärdera dess prestanda.
Kompilatorer är viktiga inom mjukvaruutveck- kompilera projekt som använder tilläggen i Java
ling. De gör det möjligt för programmerare att 9–11. Sådana projekt kunde inte hittas och därför
skriva läsbar kod som sedan kompileras till ett fick tilläggen testas på Java 4–8 projekt. Vidare
format som gör det möjligt och snabbt för en pro- arbete med detta behövs för att kunna analysera
cessor att köra programmet. Java är ett modernt ExtendJ ordentligt.
programmeringsspråk och Java-kompilatorer som Vi drog tre slutsatser från undersökningen. Den
javac är komplexa och svåra att modifiera. första var att ExtendJ är i hög grad utöknings-
För att utforska nya sätt att utveckla kompila- bar. Den andra var att kompileringstiden och
torer skapades ExtendJ Java-kompilatorn som är minnesanvändningen för den utökade kompilatorn
skriven på ett utökningsbart sätt. ExtendJ stöd- är oförändrad för Java 4–8 projekt. I Figur 1
jer just nu Java 4–8 och i detta arbete hade vi som visas kompileringstiden för tre av de projekt vi
mål att utöka kompilatorn till Java 9–11. Detta kunde kompilera. Slutligen drog vi slutsatsen att
för att utvärdera om den är utökningsbar och hur kompileringstiden är som mest tre gånger längre
kompilatorns prestanda påverkas av utökningar. och minnesanvändningen sex gånger högre än för
Det finns idag flera forskningsprojekt som använ- javac.
der sig av ExtendJ, och en utökning av kompila-
torn skulle betyda att dessa blir mer aktuella.
Vi byggde ut ExtendJ och analyserade hur kor-
rekt den fungerar samt hur dess prestanda är jäm-
fört med javac. Då kompilatorer är komplexa pro-
gram så måste de testas på existerande projekt.
Om ExtendJ kan kompilera projekt som använ-
der sig av ändringarna i Java 9–11 så är den till
stor del korrekt. Även prestandan måste mätas
på existerande projekt för att kunna dra relevanta Figur 1: Hur kompileringstiden för tre projekt
slutsatser. beror av vilken kompilator och version som an-
En stor del av arbetet lades på att hitta och vänds.