Intermediate Code Generation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Three-Address Code

In three-address code, there is at most one operator on the right side of an


instruction; that is, no built-up arithmetic expressions are permitted. Thus a
source-language expression like x+y*z might be translated into the sequence of
three-address instructions
t1 = y * z
t2 = x + t1
where t1 and t2 are compiler-generated temporary names. This unraveling of
multi-operator arithmetic expressions and of nested flow-of-control statements
makes three-address code desirable for target-code generation and optimization.
The use of names for the intermediate values computed by a program allows
three-address code to be rearranged easily.

Addresses and Instructions


Three-address code is built from two concepts: addresses and instructions.
An address can be one of the following:
A name. For convenience, we allow source-program names to appear as
addresses in three-address code. In an implementation, a source name is replaced
by a pointer to its symbol-table entry, where all information about the name is
kept.
Aconstant.In practice, a compiler must deal with many different types of
constants and variables.
A compiler-generated temporary. It is useful, especially in optimizing
compilers, to create a distinct name each time a temporary is needed. These
temporaries can be combined, if possible, when registers are allocated to
variables.
The common three-address instruction forms:
1. Assignment instructions of the form x = y op z, where op is a binary arithmetic
or logical operation, and x, y, and z are addresses.
2. Assignments of the form x = op y, where op is a unary operation. Essential
unary operations include unary minus, logical negation, and conversion operators
that, for example, convert an integer to a floating-point number.
3. Copy instructions of the form x = y, where x is assigned the value of y.
4. An unconditional jump goto L. The three-address instruction with label L is the
next to be executed.
5. Conditional jumps of the form if x goto L and if False x goto L. These
instructions execute the instruction with label L next if x is true and false,
respectively. Otherwise, the following three-address instruction in sequence is
executed next, as usual.
6. Conditional jumps such as if x relop y goto L, which apply a relational operator
(<, ==, >=, etc.) to x and y, and execute the instruction with label L next if x stands
in relation relop to y. If not, the three-address instruction following if x relop y
goto L is executed next, in sequence.
7. Procedure calls and returns are implemented using the following instructions:
param x for parameters; call p, n and y = call p, n for procedure and function calls,
respectively; and return y, where y, representing a returned value, is optional.
Their typical use is as the sequence of three address instructions
param x1
param x2
………
param xn
call p, n
generated as part of a call of the procedure p(x1; x2; : : : ; xn). The integer n,
indicating the number of actual parameters in “call p, n,".
8. Indexed copy instructions of the form x = y[i] and x[i]=y. The instruction x =
y[i] sets x to the value in the location i memory units beyond location y. The
instruction x[i]=y sets the contents of the location I units beyond x to the value of
y.
9. Address and pointer assignments of the form x = &y, x = *y, and * x = y. The
instruction x = &y sets the r-value of x to be the location (l-value) of y.
Presumably y is a name, perhaps a temporary, that denotes an expression with an
l-value such as A[i][j], and x is a pointer name or temporary. In the instruction x
= * y, presumably y is a pointer or a temporary whose r-value is a location. The
r-value of x is made equal to the contents of that location. Finally, * x = y sets the
r-value of the object pointed to by x to the r-value of y.
do i = i+1; while (a[i] < v);
Two possible translations of this statement are shown below 1 st uses a symbolic
label L, 2nd shows the position numbers for the instructions.
Quadruples
The description of three-address instructions specifies the components of each
type of instruction, but it does not specify the representation of these instructions
in a data structure. In a compiler, these instructions can be implemented as objects
or as records with fields for the operator and the operands. Three such
representations are called “quadruples," triples," and “indirect triples."
A quadruple (or just “quad") has four fields, which we call op, arg1, arg2, and
result. The op field contains an internal code for the operator. For instance, the
three-address instruction x = y + z is represented by placing + in op, y in arg1, z
in arg2, and x in result.
The following are some exceptions to this rule:
1. Instructions with unary operators like x = minus y or x = y do not use arg2.
Note that for a copy statement like x = y, op is =, while for most other operations,
the assignment operator is implied.
2. Operators like param use neither arg2 nor result.
3. Conditional and unconditional jumps put the target label in result.
Example : Three-address code for the assignment a = b* -c+b*-c;
The quadruples in Figure (b) implement the three-address code in (a).

Triples
A triple has only three fields, which we call op, arg1, and arg2. Note that the
result field in above fig.(b) is used primarily for temporary names. Using triples,
we refer to the result of an operation x op y by its position, rather than by an
explicit temporary name. Thus, instead of the temporary t1 in Fig.(b), a triple
representation would refer to position (0). Parenthesized numbers represent
pointers into the triple structure itself. Representations of a = b*-c +b*-c is shown
below.

A benefit of quadruples over triples can be seen in an optimizing compiler, where


instructions are often moved around. With quadruples, if we move an instruction
that computes a temporary t, then the instructions that use t require no change.
With triples, the result of an operation is referred to by its position, so moving an
instruction may require us to change all references to that result. This problem
does not occur with indirect triples.
Indirect triples consist of a listing of pointers to triples, rather than a listing of
triples themselves. For example, let us use an array instruction to list pointers to
triples in the desired order. The following figure shows the indirect triples

Static single-assignment form (SSA) is an intermediate representation that


facilitates certain code optimizations. Two distinctive aspects distinguish SSA
from three-address code. The first is that all assignments in SSA are to variables
with distinct names; hence the term static single-assigment.
For example, the source program
if ( flag ) x = -1; else x = 1;
y = x * a;
SSA uses a notational convention called the Փfunction to combine the two
definitions of x:
if ( flag ) x1 = -1; else x2 = 1;
x3 = Փ (x1; x2);
Here, -(x1; x2) has the value x1 if the control flow passes through the true part of
the conditional and the value x2 if the control flow passes through the false part.

Types and Declarations


The applications of types can be grouped under checking and translation:
Type checking uses logical rules to reason about the behavior of a program at
run time. Specifically, it ensures that the types of the operands match the type
expected by an operator. For example, the && operator in Java expects its two
operands to be booleans; the result is also of type boolean.
Translation Applications. From the type of a name, a compiler can determine
the storage that will be needed for that name at run time. Type information is also
needed to calculate the address denoted by an array reference, to insert explicit
type conversions, and to choose the right version of an arithmetic operator, among
other things.
Type Expressions
Types have structure, which we shall represent using type expressions: a type
expression is either a basic type or is formed by applying an operator called a type
constructor to a type expression. The sets of basic types and constructors depend
on the language to be checked.
We shall use the following definition of type expressions:
• A basic type is a type expression. Typical basic types for a language include
boolean, char, integer,
• float, and void;
• A type name is a type expression.
• A type expression can be formed by applying the array type constructor to
a number and a type expression.
• A record is a data structure with named fields. A type expression can be
formed by applying the record type constructor to the field names and their
types. Record types will be implemented in by applying the constructor
record to a symbol table containing entries for the fields.
• A type expression can be formed by using the type constructor ->for
function types. We write s->t for “function from type s to type t." Function
types will be useful when type checking.
• If s and t are type expressions, then their Cartesian product s x t is a type
expression.
• Type expressions may contain variables whose values are type
expressions.(compiler generated variables).
A convenient way to represent a type expression is to use a graph.
Type Equivalence
When are two type expressions equivalent? Many type-checking rules have
the form, “if two type expressions are equal then return a certain type else
error." Potential ambiguities arise when names are given to type expressions
and the names are then used in subsequent type expressions. The key issue is
whether a name in a type expression stands for itself or whether it is an
abbreviation for another type expression.
When type expressions are represented by graphs, two types are structurally
equivalent if and only if one of the following conditions is true:
They are the same basic type.
They are formed by applying the same constructor to structurally equivalent
types.
One is a type name that denotes the other.
If type names are treated as standing for themselves, then the first two
conditions in the above definition lead to name equivalence of type
expressions.
Declarations
By using a simplified grammar that declares just one name at a time types and
declarations can be studied; declarations with lists of names can be handled
as. The grammar is
Nonterminal D generates a sequence of declarations. Nonterminal T generates
basic, array, or record types. Nonterminal B generates one of the basic types
int and float. Nonterminal C, for “component," generates strings of zero or
more integers, each integer surrounded by brackets. An array type consists of
a basic type specified by B, followed by array components specified by
nonterminal C. A record type (the second production for T) is a sequence of
declarations for the fields of the record, all surrounded by curly braces.
Storage Layout for Local Names
From the type of a name, we can determine the amount of storage that will be
needed for the name at run time. At compile time, we can use these amounts
to assign each name a relative address. The type and relative address are saved
in the symbol-table entry for the name. Data of varying length, such as strings,
or data whose size cannot be determined until run time, such as dynamic
arrays, is handled by reserving a known fixed amount of storage for a pointer
to the data.
Suppose that storage comes in blocks of contiguous bytes, objects are stored
in consecutive bytes and given the address of the first byte.
The width of a type is the number of storage units needed for objects of that
type. The translation scheme (SDT) in Figure below computes types and their
widths for basic and array types;

Sequences of Declarations
Languages such as C and Java allow all the declarations in a single procedure
to be processed as a group. The declarations may be distributed within a Java
procedure, but they can still be processed when the procedure is analyzed.
Therefore, we can use a variable, say off
set, to keep track of the next available relative address.
The translation scheme of below figure deals with a sequence of declarations
of the form T id, where T generates a type. Before the first declaration is
considered, offset is set to 0. As each new name x is seen, x is entered into the
symbol table with its relative address set to the current value of off set, which
is then incremented by the width of the type of x.

Fields in Records and Classes


The translation of declarations in above figure carries over to fields in records
and classes. Record types can be added to the grammar by adding the
following production
T ->record ‘{‘ D ‘}’
The fields in this record type are specified by the sequence of declarations
generated by D. The approach of above figure can be used to determine the
types and relative addresses of fields, provided we are careful about two
things:
The field names within a record must be distinct; that is, a name may appear
at most once in the declarations generated by D.
The offset or relative address for a field name is relative to the data area for
that record.
Example : The use of a name x for a field within a record does not conflict
with other uses of the name outside the record. Thus, the three uses of x in the
following declarations are distinct and do not conflict with each other:
float x;
record { float x; float y; } p;
record { int tag; float x; float y; } q;
A subsequent assignment x = p.x + q.x; sets variable x to the sum of the fields
named x in the records p and q. Note that the relative address of x in p differs
from the relative address of x in q.
For convenience, record types will encode both the types and relative
addresses of their elds, using a symbol table for the record type. A record type
has the form record(t), where record is the type constructor, and t is a symbol
table object that holds information about the fields of this record type.

Translation of Expressions
The syntax-directed definition in below figure builds up the three-address
code for an assignment statement S using attribute code for S and attributes
addr and code for an expression E. Attributes S:code and E:code denote the
three-address code for S and E, respectively. Attribute E:addr denotes the
address that will hold the value of E. (recall an adress can be name, constant
or compiler generate temporary variable)

Figure. Three address code generation for expressions

Incremental Translation
Code attributes can be long strings, so they are usually generated
incrementally. Thus, instead of building up E:code as in above figure, it can
be arrange to generate only the new three-address instructions, as in the
translation scheme of shown in below figure. In the incremental approach,
gen not only constructs a three-address instruction, it appends the instruction
to the sequence of instructions generated so far. The sequence may either be
retained in memory for further processing, or it may be output incrementally.
The translation scheme in below figure generates the same code as the
syntax directed definition in above figure. With the incremental approach, the
code attribute is not used, since there is a single sequence of instructions that
is created by successive calls to gen.
For example, the semantic rule for E ->E1 +E2 in below figure simply calls
gen to generate an add instruction; the instructions to compute E1 into E1:addr
and E2 into E2:addr have already been generated.
The new semantic action for E ->E1 +E2creates a node by using a constructor,
as in E ->E1 +E2 { E:addr = new Node(0+0; E1:addr; E2:addr); }
Addressing Array Elements
Array elements can be accessed quickly if they are stored in a block of consecu
tive locations. In C and Java, array elements are numbered 0; 1,… n -1, for an
array with n elements. If the width(w) of each array element is w, then the ith
element of array A begins in location
base + i x w
where base is the relative address of the storage allocated for the array. That
is, base is the relative address of A[0].
The relative address of A[i1][i2] can then be calculated by the formula
Translation of Array References

Nonterminal L has three synthesized attributes:


L:addr denotes a temporary that is used while computing the o
set for the array reference by summing the terms ij x wj.
L:array is a pointer to the symbol-table entry for the array name.
L:type is the type of the subarray generated by L. For any type t,assume that
its width is given by t:width.

You might also like