Poc Unit 3
Poc Unit 3
11
5.2 INTERMEDIATE CODE GENERATION (ICG)
In compiler, the front-end translates a source program into an intermediate representa on from which
the back end generates target code.
2. IC eliminates the need of a new full compiler for every machine by keeping the analysis por on
for all the compilers.
2 important things:
It shouldn’t be difficult to produce the target program from the intermediate code.
A source program can be translated directly into the target language, but some benefits of using
intermediate form are:
Retarge ng is facilitated: a compiler for a different machine can be created by a aching a Back-
end (which generate Target Code) for the new machine to an exis ng Front-end (which
generate Intermediate Code).
A machine Independent Code-Op mizer can be applied to the Intermediate Representa on.
12
Logical Structure of a Compiler Front End
Syntax Tree
DAG (Direct Acyclic Graph)
Pos ix Nota on
3 Address Code
Syntax Tree
DAG (Direct Acyclic Graph)
Syntax tree (AST) is a condensed form of parse tree useful for represen ng language
constructs.
EXAMPLE
13
A parse tree is a graphical representa on of a A syntax tree (AST) is a condensed form of parse
replacement process in a deriva on tree
Each interior node represents a grammar rule Each interior node represents an operator
Each leaf node represents a terminal Each leaf node represents an operand
Parse tree represent every detail from the real Syntax tree does not represent every detail
syntax from the real syntax Eg : No parenthesis
14
Construc ng Syntax Tree For Expression
Each node in a syntax tree can be implemented in arecord with several fields.
In the node of an operator, one field contains operator and remaining field contains pointer to
the nodes for the operands.
When used for transla on, the nodes in a syntax tree may contain addi on of fields to hold
the values of a ributes a ached to the node.
1. mknode(op,le ,right): creates an operator node with label op and two fields
containing pointers to le and right.
2. mkleaf(id,entry): creates an iden fier node with label id and a field containing
entry, a pointer to the symbol table entry for iden fier
3. mkleaf(num,val): creates a number node with label num and a field containing
val, the value of the number.
EXAMPLE
a–4+c
The tree is constructed
bo om up
P1 = mkleaf(id,entry a)
P2 = mkleaf(num, 4)
P3 = mknode(-, P1, P2)
P4 = mkleaf(id,entry c)
P5 = mknode(+, P3, P4)
Syntax Tree
15
Syntax directed defini on
Syntax trees for assignment statements are produced by the syntax-directed defini on.
The two binary operators + and * are examples of the full operator set in a typical language.
Operator associates and precedences are the usual ones, even though they have not been put
into the grammar. This defini on constructs the tree from the input a:=b* -c + b* -c
The token id has an a ribute place that points to the symbol-table entry for the iden fier.
A symbol-table entry can be found from an a ribute id.name, represen ng the lexeme
associated with that occurrence of id.
If the lexical analyser holds all lexemes in a single array of characters, then a ribute name
might be the index of the first character of the lexeme.
16
In (a), each node is represented as a record with a field for its operator and addi onal fields
for pointers to its children.
In Fig (b), nodes are allocated from an array of records and the index or posi on of the node
serves as the pointer to the node.
All the nodes in the syntax tree can be visited by following winters, star ng from the root at
posi on 10.
Dag also gives the hierarchical structure of source program but in a more compact way
because common sub expressions are iden fied.
EXAMPLE a=b*-c +
b*-c
17
Pos ix Nota on
Linearized representa on of syntax tree
In pos ix nota on, each operator appears immediately a er its last operand.
Operators can be evaluated in the order in which they appear in the string
EXAMPLE
Source String : a := b * -c + b * -c
Pos ix Rules
2. If E is an expression of the form E1 op E2 then pos ix nota on for E is E1’ E2’ op, here E1’ and
E2’ are the pos ix nota ons for E1and E2, respec vely
3. If E is an expression of the form (E), then the pos ix nota on for E is the same as the pos ix
nota on for E.
4. For unary opera on –E the pos ix is E- Ex: pos ix nota on for 9- (5+2) is 952+-
The reason for the term “three address code” is that each statement contains 3 addresses at
most. Two for the operands and one for the result.
18
a = b op c
where,
Three-address code is a linearized representa on of a syntax tree or a DAG in which explicit names
correspond to the interior nodes of the graph.
Three Address Code corresponding to the syntax tree and DAG given above (page no: )
1. Assignment statements
x : = op y, where op is a unary opera on . Essen al unary opera ons include unary minus,
logical nega on, shi operators, and conversion operators that for example, convert a fixed-
point number to a floa ng-point number.
19
3. Copy statements
if x relop y goto L This instruc on applies a rela onal operator ( <, =, =, etc,) to x and y, and
executes the statement with label L next if x stands in rela on relop to y. If not, the three-
address statement following if x relop y goto L is executed next, as in the usual sequence.
param x and call p, n for procedure calls and return y, where y represen ng a returned
value is op onal. Their typical use is as the sequence of three-address statements
param x1 param
x2
……….
param xn call
p,n
generated as part of the call procedure p( xl , x2, . . . , xn ) . The integer n indica ng the number
of actual-parameters in ''call p , n" is not redundant because calls can be nested.
7. Indexed Assignments
20
Seman c rule genera ng code for a while statement
The func on newtemp returns a sequence of dis nct names t1, t2,……… in respose of
successive calls. Nota on gen(x ‘:= ‘y ‘+’ z is used to represent the three address statement x
:= y + z.
Expressions appearing instead of variables like x, y and z are evaluated when passed to gen,
and quoted operators or operand, like ‘+’ are taken literally.
Flow of control statements can be added to the language of assignments. The code for S
while E do S1 is generated using new a ributes S.begin and S.a er to mark the first
statement in the code for E and the statement following the code for S, respec vely.
The func on newlabel returns a new label every me is called. We assume that a nonzero
expression represents true; that is when the value of E becomes zero, control laves the while
statement
21
Implementa on Of Three-Address Statements
A three address statement is an abstract form of intermediate code. In a compiler, these statements
can be implemented as records with fields for the operator and the operands. Three such,
representa ons are
Quadruples
Triples
Indirect triples
5.2.1.3 QUADRUPLES
A quadruple is a record structure with four fields, which are op, ag1, arg2 and result
The op field contains an internal code for the operator. The three address statement x:= y op z
is represented by placing y in arg1, z in arg2 and x in result.
The contents of arg1, arg2, and result are normally pointers to the symbol table entries for
the names represented by these fields. If so temporary names must be entered into the
symbol table as they are created.
EXAMPLE 1
|e^f+b*a
For the first construct the three address code for the expression
t1 = e ^ f t2
= b * c t3 =
t2 / t1 t4 =
b * a t5 = a
+ t3 t6 = t5
+ t4
22
A statement like param t1 is represented by placing param in the operator field and t1 in the
arg1 field. Neither arg2 not result field is used
Uncondi onal & Condi onal jump statements are represented by placing the target in the
result field.
5.2.1.4 TRIPLES
In triples representa on, the use of temporary variables is avoided & instead reference to
instruc ons are made
So three address statements can be represented by records with only there fields OP, arg1 &
arg2.
Since, there fields are used this intermediated code formal is known as triples
Advantages
No need to use temporary variable which saves memory as well as me
Disadvantages
Triple representa on is difficult to use for op mizing compilers Because for
op miza on statements need to be suffled.
for e.g. statement 1 can be come down or statement 2 can go up ect.
So the reference we used in their representa on will change.
EXAMPLE 1
a+b*c|e^f+b*a
t1 = e ^ f t2
= b * c t3 =
t2 / t1 t4 =
b * a t5 = a
+ t3 t6 = t5
+ t4
23
(5) + (4) (3)
EXAMPLE 2
A ternary opera on like x[i] : = y requires two entries in the triple structure while x : = y[i] is naturally
represented as two opera ons.
x[i] := y x := y[i]
INDIRECT TRIPLES
This representa on is an enhancement over triple representa on.
It uses an addi onal instruc on array to led the pointer to the triples in the desired order.
Since, it uses pointers instead of posi on to stage reposi on the expression to produce an
op mized code.
EXAMPLE 1
Comparison
When we ul mately produce the target code each temporary and programmer defined name
will assign run me memory loca on
This loca on will be entered into symbol table entry of that data.
24
Using the quadruple nota on, a three address statement containing a temporary can
immediately access the loca on for that temporary via symbol table.
With quadruple nota on, statements can o en move around which makes op miza on
easier.
This is achieved because using quadruple nota on the symbol table interposes high degree of
indirec on between computa on of a value and its use.
With quadruple nota on, if we move a statement compu ng x, the statement using x requires
no change.
But with triples, moving a statement that defines a temporary value requires us to change all
references to that statement in arg1 and arg2 arrays. This makes triples difficult to use in
op mizing compiler
Space U liza on
Quadruples and indirect triples requires same amount of space for storage (normal case).
But if same temporary value is used more than once indirect triples can save some space. This
is bcz, 2 or more entries in statement array can point to the same line of op-arg1-arg2
structure.
Quadruples
direct access of the loca on for temporaries
easier for op miza on Triples
space efficiency
Indirect Triples
easier for op miza on
space efficiency
PROBLEM 1
a=b*-c+b*-c
Sol : - Three address code for given expression is
TAC t1 =
uniminus c t2 =
25
b* t1 t3 =
uniminus c t4 =
b* t3 t5 = t2 + t4
Q = t5
QUADRUPLES
TRIPLES
INDIRECT TRIPLES
26
Produc on Seman c ac on
else error }
27
Syntax-directed defini on to produce three-address code for assignments.
28
They are used to compute logical values.
But more o en they are used as condi onal expressions in statements that alter the
flow of control, such as if-then-else, or while-do statements.
Boolean expressions are composed of the Boolean operators (and, or, and not) applied to
elements that are Boolean variables or rela onal expressions.
Rela onal expressions are of the form E1 relop E2, where E1 and E2 are arithme c expressions
and relop can be <, <=, =!, =, > or >=
Numerical Representa on - To encode true and false numerically and to evaluate a Boolean
expression analogously to an arithme c expression. O en, 1 is used to denote true and 0 to
denote false.
EXAMPLE
The transla on for a or b and not c will result following three-address sequence
t1 : = not c t2 :
= b and t1 t3 :
= a or t2
29
where the func on emit( ) output the three address statement into the output file and nextstat(
) gives the index of the next three address statement in the output sequence and emit
increments nextstat a er producing each three address statement.
A rela onal expression such as a < b is equivalent to the condi onal statement if a < b then 1
else 0 which can be translated into the three-address code sequence (let statement numbers
start at 100)
101 t : = 0
103 t : = 1
104
This is normally used for flow-of-control statements, such as the if-then, if-then-else and while-
do statements those generated by the following grammar:
30
S → if E then S1
| if E then S1 else S2
| while E do S1
S → if E then S1
| if E then S1 else S2
| while E do S1
In each of these produc ons, E is the Boolean expression to be translated. In the transla on,
we assume that a three-address statement can be symbolically labeled, and that the func on
newlabel returns a new symbolic label each me it is called.
With each E we associate two labels E.true and E.false. E.true is the label to which control flows
if E is true, and E.false is the label to which control flows if E is false.
The inherited a ribute S.next is a label that is a ached to the first three-address instruc on to
be executed a er the code for S and another inherited a ribute S.begin is the first instruc on
of S
31
Syntax Directed Defini on for flow –of –control statements
E.false := S.next;
S1.next := S.next;
E.false := newlabel;
:= S.next;
S.next;
S1.next := S.begin;
**********
32