0% found this document useful (0 votes)
35 views22 pages

Poc Unit 3

The document discusses intermediate code generation in compilers. It describes different intermediate representations like graphical representations using syntax trees and DAGs, postfix notation, and three-address code. It provides examples and advantages of each approach.

Uploaded by

Coding Point
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views22 pages

Poc Unit 3

The document discusses intermediate code generation in compilers. It describes different intermediate representations like graphical representations using syntax trees and DAGs, postfix notation, and three-address code. It provides examples and advantages of each approach.

Uploaded by

Coding Point
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Intermediate Code Genera on (ICG): Intermediate languages

– Graphical representa ons, Three-Address code,


Quadruples, Triples. Assignment statements, Boolean
expressions.

11
5.2 INTERMEDIATE CODE GENERATION (ICG)
In compiler, the front-end translates a source program into an intermediate representa on from which
the back end generates target code.

Need For ICG


1. If a compiler translates the source language to its target machine language without genera ng
IC, then for each new machine, a full na ve compiler is required.

2. IC eliminates the need of a new full compiler for every machine by keeping the analysis por on
for all the compilers.

3. Synthesis part of back end depends on the target machine.

2 important things:

 IC Genera on process should not be very complex

 It shouldn’t be difficult to produce the target program from the intermediate code.

A source program can be translated directly into the target language, but some benefits of using
intermediate form are:

 Retarge ng is facilitated: a compiler for a different machine can be created by a aching a Back-
end (which generate Target Code) for the new machine to an exis ng Front-end (which
generate Intermediate Code).

 A machine Independent Code-Op mizer can be applied to the Intermediate Representa on.

12
Logical Structure of a Compiler Front End

5.2.1 INTERMEDIATE LANGUAGES


The most commonly used intermediate representa ons were:-

 Syntax Tree
 DAG (Direct Acyclic Graph)
 Pos ix Nota on
 3 Address Code

5.2.1.1 GRAPHICAL REPRESENTATION


Includes both

 Syntax Tree
 DAG (Direct Acyclic Graph)

Syntax Tree Or Abstract Syntax Tree (AST)


Graphical Intermediate Representa on

Syntax Tree depicts the hierarchical structure of a source program.

Syntax tree (AST) is a condensed form of parse tree useful for represen ng language
constructs.

EXAMPLE

Parse tree and syntax tree for 3 * 5 + 4 as follows.

Grammar Parse Tree Syntax Tree

13

Parse Tree VS Syntax Tree

Parse Tree Syntax Tree

A parse tree is a graphical representa on of a A syntax tree (AST) is a condensed form of parse
replacement process in a deriva on tree

Each interior node represents a grammar rule Each interior node represents an operator

Each leaf node represents a terminal Each leaf node represents an operand

Parse tree represent every detail from the real Syntax tree does not represent every detail
syntax from the real syntax Eg : No parenthesis

Syntax tree for a * (b + c) /d

14
Construc ng Syntax Tree For Expression
Each node in a syntax tree can be implemented in arecord with several fields.

In the node of an operator, one field contains operator and remaining field contains pointer to
the nodes for the operands.

When used for transla on, the nodes in a syntax tree may contain addi on of fields to hold
the values of a ributes a ached to the node.

Following func ons are used to create syntax tree

1. mknode(op,le ,right): creates an operator node with label op and two fields
containing pointers to le and right.
2. mkleaf(id,entry): creates an iden fier node with label id and a field containing
entry, a pointer to the symbol table entry for iden fier
3. mkleaf(num,val): creates a number node with label num and a field containing
val, the value of the number.

Such func ons return a pointer to a newly created node.

EXAMPLE

a–4+c
The tree is constructed
bo om up

P1 = mkleaf(id,entry a)
P2 = mkleaf(num, 4)
P3 = mknode(-, P1, P2)
P4 = mkleaf(id,entry c)
P5 = mknode(+, P3, P4)
Syntax Tree

15
Syntax directed defini on
Syntax trees for assignment statements are produced by the syntax-directed defini on.

Non terminal S generates an assignment statement.

The two binary operators + and * are examples of the full operator set in a typical language.
Operator associates and precedences are the usual ones, even though they have not been put
into the grammar. This defini on constructs the tree from the input a:=b* -c + b* -c

The token id has an a ribute place that points to the symbol-table entry for the iden fier.

A symbol-table entry can be found from an a ribute id.name, represen ng the lexeme
associated with that occurrence of id.

If the lexical analyser holds all lexemes in a single array of characters, then a ribute name
might be the index of the first character of the lexeme.

Two representa ons of the syntax tree are as follows.

16
In (a), each node is represented as a record with a field for its operator and addi onal fields
for pointers to its children.

In Fig (b), nodes are allocated from an array of records and the index or posi on of the node
serves as the pointer to the node.

All the nodes in the syntax tree can be visited by following winters, star ng from the root at
posi on 10.

Direct Acyclic Graph (DAG)


Graphical Intermediate Representa on

Dag also gives the hierarchical structure of source program but in a more compact way
because common sub expressions are iden fied.

EXAMPLE a=b*-c +

b*-c

17
Pos ix Nota on
Linearized representa on of syntax tree

In pos ix nota on, each operator appears immediately a er its last operand.

Operators can be evaluated in the order in which they appear in the string

EXAMPLE

Source String : a := b * -c + b * -c

Pos ix String: a b c uminus * b c uminus * + assign

Pos ix Rules

1. If E is a variable or constant, then the pos ix nota on for E is E itself.

2. If E is an expression of the form E1 op E2 then pos ix nota on for E is E1’ E2’ op, here E1’ and
E2’ are the pos ix nota ons for E1and E2, respec vely

3. If E is an expression of the form (E), then the pos ix nota on for E is the same as the pos ix
nota on for E.

4. For unary opera on –E the pos ix is E- Ex: pos ix nota on for 9- (5+2) is 952+-

Pos ix nota on of an infix expression can be obtained using stack

5.2.1.2 THREE-ADDRESS CODE


In Three address statement, at most 3 addresses are used to represent any statement.

The reason for the term “three address code” is that each statement contains 3 addresses at
most. Two for the operands and one for the result.

General Form Of 3 Address Code

18
a = b op c
where,

a, b, c are the operands that can be names, constants or


compiler generated temporaries.

op represents operator, such as fixed or floa ng point arithme c


operator or a logical operator on Boolean valued data. Thus a
source language expression like x + y * z might be translated into
a sequence t1 := y*z

t2 := x+t1 where, t1 and t2 are compiler generated temporary


names.

Advantages Of Three Address Code


 The unraveling of complicated arithme c expressions and of nested flow-of-control
statements makes three-address code desirable for target code genera on and op miza on.
 The use of names for the intermediate values computed by a program allows three- address
code to be easily rearranged - unlike pos ix nota on.

Three-address code is a linearized representa on of a syntax tree or a DAG in which explicit names
correspond to the interior nodes of the graph.

Three Address Code corresponding to the syntax tree and DAG given above (page no: )

Types of Three-Address Statements

1. Assignment statements

x := y op z, where op is a binary arithme c or logical opera on.

2. Assignment instruc ons

x : = op y, where op is a unary opera on . Essen al unary opera ons include unary minus,
logical nega on, shi operators, and conversion operators that for example, convert a fixed-
point number to a floa ng-point number.

19
3. Copy statements

x : = y where the value of y is assigned to x.

4. Uncondi onal jump

goto L The three-address statement with label L is the next to be executed

5. Condi onal jump

if x relop y goto L This instruc on applies a rela onal operator ( <, =, =, etc,) to x and y, and
executes the statement with label L next if x stands in rela on relop to y. If not, the three-
address statement following if x relop y goto L is executed next, as in the usual sequence.

6. Procedural call and return

param x and call p, n for procedure calls and return y, where y represen ng a returned
value is op onal. Their typical use is as the sequence of three-address statements

param x1 param
x2
……….
param xn call
p,n
generated as part of the call procedure p( xl , x2, . . . , xn ) . The integer n indica ng the number
of actual-parameters in ''call p , n" is not redundant because calls can be nested.

7. Indexed Assignments

Indexed assignments of the form x = y[i] or x[i] = y

8. Address and pointer assignments

Address and pointer operator of the form x := &y, x := *y and *x := y

Syntax-Directed Transla on Into Three-Address Code


When three-address code is generated, temporary names are made up for the interior nodes
of a syntax tree. for example id : = E consists of code to evaluate E into some temporary t,
followed by the assignment id.place : = t.
Given input a:= b * - c + b + - c, it produces the three address code in given above (page no: )
The synthesized a ribute S.code represents the three address code for the assignment S. The
nonterminal E has two a ributes:

1. E.place the name that will hold the value of E, and


2. E.code. the sequence of three-address statements evalua ng E.

Syntax-directed defini on to produce three-address code for assignments.

20
Seman c rule genera ng code for a while statement

The func on newtemp returns a sequence of dis nct names t1, t2,……… in respose of
successive calls. Nota on gen(x ‘:= ‘y ‘+’ z is used to represent the three address statement x
:= y + z.

Expressions appearing instead of variables like x, y and z are evaluated when passed to gen,
and quoted operators or operand, like ‘+’ are taken literally.

Flow of control statements can be added to the language of assignments. The code for S
while E do S1 is generated using new a ributes S.begin and S.a er to mark the first
statement in the code for E and the statement following the code for S, respec vely.

The func on newlabel returns a new label every me is called. We assume that a nonzero
expression represents true; that is when the value of E becomes zero, control laves the while
statement

21
Implementa on Of Three-Address Statements
A three address statement is an abstract form of intermediate code. In a compiler, these statements
can be implemented as records with fields for the operator and the operands. Three such,
representa ons are

 Quadruples
 Triples
 Indirect triples

5.2.1.3 QUADRUPLES
A quadruple is a record structure with four fields, which are op, ag1, arg2 and result

The op field contains an internal code for the operator. The three address statement x:= y op z
is represented by placing y in arg1, z in arg2 and x in result.

The contents of arg1, arg2, and result are normally pointers to the symbol table entries for
the names represented by these fields. If so temporary names must be entered into the
symbol table as they are created.

EXAMPLE 1

Translate the following expression to quadruple triple and indirect triple a + b * c

|e^f+b*a

For the first construct the three address code for the expression

t1 = e ^ f t2
= b * c t3 =
t2 / t1 t4 =
b * a t5 = a
+ t3 t6 = t5
+ t4

Loca on OP arg1 arg2 Result


(0) ^ e f t1
(1) * b c t2
(2) / t2 t1 t3
(3) * b a t4
(4) + a t3 t5
(5) + t3 t4 t6
Excep ons

 The statement x := op y, where op is a unary operator is represented by placing op in the


operator field, y in the argument field & n in the result field. The arg2 is not used

22
 A statement like param t1 is represented by placing param in the operator field and t1 in the
arg1 field. Neither arg2 not result field is used

 Uncondi onal & Condi onal jump statements are represented by placing the target in the
result field.

5.2.1.4 TRIPLES
In triples representa on, the use of temporary variables is avoided & instead reference to
instruc ons are made

So three address statements can be represented by records with only there fields OP, arg1 &
arg2.

Since, there fields are used this intermediated code formal is known as triples

Advantages
 No need to use temporary variable which saves memory as well as me

Disadvantages
 Triple representa on is difficult to use for op mizing compilers  Because for
op miza on statements need to be suffled.
 for e.g. statement 1 can be come down or statement 2 can go up ect.
 So the reference we used in their representa on will change.

EXAMPLE 1

a+b*c|e^f+b*a

t1 = e ^ f t2
= b * c t3 =
t2 / t1 t4 =
b * a t5 = a
+ t3 t6 = t5
+ t4

Loca on OP arg1 arg2


(0) ^ e f
(1) * b c
(2) / (1) (0)
(3) * b a
(4) + a (2)

23
(5) + (4) (3)

EXAMPLE 2

A ternary opera on like x[i] : = y requires two entries in the triple structure while x : = y[i] is naturally
represented as two opera ons.

x[i] := y x := y[i]

INDIRECT TRIPLES
This representa on is an enhancement over triple representa on.

It uses an addi onal instruc on array to led the pointer to the triples in the desired order.

Since, it uses pointers instead of posi on to stage reposi on the expression to produce an
op mized code.

EXAMPLE 1

Statement Loca on op arg1 arg2


35 (0) (0) ^ E f
36 (1) (1) * B c
37 (2) (2) / (1) (0)
38 (3) (3) * B a
39 (4) (4) + A (2)
40 (5) (5) + (4) (3)

Comparison
When we ul mately produce the target code each temporary and programmer defined name
will assign run me memory loca on

This loca on will be entered into symbol table entry of that data.

24
Using the quadruple nota on, a three address statement containing a temporary can
immediately access the loca on for that temporary via symbol table.

But this is not possible with triples nota on.

With quadruple nota on, statements can o en move around which makes op miza on
easier.

This is achieved because using quadruple nota on the symbol table interposes high degree of
indirec on between computa on of a value and its use.

With quadruple nota on, if we move a statement compu ng x, the statement using x requires
no change.

But with triples, moving a statement that defines a temporary value requires us to change all
references to that statement in arg1 and arg2 arrays. This makes triples difficult to use in
op mizing compiler

With indirect triples also, there is no such problem.

A statement can be moved by reordering the statement list.

Space U liza on
Quadruples and indirect triples requires same amount of space for storage (normal case).

But if same temporary value is used more than once indirect triples can save some space. This
is bcz, 2 or more entries in statement array can point to the same line of op-arg1-arg2
structure.

Triples requires less space for storage compared to above 2.

Quadruples
 direct access of the loca on for temporaries
 easier for op miza on Triples
 space efficiency

Indirect Triples
 easier for op miza on
 space efficiency

PROBLEM 1

Translate the following expression to quadruple tuples & indirect tuples

a=b*-c+b*-c
Sol : - Three address code for given expression is

TAC t1 =
uniminus c t2 =

25
b* t1 t3 =
uniminus c t4 =
b* t3 t5 = t2 + t4
Q = t5
QUADRUPLES

Loca on OP arg1 arg2 result


(0) uniminus c t1
(1) * b t1 t2
(3) uniminus c t3
(4) * b t3 t4
(5) + t2 t4 t5
(6) = t5 a

TRIPLES

Loca on OP arg1 arg2


(1) uniminus c
(2) * b (1)
(3) uniminus c
(4) * b (3)
(5) + (2) (4)
(6) = a (5)

INDIRECT TRIPLES

Statements Loca on OP arg1 arg2


35 (1) (1) uniminus C
36 (2) (2) * B (1)
37 (3) (3) uniminus C
38 (4) (4) * B (3)
39 (5) (5) + (2) (4)
40 (6) (6) = A (5)

5.2.1.5 ASSIGNMENT STATEMENTS


Transla on Scheme (SDT) To Produce Three-Address Code
For Assignments

26
Produc on Seman c ac on

S->id : = E { p : = lookup ( id.name);

if p ≠ nil then emit( p ‘

: =’ E.place) else error

E->E1 + E2 { E.place : = newtemp;

emit( E.place ‘: =’ E1.place ‘ + ‘ E2.place ) }

E->E1 * E2 { E.place : = newtemp;

emit( E.place ‘: =’ E1.place ‘ * ‘ E2.place ) }

E->-E1 { E.place : = newtemp;

emit ( E.place ‘: =’ ‘uminus’ E1.place ) }

E-> ( E1) { E.place : = E1.place }

E->id { p : = lookup ( id.name);

if p ≠ nil then E.place : =

else error }

emite  generate the three address code to the output file.

newtemp  return a new temporary variable.

lookup iden fier  check if the id is in symbol table


EXAMPLE : Annotated Parse Tree For Genera on Of TAC For Assignment Statements

27
Syntax-directed defini on to produce three-address code for assignments.

5.2.1.6 BOOLEAN EXPRESSIONS


Boolean expressions have two primary purposes.

28
 They are used to compute logical values.

 But more o en they are used as condi onal expressions in statements that alter the
flow of control, such as if-then-else, or while-do statements.

Boolean expressions are composed of the Boolean operators (and, or, and not) applied to
elements that are Boolean variables or rela onal expressions.

Rela onal expressions are of the form E1 relop E2, where E1 and E2 are arithme c expressions
and relop can be <, <=, =!, =, > or >=

Here we consider Boolean expressions generated by the following grammar :

E->E or E | E and E | note | ( E ) |id relop id | true | false

Methods Of Transla ng Boolean Expressions


There are two principal methods of represen ng the value of a boolean expression. They are :

Numerical Representa on - To encode true and false numerically and to evaluate a Boolean
expression analogously to an arithme c expression. O en, 1 is used to denote true and 0 to
denote false.

Jumping Method (Short-circuit Method) - To implement Boolean expressions by flow of


control, that is, represen ng the value of a Boolean expression by a posi on reached in a
program. This method is par cularly convenient in implemen ng the Boolean expressions in
flow-of-control statements, such as the if-then and while-do statements.

Method 1: Numerical Representa on


Here, 1 denotes true and 0 denotes false. Expressions will be evaluated completely from le to
right, in a manner similar to arithme c expressions.

EXAMPLE

The transla on for a or b and not c will result following three-address sequence

t1 : = not c t2 :

= b and t1 t3 :

= a or t2

Transla on Scheme Using A Numerical Representa on For Boolean Expression

29
where the func on emit( ) output the three address statement into the output file and nextstat(
) gives the index of the next three address statement in the output sequence and emit
increments nextstat a er producing each three address statement.

A rela onal expression such as a < b is equivalent to the condi onal statement if a < b then 1

else 0 which can be translated into the three-address code sequence (let statement numbers

start at 100)

100 if a < b goto 103

101 t : = 0

102 goto 104

103 t : = 1

104

Method 2: Jumping or Short-Circuit Code


We can also translate a boolean expression into three-address code without genera ng code
for any of the boolean operators and without having the code necessarily evaluate the en re
expression. This style of evalua on is some mes called “short- circuit” or “jumping” code.

This is normally used for flow-of-control statements, such as the if-then, if-then-else and while-
do statements those generated by the following grammar:

30
S → if E then S1

| if E then S1 else S2

| while E do S1

Code for if-then, if-then-else and while-do is given below:

Consider the grammar

S → if E then S1

| if E then S1 else S2

| while E do S1

In each of these produc ons, E is the Boolean expression to be translated. In the transla on,
we assume that a three-address statement can be symbolically labeled, and that the func on
newlabel returns a new symbolic label each me it is called.

With each E we associate two labels E.true and E.false. E.true is the label to which control flows
if E is true, and E.false is the label to which control flows if E is false.

The inherited a ribute S.next is a label that is a ached to the first three-address instruc on to
be executed a er the code for S and another inherited a ribute S.begin is the first instruc on
of S

31
Syntax Directed Defini on for flow –of –control statements

S→if E then S1 { E.true := newlabel;

E.false := S.next;

S1.next := S.next;

S.code := E.code || gen (E.true ‘:’) || S1.code }

S→if E then S1 else S2 { E.true := newlabel;

E.false := newlabel;

S1.next := S.next; S2.next

:= S.next;

S.code := E.code || gen (E.true ‘:’) || S1.code


||gen(‘goto’ S.next) || gen(E.false ‘:’)|| S2.code }

S→while E do S1 { S.begin := newlable;

E.true := newlabel; E.false :=

S.next;

S1.next := S.begin;

S.code := gen (S.begin’:’) || E.code || gen (E.true ‘:’ )


||S1.code || gen (‘goto’ S.begin) }

**********

32

You might also like