Chapter 6 - Intermediate Languages

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Compiler Design

Chapter 6: Intermediate Languages

Instructor: Fikru T. (MSc.)


Email: fikrutafesse08@gmail.com

1
Introduction
• The task of compiler is to convert the source program into machine program.
• This activity is done directly ,but it is not always possible to generate such a
machine code directly in one pass.
• Then, typically compilers generate as easy to represent form of source language
called intermediate language.
• The generation of an intermediate language leads to efficient code generation.
• Intermediate code is used to translate the source code into the machine code.
• Intermediate code lies between the high-level language and the machine language.
Lexical Intermediate Code
Parser Static Intermediate
Analyzer Rest of compiler
Checker Code Generator
Figure : Position of intermediate code generator 2
Cont'd ...
• If the compiler directly translates source code into the machine code without
generating intermediate code then a full native compiler is required for each new
machine.
• The intermediate code keeps the analysis portion same for all the compilers that's why
it doesn't need a full compiler for every unique machine.
• Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase.
• It takes input in the form of an
• Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.

3
Cont'd ...
Benefits of Intermediate Code Generation
• There are certain benefits of generating machine independent intermediate code.
1. A compiler for different machines can be created by attaching different back
end to existing front end of each machine.
2. A compiler for different source language (on the same machine) can be created
by proving different front ends for corresponding source languages to existing
back end.
3. A machine independent code optimizer can be applied to intermediate code in
order to optimize code generation.

4
Intermediate Languages
• There are three types of intermediate code representations.
1. Syntax Tree
2. Posix Notation
3. Three address code
• These three representations are used to represent the intermediate languages.
1. Syntax Tree
• The natural hierarchical structure is represented by syntax trees.
• Directed Acyclic Graph(DAG) is very much similar to syntax trees but they are in
more compact form.
• The code being generated as intermediate should be such that the remaining processing
of the subsequent phases should be easy. 5
Cont'd ...

• Example: consider the input string x := -a * b + -a * b for the syntax tree and DAG
representation. := :=
x + x +
• Solution:
* *
b *
Uminus b
Uminus Uminus b
a a
a
Parsee Tree DAG

2. Posix Notation
• The Posix notation is using postfix representation .
• Consider that the input expression is x := - a * b + - a * b then the required posix form
is: x a – b * a – b * + := 6
Cont'd ...
• Basically, the linearization of syntax tree is posix notation.
• In this representation the operator can be easily associated with the corresponding
operands.
• This is the most natural way of representation in expressions evaluation.
3. Three address code (TAC)
• It is used by an optimizing compilers.
• In three address code, the given expression is broken down into several separate
instructions.
• These instructions can be easily translate into Assembly Language.
• Each three address code instruction has at most three operands.
• It is a combination of assignment and a binary operator.
7
Cont'd ...
• In TAC, there is at most one operator on the right side of an instruction.
• Example1: x+y*z
t1=y*z
• t1 and t2 are compiler generated temporary names.
t2=x+t1
• Example2: x= a+a*(b-c)+d+(b-c)
t1 = b-c
t2 = a* t1
t3 = a+ t2
t4 = d+ t1
t5 = t3 + t4
x=t5
8
Cont'd ...
• Three address code can be classified into two: Quadruples and Triples.
1. Quadruples: has four fields such as operator, source1, source2, result(destination).
• Example: a := -b * c + d
TAC: t1 = -b
t2 = c + d
t3 = t1 * t2
a := t3
2. Triples : has three fields such as operator, source1 and source2.
Example: a := -b * c + d
TAC: t1 = -b
t2 = c + d
t3 = t1 * t2
a := t3 9
Procedure Calls
– Procedure is an important and frequently used programming construct for a compiler.
– It is used to generate good code for procedure calls and returns.
– Calling Sequence
– The translation for a call includes a sequence of actions taken on entry and exit from
each procedures.
– Following actions take place in a calling sequence
1. When a procedure call occurs then space is allocated for activation record.
2. Evaluate the argument of called procedure.
3. Establish the environment pointers to enable the called procedure to access data in
enclosing blocks.
4. Save the state of the calling procedure so that it can resume execution after the call.10
Procedure Calls Cont.….
5. Also save the return address
– It is the address of location to which the called routine must transfer after it is finished.

6. Finally generate a jump to the beginning of the code for a called procedure.
– Semantic action of Procedures P(x1, x2,….,Xn)

S→call id(Elist) For each item P on QUEUE do Three address code:

GEN (param p) param x1


GEN (call id. PLACE) param x2

Elist→Elist,E append E.PLACE to the end of QUEUE ...


param xn
Elist→E initialize QUEUE to contain on E.PLACE
call P, n
– Queue is used to store the list of parameters in the procedure call. 11
Declarations
• When we encounter declarations, we need to lay out storage for the declared variables.
• For every local name in a procedure, we create a ST(Symbol Table) entry containing:
1. The type of the name D → integer, id
2. How much storage the name requires D → real, id
• A suitable transition scheme for declarations would be: D → D1, id

• ENTER is used to make the entry into


symbol table.
• ATTR is used to trace the data type.

12
Flow Control Statements
Switch Case statements
– Switch statements in C tests the value of a variable and compares it with multiple cases.
– Once the case match is found, a block of statements associated with that particular case
is executed.
– Each case in a block of a switch has a different name/ number which is referred to as an
identifier.
– The value provided by the user is compared with all the cases inside the switch block
until the match is found.
– If a case match not found, then the default statement is executed, and the control goes
out of the switch block.
13
Flow Control Statements Cont.…
switch case Three Address Code:
Code to evaluate E into T:
switch E goto Test: TEST: if T = V1 goto L1
begin L1: Code for S1 if T = V2 goto L2
case V1: S1 goto NEXT if T = Vn-1 goto Ln-1
Case V2: S2 L2: Code for S2 goto Ln
……. goto NEXT NEXT
Case Vn-1:Sn-1 …….
default: Sn Ln-1: Code for Sn-1
end goto NEXT
Ln: Code for Sn default
goto NEXT
14
Flow Control Statements Cont.…
• Example: Three Address Code:
1. T1 = x + y
13. goto NEXT
switch (x + y){
2. If T1 = 1 goto 8
14. T5 = C/2
case 1 : a=a+2;
3. If T1 = 2 goto 11
15. C = T5
break;
4. If T1 = 6 goto 14
16. goto NEXT
case 2 : b = b*5;
5. T2 = d/2
17. NEXT
break;
6. d = T2
case 6 : c=c/2;
7. goto NEXT
default: d = d/2;
8. T3 = a + 2
}
9. a =T3
10. goto NEXT
11. T4 = b*5
12. b = T4 15
Backpatching
– While generating three address codes for the given expression, it can specify the address
of the Label in goto statements.
– It is very difficult to assign locations of these label statements in one pass so, two passes
are used.
– In the first pass, it can leave these addresses unspecified & in the next pass, and it can fill
these addresses.
– Therefore filling of incomplete information is called Backpatching.
One-pass code generation using backpatching
– Backpatching can be used to generate a program for Boolean expressions and the flow
of control statements in one pass.
16
Backpatching Cont.….
– In this, synthesized attributes truelist and falselist of non-terminal B are used to handle
labels in jumping code for Boolean expressions.
– In specific, B.truelist will be a list of the jump or conditional jump instructions into which
it should add the label to which control goes if B is true.
– B.falselist similarly is the list of instructions that finally get the label to which control goes
when B is false.
– As the program is produced for B, jumps to the true and false exists are left incomplete,
with the label field unfilled.
– These preliminary jumps are located on lists pointed to by B.truelist and B.falselist as
applicable.
17
Backpatching Cont.….
– Likewise, a statement S has a synthesized attribute S.nextlist, indicating a list of jumps to the
instruction directly following the code for S.
– It can produce instructions into an instruction array, and labels will be indices into this array.
– To manipulate the list of jumps, we use three functions:
1. Makelist (i)− Create a new list including only i, an index into the array of instructions;
• makelist returns a pointer to the newly generated list.
2. Merge(p1,p2) - Concatenates the lists pointed to by p1, and p2 and returns a pointer
to the concatenated list.
3. Backpatch (p, i)− Inserts i as the target label for each of the instructions on the
record pointed to by p.

18

You might also like