CD Unit 5 RV
CD Unit 5 RV
CD Unit 5 RV
UNIT-5 (Lecture-1)
Example:
Consider the three address statement x:= y + z. It can have the following sequence of codes:
MOV y, R0
ADD z, R0
o A register descriptor contains the track of what is currently in each register. The register
descriptors show that all the registers are initially empty.
o An address descriptor is used to store the location where current value of the name can be found
at run time.
A code-generation algorithm:
The algorithm takes a sequence of three-address statements as input. For each three address statement of
the form x:= y op z perform the various actions. These are as follows:
1. Invoke a function getreg to find out the location L where the result of computation y op z should
be stored.
2. Consult the address description for y to determine y'. If the value of y currently in memory and
register both then prefer the register y' . If the value of y is not already in L then generate the
instruction MOV y' , L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in
both then prefer a register to a memory location. Update the address descriptor of x to indicate
that x is in location L. If x is in L then update its descriptor and remove x from all other
descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register will
no longer contain y or z.
The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following sequence of three
address code:
t:= a-b
u:= a-c
v:= t +u
d:= v+u
UNIT-5 (Lecture-2)
2. Target program:
The target program is the output of the code generator. The output can be:
a) Assembly language: It allows subprogram to be separately compiled.
b) Relocatable machine language: It makes the process of code generation easier.
c) Absolute machine language: It can be placed in a fixed location in memory and can be executed
immediately.
3. Memory management
o During code generation process the symbol table entries have to be mapped to actual data object
addresses and levels have to be mapped to instruction address.
o Mapping name in the source program to address of data is co-operating done by the front end
and code generator.
o Local variables are stack allocation in the activation record while global variables are in static
area.
RAVIKANT NIRALA Page 1
COMPILER DESIGN (KCS-502)
4. Instruction selection:
o Nature of instruction set of the target machine should be complete and uniform.
o When you consider the efficiency of target machine then the instruction speed and machine
idioms are important factors.
o The quality of the generated code can be determined by its speed and size.
Example:
5. Register allocation
Register can be accessed faster than memory. The instructions involving operands in register are shorter
and faster than those involving in memory operand.
The following sub problems arise when we use registers:
Register allocation: In register allocation, we select the set of variables that will reside in register.
Register assignment: In Register assignment, we pick the register that contains variable.
Certain machine requires even-odd pairs of registers for some operands and result.
For example:
Consider the following division instruction of the form:
D x, y
Where,
x is the dividend even register in even/odd register pair
y is the divisor
Even register is used to hold the reminder.
Old register is used to hold the quotient.
6. Evaluation order
The efficiency of the target code can be affected by the order in which the computations are performed.
Some computation orders need fewer registers to hold results of intermediate than others.
RAVIKANT NIRALA Page 2
COMPILER DESIGN (KCS-502)
UNIT-5 (Lecture-3)
Where, op is used as an op-code and source and destination are used as a data field.
o It has the following op-codes:
ADD (add source to destination)
SUB (subtract source from destination)
MOV (move source to destination)
o The source and destination of an instruction can be specified by the combination of registers and
memory location with address modes.
The information which required during an execution of a procedure is kept in a block of storage called
an activation record. The activation record includes storage for names local to the procedure.
We can describe address in the target code using the following ways:
1. Static allocation
2. Stack allocation
In static allocation, the position of an activation record is fixed in memory at compile time.
In the stack allocation, for each execution of a procedure a new activation record is pushed onto the
stack. When the activation ends then the record is popped.
For the run-time allocation and deallocation of activation records the following three-address statements
are associated:
1. Call
2. Return
3. Halt
4. Action, a placeholder for other statements
Static allocation:
It is used to transfer the control to the address that is saved at the beginning of the activation record.
The HALT statement is the final instruction that is used to return the control to the operating system.
Stack allocation
Using the relative address, static allocation can become stack allocation for storage in activation records.
In stack allocation, register is used to store the position of activation record so words in activation
records can be accessed as offsets from the value in this register.
1. Initialization of stack:
Where,
UNIT-5 (Lecture-4)
B2
(3) t1 := 4* i
(4) t2 := a[t1]
(5) t3 := 4* i
(6) t4 := b[t3]
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=10 goto (3)
Basic block B1 contains the statement (1) to (2)
Basic block B2 contains the statement (3) to (12)
Flow Graph
Flow graph is a directed graph. It contains the flow of control information for the set of basic block.
A control flow graph is used to depict that how the program control is being parsed among the
blocks. It is useful in the loop optimization.
Flow graph for the vector dot product is given as follows:
o Block B1 is the initial node. Block B2 immediately follows B1, so from B2 to B1 there is an
edge.
o The target of jump from last statement of B1 is the first statement B2, so from B1 to B2 there is
an edge.
o B2 is a successor of B1 and B1 is the predecessor of B2.
UNIT-5 (Lecture-5)
There are two type of basic block optimization. These are as follows:
1. Structure-Preserving Transformations
2. Algebraic Transformations
In the common sub-expression, you don't need to be computed it over and over again. Instead of this you
can compute it once and kept in store from where it's referenced when encountered again.
a:=b+c
b:=a-d
c:=b+c
d:=a-d
In the above expression, the second and forth expression computed the same expression. So the block
can be transformed as follows:
a:=b+c
b:=a-d
c:=b+c
d:=b
A statement t:= b + c can be changed to u:= b + c where t is a temporary variable and u is a new
temporary variable. All the instance of t can be replaced with the u without changing the basic block
value.
t1 : = b + c
t2 : = x + y
These two statements can be interchanged without affecting the value of block when value of t1 does not
affect the value of t2.
2. Algebraic transformations:
o In the algebraic transformation, we can change the set of expression into an algebraically
equivalent set. Thus the expression x:= x + 0 or x:= x *1 can be eliminated from a basic block
without changing the set of expression.
o Constant folding is a class of related optimization. Here at compile time, we evaluate constant
expressions and replace the constant expression by their values. Thus the expression 5*2.7 would
be replaced by13.5.
o Sometimes the unexpected common sub expression is generated by the relational operators like
<=, >=, <, >, +, = etc.
o Sometimes associative expression is applied to expose common sub expression without changing
the basic block value. if the source code has the assignments
a:= b + c
e:= c +d +b
UNIT-5 (Lecture-6)
Machine-independent Optimization:
o Machine independent optimization attempts to improve the intermediate code to get a better
target code. The part of the code which is transformed here does not involve any absolute
memory location or any CPU registers.
o The process of intermediate code generation introduces much inefficiency like: using variable
instead of constants, extra copies of variable, repeated evaluation of expression. Through the
code optimization, you can remove such efficiencies and improves code.
o It can change the structure of program sometimes of beyond recognition like: unrolls loops,
inline functions, eliminates some variables that are programmer defined.
(a) z = 5*(45.0/5.0)*r
Perform 5*(45.0/5.0)*r at compile time.
(b) x = 5.7
y = x/3.6
Evaluate x/3.6 as 5.7/3.6 at compile time.
c=a*b
x=a
till
d=x*b+4
c=a*b
x=a
till
d=a*b+4
Here, after variable propagation a*b and x*b identified as common sub expression.
c=a*b
x=b
till
d=a*b+4
c=a*b
till
d=a*b+4
Here, x= b is a dead state because it will never subsequently used in the program. So, we can eliminate
this state.
do
{
item = 10;
RAVIKANT NIRALA Page 2
COMPILER DESIGN (KCS-502)
item = 10;
do
{
valuevalue = value + item;
} while(value<100);
o Strength reduction is used to replace the high strength operator by the low strength.
o An induction variable is used in loop for the following kind of assignment like i = i + constant.
i = 1;
while(i<10)
{
y = i * 4;
i++
}
i=1
t=4
{
while( t<40)
y = t;
t = t + 4;
}
UNIT-5 (Lecture-7)
If we decrease the number of instructions in an inner loop then the running time of a program may be
improved even if we increase the amount of code outside that loop.
1. Code Motion:
Code motion is used to decrease the amount of code in loop. This transformation takes a statement or
expression which can be moved outside the loop body without affecting the semantics of the program.
For example
2. Induction-Variable Elimination
It can reduce the number of additions in a loop. It improves both code space and run time performance.
Example:
The code fragment below has three induction variables (i1, i2, and i3) that can be replaced with one
induction variable, thus eliminating two induction variables.
int a[SIZE];
int b[SIZE];
void f (void)
{
int i1, i2, i3;
The code fragment below shows the loop after induction variable elimination.
int a[SIZE];
int b[SIZE];
void f (void)
{
int i1;
3. Reduction in Strength
o Strength reduction is used to replace the expensive operation by the cheaper once on the target
machine.
o Addition of a constant is cheaper than a multiplication. So we can replace multiplication with an
addition within the loop.
o Multiplication is cheaper than exponentiation. So we can replace exponentiation with
multiplication within the loop.
Example:
while (i<10)
{
j= 3 * i+1;
a[j]=a[j]-2;
i=i+2;
}
s= 3*i+1;
while (i<10)
{
j=s;
a[j]= a[j]-2;
i=i+2;
s=s+6;
}
Dead-code Elimination
Dead code is one or more than one code statements, which are:
• Either never executed or unreachable,
• Or if executed, their output is never used.
Thus, dead code plays no role in any program operation and therefore it can simply be eliminated.
There are some code statements whose computed values are used only under certain circumstances, i.e.,
sometimes the values are used and sometimes they are not. Such codes are known as partially dead-
code.
The above control flow graph depicts a chunk of program where variable ‘a’ is used to assign the
output of expression ‘x * y’. Let us assume that the value assigned to ‘a’ is never used inside the loop.
Immediately after the control leaves the loop, ‘a’ is assigned the value of variable ‘z’, which would be
used later in the program. We conclude here that the assignment code of ‘a’ is never used anywhere,
therefore it is eligible to be eliminated.
Likewise, the picture above depicts that the conditional statement is always false, implying that the
code, written in true case, will never be executed, hence it can be removed.
Partial Redundancy
Redundant expressions are computed more than once in parallel path, without any change in operands.
whereas partial-redundant expressions are computed more than once in a path, without any change in
operands. For example,
UNIT-5 (Lecture-8)
1. The leaves of graph are labeled by unique identifier and that identifier can be variable names or
constants.
2. Interior nodes of the graph are labeled by an operator symbol.
3. Nodes are also given a sequence of identifiers for labels to store the computed value.
o DAGs are a type of data structure. It is used to implement transformations on basic blocks.
o DAG provides a good way to determine the common sub-expression.
o It gives a picture representation of how the value computed by the statement is used in
subsequent statements.
Method:
Step 1:
Step 2:
For case(i), create node(OP) whose right child is node(z) and left child is node(y).
For case(ii), check whether there is node(OP) with one child node(y).
Output:
For node(x) delete x from the list of identifiers. Append x to attached identifiers list for the node n found
in step 2. Finally set node(x) to n.
Example:
S1:= 4 * i
S2:= a[S1]
S3:= 4 * i
S4:= b[S3]
S5:= s2 * S4
S6:= prod + S5
Prod:= s6
S7:= i+1
i := S7
if i<= 20 goto (1)
Stages in DAG Construction:
o To efficiently optimize the code compiler collects all the information about the program and
distribute this information to each block of the flow graph. This process is known as data-flow
graph analysis.
o Certain optimization can only be achieved by examining the entire program. It can't be achieve
by examining just a portion of the program.
o For this kind of optimization user defined chaining is one particular problem.
o Here using the value of the variable, we try to find out that which definition of a variable is
applicable in a statement.
Based on the local information a compiler can perform some optimizations. For example, consider the
following code:
x = a + b;
x=6*3
o In this code, the first assignment of x is useless. The value computer for x is never used in the
program.
o At compile time the expression 6*3 will be computed, simplifying the second assignment
statement to x = 18;
Some optimization needs more global information. For example, consider the following code:
a = 1;
b = 2;
c = 3;
if (....) x = a + 5;
else x = b + 4;
c = x + 1;
In this code, at line 3 the initial assignment is useless and x +1 expression can be simplified as 7.
But it is less obvious that how a compiler can discover these facts by looking only at one or two
consecutive statements. A more global analysis is required so that the compiler knows the following
things at each point in the program:
o Which variables are guaranteed to have constant values
o Which variables will be used before being redefined
Data flow analysis is used to discover this kind of property. The data flow analysis can be performed on
the program's control flow graph (CFG).
The control flow graph of a program is used to determine those parts of a program to which a particular
value assigned to a variable might propagate.