0% found this document useful (0 votes)
57 views

intermediate code generation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

intermediate code generation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Intermediate code can translate the source program into the machine program.

Intermediate code is generated because the compiler can’t generate machine code
directly in one pass. Therefore, first, it converts the source program into intermediate
code, which performs efficient generation of machine code further. The intermediate
code can be represented in the form of postfix notation, syntax tree, directed acyclic
graph, three address codes, Quadruples, and triples.

If it can divide the compiler stages into two parts, i.e., Front end & Back end, then this
phase comes in between.

In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this
intermediate code to generate the target code (which can be understood by the machine). The
benefits of using machine-independent intermediate code are:
Because of the machine-independent intermediate code, portability will be enhanced. For
ex, suppose, if a compiler translates the source language to its target machine language
without having the option for generating intermediate code, then for each new machine, a
full native compiler is required. Because, obviously, there were some modifications in the
compiler itself according to the machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance of source code by
optimizing the intermediate code.
If we generate machine code directly from source code then for n target machine we will have
optimizer and n code generator but if we will have a machine-independent intermediate code,
we will have only one optimizer. Intermediate code can be either language-specific (e.g., Byte
code for Java) or language. independent (three-address code). The following are commonly
used intermediate code representations:

Syntax Directed Translation in Compiler


Design
Parser uses a CFG(Context-free-Grammar) to validate the input string and
produce output for the next phase of the compiler. Output could be either a parse
tree or an abstract syntax tree. Now to interleave semantic analysis with the
syntax analysis phase of the compiler, we use Syntax Directed Translation.

Conceptually, with both syntax-directed definition and translation schemes, we parse the
input token stream, build the parse tree, and then traverse the tree as needed to
evaluate the semantic rules at the parse tree nodes. Evaluation of the semantic rules
may generate code, save information in a symbol table, issue error messages, or
perform any other activities. The translation of the token stream is the result obtained by
evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree
in form of attributes attached to the nodes. Syntax-directed translation rules use 1)
lexical values of nodes, 2) constants & 3) attributes associated with the non-terminals in
their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or
syntax tree and compute the values of attributes at the nodes of the tree by visiting them
in some order. In many cases, translation can be done during parsing without building an
explicit tree.
Example

E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to
this grammar, in order to pass some information up the parse tree and check for
semantic errors, if any. In this example, we will focus on the evaluation of the given
expression, as we don’t have any semantic assertions to check in this very basic
example.

E -> E+T { E.val = E.val + T.val } PR#1


E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5
For understanding translation rules further, we take the first SDT augmented to [ E ->
E+T ] production rule. The translation rule in consideration has val as an attribute for
both the non-terminals – E & T. Right-hand side of the translation rule corresponds to
attribute values of the right-side nodes of the production rule and vice-versa.
Generalizing, SDT are augmented rules to a CFG that associate 1) set of attributes to
every node of the grammar and 2) a set of translation rules to every production rule
using attributes, constants, and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree
corresponding to S would be

To evaluate translation rules, we can employ one depth-first search traversal on the
parse tree. This is possible only because SDT rules don’t impose any specific order on
evaluation until children’s attributes are computed before parents for a grammar having
all synthesized attributes. Otherwise, we would have to figure out the best-suited plan to
traverse through the parse tree and evaluate all the attributes in one or more traversals.
For better understanding, we will move bottom-up in the left to right fashion for
computing the translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents, as
discussed above. Right-hand side nodes are sometimes annotated with subscript 1 to
distinguish between children and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of
children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding
to node E. If all the semantic attributes in an augmented grammar are synthesized, one
depth-first search traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s
attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same
production symbols annotated to differentiate between parent and child, has an inherited
attribute val corresponding to node T.
Advantages of Syntax Directed Translation:

Ease of implementation: SDT is a simple and easy-to-implement method for


translating a programming language. It provides a clear and structured way to specify
translation rules using grammar rules.
Separation of concerns: SDT separates the translation process from the parsing
process, making it easier to modify and maintain the compiler. It also separates the
translation concerns from the parsing concerns, allowing for more modular and
extensible compiler designs.
Efficient code generation: SDT enables the generation of efficient code by optimizing
the translation process. It allows for the use of techniques such as intermediate code
generation and code optimization.

Disadvantages of Syntax Directed Translation:

Limited expressiveness: SDT has limited expressiveness in comparison to other


translation methods, such as attribute grammars. This limits the types of translations that
can be performed using SDT.
Inflexibility: SDT can be inflexible in situations where the translation rules are complex
and cannot be easily expressed using grammar rules.
Limited error recovery: SDT is limited in its ability to recover from errors during the
translation process. This can result in poor error messages and may make it difficult to
locate and fix errors in the input program.

1. Postfix Notation: Also known as reverse Polish notation or suffix notation. The ordinary
(infix) way of writing the sum of a and b is with an operator in the middle: a + b The postfix
notation for the same expression places the operator at the right end as ab +. In general, if
e1 and e2 are any postfix expressions, and + is any binary operator, the result of applying +
to the values denoted by e1 and e2 is postfix notation by e1e2 +. No parentheses are needed
in postfix notation because the position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression. In postfix notation, the operator
follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is :
ab – cd + *ab -+

Convert Infix expression to Postfix


expression
Write a program to convert an Infix expression to Postfix form.
Infix expression: The expression of the form “a operator b” (a + b) i.e.,
when an operator is in-between every pair of operands.
Postfix expression: The expression of the form “a b operator” (ab+) i.e.,
When every pair of operands is followed by an operator.
Examples:
Input: A + B * C + D
Output: ABC*+D+
Input: ((A + B) – C * (D / E)) + F
Output: AB+CDE/*-F+

Why postfix representation of the expression?


The compiler scans the expression either from left to right or from right to left.
Consider the expression: a + b * c + d
The compiler first scans the expression to evaluate the expression b * c, then again
scans the expression to add a to it.
The result is then added to d after another scan.
The repeated scanning makes it very inefficient. Infix expressions are easily readable
and solvable by humans whereas the computer cannot differentiate the operators and
parenthesis easily so, it is better to convert the expression to postfix(or prefix) form
before evaluation.
The corresponding expression in postfix form is abc*+d+. The postfix expressions can
be evaluated easily using a stack.

How to convert an Infix expression to a Postfix


expression?
To convert infix expression to postfix expression, use the stack data
structure. Scan the infix expression from left to right. Whenever we get an
operand, add it to the postfix expression and if we get an operator or
parenthesis add it to the stack by maintaining their precedence.
Below are the steps to implement the above idea:
1. Scan the infix expression from left to right.
2. If the scanned character is an operand, put it in the postfix expression.
3. Otherwise, do the following
If the precedence and associativity of the scanned operator are greater than the
precedence and associativity of the operator in the stack [or the stack is empty or
the stack contains a ‘(‘ ], then push it in the stack. [‘^‘ operator is right associative
and other operators like ‘+‘,’–‘,’*‘ and ‘/‘ are left-associative].
Check especially for a condition when the operator at the top of the stack and the
scanned operator both are ‘^‘. In this condition, the precedence of the scanned
operator is higher due to its right associativity. So it will be pushed into the operator
stack.
In all the other cases when the top of the operator stack is the same as the scanned
operator, then pop the operator from the stack because of left associativity due to
which the scanned operator has less precedence.
Else, Pop all the operators from the stack which are greater than or equal to in
precedence than that of the scanned operator.
After doing that Push the scanned operator to the stack. (If you encounter
parenthesis while popping then stop there and push the scanned operator in the
stack.)
4. If the scanned character is a ‘(‘, push it to the stack.
5. If the scanned character is a ‘)’, pop the stack and output it until a ‘(‘ is encountered,
and discard both the parenthesis.
6. Repeat steps 2-5 until the infix expression is scanned.
7. Once the scanning is over, Pop the stack and add the operators in the postfix
expression until it is not empty.
8. Finally, print the postfix expression.

Illustration:

Follow the below illustration for a better understanding


Consider the infix expression exp = “a+b*c+d”
and the infix expression is scanned using the iterator i, which is initialized
as i = 0.
1st Step: Here i = 0 and exp[i] = ‘a’ i.e., an operand. So add this in the
postfix expression. Therefore, postfix = “a”.
Add ‘a’ in the postfix

2nd Step: Here i = 1 and exp[i] = ‘+’ i.e., an operator. Push this into the
stack. postfix = “a” and stack = {+}.

Push ‘+’ in the stack

3rd Step: Now i = 2 and exp[i] = ‘b’ i.e., an operand. So add this in the
postfix expression. postfix = “ab” and stack = {+}.
Add ‘b’ in the postfix

4th Step: Now i = 3 and exp[i] = ‘*’ i.e., an operator. Push this into the
stack. postfix = “ab” and stack = {+, *}.

Push ‘*’ in the stack

5th Step: Now i = 4 and exp[i] = ‘c’ i.e., an operand. Add this in the postfix
expression. postfix = “abc” and stack = {+, *}.
Add ‘c’ in the postfix

6th Step: Now i = 5 and exp[i] = ‘+’ i.e., an operator. The topmost element
of the stack has higher precedence. So pop until the stack becomes
empty or the top element has less precedence. ‘*’ is popped and added in
postfix. So postfix = “abc*” and stack = {+}.

Pop ‘*’ and add in postfix

Now top element is ‘+‘ that also doesn’t have less precedence. Pop
it. postfix = “abc*+”.
Pop ‘+’ and add it in postfix

Now stack is empty. So push ‘+’ in the stack. stack = {+}.

Push ‘+’ in the stack

7th Step: Now i = 6 and exp[i] = ‘d’ i.e., an operand. Add this in the postfix
expression. postfix = “abc*+d”.
Add ‘d’ in the postfix

Final Step: Now no element is left. So empty the stack and add it in the
postfix expression. postfix = “abc*+d+”.

Pop ‘+’ and add it in postfix

Below is the implementation of the above algorithm:

C
C++14
Java
Python3
C#
Javascript
// C code to convert infix to postfix expression

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_EXPR_SIZE 100

// Function to return precedence of operators


int precedence(char operator)
{
switch (operator) {
case '+':
case '-':
return 1;
case '*':
case '/':
return 2;
case '^':
return 3;
default:
return -1;
}
}

// Function to check if the scanned character


// is an operator
int isOperator(char ch)
{
return (ch == '+' || ch == '-' || ch == '*' || ch == '/'
|| ch == '^');
}

// Main functio to convert infix expression


// to postfix expression
char* infixToPostfix(char* infix)
{
int i, j;
int len = strlen(infix);
char* postfix = (char*)malloc(sizeof(char) * (len + 2));
char stack[MAX_EXPR_SIZE];
int top = -1;

for (i = 0, j = 0; i < len; i++) {


if (infix[i] == ' ' || infix[i] == '\t')
continue;

// If the scanned character is operand


// add it to the postfix expression
if (isalnum(infix[i])) {
postfix[j++] = infix[i];
}

// if the scanned character is '('


// push it in the stack
else if (infix[i] == '(') {
stack[++top] = infix[i];
}

// if the scanned character is ')'


// pop the stack and add it to the
// output string until empty or '(' found
else if (infix[i] == ')') {
while (top > -1 && stack[top] != '(')
postfix[j++] = stack[top--];
if (top > -1 && stack[top] != '(')
return "Invalid Expression";
else
top--;
}

// If the scanned character is an operator


// push it in the stack
else if (isOperator(infix[i])) {
while (top > -1
&& precedence(stack[top])
>= precedence(infix[i]))
postfix[j++] = stack[top--];
stack[++top] = infix[i];
}
}

// Pop all remaining elements from the stack


while (top > -1) {
if (stack[top] == '(') {
return "Invalid Expression";
}
postfix[j++] = stack[top--];
}
postfix[j] = '\0';
return postfix;
}

// Driver code
int main()
{
char infix[MAX_EXPR_SIZE] = "a+b*(c^d-e)^(f+g*h)-i";

// Function call
char* postfix = infixToPostfix(infix);
printf("%s\n", postfix);
free(postfix);
return 0;
}

Output
abcd^e-fgh*+^*+i-
Time Complexity: O(N), where N is the size of the infix expression
Auxiliary Space: O(N), where N is the size of the infix expression

2. Three-Address Code: A statement involving no more than three references(two for


operands and one for result) is known as a three address statement. A sequence of three
address statements is known as a three address code. Three address statement is of form x =
y op z, where x, y, and z will have address (memory location). Sometimes a statement might
contain less than three references but it is still called a three address statement.
Example: The three address code for the expression a + b * c + d : T 1 = b * c T 2 = a + T 1
T 3 = T 2 + d T 1 , T 2 , T 3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
Three address code is a type of intermediate code which is easy to generate and can
be easily converted to machine code. It makes use of at most three addresses and one
operator to represent an expression and the value computed at each instruction is
stored in temporary variable generated by compiler. The compiler decides the order of
operation given by three address code.

Three address code is used in compiler applications:

Optimization: Three address code is often used as an intermediate representation of


code during optimization phases of the compilation process. The three address code
allows the compiler to analyze the code and perform optimizations that can improve the
performance of the generated code.
Code generation: Three address code can also be used as an intermediate
representation of code during the code generation phase of the compilation process.
The three address code allows the compiler to generate code that is specific to the
target platform, while also ensuring that the generated code is correct and efficient.
Debugging: Three address code can be helpful in debugging the code generated by the
compiler. Since three address code is a low-level language, it is often easier to read and
understand than the final generated code. Developers can use the three address code
to trace the execution of the program and identify errors or issues that may be present.
Language translation: Three address code can also be used to translate code from
one programming language to another. By translating code to a common intermediate
representation, it becomes easier to translate the code to multiple target languages.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated
temporaries and op represents the operator
Example-1: Convert the expression a * – (b + c) into three address code.

Example-2: Write three address code for following code


for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
Implementation of Three Address Code –
There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and
result. op denotes the operator and arg1 and arg2 denotes the two operands and result
is used to store the result of the expression.
Advantage –
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using symbol table.
Disadvantage –
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

2. Triples – This representation doesn’t make use of extra temporary variable to


represent a single operation instead when a reference to another triple’s value is
needed, a pointer to that triple is used. So, it consist of only three fields namely op, arg1
and arg2.
Disadvantage –
Temporaries are implicit and difficult to rearrange code.
It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples – This representation makes use of pointer to the listing of all
references to computations which is made separately and stored. Its similar in utility as
compared to quadruple representation but requires less space than it. Temporaries are
implicit and easier to rearrange code.
Example – Consider expression a = b * – c + b * – c
Question – Write quadruple, triples and indirect triples for following expression : (x + y) *
(y + z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
3. Syntax Tree: A syntax tree is nothing more than a condensed form of a parse tree. The
operator and keyword nodes of the parse tree are moved to their parents and a chain of
single productions is replaced by the single link in the syntax tree the internal nodes are
operators and child nodes are operands. To form a syntax tree put parentheses in the
expression, this way it’s easy to recognize which operand should come first.
Example: x = (a + b * c) / (a – b * c)
Advantages of Intermediate Code Generation:

Easier to implement: Intermediate code generation can simplify the code generation process
by reducing the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.
Platform independence: Intermediate code is platform-independent, meaning that it can be
translated into machine code or bytecode for any platform.
Code reuse: Intermediate code can be reused in the future to generate code for other platforms
or languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as
it is closer to the original source code.

Disadvantages of Intermediate Code Generation:

Increased compilation time: Intermediate code generation can significantly increase the
compilation time, making it less suitable for real-time or time-critical applications.
Additional memory usage: Intermediate code generation requires additional memory to store
the intermediate representation, which can be a concern for memory-limited systems.
Increased complexity: Intermediate code generation can increase the complexity of the
compiler design, making it harder to implement and maintain.
Reduced performance: The process of generating intermediate code can result in code that
executes slower than code generated directly from the source code.

You might also like