intermediate code generation
intermediate code generation
Intermediate code is generated because the compiler can’t generate machine code
directly in one pass. Therefore, first, it converts the source program into intermediate
code, which performs efficient generation of machine code further. The intermediate
code can be represented in the form of postfix notation, syntax tree, directed acyclic
graph, three address codes, Quadruples, and triples.
If it can divide the compiler stages into two parts, i.e., Front end & Back end, then this
phase comes in between.
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this
intermediate code to generate the target code (which can be understood by the machine). The
benefits of using machine-independent intermediate code are:
Because of the machine-independent intermediate code, portability will be enhanced. For
ex, suppose, if a compiler translates the source language to its target machine language
without having the option for generating intermediate code, then for each new machine, a
full native compiler is required. Because, obviously, there were some modifications in the
compiler itself according to the machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance of source code by
optimizing the intermediate code.
If we generate machine code directly from source code then for n target machine we will have
optimizer and n code generator but if we will have a machine-independent intermediate code,
we will have only one optimizer. Intermediate code can be either language-specific (e.g., Byte
code for Java) or language. independent (three-address code). The following are commonly
used intermediate code representations:
Conceptually, with both syntax-directed definition and translation schemes, we parse the
input token stream, build the parse tree, and then traverse the tree as needed to
evaluate the semantic rules at the parse tree nodes. Evaluation of the semantic rules
may generate code, save information in a symbol table, issue error messages, or
perform any other activities. The translation of the token stream is the result obtained by
evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic
analysis. SDT involves passing information bottom-up and/or top-down to the parse tree
in form of attributes attached to the nodes. Syntax-directed translation rules use 1)
lexical values of nodes, 2) constants & 3) attributes associated with the non-terminals in
their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or
syntax tree and compute the values of attributes at the nodes of the tree by visiting them
in some order. In many cases, translation can be done during parsing without building an
explicit tree.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT rules to
this grammar, in order to pass some information up the parse tree and check for
semantic errors, if any. In this example, we will focus on the evaluation of the given
expression, as we don’t have any semantic assertions to check in this very basic
example.
To evaluate translation rules, we can employ one depth-first search traversal on the
parse tree. This is possible only because SDT rules don’t impose any specific order on
evaluation until children’s attributes are computed before parents for a grammar having
all synthesized attributes. Otherwise, we would have to figure out the best-suited plan to
traverse through the parse tree and evaluate all the attributes in one or more traversals.
For better understanding, we will move bottom-up in the left to right fashion for
computing the translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information
happens bottom-up and all the children’s attributes are computed before parents, as
discussed above. Right-hand side nodes are sometimes annotated with subscript 1 to
distinguish between children and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of
children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding
to node E. If all the semantic attributes in an augmented grammar are synthesized, one
depth-first search traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s
attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same
production symbols annotated to differentiate between parent and child, has an inherited
attribute val corresponding to node T.
Advantages of Syntax Directed Translation:
1. Postfix Notation: Also known as reverse Polish notation or suffix notation. The ordinary
(infix) way of writing the sum of a and b is with an operator in the middle: a + b The postfix
notation for the same expression places the operator at the right end as ab +. In general, if
e1 and e2 are any postfix expressions, and + is any binary operator, the result of applying +
to the values denoted by e1 and e2 is postfix notation by e1e2 +. No parentheses are needed
in postfix notation because the position and arity (number of arguments) of the operators
permit only one way to decode a postfix expression. In postfix notation, the operator
follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is :
ab – cd + *ab -+
Illustration:
2nd Step: Here i = 1 and exp[i] = ‘+’ i.e., an operator. Push this into the
stack. postfix = “a” and stack = {+}.
3rd Step: Now i = 2 and exp[i] = ‘b’ i.e., an operand. So add this in the
postfix expression. postfix = “ab” and stack = {+}.
Add ‘b’ in the postfix
4th Step: Now i = 3 and exp[i] = ‘*’ i.e., an operator. Push this into the
stack. postfix = “ab” and stack = {+, *}.
5th Step: Now i = 4 and exp[i] = ‘c’ i.e., an operand. Add this in the postfix
expression. postfix = “abc” and stack = {+, *}.
Add ‘c’ in the postfix
6th Step: Now i = 5 and exp[i] = ‘+’ i.e., an operator. The topmost element
of the stack has higher precedence. So pop until the stack becomes
empty or the top element has less precedence. ‘*’ is popped and added in
postfix. So postfix = “abc*” and stack = {+}.
Now top element is ‘+‘ that also doesn’t have less precedence. Pop
it. postfix = “abc*+”.
Pop ‘+’ and add it in postfix
7th Step: Now i = 6 and exp[i] = ‘d’ i.e., an operand. Add this in the postfix
expression. postfix = “abc*+d”.
Add ‘d’ in the postfix
Final Step: Now no element is left. So empty the stack and add it in the
postfix expression. postfix = “abc*+d+”.
C
C++14
Java
Python3
C#
Javascript
// C code to convert infix to postfix expression
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Driver code
int main()
{
char infix[MAX_EXPR_SIZE] = "a+b*(c^d-e)^(f+g*h)-i";
// Function call
char* postfix = infixToPostfix(infix);
printf("%s\n", postfix);
free(postfix);
return 0;
}
Output
abcd^e-fgh*+^*+i-
Time Complexity: O(N), where N is the size of the infix expression
Auxiliary Space: O(N), where N is the size of the infix expression
Easier to implement: Intermediate code generation can simplify the code generation process
by reducing the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.
Platform independence: Intermediate code is platform-independent, meaning that it can be
translated into machine code or bytecode for any platform.
Code reuse: Intermediate code can be reused in the future to generate code for other platforms
or languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as
it is closer to the original source code.
Increased compilation time: Intermediate code generation can significantly increase the
compilation time, making it less suitable for real-time or time-critical applications.
Additional memory usage: Intermediate code generation requires additional memory to store
the intermediate representation, which can be a concern for memory-limited systems.
Increased complexity: Intermediate code generation can increase the complexity of the
compiler design, making it harder to implement and maintain.
Reduced performance: The process of generating intermediate code can result in code that
executes slower than code generated directly from the source code.