2290
2290
2290
COURSE OBJECTIVES:
To understand and list the different stages in the process of compilation.
To understand and design Lexical analyzers and parsers
To Develop algorithms to generate code for a target machine
To learn and develop techniques for optimization of code.
COURSE OUTCOMES:
Upon completing the course the students will be able to
Understand the complete process of compilation from source code to target code
develop the lexical analyzer and parsers
Develop algorithms to generate code for a target machine
Optimize the generated code
U N I T 1:Introduction (9)
Introduction - What is a Compiler? - Cousins of a Compiler- Assembler - Interpreter - Phases of
compilation and overview, Lexical Analysis, Syntax Analysis, Semantic Analysis, Intermediate
code Generation, Code Optimization, Code Generation - Specification of Tokens.
TEXT BOOKS:
1. Alfred Aho, Ravi Sethi, Jeffrey D Ullman, Compilers Principles, Techniques and Tools,
Pearson Education Asia, 2nd Edition, 2017.
2
REFERENCES:
1. Keith Cooper and lindaTorczon, Engineering a compiler, 2nd edition, 2016.
2. Bennet.J.P, Introduction to Compiler Techniques, Tata McGraw-Hill, 2015.
3. R.Levine, Tony Mason, Doug Brown John, Lex &Yacc, 2nd Edition (October 2012) O‟Reilly &
Associates.
4. Kenneth c.Louden,Compiler Construction: Principles and Pratice, Thomson Learning, 2018.
WEBSITES:
1. http://www.tenouk.com/ModuleW.html/
2. http://www.mactech.com/articles/mactech/Vol.06/06.04/Lexical Analysis/index.html
3
LECTURE PLAN
DESCRIPTION OF
Reference Book & Page Nos. TEACHING
S.NO PORTION TO BE HOURS
Used for teaching AIDS
COVERED
Discussion on the PPT
1 1 R[1] Page no 17-33
Fundamentals of Compilers
Introduction to Types of PPT
2 1 R[1] Page no 12-14
Compilers-uses of compilers
Compilers
R[2]-Page no 1.1-1.3
3 Analysis of the source 1 BB
R[1]-Page no1-3
program
Tutorial:
Compilers R[2]-Page no 1.1-1.3
5 1 PPT
R[1]-Page no1-3
Phases of compiler
Input buffering-Tokens
10 1 R[2]-Page no 1.28-1.29 PPT
Specification
Tutorial hour –
13 Context-free grammars 1 R[2]-Page no 197-206 PPT
Top-down parsing
Recursive-descent parser R[1]-Page no 217-231
14 1 BB
R[2]-Page no 2.16-2.19
Predictive parser
Bottom-up Parsing
Shift reduce parsing R[2]-Page no 2.26-2.37
16 1 PPT
R[1] Page no 233-240
Operator-precedence parsing
Tutorial :
Bottom-up Parsing R[2]-Page no 2.26-2.37
17 1 BB
Shift reduce parsing R[1] Page no 233-240
Declarations
Assignment statements R[1]-Page no 370-379
22 1 BB
R[2] Page no 3.13-3.14
Procedure calls
28 1 R[2] Page no 3.41-3.45 PPT
Tutorial:
36 The dag representation of 1 R[2] Page no 4.10-4.14 PPT
basic blocks
Peephole optimization R[1]-Page no 549-553
37 1 BB
R[2]-Page no 4.22-4.24
Tutorial BB
41 1 R[2]-Page no 5.3-5.11
Code optimization
Run time environment
42 1 R[2]-Page no 5.20 PPT
Storage Organization
44 Storage Allocation strategies 1 R[2]-Page no 5.27-5.31 BB
Tutorial :
45 Storage Organization 1 R[2]-Page no 5.27-5.31 BB
7
Alfred Aho, Ravi Sethi, Jeffrey D Ullman, 2006, Compilers Principles, Techniques and Tools,
1
4th Edition, Pearson Education Asia
Author: P.Kalaiselvi, Principles Of Compiler Design A.A.R.Senthikumaar, 2008, 3rd edition,
2
charulatha publication, India
Allen I. Holub, 2003, Compiler Design in C, 4th Edition, Prentice Hall of India.
3
Fischer.C.N and R.J.LeBlanc, 2003, Crafting a compiler with C, 3rd Edition, Benjamin
4 Cummings.
LECTURE NOTES
CHAPTER I- LEXICAL ANALYSIS
1.1 INRODUCTION TO COMPILING
Translator:
It is a program that translates one language to another.
Figure1.1:translator
Types of Translator:
1.Interpreter
2.Compiler
3.Assembler
1.Interpreter:
It is one of the translators that translate high level language to low level language.
2.Assembler:
It translates assembly level language to machine code.
assembly language Assembler machine code
Figure 1.3:Assembler
Example: Microprocessor 8085, 8086.
3.Compiler:
It is a program that translates one language(source code) to another language (target
code).
source code target code
Compiler
Figure 1.4:Compiler
It executes the whole program and then displays the errors.
Example: C, C++, COBOL, higher version of Pascal.
9
Analysis part breaks down the source program into constituent pieces and creates an
intermediate representation of the source program.
Synthesis part constructs the desired target program from the intermediate representation.
source code
Analysis
intermediate code
Synthesis
object code
Figure 1.5:Parts of Compilation
Linear/Lexical Analysis :
It is also called scanning. It is the process of reading the characters from left to right and
grouping into tokens having a collective meaning. For example, in the assignment statement
a=b+c*2, the characters would be grouped into the following tokens:
i) The identifier1 „a‟
ii) The assignment symbol (=)
iii) The identifier2 „b‟
iv) The plus sign (+)
v) The identifier3 „c‟
vi) The multiplication sign (*)
vii) The constant „2‟
Syntax Analysis :
It is called parsing or hierarchical analysis. It involves grouping the tokens of the source
program into grammatical phrases that are used by the compiler to synthesize output. They are
represented using a syntax tree as shown below:
a +
b *
c 2
Figure 1.6:Syntax Analysis
A syntax tree is the tree generated as a result of syntax analysis in which the interior
nodes are the operators and the exterior nodes are the operands. This analysis shows an
error when the syntax is incorrect.
Semantic Analysis :
It checks the source programs for semantic errors and gathers type information for the
subsequent code generation phase. It uses the syntax tree to identify the operators and
operands of statements. An important component of semantic analysis is type checking.
Here the compiler checks that each operator has operands that are permitted by the source
language specification.
12
A Compiler operates in phases, each of which transforms the source program from
one representation into another. The following are the phases of the compiler:
Main phases:
1) Lexical analysis
2)Syntax analysis
3)Semantic analysis
4) Intermediate code generation
5)Code optimization
6)Code generation
Sub-Phases:
1)Symbol table management
2)Error handling
LEXICAL ANALYSIS:
It is the first phase of the compiler. It gets input from the source program and produces
tokens as output. It reads the characters one by one, starting from left to right and forms the
tokens.
Token : It represents a logically cohesive sequence of characters such as keywords,
operators, identifiers, special symbols etc.
Example: a +b =20
Here, a,b,+,=,20 are all separate tokens.
Group of characters forming a token is called the Lexeme.
13
The lexical analyser not only generates a token but also enters the lexeme into the
symbol table if it is not already there.
SYNTAX ANALYSIS:
It is the second phase of the compiler. It is also known as parser. It gets the token stream as
input from the lexical analyser of the compiler and generates syntax tree as the output.
Syntax tree: It is a tree in which interior nodes are operators and exterior nodes are operands.
Example: For a=b+c*2, syntax tree is
a +
b *
c 2
Figure 1.8: Syntax Tree
SEMANTIC ANALYSIS:
It is the third phase of the compiler. It gets input from the syntax analysis as parse tree and
checks whether the given syntax is correct or not. It performs type conversion of all the data
types into real data types.
CODE OPTIMIZATION:
It is the fifth phase of the compiler. It gets the intermediate code as input and produces
optimized intermediate code as output. This phase reduces the redundant code and attempts to
improve the intermediate code so that faster-running machine code will result. During the code
optimization, the result of the program is not affected. To improve the code generation, the
optimization involves,
- deduction and removal of dead code (unreachable code).
- calculation of constants in expressions and terms.
- collapsing of repeated expression into temporary string.
- loop unrolling.
- moving code outside the loop.
- removal of unwanted temporary variables.
14
CODE GENERATION:
It is the final phase of the compiler. It gets input from code optimization phase and produces
the target code or object code as result.Intermediate instructions are translated into a sequence of
machine instructions that perform the same task. The code generation involves
- allocation of register and memory
- generation of correct references
- generation of correct data types
- generation of missing code
Symbol table is used to store all the information about identifiers used in the program. It is a
data structure containing a record for each identifier, with fields for the attributes of the
identifier. It allows to find the record for each identifier quickly and to store or retrieve data from
that record. Whenever an identifier is detected in any of the phases, it is stored in the symbol
table.
ERROR HANDLING:
Each phase can encounter errors. After detecting an error, a phase must handle the error so
that compilation can proceed. In lexical analysis, errors occur in separation of tokens. In
syntax analysis, errors occur during construction of syntax tree. In semantic analysis, errors
occur when the compiler detects constructs with right syntactic structure but no meaning and
duringtype conversion. In code optimization, errors occur when the result is affected by the
optimization. In code generation, it shows error when code is missing etc.
To illustrate the translation of source code through each phase, consider the statement
a=b+c*2. The figure shows the representation of this statement after each phase:
15
a=b+c*2
Lexical analyser
id1=id2+id3*2
Syntax analyser
=
id1 +
id2 *
id3 2
Semantic analyser
id1 +
id2 *
id3 inttoreal
Symbol Table 2
Intermediate code generator
a id1
b id2
c id3 temp1=inttoreal(2)
temp2=id3*temp1
temp3=id2+temp2
id1=temp3
Code optimizer
temp1=id3*2.0
id1=id2+temp1
Code generator
MOVF id3,R2
MULF #2.0,R2
MOVF id2,R1
ADDF R2,R1
MOVF R1,id1
16
PREPROCESSOR
A preprocessor is a program that processes its input data to produce output that is used as
input to another program. The output is said to be a preprocessed form of the input data, which
is often used by some subsequent programs like compilers.
They may perform the following functions :
1. Macro processing
2. File Inclusion
3. Rational Preprocessors
4. Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence should be
mapped to an output sequence according to a defined procedure. The mapping process that
instantiates a macro into a specific output sequence is known as macro expansion.
2. File Inclusion:
Preprocessor includes header files into the program text. When the preprocessor finds
an #include directive it replaces it by the entire content of the specified file.
3. Rational Preprocessors:
These processors change older languages with more modern flow-of-control and data-
structuring facilities.
4. Language extension :
These processors attempt to add capabilities to the language by what amounts to built-in
macros. For example, the language Equel is a database query language embedded in C.
ASSEMBLER
Assembler creates object code by translating assembly instruction mnemonics
into machine code. There are two types of assemblers:
One-pass assemblers go through the source code once and assume that all symbols
will be defined before any instruction that references them.
Two-pass assemblers create a table with all symbols and their values in the first pass,
and then use the table in a second pass to generate code.
Compiler passes
A collection of phases is done only once (single pass) or multiple times (multi pass)
Single pass: usually requires everything to be defined before being used in
source program.
Multi pass: compiler may have to keep entire program representation in memory.
Several phases can be grouped into one single pass and the activities of these phases are
interleaved during the pass. For example, lexical analysis, syntax analysis, semantic analysis
and intermediate code generation might be grouped into one pass.
These are specialized tools that have been developed for helping implement
various phases of a compiler. The following are the compiler construction tools:
1) Parser Generators:
-These produce syntax analyzers, normally from input that is based on a context-free
grammar.
-It consumes a large fraction of the running time of a compiler. -
Example-YACC (Yet Another Compiler-Compiler).
2) Scanner Generator:
-These generate lexical analyzers, normally from a specification based on regular
expressions. -The basic organization of lexical analyzers is based on finite automation.
3) Syntax-Directed Translation:
-These produce routines that walk the parse tree and as a result generate intermediate
code. -Each translation is defined in terms of translations at its neighbor nodes in the tree.
5) Data-Flow Engines:
-It does code optimization using data-flow analysis, that is, the gathering of information
about how values are transmitted from one part of a program to each other part.
18
token
source lexical parser
program analyser
get next token
symbol
table
Figure 1.10:Role of Lexical Analyzer
Upon receiving a “get next token” command from the parser, the lexical analyzer
reads input characters until it can identify the next token.
1.7.3 TOKENS
A token is a string of characters, categorized according to the rules as a symbol (e.g.,
IDENTIFIER, NUMBER, COMMA). The process of forming tokens from an input stream of
characters is called tokenization.
A token can look like anything that is useful for processing an input text stream or text
file. Consider this expression in the C programming 10language: sum=3+2;
Table 1.1:Tokens
LEXEME:
PATTERN:
A pattern is a description of the form that the lexemes of a token may take.In the case of
a keyword as a token, the pattern is just the sequence of characters that form the keyword. For
identifiers and some other tokens, the pattern is a more complex structure that is matched by
many strings.
5)Panic mode recovery: Deletion of successive characters from the token until
error is resolved.
Forward pointer
A = B + C
We introduce a two-buffer scheme that handles large look aheads safely. We then consider
an improvement involving "sentinels" that saves time checking for the ends of buffers.
20
lexeme_beginning
forward
Each buffer is of the same size N, and N is usually the number of characters on one
disk block. E.g., 1024 or 4096 bytes. Using one system read command we can read N
characters into a buffer. If fewer than N characters remain in the input file, then a special
character, represented by eof, marks the end of the source file. Two pointers to the input are
maintained:
1. Pointer lexeme_beginning, marks the beginning of the current lexeme,
whose extent we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found.
Once the next lexeme is determined, forward is set to the character at its
right end.
The string of characters between the two pointers is the current lexeme. After
the lexeme is recorded as an attribute value of a token returned to the parser,
lexeme_beginning is set to the character immediately after the lexeme just found.
SENTINELS
For each character read, we make two tests: one for the end of the buffer, and one to
determine what character is read. We can combine the buffer-end test with the test for the
current character if we extend each buffer to hold a sentinel character at the end. The sentinel is
a special character that cannot be part of the source program, and a natural choice is the
character eof.
21
lexeme_beginning
forwad
Figure 1.13:Sentinels
Note that eof retains its use as a marker for the end of the entire input. Any eof that
appears other than at the end of a buffer means that the input is at an end.
forward : = forward + 1; if
forward ↑ = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1
end
else if forward at end of second half then
begin reload first half;
move forward to beginning of first
half end
else /* eof within a buffer signifying end of input
*/ terminate lexical analysis
end
A string over an alphabet is a finite sequence of symbols drawn from that alphabet.
In language theory, the terms "sentence" and "word" are often used as synonyms for
"string." The length of a string s, usually written |s|, is the number of occurrences of symbols in s.
For example, banana is a string of length six. The empty string, denoted ε, is the string of length
zero.
Operations on strings
The following string-related terms are commonly used:
22
1. A prefix of string s is any string obtained by removing zero or more symbols from the
end of strings.For example, ban is a prefix of banana.
2. A suffix of string s is any string obtained by removing zero or more symbols from
the beginning of s. For example, nana is a suffix of banana.
3. A substring of s is obtained by deleting any prefix and any suffix from s. For
example, nan is a substring of banana.
4. The proper prefixes, suffixes, and substrings of a string s are those prefixes,
suffixes, and substrings, respectively of s that are not ε or not equal to s itself.
5. A subsequence of s is any string formed by deleting zero or more not necessarily
consecutive positions of s. For example, baan is a subsequence of banana.
Operations on languages:
The following are the operations that can be applied to languages:
1.Union
2.Concatenation
3.Kleene closure
4.Positive closure
The following example shows the operations on strings: Let L={0,1} and S={a,b,c}
1. Union : L U S={0,1,a,b,c}
2. Concatenation : L.S={0a,1a,0b,1b,0c,1c}
*
3. Kleene closure : L ={ ε,0,1,00….}
+
4. Positive closure : L ={0,1,00….}
Regular Expressions
Each regular expression r denotes a language L(r).Here are the rules that define the
regular expressions over some alphabet Σ and the languages that those expressions denote:
1. ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2. If„a‟is a symbol in Σ, then „a‟is a regular expression, and L(a) = {a}, that is, the
language with one string, of length one, with „a‟in its one position.
3. Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then,
Regular set
There are a number of algebraic laws for regular expressions that can be used to
manipulate into equivalent forms.
For instance, r|s = s|r is commutative; r|(s|t)=(r|s)|t is associative.
Regular Definitions
dl → r 1 d2
→ r2
………
dn → rn
Example: Identifiers is the set of strings of letters and digits beginning with a letter. Regular
definition for this set:
letter → A | B | …. | Z | a | b | …. | z |
digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *
Shorthands
- If „r‟ is a regular expression, then ( r )? is a regular expression that denotes the language
L( r ) U { ε }.
24
3. Character Classes:
- The notation [abc] where a, b and c are alphabet symbols denotes the regular expression
a | b | c.
Non-regular Set
RECOGNITION OF TOKENS
term → id
|num
where the terminals if , then, else, relop, id and num generate sets of strings given by the
following regular definitions:
If → if
Then → then
Else → else
Relop → <|<=|=|<>|>|>= *
Id → letter(letter|digit)
+ + +
Num → digit (.digit )?(E(+|-)?digit )?
For this language fragment the lexical analyzer will recognize the keywords if, then, else,
as well as the lexemes denoted by relop, id, and num. To simplify matters, we assume keywords
are reserved; that is, they cannot be used as identifiers.
Transition diagrams
It is a diagrammatic representation to depict the action that will take place when a lexical
analyzer is called by the parser to get the next token. It is used to keep track of information about
the characters that are seen as the forward pointer scans the input.
25
LEX
Lex is a computer program that generates lexical analyzers. Lex is commonly used with
the yacc parser generator.
Lex
lex.l lex.yy.c
compiler
C compiler
lex.yy.c a.out
26
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Finite Automata is one of the mathematical models that consist of a number of states and
edges. It is a transition diagram that recognizes a regular expression or grammar.
q0 – starting state
fn – final state
The following steps are involved in the construction of DFA from regular expression:
i) Convert RE to NFA using Thomson‟s rules
ii) Convert NFA to DFA
iii) Construct minimized DFA
UNIT-II
CHAPTER-II
Syntax analysis is the second phase of the compiler. It gets the input from the tokens and
generates a syntax tree or parse tree.
The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and
verifies that the string can be generated by the grammar for the source language. It reports any
syntax errors in the program. It also recovers from commonly occurring errors so that it can
continue processing its input.
symbol
table
1. Variable re-declaration
2. Variable initialization before use.
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.
The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction
On discovering an error, the parser discards input symbols one at a time until a
synchronizing token is found. The synchronizing tokens are usually delimiters, such as
semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When
multiple errors in the same statement are rare, this method is quite useful.
On discovering an error, the parser performs local correction on the remaining input that
allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc.
Error productions:
The parser is constructed using augmented grammar with error productions. If an error
production is used by the parser, appropriate error diagnostics can be generated to indicate the
erroneous constructs recognized by the input.
Global correction:
Given an incorrect input string x and grammar G, certain algorithms can be used to find a
parse tree for a string y, such that the number of insertions, deletions and changes of tokens is as
small as possible. However, these methods are in general too costly in terms of time and space.
30
Non-Terminals : These are the syntactic variables that denote a set of strings. These help to
define the language generated by the grammar.
Start Symbol : One non-terminal in the grammar is denoted as the “Start-symbol” and the set of
strings it denotes is the language defined by the grammar.
Productions : It specifies the manner in which terminals and non-terminals can be combined to
form strings. Each production consists of a non-terminal, followed by an arrow, followed by a
string of non-terminals and terminals.
Example of context-free grammar: The following grammar defines simple arithmetic
expressions:
expr → expr op expr
expr → (expr)
expr → - expr
expr → id op
→ + op → -
op → *
op → /
op → ↑
In this grammar,
id + - * / ↑( ) are terminals.
expr , op are non-terminals.
expr is the start symbol.
Each line is a production.
2.2.1 Derivations:
Derivation is a process that generates a valid string with the help of grammar by replacing the
non-terminals on the left with the string on the right side of the production.
To generate a valid string - (id+id ) from the grammar the steps are
E→-E
E → - (E )
E → - (E+E )
E → - (id+E )
E → - (id+id )
In the above derivation,
E is the start symbol.
- (id+id) is the required sentence (only terminals).
Strings such as E, -E, -(E), . . . are called sentinel forms.
Types of derivations:
In leftmost derivations, the leftmost non-terminal in each sentinel is always chosen first for
replacement.
In rightmost derivations, the rightmost non-terminal in each sentinel is always chosen first
for replacement.
Example:
E→-E E→-E
E → - (E ) E → - (E )
E → - (E+E ) E → - (E+E )
E → - (id+E ) E → - (E+id )
E → - (id+id ) E → - (id+id )
String that appear in leftmost derivation are called left sentinel forms.
String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
Each interior node of a parse tree is a non-terminal. The children of node can be a
terminal or non-terminal of the sentinel forms that are read from left to right. The sentinel form
in the parse tree is called yield or frontier of the tree.
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.
The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E→E+E*E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
E E
E + E E * E
id E * E E + E id
id id id id
The transition diagram has set of states and The context-free grammar has set of
edges. productions.
It is useful for describing the structure of lexical It is useful in describing nested structures
constructs such as identifiers, constants, such as balanced parentheses, matching
keywords, and so forth. begin-end‟s and so on.
The lexical rules of a language are simple and RE is used to describe them.
Regular expressions provide a more concise and easier to understand notation for tokens
than grammars.
Efficient lexical analyzers can be constructed automatically from RE than from
grammars.
Separating the syntactic structure of a language into lexical and nonlexical parts provides
a convenient way of modularizing the front end into two manageable-sized components.
Eliminating ambiguity:
Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.
Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt | other
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following
two parse trees for leftmost derivation :
34
1. stmt
E1
E 2 S1 S2
2. stmt
E 1 S 2
E2 S1
A → βA‟
E → E+T |T
T → T*F |F
F→ (E) |id
as E → TE‟
E‟ → +TE‟ |ε
as T → FT‟
T‟→ *FT‟ | ε
is E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id
A → αA‟
A‟→ β1 | β2
S → iEtSS‟ | a
S‟→ eS |ε
E→b
PARSING
Parse tree:
Types of parsing:
Top–down parsing : A parser can start with the start symbol and try to transform it to the
input string.
Example : LL Parsers.
Bottom–up parsing : A parser can start with input and attempt to rewrite it into the start
symbol.
Example : LR Parsers.
Recursive descent parsing is one of the top-down parsing techniques that uses a set of
recursive procedures to scan its input.
This parsing method may involve backtracking, that is, making repeated scans of the
input.
Step1:
Initially create a tree with single node labeled S. An input pointer points to „c‟, the first symbol
of w. Expand the tree with the production of S.
c A d
Step2:
The leftmost leaf „c‟ matches the first symbol of w, so advance the input pointer to the second
symbol of w „a‟ and consider the next leaf „A‟. Expand A using the first alternative.
S
c A d
a b
Step3:
The second symbol „a‟ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w „d‟. But the third leaf of tree is b which does not match with the input
symbol d.
38
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
c A d
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. Hence,
elimination of left-recursion must be done before parsing.
E → E+T |T
T → T*F |F
F→ (E) |id
becomes, E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id
Recursive procedure:
Procedure E()
begin
T( );
EPRIME( );
end
39
Procedure EPRIME(
) begin
If input_symbol=‟+‟
then ADVANCE( );
T( );
EPRIME( );
end
Procedure T( )
begin
F( );
TPRIME( );
end
Procedure TPRIME(
) begin
If input_symbol=‟*‟
then ADVANCE( );
F( );
TPRIME( );
end
Procedure F( )
begin
If input-symbol=‟id‟ then
ADVANCE( );
else if input-symbol=‟(„
then ADVANCE( );
E( );
else if input-symbol=‟)‟
then ADVANCE( );
end
else ERROR( );
Stack implementation:
E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
40
id+id*id
TPRIME( ) id+id*id
EPRIME( ) id+id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( )
Predictive parsing is a special case of recursive descent parsing where no backtracking is
required.
The key problem of predictive parsing is to determine the production to be applied for a
non-terminal in case of alternatives.
INPUT a + b $
STACK
X Predictive parsing program
OUTPUT
Y
Z
$
Parsing Table M
The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array M[A, a], where „A‟ is anon-terminal and „a‟ is aterminal.
The parser is controlled by a program that considers X, the symbol on top of stack, and a, the
current input symbol. These two symbols determine the parser action. There are three
possibilities:
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next
input symbol.
3. If X is a non-terminal , the program consults entry M[X, a] of the parsing table M. This
entry will either be an X-production of the grammar or an error entry.
If M[X, a] = {X → UVW},the parser replaces X on top of the stack by
WVU. If M[X, a] = error, the parser calls an error recovery routine.
Algorithm for nonrecursive predictive parsing:
Method : Initially, the parser has $S on the stack with S, the start symbol of G on top, and w$ in
the input buffer. The program that utilizes the predictive parsing table M to produce a parse for
the input is as follows:
set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X, a] = X →Y1Y2 … Yk then begin
42
The construction of a predictive parser is aided by two functions associated with a grammar G :
1. FIRST
2. FOLLOW
Rules for first( ):
Input : Grammar G
Method :
Example:
E → E+T |T
T → T*F |F
F→ (E) |id
E → TE‟
E‟ → +TE‟ |ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id
First( ) :
FIRST(E) ={ (, id}
FIRST(E‟) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T‟) ={*, ε }
FIRST(F) ={ ( , id }
Follow( ):
FOLLOW(E) ={ $, ) }
FOLLOW(E‟) ={ $, ) }
FOLLOW(T) ={ +, $, ) }
FOLLOW(T‟) = { +, $, ) }
FOLLOW(F) ={+, * , $ , ) }
Table 2.2: Predictive parsing
Er
NON- id + * ( ) $
TERMINAL
E E → TE‟ E → TE‟
E‟ E‟ → +TE‟ E‟ → ε E‟→ ε
T T → FT‟ T → FT‟
T‟ T‟→ ε T‟→ *FT‟ T‟ → ε T‟ → ε
F F→ id F→ (E)
44
LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This
type of grammar is called LL(1) grammar.
Consider this following grammar:
S → iEtS | iEtSeS | a
E→b
S → iEtSS‟ |a
S‟→ eS | ε
E→b
To construct a parsing table, we need FIRST()and FOLLOW() for all the non-terminals.
FIRST(S) ={ i, a }
FIRST(S‟) = {e, ε }
FIRST(E) ={ b}
FOLLOW(S) ={ $ ,e }
45
FOLLOW(S‟) = { $ ,e }
FOLLOW(E) = {t}
NON- a b e i t $
TERMINAL
S S→a S → iEtSS‟
S‟ S‟→ eS S‟→ ε
S‟→ ε
E E→b
Since there are more than one production, the grammar is not LL(1) grammar.
1. Shift
2. Reduce
3. Accept
4. Error
Constructing a parse tree for an input string beginning at the leaves and going towards the root is
called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.
46
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
E → E+E
E → E*E
E → (E)
E → id
E → E+E
→ E+E*E
→ E+E*id3
→ E+id2*id3
→ id1+id2*id3
Handle pruning:
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ reduce by E→ E *E
$E $ accept
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shift-reduce conflict:
Example:
$E $E
2. Reduce-reduce conflict:
M → R+R |R+c |R
R→c
and input c+c
$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
Viable prefixes:
α is a viable prefix of the grammar if there is w such that αw is a right sentinel form.
The set of prefixes of right sentinel forms that can appear on the stack of a shift-reduce parser
are called viable prefixes.
The set of viable prefixes is a regular language.
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is ɛ or has two adjacent non-
terminals.
Example:
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
E → E+E |E-E |E*E | E/E |E↑E | -E |id
Operator precedence relations:
There are three disjoint precedence relations
.
namely < - less than
= - equal to
.
> - greater than
The relations give the followingmeaning:
.
a < b – a yields precedence to b
a=b – a has the same precedence as b
.
a > b – a takes precedence over b
Also make
. . . . . . . .
( = ) , ( < ( , ) > ) , ( < id , id > ) , $ < id , id > $ , $ < ( , ) > $
Example:
E → E+E |E-E |E*E | E/E |E↑E | (E) | -E |id is given in the following table assuming
.
(9) else if a > b then /*reduce*/
(10) repeat
(11) pop the stack
.
(12) until the top stack terminal is related by <
to the terminal most recently popped
(13) else error( )
end
Stack implementation of operator precedence parsing:
Operator precedence parsing uses a stack and precedence relation table for its
implementation of above algorithm. It is a shift-reduce parsing containing all four actions shift,
reduce, accept and error.
The initial configuration of an operator precedence parsing is STACK$
INPUT
where w is the input string to be parsed. w$
Example:
Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id .The
implementation is as follows:
2.6 LR PARSERS
An efficient bottom-up syntax analysis technique that can be used to parse a large class of
CFG is called LR(k) parsing. The „L‟ is for left-to-right scanning of the input, the „R‟ for
constructing a rightmost derivation in reverse, and the „k‟ for the number of input symbols.
When „k‟ is omitted, it is assumed to be 1.
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can be
written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
It detects asyntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
INPUT
a1 … ai … an $
STACK
53
It consists of : an input, an output, a stack, a driver program, and a parsing table that has two
parts (action and goto).
Action : The parsing program determines sm, the state currently on top of stack, and ai, the
current input symbol. It then consults action[sm,ai] in the action table which can have one of four
values :
Goto : The function goto takes a state and grammar symbol as arguments and produces a state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
buffer. The parser then executes the following program :
set ip to point to the first input symbol of
w$; repeat forever begin
let s be the state on top of the stack
and a the symbol pointed to by ip;
if action[s, a] =shift s‟ then begin push
a then s‟ on top of the stack;
advance ip to the next input symbol
end
else if action[s, a]=reduce A→β then begin
pop 2* |β |symbols off the stack;
let s‟ be the state now on top of the stack;
push A then goto[s‟, A] on top of the
stack; output the production A→ β
end
else if action[s, a]=accept then
return
else error( )
end
54
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the
right side. For example, production A → XYZ yields the four items :
A → . XYZ
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from
I by the two rules:
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such
that [A→ α . Xβ] is in I.
Steps to construct SLR parsing table for grammar G are:
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
55
3. The goto transitions for state i are constructed for all non-terminals A using the rule: If
goto(Ii,A)= Ij, then goto[i,A] = j.
Ʃ All entries not defined by rules (2) and (3) are made “error”
Ʃ The initial state of the parser is the one constructed from the set of items containing
[S‟→.S].
I0 : E‟ → . E
δ →.E+T
iv) → . T
T →.T*F
T →.F
F → . (E)
F → . id
GOTO ( I0 , E) GOTO ( I4 , id )
I1 : E‟ → E . I5 : F→ id .
E→E.+T
56
GOTO ( I6 , T )
GOTO ( I0 , T) I9 : E → E + T .
I2 : E → T . T→T.*F
T→T.*F
GOTO ( I6 , F )
GOTO ( I0 , F) I3 : T → F .
I3 : T → F .
GOTO ( I6 , ( )
GOTO ( I0 , ( ) I4 : F→ ( . E )
I4 : F → ( . E)
E →.E+T GOTO ( I6 , id)
E →.T I5 : F→ id .
T →.T*F
T →.F GOTO ( I7 , F )
F → . (E) I10 : T → T * F .
F → . id
GOTO ( I7 , ( )
GOTO ( I0 , id ) I4 : F→ ( . E )
I5 : F→ id . E→.E+T
E→.T
GOTO ( I1 , + ) T→.T*F
I6 : E → E + . T T→.F
T→.T*F F → . (E)
T→.F F → . id
F → . (E)
F → . id GOTO ( I7 , id )
I5 : F → id .
GOTO ( I2 , * )
I7 : T → T * . F GOTO ( I8 , ) )
F → . (E) I11 : F→ ( E ) .
F → . id
GOTO ( I8 , + )
GOTO ( I4 , E ) I6 : E → E + . T
I8 : F→ ( E . ) E T→.T*F
→E.+T T→.F
F→ . ( E )
GOTO ( I4 , T) F→ . id
I2 : E →T .
T→T.*F GOTO ( I9 , *)
I7 : T → T * . F
GOTO ( I4 , F) F→ . ( E )
I3 : T → F . F→ . id
57
GOTO ( I4 , ()
I4 : F → ( . E)
E →.E+T
E →.T
T →.T*F
T →.F
F → . (E)
F → id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }
ACTION GOTO
id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 ACC
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
Stack implementation:
0 id + id * id $ GOTO ( I0 , id ) = s5 ; shift
0F3 + id * id $ GOTO ( I0 , F ) = 3
GOTO ( I3 , + ) = r4 ; reduce by T → F
0T2 + id * id $ GOTO ( I0 , T ) = 2
GOTO ( I2 , + ) = r2 ; reduce by E → T
0E1 + id * id $ GOTO ( I0 , E ) = 1
GOTO ( I1 , + ) = s6 ; shift
0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ; reduce by F→ id
0E1+6F3 * id $ GOTO ( I6 , F ) = 3
GOTO ( I3 , * ) = r4 ; reduce by T → F
0E1+6T9 * id $ GOTO ( I6 , T ) = 9
GOTO ( I9 , * ) = s7 ; shift
0 E 1 + 6 T 9 * 7 id 5 $ GOTO ( I5 , $ ) = r6 ; reduce by F→ id
0 E 1 + 6 T 9 * 7 F 10 $ GOTO ( I7 , F ) = 10
GOTO ( I10 , $ ) = r3 ; reduce by T → T * F
0E1+6T9 $ GOTO ( I6 , T ) = 9
GOTO ( I9 , $ ) = r1 ; reduce by E → E + T
0E1 $ GOTO ( I0 , E ) = 1
GOTO ( I1 , $ ) = accept
2. Flow-of-control checks – Statements that cause flow of control to leave a construct must
have some place to which to transfer the flow of control. Example: An error occurs when an
enclosing statement, such as break, does not exist in switch statement.
A type checker verifies that the type of a construct matches that expected by its
context. For example : arithmetic operator mod in Pascal requires integer operands, so a
type checker verifies that the operands of mod have type integer.
Type information gathered by a type checker may be needed when code is generated.
The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to
language constructs.
For example : “ if both operands of the arithmetic operators of +,- and * are of type integer,
then the result is of type integer ”
Type Expressions
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting
“the absence of a value” allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
Records : The difference between a record and a product is that the fields of a record have
names. The record type constructor will be applied to a tuple formed from field names and
field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.
4. Type expressions may contain variables whose values are type expressions.
x pointer
Type systems
A type system is a collection of rules for assigning type expressions to the various parts
of a program.
A type checker implements a type system. It is specified in a syntax-directed manner.
Different type systems may be used by different compilers or processors of the
same language.
Checkingdone by a compiler is said to be static, while checking done when the target
program runs is termed dynamic.
Any check can be done dynamically, if the target code carries the type of an element
along with the value of that element.
61
Error Recovery
Since type checking has the potential for catching errors in program, it is desirable
for type checker to recover from errors, so it can check the rest of the input.
Error handling has to be designed into the type system right from the start; the
type checking rules must be prepared to cope with errors.
Here, we specify a type checker for a simple language in which the type of each
identifier must be declared before the identifier is used. The type checker is a translation scheme
that synthesizes the type of each expression from the types of its subexpressions. The type
checker can handle arrays, pointers, statements and functions.
A Simple Language
Translation scheme:
P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type)}
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
In the following rules, the attribute type forE gives the type expression assigned to the
expression generated by E.
The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type
t of the object pointed to by the pointer E.
Statements do not have values; hence the basic type void can be assigned to them. If an error
is detected within a statement, then type_error is assigned.
1. Assignment statement:
S → id : = E { S.type : = if id.type = E.type then void else
type_error }
2. Conditional statement:
S → if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }
3. While statement:
S → while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }
63
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and S1.type = void
then void
else type_error }
Procedures:
A procedure definition is a declaration that associates an identifier with a statement.
The identifier is the procedure name, and the statement is the procedure body.
For example, the following is the definition of procedure named readarray :
procedure readarray;
var i : integer;
begin
for i : = 1 to 9 do
read(a[i]) end;
When a procedure name appears within an executable statement, the procedure is said to
be called at that point.
Activation trees:
An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
1. Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only if control flows from activation a
to b.
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs before the
lifetime of b.
Control stack:
A control stack is used to keep track of live procedure activations. The idea is to push
the node for an activation onto the control stack as the activation begins and to pop the
node when the activation ends.
The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the
path from n to the root.
64
Binding of names:
Even if each name is declared once in a program, the same name may denote different
data objects at run time. “Data object” corresponds to a storage location that holds values.
The term environment refers to a function that maps a name to a storage location. The
term state refers to a function that maps a storage location to the value held there.
environment state
The executing target program runs in its own logical address space in which
each program value has a location.
The management and organization of this logical address space is shared between the
complier, operating system and target machine. The operating system maps the
logical address into physical addresses, which are usually spread throughout memory.
Code
Static Data
Stack
free memory
Heap
65
Run-time storage comes in blocks, where a byte is the smallest unit of addressable
memory. Four bytes form a machine word. Multibyte objects are stored in
consecutive bytes and given the address of first byte.
The storage layout for data objects is strongly influenced by the addressing constraints
of the target machine.
A character array of length 10 needs only enough bytes to hold 10 characters, a
compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused.
This unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be placed in
an area called static.
The dynamic areas used to maximize the utilization of space at run time are stack
and heap.
Activation records:
Procedure calls and returns are usually managed by a run time stack called the
control stack.
Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack.
The contents of the activation record vary with the language being implemented.
The diagram below shows the contents of activation record.
Space for the return value of the called functions, if any. Again, not all called
procedures return a value, and if one does, we may prefer to place that value in a
register for efficiency.
The actual parameters used by the calling procedure. These are not placed in
activation record but rather in registers, when possible, for greater efficiency.
2.13 STORAGE ALLOCATION STRATEGIES
The different storage allocation strategies are :
1. Static allocation – lays out storage for all data objects at compile time
2. Stack allocation – manages the run-time storage as a stack.
3. Heap allocation – allocates and deallocates storage as needed at run time from a data
area known as heap.
All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack.
Each time a procedure is called , space for its local variables is pushed onto a stack,
and when the procedure terminates, that space is popped off the stack.
Calling sequences:
Procedures called are implemented in what is called as calling sequence, which
consists of code that allocates an activation record on the stack and enters information
into its fields.
A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
The code in calling sequence is often divided between the calling procedure (caller)
and the procedure it calls (callee).
When designing calling sequences and the layout of activation records, the
following principles are helpful:
Values communicated between caller and callee are generally placed at the
beginning of the callee‟s activation record, so they are as close as possible to
the caller‟s activation record.
67
Fixed length items are generally placed in the middle. Such items typically
include the control link, the access link, and the machine status fields.
Items whose size may not be known early enough are placed at the end of the
activation record. The most common example is dynamically sized array, where
the value of one of the callee‟s parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common approach is to
have it point to the end of fixed-length fields in the activation record. Fixed-length
data can then be accessed by fixed offsets, known to the intermediate-code
generator, relative to the top-of-stack pointer.
...
Parameters and returned values
caller‟s
control link
activation
links and saved status
record
caller‟s temporaries and local data
responsibility Parameters and returned values
callee‟s
activation control link
record links and saved status
top_sp
callee‟s
responsibility temporaries and local data
The calling sequence and its division between caller and callee are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp into the callee‟s
activation record. The caller then increments the top_sp to the respective
positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
The callee places the return value next to the parameters.
Using the information in the machine-status field, the callee restores top_sp
and other registers, and then branches to the return address that the caller
placed in the status field.
Although top_sp has been decremented, the caller knows where the return value is,
relative to the current value of top_sp; the caller therefore may use that value.
68
activation
control link
record for p
pointer to A
pointer to B
pointer to C
array A
arrays of p
array B
array C
arrays of q top
Procedure p has three local arrays, whose sizes cannot be determined at compile
time. The storage for these arrays is not part of the activation record for p.
Access to the data is through two pointers, top and top-sp. Here the top marks the
actual top of stack; it points the position at which the next activation record will begin.
The second top-sp is used to find local, fixed-length fields of the top activation record.
The code to reposition top and top-sp can be generated at compile time, in terms of
sizes that will become known at run time.
63
Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects.
Pieces may be deallocated in any order, so over the time the heap will consist of
alternate areas that are free and in use.
s Retained activation
s record for r
r q ( 1 , 9) control link
control link
q(1,9)
control link
The record for an activation of procedure r is retained when the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.
If the retained activation record for r is deallocated, there will be free space in the
heap between the activation records for s and q.
CHAPTER III – INTERMEDIATE CODE GENERATION
64
3.1 INTRODUCTION
The front end translates a source program into an intermediate representation from
which the back end generates target code.
5. Retargeting is facilitated. That is, a compiler for a different machine can be created
by attaching a back end for the new machine to an existing front end.
1. Syntax tree
2. Postfix notation
3. Three address code
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
Syntax tree:
A syntax tree depicts the natural hierarchical structure of a source program. A dag
(Directed Acyclic Graph) gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statement a : =
b * - c + b * - c are as follows:
65
assign assign
A + A +
* * *
c c c
Postfix notation:
Syntax-directed definition:
Syntax trees for assignment statements are produced by the syntax-directed definition.
Non-terminal S generates an assignment statement. The two binary operators + and * are
examples of the full operator set in a typical language. Operator associativities and
precedences are the usual ones, even though they have not been put into the grammar. This
definition constructs the tree from the input a : = b * - c + b* - c.
Table 3.1:Syntax directed definition
The token id has an attribute place that points to the symbol-table entry for the identifier.
A symbol-table entry can be found from an attribute id.name, representing the lexeme associated
with that occurrence of id. If the lexical analyzer holds all lexemes in a single array of characters,
then attribute name might be the index of the first character of the lexeme.
Two representations of the syntax tree are as follows. In (a) each node is represented as a
record with a field for its operator and additional fields for pointers to its children. In (b), nodes
are allocated from an array of records and the index or position of the node serves as the pointer to
the node. All the nodes in the syntax tree can be visited by following pointers, starting from the
root at position 10.
Two representations of the syntax tree
assign 0 id b
1 id c
id a
2 uminus 1
3 * 0 2
+
4 id b
5 id c
* *
6 uminus 5
id b id b
7 * 4 6
uminus uminus 8 + 3 7
9 id a
id c id c
10 assign 9 8
(a) (b)
x : = y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into asequence
t1 : = y * z t 2 :
= x + t1
wheret1 and t2 are compiler-generated temporary names.
Three-address code corresponding to the syntax tree and dag given above
t1 : = - c t1 : = -c
t2 : = b * t1 t2 : = b * t1
t3 : = - c t5 : = t2 + t2
t4 : = b * t3 a : = t5
t5 : = t2 + t4
a : = t5
(a) Code for the syntax tree (b) Code for the dag
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
68
8. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
9. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (
<, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.
4. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).
When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. For example, id : = E consists of code to evaluate E into some temporary
t, followed by the assignment id.place : = t.
E E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)
E E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)
E - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)
E id E.place : = id.place;
E.code : = ‘ ‘
70
S.begin:
E.code
S1.code
goto S.begin
S.after: ...
71
Quadruples:
A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
The op field contains an internal code for the operator. The three-address statement x : =
y op z is represented by placing y in arg1, z in arg2 and x in result.
The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.
If we do so, three-address statements can be represented by records with only three
fields: op, arg1 and arg2.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).
Since three fields are used, this intermediate code format is known as triples.
op arg1
op arg1 arg2 result
(0) uminus c
(0) uminus c t1
(1) * b (0)
(1) * b t1 t2
(2) uminus c
(2) uminus c t3
(3) * b (2)
(3) * b t3 t4
(4) + (1) (3)
(4) + t2 t4 t5
(5) assign a (4)
(5) := t3 a
(a) Quadruples
(b) Triples
A ternary operation like x[i] : = y requires two entries in the triple structure as shown as
below while x : = y[i] is naturally represented as two operations.
Indirect Triples:
3.3 DECLARATIONS
PD { offset : = 0 }
DD;D
D D ; D | id : T | proc id ; D ; S
One possible implementation of a symbol table is a linked list of entries for names.
A new symbol table is created when a procedure declaration D proc id D1;S is seen,
and entries for the declarations in D1 are created in the new table. The new table points back to
the symbol table of the enclosing procedure; the name represented by id itself is local to the
enclosing procedure. The only change from the treatment of variable declarations is that the
procedure enter is told which symbol table to make an entry in.
For example, consider the symbol tables for procedures readarray, exchange, and
quicksort pointing back to that for the containing procedure sort, consisting of the entire
program. Since partition is declared within quicksort, its table points to that of quicksort.
sort
nil header
a
x
readarray to readarray
exchange to exchange
quicksort
partition
header
i
j
1. mktable(previous) creates a new symbol table and returns a pointer to the new table. The
argument previous points to a previously created symbol table, presumably that for the
enclosing procedure.
2. enter(table, name, type, offset) creates a new entry for name name in the symbol table pointed
to by table. Again, enter places type type and relative address offset in fields within the entry.
3. addwidth(table, width) records the cumulative width of all the entries in table in the header
associated with this symbol table.
4. enterproc(table, name, newtable) creates a new entry for procedure name in the symbol table
pointed to by table. The argument newtable points to the symbol table for this procedure
name.
D D1 ; D2
The stack tblptr is used to contain pointers to the tables for sort, quicksort, and
partition when the declarations in partition are considered.
The top element of stack offset is the next available relative address for a local of
the current procedure.
All semantic actions in the subtrees for B and C in
A BC {actionA}
are done before actionA at the end of the production occurs. Hence, the action associated
with the marker M is the first to be done.
76
The action for nonterminal M initializes stack tblptr with a symbol table for the
outermost scope, created by operation mktable(nil). The action also pushes relative
address 0 onto stack offset.
Similarly, the nonterminal N uses the operation mktable(top(tblptr)) to create a new
symbol table. The argument top(tblptr) gives the enclosing scope for the new table.
For each variable declaration id: T, an entry is created for id in the current symbol table.
The top of stack offset is incremented by T.width.
When the action on the right side of D proc id; ND1; S occurs, the width of all
declarations generated by D1 is on the top of stack offset; it is recorded using addwidth.
Stacks tblptr and offset are then popped.
At this point, the name of the enclosed procedure is entered into the symbol table of
its enclosing procedure.
Suppose that the context in which an assignment appears is given by the following grammar.
PMD
Mɛ
D D ; D | id : T | proc id ; N D ; S
Nɛ
Nonterminal P becomes the new start symbol when these productions are added to those in the
translation scheme shown below.
S id : = E { p : = lookup ( id.name);
if p ≠nil then
emit( p „ : =‟ E.place)
else error }
E E1 + E2 { E.place : = newtemp;
emit( E.place „: =‟ E1.place „ + „ E2.place ) }
E E1 * E2 { E.place : = newtemp;
emit( E.place „: =‟ E1.place „ * „ E2.place ) }
E - E1 { E.place : = newtemp;
emit ( E.place „: =‟ „uminus‟ E1.place ) }
E ( E1 ) { E.place : = E1.place }
77
E id { p : = lookup ( id.name);
if p ≠nil then
E.place : =
p else error }
statement value of c
0
$0 := a * b 1
$1 := c * d 2
$0 := $0 + $1 1
$1 := e * f 2
$0 := $0 - $1 1
x := $0 0
Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w, then the ith element of array A
begins in location
base + ( i – low ) x w
where low is the lower bound on the subscript and base is the relative address of the storage
allocated for the array. That is, base is the relative address of A[low].
78
i x w + ( base – low x w)
The subexpression c = base – low x w can be evaluated when the declaration of the array is seen.
We assume that c is saved in the symbol table entry for A , so the relative address of A[i] is
obtained by simply adding i x w to c.
Row-major (row-by-row)
Column-major (column-by-column)
A[ 1,1 ] A [ 1,1 ]
first column
first row A[ 1,2 ] A [ 2,1 ]
A[ 1,3 ] A [ 1,2 ]
A[ 2,1 ] A [ 2,2 ] second column
In the case of row-major form, the relative address of A[ i1 ,i2] can be calculated by the formula
where, low1 and low2 are the lower bounds on the values of i1 and i2 and n2 is the number of
values that i2 can take. That is, if high2 is the upper bound on the value of i2, then n2 = high2 –
low2 + 1.
Assuming that i1 and i2 are the only values that are known at compile time, we can rewrite the
above expression as
Generalized formula:
The expression generalizes to the following expression for the relative address of A[i1,i2,…,ik]
(1) SL:=E
(2) EE+E
(3) E(E)
(4) EL
(5) L Elist ]
(6) L id
(7) Elist Elist , E
(8) Elist id [ E
We generate a normal assignment if L is a simple name, and an indexed assignment into the
location denoted by L otherwise :
Elist.place : = E.place;
Elist.ndim : = 1 }
Consider the grammar for assignment statements as above, but suppose there are two
types – real and integer , with integers converted to reals when necessary. We have another
attribute E.type, whose value is either real or integer. The semantic rule for E.type associated
with the production E E + E is :
EE+E { E.type : =
if E1.type = integer and
E2.type = integer then integer
else real }
The entire semantic rule for E E + E and most of the other productions must be
modified to generate, when necessary, three-address statements of the form x : = inttoreal y,
whose effect is to convert integer y to areal of equal value, called x.
Semantic action for E E1 + E2
E.place := newtemp;
if E1.type = integer and E2.type = integer then
begin emit( E.place ‘: =’ E1.place ‘int +’
E2.place); E.type : = integer
end
else if E1.type = real and E2.type = real then begin
emit( E.place ‘: =’ E1.place „real +‟
E2.place); E.type : = real
end
81
t1 : = i int* j t3 :
= inttoreal t1 t2 :
= y real+ t3 x : =
t2
3.5 BOOLEAN EXPRESSIONS
Boolean expressions have two primary purposes. They are used to compute logical
values, but more often they are used as conditional expressions in statements that alter the flow
of control, such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators ( and, or, and not ) applied
to elements that are boolean variables or relational expressions. Relational expressions are of the
form E1 relop E2, where E1 and E2 are arithmetic expressions.
Here we consider boolean expressions generated by the following grammar :
There are two principal methods of representing the value of a boolean expression. They are :
To encode true and false numerically and to evaluate a boolean expression analogously
to an arithmetic expression. Often, 1 is used to denote true and 0 to denote false.
To implement boolean expressions by flow of control, that is, representing the value of a
boolean expression by a position reached in a program. This method is particularly
convenient in implementing the boolean expressions in flow-of-control statements, such
as the if-then and while-do statements.
82
Here, 1 denotes true and 0 denotes false. Expressions will be evaluated completely from
left to right, in a manner similar to arithmetic expressions.
For example :
E E1 or E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘or’E2.place )}
E E1 and E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘and’E2.place )}
E not E1 { E.place : = newtemp;
emit( E.place ‘: =’ ‘not’ E1.place )}
E ( E1 ) { E.place : = E1.place }
E id1 relop id2 { E.place : = newtemp;
emit( ‘if’ id1.place relop.op id2.place ‘goto’ nextstat + 3);
emit( E.place ‘: =’ ‘0’ );
emit(„goto‟ nextstat +2);
emit( E.place ‘: =’ ‘1’) }
E true { E.place : = newtemp;
emit( E.place ‘: =’ ‘1’) }
E false { E.place : = newtemp;
emit( E.place ‘: =’ ‘0’) }
83
Short-Circuit Code:
We can also translate a boolean expression into three-address code without generating
code for any of the boolean operators and without having the code necessarily evaluate the entire
expression. This style of evaluation is sometimes called “short-circuit” or “jumping” code. It is
possible to evaluate boolean expressions without generating code for the boolean operators and,
or, and not if we represent the value of an expression by a position in the code sequence.
Flows-of-Control Statements
We now consider the translation of boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as those generated by the following
grammar:
S if E then S1
| if E then S1 else S2
| while E do S1
In each of these productions, E is the Boolean expression to be translated. In the translation, we
assume that a three-address statement can be symbolically labeled, and that the function
newlabel returns a new symbolic label each time it is called.
E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
The semantic rules for translating a flow-of-control statement S allow control to flow
from the translation S.code to the three-address instruction immediately following
S.code.
S.next is a label that is attached to the first three-address instruction to be executed after
the code for S.
Code for if-then , if-then-else, and while-do statements
to E.true
E.code
to E.false
E.code to E.true E.true: S1.code
E.true : E.false
S1.code
goto S.next
E.false:
S2.code
E.false : ...
S.next:
...
to E.false
E.true: S1.code
goto S.begin
E.false: ...
(c) while-do
Figure3.9: code for if-then,if-then-else,while-do statements
85
E E1 or E2 E1.true : = E.true;
E1.false : = newlabel;
E2.true : = E.true;
E2.false : = E.false;
E.code : = E1.code || gen(E1.false ‘:’) || E2.code
E ( E1 ) E1.true : = E.true;
E1.false : = E.false;
E.code : = E1.code
There is a selector expression, which is to be evaluated, followed by n constant values that the
expression might take, including a default “value” which always matches the expression if no other
value does. The intended translation of a switch is code to:
switch E begin
case V1 : S1 case
V2 : S2
...
case Vn-1 : Sn-1
default : Sn
end
This case statement is translated into intermediate code that has the following form :
Translation of a case statement
goto next
Ln : code for Sn
goto next
test : if t = V1 goto L1
if t = V2 goto L2
...
if t = Vn-1 goto Ln-1
goto Ln
next :
3.7 BACKPATCHING
The easiest way to implement the syntax-directed definitions for boolean expressions is to use
two passes. First, construct a syntax tree for the input, and then walk the tree in depth-first order,
computing the translations. The main problem with generating code for boolean expressions and flow-
of-control statements in a single pass is that during one single pass we may not know the labels that
control must go to at the time the jump statements are generated. Hence, a series of branching
statements with the targets of the jumps left unspecified is generated. Each statement will be put on a
list of goto statements whose labels will be filled in when the proper label can be determined. We call
this subsequent filling in of labels backpatching.
89
Boolean Expressions:
We now construct a translation scheme suitable for producing quadruples for boolean
expressions during bottom-up parsing. The grammar we use is the following:
5. E E1 or M E2
6. | E1 and M E2
7. | not E1
8. | ( E1)
9. | id1 relop id2
10. | true
11. | false
12. M ɛ
Synthesized attributes truelist and falselist of nonterminal E are used to generate jumping code for
boolean expressions. Incomplete jumps with unfilled labels are placed on lists pointed to by
E.truelist and E.falselist.
Attribute M.quad records the number of the first statement of E2.code. With the production M ɛ
we associate the semantic action
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to follow. This value will be backpatched
onto the E1.truelist when we have seen the remainder of the production E E1 and ME2. The
translation scheme is as follows:
E.truelist : = E2.truelist;
E.falselist : = merge(E1.falselist, E2.falselist)}
Flow-of-Control Statements:
5. S if E then S
6. | if E then S else S
7. | while E do S
8. | begin L end
9. |A
10. LL;S
11. |S
Here S denotes a statement, L a statement list, A an assignment statement, and E a boolean expression.
We make the tacit assumption that the code that follows a given statement in execution also follows it
physically in the quadruple array. Else, an explicit jump must be provided.
The nonterminal E has two attributes E.truelist and E.falselist. L and S also need a list of
91
unfilled quadruples that must eventually be completed by backpatching. These lists are pointed to by
the attributes L..nextlist and S.nextlist. S.nextlist is a pointer to a list of all conditional and
unconditional jumps to the quadruple following the statement S in execution order, and L.nextlist is
defined similarly.
The semantic rules for the revised grammar are as follows:
F S if E then M1 S1 N else M2 S2
{ backpatch (E.truelist, M1.quad);
backpatch (E.falselist, M2.quad);
S.nextlist : = merge (S1.nextlist, merge (N.nextlist, S2.nextlist)) }
We backpatch the jumps when E is true to the quadruple M1.quad, which is the beginning of the code
for S1. Similarly, we backpatch jumps when E is false to go to the beginning of the code for S2. The
list S.nextlist includes all jumps out of S1 and S2, as well as the jump generated by N.
The statement following L1 in order of execution is the beginning of S. Thus the L1.nextlist list is
backpatched to the beginning of the code for S, which is given by M.quad.
(9) L S { L.nextlist : = S.nextlist }
The procedure is such an important and frequently used programming construct that it is
imperative for a compiler to generate good code for procedure calls and returns. The run-time routines
that handle procedure argument passing, calls and returns are part of the run-time support package.
S call id ( Elist )
Elist Elist , E
Elist E
Calling Sequences:
The translation for a call includes a calling sequence, a sequence of actions taken on entry to and
exit from each procedure. The falling are the actions that take place in a calling sequence :
When a procedure call occurs, space must be allocated for the activation record of the
called procedure.
The arguments of the called procedure must be evaluated and made available to the
called procedure in a known place.
Environment pointers must be established to enable the called procedure to access data
in enclosing blocks.
The state of the calling procedure must be saved so it can resume execution after the
call.
Also saved in a known place is the return address, the location to which the called
routine must transfer after it is finished.
Finally a jump to the beginning of the code for the called procedure must be
4. Elist E
{ initialize queue to contain only E.place }
Here, the code for S is the code for Elist, which evaluates the arguments, followed by a param
p statement for each argument, followed by a call statement.
queue is emptied and then gets a single pointer to the symbol table location for the name that
denotes the value of E.
90
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program. The
code generation techniques presented below can be used whether or not an optimizing phase
occurs before code generation.
symbol
table
Prior to code generation, the front end must be scanned, parsed and translated into intermediate
representation along with necessary type checking. Therefore, input to code generation is
assumed to be error-free.
10) Target program:
The output of the code generator is the target program. The output may be :
Absolute machine language
It can be placed in a fixed memory location and can be executed immediately.
91
3. Memory management:
Names in the source program are mapped to addresses of data objects in run-time memory by
the front end and code generator. It makes use of symbol table, that is, a name in a three-address
statement refers to a symbol-table entry for the name. Labels in three-address statements have to
be converted to addresses of instructions. For example,
j : goto i generates jump instruction as follows :
if i < j, a backward jump instruction with target address equal to location of
code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for quadruple i the
location of the first machine instruction generated for quadruple j. When i is
processed, the machine locations for all instructions that forward jumps to i
are filled.
4. Instruction selection:
The instructions of target machine should be complete and uniform. Instruction speeds and
machine idioms are important factors when efficiency of target program is considered. The
quality of the generated code is determined by its speed and size. The former statement can be
translated into the latter statement as shown below:
5. Register allocation
Instructions involving register operands are shorter and faster than those involving operands
in memory. The use of registers is subdivided into two subproblems :
Register allocation – the set of variables that will reside in registers at a point in
the program is selected
Register assignment – the specific register that a variable will reside in is
picked.
92
Certain machine requires even-odd register pairs for some operands and results. For
example , consider the division instruction of the form :
D x, y
6. Evaluation order
The order in which the computations are performed can affect the efficiency of the target
code. Some computation orders require fewer registers to hold intermediate results than others.
The source and destination of an instruction are specified by combining registers and
memory locations with address modes.
Table 4.1:Address modes with their assembly-language forms
Absolute M M 1
Register R R 0
Literal #c c 1
93
For example : MOV R0, M stores contents of Register R0 into memory location M ;
MOV 4(R0), M stores the value contents(4+contents(R0)) into M.
Instruction costs :
Instruction cost = 1+cost for source and destination address modes. This cost corresponds to
the length of the instruction. Address modes involving registers have cost zero. Address modes
involving memory location or literal have cost one. Instruction length should be minimized if
space is important. Doing so also minimizes the time taken to fetch and perform the instruction.
For example : MOV R0, R1 copies the contents of register R0 into R1. It has cost one, since it
occupies only one word of memory.
The three-address statement a : = b + c can be implemented by many different
instruction sequences :
i) MOV b, R0
ADD c, R0 cost = 6
MOV R0, a
ii) MOV b, a
ADD c, a cost = 6
iii) Assuming R0, R1 and R2 contain the addresses of a, b, and c :
MOV *R1, *R0
ADD *R2, *R0 cost = 2
In order to generate good code for target machine, we must utilize its addressing
capabilities efficiently.
GOTO callee.code_area /*It transfers control to the target code for the called procedure */
where,
callee.static_area – Address of the activation record callee.code_area
– Address of the first instruction for called procedure
#here +20 – Literal return address which is the address of the instruction following GOTO.
GOTO *callee.static_area
This transfers control to the address saved at the beginning of the activation record.
The statement HALT is the final instruction that returns control to the operating system.
Static allocation can become stack allocation by using relative addresses for storage in
activation records. In stack allocation, the position of activation record is stored in register so
words in activation records can be accessed as offsets from the value in this register.
GOTO callee.code_area
95
where,
caller.recordsize – size of the activation record
#here +16 – address of the instruction following the GOTO
A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without any halt or possibility of branching except at the end.
The following sequence of three-address statements forms a basic block: t1:=a*a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
Output: A list of basic blocks with each three-address statement in exactly one block
Method:
(9) We first determine the set of leaders, the first statements of basic blocks. The rules
we use are of the following:
The first statement is a leader.
Any statement that is the target of a conditional or unconditional goto is a
leader.
Any statement that immediately follows a goto or conditional goto statement
is a leader.
(10) For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Consider the following source code for dot product of two vectors a and b of length 20
begin
prod :=0;
i:=1; do
begin
i :=i+1;
end
while i <= 20
end
A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block. Two important classes of transformation are :
1. Structure-preserving transformations
2. Algebraic transformations
Structure preserving transformations:
a:=b+c a:=b+c
b:=a–d b:=a-d
c:=b+c c:=b+c
d:=a–d d : = b
Since the second and fourth expressions compute the same expression, the basic block can
be transformed as above.
b) Dead-code elimination:
Suppose x is dead, that is, never subsequently used, at the point where the statement x :
= y + z appears in a basic block. Then this statement may be safely removed without
changing the value of the basic block.
d) Interchange of statements:
t1 : = b + c
t2 : = x + y
We can interchange the two statements without affecting the value of the block if
and only if neither x nor y is t1 and neither b nor c is t2.
8. Algebraic transformations:
Algebraic transformations can be used to change the set of expressions computed by a
basic block into an algebraically equivalent set.
Examples:
i) x : = x + 0 or x : = x * 1 can be eliminated from a basic block without changing the set of
expressions it computes.
ii) The exponential statement x : = y * * 2 can be replaced by x : = y * y.
98
Flow graph is a directed graph containing the flow-of-control information for the set of basic
blocks making up a program. The nodes of the flow graph are basic blocks. It has a distinguished
initial node. E.g.: Flow graph for the vector dot product is given as follows:
prod : = 0 B1
i:=1
t1 : = 4 * i
t2 : = a [ t1 ]
t3 : = 4 * i B2
t4 : = b [ t3 ]
t5 : = t2 * t4
t6 : = prod +
t5 prod : = t6
t7 : = i + 1
i : = t7
if i <= 20 goto B2
B1 is the initial node. B2 immediately follows B1, so there is an edge from B1 to B2. The target of
jump from last statement of B1 is the first statement B2, so there is an edge from B1 (last
statement) to B2 (first statement). B1 is the predecessor of B2, and B2 is a successor of B1.
4.4.4 Loops
If the name in a register is no longer needed, then we remove the name from the register
and the register can be used to store some other names.
99
y Live i
z Live i
A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements. For example: consider the three-
address statement a := b+c It can have the following sequence of codes:
(or)
(or)
ADD Rj, Ri
Register and Address Descriptors:
A register descriptor is used to keep track of what is currently in each registers. The register
descriptors show that initially all the registers are empty. An address descriptor stores the
location where the current value of the name can be found at run time.
100
A code-generation algorithm:
The algorithm takes as input a sequence of three -address statements constituting a basic block.
For each three-address statement of the form x : = y op z, perform the following actions:
Invoke a function getreg to determine the location L where the result of the computation y op
z should be stored.
Consult the address descriptor for y to determine y‟, the current location of y. Prefer the
register for y‟ if the value of y is currently both in memory and a register. If the value of y
is not already in L, generate the instruction MOV y‟ , L to place a copy of y in L.
If the current values of y or z have no next uses, are not live on exit from the block, and are
in registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z.
The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
t:=a–b
u:=a–c
v:=t+u
d:=v+u
with d live at the end.
Register empty
The table shows the code sequences generated for the indexed assignment
statements a : = b [ i ] and a [ i ] : = b
Table 4.4: Indexed assignment
The table shows the code sequences generated for the pointer assignments
a : = *p and *p : = a
Table 4.5:Pointer assignment
a : = *p MOV *Rp, a 2
*p : = a MOV a, *Rp 2
Statement Code
x : = y +z if x MOV y, R0
<0 goto z ADD z, R0
MOV R0,x
CJ< z
102
A DAG for a basic block is a directed acyclic graph with the following labels on nodes:
Leaves are labeled by unique identifiers, either variable names or constants.
Interior nodes are labeled by an operator symbol.
Nodes are also optionally given a sequence of identifiers for labels to store
the computed values.
DAGs are useful data structures for implementing transformations on basic blocks.
It gives a picture of how the value computed by a statement is used in subsequent
statements.
It provides a good way of determining common sub - expressions.
Output: A DAG for the basic block containing the following information:
1. A label for each node. For leaves, the label is an identifier. For interior nodes,
an operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case (i)x := y OP z
Case (ii)x := OP y
Case (iii)x := y
Method:
Step 2: For the case(i), create a node(OP) whose left child is node(y) and right child is
For case(ii), determine whether there is node(OP) with one child node(y). If not create
such a node.
Step 3: Delete x from the list of identifiers for node(x). Append x to the list of attached
4. t1 := 4* i
5. t2 := a[t1]
6. t3 := 4* i
7. t4 := b[t3]
8. t5 := t2*t4
9. t6 := prod+t5
10. prod := t6
11. t7 := i+1
12. i := t7
13. if i<=20 goto (1)
Application of DAGs:
The advantage of generating code for a basic block from its dag representation is that,
from a dag we can easily see how to rearrange the order of the final computation sequence than
we can starting from a linear sequence of three-address statements or quadruples.
MOV a , R0
ADD b , R0
MOV c , R1
ADD d , R1
MOV R0 , t1
MOV e , R0
SUB R1 , R0
MOV t1 , R1
SUB R0 , R1
MOV R1 , t4
t2 : = c + d
t3 : = e – t2
t1 : = a + b
t4 : = t1 – t3
MOV c , R0
ADD d , R0
MOV a , R0
SUB R0 , R1
MOV a , R0
ADD b , R0
SUB R1 , R0
MOV R0 , t4
In this order, two instructions MOV R0 , t1 and MOV t1 , R1 have been saved.
109
The heuristic ordering algorithm attempts to make the evaluation of a node immediately follow
the evaluation of its leftmost argument.
The algorithm shown below produces the ordering in reverse.
Algorithm:
1 *
2 + - 3
4
*
5 - + 8
6 + 7 c d 11 e 12
a 9 b 10
Initially, the only node with no unlisted parents is 1 so set n=1 at line (2) and list 1 at line (3).
Now, the left argument of 1, which is 2, has its parents listed, so we list 2 and set n=2 at line (6).
110
Now, at line (4) we find the leftmost child of 2, which is 6, has an unlisted parent 5. Thus we
select anew n at line (2), and node 3 is the only candidate. We list 3 and proceed down its left
chain, listing 4, 5 and 6. This leaves only 8 among the interior nodes so we list that.
Code sequence:
t8 : = d + e t 6 :
= a + b t5 : = t6
– c t4 : = t5 * t8
t3 : = t4 – e t2 :
= t6 + t4 t1 : =
t2 * t3
This will yield an optimal code for the DAG on machine whatever be the number of registers.
5.1 INTRODUCTION
The code produced by the straight forward compiling algorithms can often be made to run
faster or take less space, or both. This improvement is achieved by program transformations that
are traditionally called optimizations. Compilers that apply code-improving transformations are
called optimizing compilers.
Optimizations are classified into two categories. They are
Machine independent optimizations:
Machine dependant optimizations:
Machine independent optimizations are program transformations that improve the target
code without taking into consideration any properties of the target machine.
Simply stated, the best program transformations are those that yield the most benefit for the
least effort.
The transformation must preserve the meaning of programs. That is, the optimization must
not change the output produced by a program for a given input, or cause an error such as
division by zero, that was not present in the original source program. At all times we take the
“safe” approach of missing an opportunity to apply a transformation rather than risk
changing what the program does.
111
A transformation must, on the average, speed up programs by a measurable amount. We are
also interested in reducing the size of the compiled code although the size of the code has
less importance than it once had. Not every transformation succeeds in improving every
program, occasionally an “optimization” may slow down a program slightly.
The transformation must be worth the effort. It does not make sense for a compiler writer to
expend the intellectual effort to implement a code improving transformation and to have the
compiler expend the additional time compiling source programs if this effort is not repaid
when the target programs are executed. “Peephole” transformations of this kind are simple
enough and beneficial enough to be included in any compiler.
112
Flow analysis is a fundamental prerequisite for many important types of code improvement.
Generally control flow analysis precedes data flow analysis.Control flow analysis (CFA)
represents flow of control usually in form of graphs, CFA constructs such as
1. control flow graph
2. Call graph
Data flow analysis (DFA) is the process of ascerting and collecting information prior to program
execution about the possible modification, preservation, and use of certain entities (such as
values or attributes of variables) in a computer program.
There are a number of ways in which a compiler can improve a program without
changing the function it computes. The transformations
Common sub expression elimination,
Copy propagation,
Dead-code elimination, and
Constant folding
are common examples of such function-preserving transformations. The other transformations
come up primarily when global optimizations are performed.
113
Frequently, a program will include several calculations of the same value, such as an offset
in an array. Some of the duplicate calculations cannot be avoided by the programmer because
they lie below the level of detail accessible within the source language.
1.Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous computation.
We can avoid recomputing the expression if we can use the previously computed value.
For example
t1: =4*i
t2: =a [t1]
t3: =4*j
t4:=4*i
t5: =n
t6: =b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: =4*i
t2: =a [t1]
t3: =4*j
t5: =n
t6: =b [t1] +t5
The common sub expression t 4: =4*i is eliminated as its computation is already in t1. And value
of i is not been changed from definition to use.
2.Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind
the copy-propagation transformation is to use g for f, whenever possible after the copy statement
f: = g. Copy propagation means use of one variable instead of another. This may not appear to be
improvement, but as we shall see it gives us an opportunity to eliminate x.
an
For example
x=Pi;
..…
A=x*r*r;
The optimization using copy propagation can be done as follows:
A=Pi*r*r;
Here the variable x is eliminated
3.Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used subsequently; otherwise, it
is dead at that point. A related idea is dead or useless code, statements that compute the
114
values that never get used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations. An optimization can be
done byeliminating dead code.
Example:
i=0;
if(i=1)
{
a=b+5;
}
Here, „if‟statement is dead code because this condition will never get satisfied.
4.Constant folding:
We can eliminate both the test and printing from the object code. More generally, deducing
at compile time that the value of an expression is a constant and using the constant instead is
known as constant folding. One advantage of copy propagation is that it often turns the copy
statement into dead code. For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
An important modification that decreases the amount of code in a loop is code motion.
This transformation takes an expression that yields the same result independent of the number of
times a loop is executed ( a loop-invariant computation) and places the expression before the
loop. Note that the notion “before the loop” assumes the existence of an entry for the loop. For
example, evaluation of limit-2 is a loop-invariant computation in the following while-statement:
while (i <= limit-2) /* statement does not change limit*/
t= limit-2;
while (i<=t) /* statement does not change limit or t */
2.Induction Variables :
Loops are usually processed inside out. For example consider the loop around B3. Note that
the values of j and t4 remain in lock-step; every time the value of j decreases by 1, that of t4
decreases by 4 because 4*j is assigned to t4. Such identifiers are called induction variables. When
there are two or more induction variables in a loop, it may be possible to get rid of all but one, by
the process of induction-variable elimination. For the inner loop around B3 in Fig. we cannot get
rid of either j or t4 completely; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2 - B5 is
considered. Example:As the relationship t4:=4*j surely holds after such an assignment to t4 in
Fig. and t4 is not changed elsewhere in the inner loop around B3, it follows that just after the
statement j:=j -1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t
4:= 4*j by t4:= t4-4. The only problem is that t 4 does not have a value when we enter block B3
for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3, we
place an initializations of t4 at the end of the block where j itself is
before after
Structure-Preserving Transformations
Algebraic Transformations
Structure-Preserving Transformations:
Common sub-expressionelimination
Dead code elimination
Renaming of temporary variables
Interchange of two independent adjacent statements.
1.Common sub-expression elimination:
Common sub expressions need not be computed over and over again. Instead they can be
computed once and kept in store from where it‟s referenced when encountered again – of course
providing the variable values in the expression still remain constant.
Example:
=b+c
=a-d
=b+c
=a-d
nd th
The 2 and 4 statements compute the same expression: b+c and a-d
It‟s possible that a large amount of dead (useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of construction or
error -correction of a program – once declared and defined, one forgets to remove them in case
they serve no purpose. Eliminating these will definitely optimize the code.
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value of t1
does not affect the value of t2.
Algebraic Transformations:
Algebraic identities represent another important class of optimizations on basic blocks. This
includes simplifying expressions or replacing expensive operation by cheaper ones i.e. reduction
in strength. Another class of related optimizations is constant folding. Here we evaluate constant
expressions at compile time and replace the constant expressions by their values. Thus the
expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common sub
expressions. Associative laws may also be applied to expose common sub expressions. For
example, if the source code has the assignments
a :=b+c e
:=c+d+b
Example:
x:=x+0 can be removed
The compiler writer should examine the language carefully to determine what rearrangements of
computations are permitted, since computer arithmetic does not always obey the algebraic
identities of mathematics. Thus, a compiler may evaluate x*y-x*z as x*(y-z) but it may not
evaluate a+(b-c) as (a+b)-c.
Dominators:
In a flow graph, a node d dominates node n, if every path from initial node of the flow
graph to n goes through d. This will be denoted by d dom n. Every initial node dominates all the
remaining nodes in the flow graph and the entry of a loop dominates all nodes in the loop.
Similarlyeverynode dominates itself.
Example:
*In the flow graph below,
*Initial node,node1 dominates every
node. *node 2 dominates itself
*node 3 dominates all but 1 and 2.
*node 4 dominates all but 1,2 and 3.
*node 5 and 6 dominates only themselves,since flow of control can skip around either by goin
through the other.
*node 7 dominates 7,8 ,9 and 10.
*node 8 dominates 8,9 and 10.
*node 9 and 10 dominates only themselves.
The way of presenting dominator information is in a tree, called the dominator tree in which
the initial node is the root. The parent of each other node is its immediate dominator. Each node
d dominates only its descendents in the tree. The existence of dominator tree follows from a
property of dominators; each node has a unique immediate dominator in that is the last
dominator of n on any path from the initial node to n. In terms of the dom relation, the immediate
dominator m has the property is d=!n and d dom n, then d dom m.
D(1)={1}
D(2)={1,2}
D(3)={1,3}
D(4)={1,3,4}
D(5)={1,3,4,5}
D(6)={1,3,4,6}
D(7)={1,3,4,7}
D(8)={1,3,4,7,8}
D(9)={1,3,4,7,8,9}
D(10)={1,3,4,7,8,10}
120
Natural Loop:
7→4 4 DOM 7
10 →7 7 DOM 10
4→3
8→3
9 →1
The above edges will form loop in flow graph. Given a back edge n → d, we define the natural
loop of the edge to be d plus the set of nodes that can reach n without going through d. Node d is
the header of the loop.
Output: The set loop consisting of all nodes in the natural loop n→d.
Method: Beginning with node n, we consider each node m*d that we know is in loop, to make
sure that m‟s predecessors are also placed in loop. Each node in loop, except for d, is placed once
on stack, so its predecessors will be examined. Note that because d is put in the loop initially, we
never examine its predecessors, and thus find only those nodes that reach n without going
through d.
Procedure insert(m);
if m is not in loop then
begin loop := loop U
{m}; push m onto stack
end;
stack : =empty;
121
loop :
={d};
insert(n);
while stack is not empty do begin
pop m, the first element of stack, off stack;
for each predecessor p of m do insert(p)
end Inner
loop:
If we use the natural loops as “the loops”, then we have the useful property that unless two
loops have the same header, they are either disjointed or one is entirely contained in the other.
Thus, neglecting loops with the same header for the moment, we have a natural notion of inner
loop: one that contains no other loop.
When two natural loops have the same header, but neither is nested within the other, they are
combined and treated as a single loop.
Pre-Headers:
Several transformations require us to move statements “before the header”. Therefore begin
treatment of a loop L by creating a new block, called the preheater. The pre-header has only the
header as successor, and all edges which formerly entered the header of Lfrom outside L instead
enter the pre-header. Edges from inside loop L to the header are not changed. Initially the pre-
header is empty, but transformations on L may place statements in it.
header pre-header
loop L
header
loop L
Reducible flow graphs are special flow graphs, for which several code optimization
transformations are especially easy to perform, loops are unambiguously defined, dominators
can be easily calculated, data flow analysis problems can also be solved efficiently. Exclusive
use of structured flow-of-control statements such as if-then-else, while-do, continue, and break
statements produces programs whose flow graphs are always reducible.The most important
properties of reducible flow graphs are that there are no jumps into the middle of loops from
outside; the only entry to a loop is through its header.
122
Definition:
A flow graph G is reducible if and only if we can partition the edges into two disjoint groups,
forward edges and back edges, with the following properties.
The forward edges from an acyclic graph in which every node can be reached from initial
node of G.
The back edges consist only of edges where heads dominate theirs tails.
Example: The above flow graph is reducible.
If we know the relation DOM for a flow graph, we can find and remove all the back edges. The
remaining edges are forward edges. If the forward edges form an acyclic graph, then we can say
the flow graph reducible. In the above example remove the five back edges 4→3, 7→4, 8→3,
9→1 and 10→7 whose heads dominate their tails, the remaining graph is acyclic. The key
property of reducible flow graphs for loop analysis is that in such flow graphs every set of nodes
that we would informally regard as a loop must contain a back edge.
we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of
a is already in register R0.If (2) had a label we could not be sure that (1) was always executed
immediately before (2) and so we could not remove (2).
Unreachable Code:
#define debug
0 ….
If ( debug ) {
debug =1 goto L2
goto L2
L2: …………………………(a)
One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what the
value of debug; (a) can be replaced by:
If debug ≠1 goto L2
Print debugging information
L2: ……………………………(b)
124
As the argument of the statement of (b) evaluates to a constant true it can be replaced by
If debug ≠0 goto L2
L2: ……………………………(c)
As the argument of the first statement of (c) evaluates to a constant true, it can be replaced by
goto L2. Then all the statement that print debugging aids are manifestly unreachable and can be
eliminated one at a time.
Flows-Of-Control Optimizations:
The unnecessary jumps can be eliminated in either the intermediate code or the target
code by the following types of peephole optimizations. We can replace the jump sequence
goto L1
….
L1: gotoL2
by the sequence
goto L2
….
L1: goto L2
If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2
provided it is preceded by an unconditional jump .Similarly, the sequence
if a < b goto L1
….
L1: goto L2
can be replaced by
….
L1: goto L2
Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional goto. Then
the sequence
goto L1
……..
125
L3: …………………………………..(1)
Maybe replaced by
Ifa<b goto L2
goto L3
…….
L3: ………………………………….(2)
While the number of instructions in (1) and (2) is the same, we sometimes skip the
unconditional jump in (2), but never in (1).Thus (2) is superior to (1) in execution time
Algebraic Simplification:
There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only a few algebraic identities occur frequently enough that it is worth
considering implementing them .For example, statements such as
(d) := x+0
Or
x := x * 1
Are often produced by straightforward intermediate code-generation algorithms, and they can be
eliminated easily through peephole optimization.
Reduction in Strength:
Reduction in strength replaces expensive operations by equivalent cheaper ones on the target
machine. Certain machine instructions are considerably cheaper than others and can often be
used as special cases of more expensive operators.
For example, x² is invariably cheaper to implement as x*x than as a call to an exponentiation
routine. Fixed-point multiplication or division by a power of two is cheaper to implement as a
shift. Floating-point division by a constant can be implemented as multiplication by a constant,
which may be cheaper.
2
X → X*X
126
i:=i-1 → i--
In order to do code optimization and a good job of code generation , compiler needs to
collect information about the program as a whole and to distribute this information to each
block in the flow graph. A compiler could take advantage of “reaching definitions” , such as
knowing where a variable like debug was last defined before reaching a given block, in order
to perform transformations are just a few examples of data-flow information that an
optimizing compiler collects by a process known as data-flow analysis.
Data-flow information can be collected by setting up and solving systems of equations
of the form :
out [S] = gen [S] U ( in [S] – kill [S] )
This equation can be read as “ the information at the end of a statement is either generated
within the statement , or enters at the beginning and is not killed as control flows through the
statement.”
The details of how data-flow equations are set and solved depend on three factors.
The notions of generating and killing depend on the desired information, i.e., on the data
flow analysis problem to be solved. Moreover, for some problems, instead of
proceeding along with flow of control and defining out[s] in terms of in[s], we need to
proceed backwards and define in[s] in terms of out[s].
Since data flows along control paths, data-flow analysis is affected by the constructs in
a program. In fact, when we write out[s] we implicitly assume that there is unique end
point where control leaves the statement; in general, equations are set up at the level of
basic blocks rather than statements, because blocks do have unique end points.
There are subtleties that go along with such statements as procedure calls,
assignments through pointer variables, and even assignments to array variables.
Within a basic block, we talk of the point between two adjacent statements, as well as
the point before the first statement and after the last. Thus, block B1 has four points: one before
any of the assignments and one after each of the three assignments.
127
B1
d1 : i :=m-
1 d2: j :=n
d3: a := u1
B2
d4 : I := i+1
B3
d5: j := j-1
B4
B5 B6
d6 :a :=u2
Now let us take a global view and consider all the points in all the blocks. A path from p1 to pn
is a sequence of points p1, p2,….,pn such that for each i between 1 and n-1, either
Pi is the point immediately preceding a statement and pi+1 is the point
immediately following that statement in the same block, or
Pi is the end of some block and pi+1 is the beginning of a successor block.
Reaching definitions:
A definition of variable x is a statement that assigns, or may assign, a value to x. The most
common forms of definition are assignments to x and statements that read a value from an i/o
device and store it in x. These statements certainly define a value for x, and they are referred
to as unambiguous definitions of x. There are certain kinds of statements that may define a
value for x; they are called ambiguous definitions. The most usual forms of ambiguous
definitions of x are:
A call of a procedure with x as a parameter or a procedure that can access x because x
is in the scope of the procedure.
An assignment through a pointer that could refer to x. For example, the assignment *q: =
y is a definition of x if it is possible that q points to x. we must assume that an
assignment through a pointer is a definition of every variable.
128
We say a definition d reaches a point p if there is a path from the point immediately following
d to p, such that d is not “killed” along that path. Thus a point can be reached by an
unambiguous definition and an ambiguous definition of the same variable appearing later
along one path.
Flow graphs for control flow constructs such as do-while statements have a useful
property: there is a single beginning point at which control enters and a single end point that
control leaves from when execution of the statement is over. We exploit this property when we
talk of the definitions reaching the beginning and the end of statements with the following
syntax.
S id: = E| S; S | if E then S else S | do S while
EE id + id| id
Expressions in this language are similar to those in the intermediate code, but the flow
graphs for statements have restricted forms.
S1
S1
If E goto s1
S2
S1 S2 If E goto s1
S1 ; S2
we say that the beginning points of the dummy blocks at the entry and exit of a statement‟s
region are the beginning and end points, respectively, of the statement. The equations are
inductive, or syntax-directed, definition of the sets in[S], out[S], gen[S], and kill[S] for all
statements S.
gen[S] is the set of definitions “generated” by S while kill[S] is the set of definitions
that never reach the end of S.
Consider the following data-flow equations for reaching definitions :
i)
S d:a:=b+c
gen [S] = { d }
kill [S] = Da – { d }
out [S] = gen [S] U ( in[S] – kill[S] )
Observe the rules for a single assignment of variable a. Surely that assignment is a definition
of a, say d. Thus
Gen[S]={d}
On the other hand, d “kills” all other definitions of a, so we write
Kill[S] = Da – {d}
ii )
S S1
S2
gen[S]=gen[S2] U (gen[S1]-kill[S2])
Kill[S] = kill[S2] U (kill[S1] – gen[S2]
in [S1] = in [S]
in [S2] = out [S1]
out [S] = out [S2]
131
Under what circumstances is definition d generated by S=S1; S2? First of all, if it is generated by
S2, then it is surely generated by S. if d is generated by S1, it will reach the end of S provided it
is not killed by S2. Thus, we write
gen[S]=gen[S2] U (gen[S1]-kill[S2])
There is a subtle miscalculation in the rules for gen and kill. We have made the
assumption that the conditional expression E in the if and do statements are “uninterpreted”;
that is, there exists inputs to the program that make their branches go either way. We
assume that any graph-theoretic path in the flow graph is also an execution path, i.e., a path
that is executed when the program is run with least one possible input.
When we compare the computed gen with the “true” gen we discover that the true gen is
always a subset of the computed gen. on the other hand, the true kill is always a superset of the
computed kill. These containments hold even after we consider the other rules. It is natural to
wonder whether these differences between the true and computed gen and kill sets present a
serious obstacle to data-flow analysis. The answer lies in the use intended for these data.
Overestimating the set of definitions reaching a point does not seem serious; it merely stops
us from doing an optimization that we could legitimately do. On the other hand,
underestimating the set of definitions is a fatal error; it could lead us into making a change in
the program that changes what the program computes. For the case of reaching definitions,
then, we call a set of definitions safe or conservative if the estimate is a superset of the true set
of reaching definitions. We call the estimate unsafe, if it is not necessarily a superset of the
truth. Returning now to the implications of safety on the estimation of gen and kill for reaching
definitions, note that our discrepancies, supersets for gen and subsets for kill are both in the
safe direction. Intuitively, increasing gen adds to the set of definitions that can reach a point,
and cannot prevent a definition from reaching a place that it truly reached. Decreasing kill can
only increase the set of definitions reaching any given point.
any data-flow problems can be solved by synthesized translations similar to those used to
compute gen and kill. It can be used, for example, to determine loop-invariant computations.
However, there are other kinds of data-flow information, such as the reaching-definition
problem. It turns out that in is an inherited attribute, and out is a synthesized attribute depending
on in. we intend that in[S] be the set of definitions reaching the beginning of S, taking into
account the flow of control throughout the entire program, including statements outside of S or
within which S is nested.
132
The set out[S] is defined similarly for the end of s. it is important to note the distinction
between out[S] and gen[S]. The latter is the set of definitions that reach the end of S without
following paths outside S. Assuming we know in[S] we compute out by equation, that is
Considering cascade of two statements S1; S2, as in the second case. We start by
observingin[S1]=in[S]. Then, we recursively compute out[S1], which gives us in[S2], since a
definition reaches the beginning of S2 if and only if it reaches the end of S1. Now we can
compute out[S2], and this set is equal to out[S].
Considering if-statement we have conservatively assumed that control can follow either
branch, a definition reaches the beginning of S1 or S2 exactly when it reaches the beginning of
S.
In[S1] = in[S2] = in[S]
If a definition reaches the end of S if and only if it reaches the end of one or both sub
statements; i.e,
Out[S]=out[S1] U out[S2]
Representation of sets:
Sets of definitions, such as gen[S] and kill[S], can be represented compactly using bit
vectors. We assign a number to each definition of interest in the flow graph. Then bit vector
representing a set of definitions will have 1 in position I if and only if the definition
numbered I is in the set. The number of definition statement can be taken as the index of
statement in an array holding pointers to statements. However, not all definitions may be of
interest during global data-flow analysis. Therefore the number of definitions of interest
will typically be recorded in a separate table.
A bit vector representation for sets also allows set operations to be implemented efficiently.
The union and intersection of two sets can be implemented by logical or and logical and,
respectively, basic operations in most systems-oriented programminglanguages. The
difference A-B of sets A and B can be implemented by taking the complement of B and then
using logical and to compute A .
Local reaching definitions:
Space for data-flow information can be traded for time, by saving information only at certain
points and, as needed, recomputing information at intervening points. Basic blocks are usually
treated as a unit during global flow analysis, with attention restricted to only those points that
are the beginnings of blocks. Since there are usually many more points than blocks, restricting
our effort to blocks is a significant savings. When needed, the reaching definitions for all points
in a block can be calculated from the reaching definitions for the beginning of a block.
133
Use-definition chains:
Evaluation order:
The techniques for conserving space during attribute evaluation, also apply to the
computation of data-flow information using specifications. Specifically, the only constraint on
the evaluation order for the gen, kill, in and out sets for statements is that imposed by
dependencies between these sets. Having chosen an evaluation order, we are free to release the
space for a set after all uses of it have occurred. Earlier circular dependencies between
attributes were not allowed, but we have seen that data-flow equations may have circular
dependencies.
General control flow:
Data-flow analysis must take all control paths into account. If the control paths are evident
from the syntax, then data-flow equations can be set up and solved in a syntax-directed
manner. When programs can contain goto statements or even the more disciplined break and
continue statements, the approach we have taken must be modified to take the actual control
paths into account. Several approaches may be taken. The iterative method works arbitrary flow
graphs. Since the flow graphs obtained in the presence of break and continue statements are
reducible, such constraints can be handled systematically using the interval-based methods.
However, the syntax-directed approach need not be abandoned when break and continue
statements are allowed.
The available expressions data-flow problem discussed in the last section allows us to
determine if an expression at point p in a flow graph is a common sub-expression. The following
algorithm formalizes the intuitive ideas presented for eliminating common sub-expressions.
134
ALGORITHM:Global common sub expression elimination.
a :=x+y c :=x+y
vs
b :=a*z d :=c*z
Because this simple approach to common sub expressions considers only the
literal expressions themselves, rather than the values computed by expressions.
135
Copy propagation:
Various algorithms introduce copy statements such as x :=copies may also be generated
directly by the intermediate code generator, although most of these involve temporaries local to
one block and can be removed by the dag construction. We may substitute y for x in all these
places, provided the following conditions are met every such use u of x. Statement s must be the
only definition of x reaching u. On every path from s to including paths that go through u
several times, there are no assignments to y.
Condition (1) can be checked using ud-changing information. We shall set up a new data-
flow analysis problem in which in[B] is the set of copies s: x:=y such that every path from
initial node to the beginning of B contains the statement s, and subsequent to the last occurrence
of s, there are no assignments to y.
Determine those uses of x that are reached by this definition of namely, s: x: =y.
Determine whether for every use of x found in (1) , s is in c_in[B], where B is the
block of this particular use, and moreover, no definitions of x or y occur prior to this
use of x within B. Recall that if s is in c_in[B]then s is the only definition of x that
reaches B.
If s meets the conditions of (2), then remove s and replace all uses of x found in (1)
by y.
Ud-chains can be used to detect those computations in a loop that are loop-invariant, that is,
whose value does not change as long as control stays within the loop. Loop is a region
consisting of set of blocks with a header that dominates all the other blocks, so the only way to
enter the loop is through the header.
If an assignment x := y+z is at a position in the loop where all possible definitions of y and z are
outside the loop, then y+z is loop-invariant because its value will be the same each time x:=y+z is
encountered. Having recognized that value of x will not change, consider v := x+w, where w could
only have been defined outside the loop, then x+w is also loop-invariant.
136
INPUT: A loop L consisting of a set of basic blocks, each block containing sequence
of three-address statements. We assume ud-chains are available for the individual
statements.
OUTPUT: the set of three-address statements that compute the same value each time
executed, from the time control enters the loop L until control next leaves L.
METHOD: we shall give a rather informal specification of the algorithm, trusting
that the principles will be clear.
Mark “invariant” those statements whose operands are all either constant or have
all their reaching definitions outside L.
Repeat step (3) until at some repetition no new statements are marked “invariant”.
Mark “invariant” all those statements not previously so marked all of whose
operands either are constant, have all their reaching definitions outside L, or have
exactly one reaching definition, and that definition is a statement in L marked
invariant.
Performing code motion:
Having found the invariant statements within a loop, we can apply to some of them an
optimization known as code motion, in which the statements are moved to pre-header of the
loop. The following three conditions ensure that code motion does not change what the
program computes. Consider s: x: =y+z.
The block containing s dominates all exit nodes of the loop, where an exit of a loop is a
node with a successor not in the loop.
There is no other statement in the loop that assigns to x. Again, if x is a temporary assigned
only once, this condition is surely satisfied and need not be changed. No use of x in the
loop is reached by any definition of x other than s. This condition too will be satisfied,
normally, if x is temporary.
METHOD:
Use loop-invariant computation algorithm to find loop-invariant statements.
137
iii) That all uses in L of x can only be reached by the definition of x in statement
s.
To understand why no change to what the program computes can occur, condition (2i) and (2ii)
of this algorithm assure that the value of x computed at s must be the value of x after any exit
block of L. When we move s to a pre-header, s will still be the definition of x that reaches the
end of any exit block of L. Condition (2iii) assures that any uses of x within L did, and will
continue to, use the value of x computed by s.
Alternative code motion strategies:
The condition (1) can be relaxed if we are willing to take the risk that we may actually
increase the running time of the program a bit; of course, we never change what the program
computes. The relaxed version of code motion condition (1) is that we may move a statement
s assigning x only if:
1‟. The block containing s either dominates all exists of the loop, or x is not used outside
the loop. For example, if x is a temporary variable, we can be sure that the value will
be used only in its own block.
If code motion algorithm is modified to use condition (1‟), occasionally the running time will
increase, but we can expect to do reasonably well on the average. The modified algorithm may
move to pre-header certain computations that may not be executed in the loop. Not only does
this risk slowing down the program significantly, it may also cause an error in certain
circumstances.
Even if none of the conditions of (2i), (2ii), (2iii) of code motion algorithm are met by an
assignment x: =y+z, we can still take the computation y+z outside a loop. Create a new
temporary t, and set t: =y+z in the pre-header. Then replace x: =y+z by x: =t in the loop. In
many cases we can propagate out the copy statement x: = t.
138
METHOD:
Consider each basic induction variable i whose only uses are to compute other induction
variables in its family and in conditional branches. Take some j in i‟s family, preferably one
such that c and d in its triple are as simple as possible and modify each test that i appears in
to use j instead. We assume in the following tat c is positive. A test of the form „if i relop x
goto B‟, where x is not an induction variable, is replaced by
r := c*x /* r := x if c is 1. */
r := r+d /* omit if d is 0 */
if j relop r goto B
where, r is a new temporary. The case „if x relop i goto B‟is handled analogously. If there are
two induction variables i1 and i2 in the test if i 1 relop i2 goto B, then we check if both i1 and i2
can be replaced. The easy case is when we have j1 with triple and j2 with triple, and c1=c2 and
d1=d2. Then, i1 relop i2 is equivalent to j1 relop j2.
Now, consider each induction variable j for which a statement j: =s was introduced. First
check that there can be no assignment to s between the introduced statement j :=s and any use of
j. In the usual situation, j is used in the block in which it is defined, simplifying this check;
otherwise, reaching definitions information, plus some graph analysis is needed to implement
the check. Then replace all uses of j by uses of s and delete statement j:
140
ONLINE QUESTIONS
UNIT-I
opt opt
Questions opt1 opt2 opt3 opt4 5 6 answer
_______
translates
assembly
level
language in
to an
equivalent
machine level Assemble Preprocesso
language Compiler r Loader r Assembler
_______
translates
high level
language in
to an
equivalent
low level Assemble Preprocesso
language Compiler r Loader r Compiler
File inclusion
is performed Assemble Preprocesso Preprocesso
by _______ Compiler r Loader r r
_______
performs type Lexical Semantic Linear Syntax Semantic
checking analysis analysis analysis analysis analysis
Grouping of
characters is
called
_______ String Stream Token Record Token
______
groups
tokens in to
grammatical
phrases Parser Scanner Analyzer Processor Parser
Example for
Token Syntax Character Symbol Keyword Keyword
141
The Idem
potent law in
regular
expression is R *** = r * R ** = r *** R ** = r * R *** = r ** R ** = r *
_______
breaks up the
source
program into
pieces &
creates
intermediate
code
representatio Analysis Syntax Synthesis Analysis
n Linear phase phase phase phase phase
_______
constructs the
target
program from
intermediate
code
representatio Analysis Syntax Synthesis Synthesis
n Linear phase phase phase phase phase
Grouping of
tokens into
syntactic
structure is Code
performed by Linear analysis Parser Scanner optimization Parser
______
transforming
parse tree in
to
intermediate
language Code Intermediat Intermediate
representatio Three address generatio e code Post fix code
n code n generation notation generation
______
converts
intermediate
code in to low Intermediate Code
level code generatio Code
language generation n Assembler Loader generation
142
_______ is
the input of Sequence
structure Sequence of of Sequence Sequence of
editors commands characters of tokens String commands
Analyzing
Pretty printers and Debugging Debugging Analyzing
performs Printing only printing and printing only and printing
_______
translates
high level
language in
to an
equivalent
low level Assemble Preprocesso
Language Interpreter r Loader r Interpreter
_______ is
the input of
text Sequence of Stream of Stream of Stream of
formatters commands characters tokens Lexeme characters
Query
interpreters
translates a Relational Arithmetic,
predicate Relational and Relational Relational
contains____ Boolean operators Boolean and Boolean and Boolean
_ operators only only operators operators operators
Loading
Loading program in Loading
Loader Loading program to program in
performs program in to in to main secondary to main
_____ cache memory memory memory Inking memory
Parser
generators Lexical Syntax Little Syntax
produce ____ Scanners analyzers analyzers languages analyzers
____ uses
Scanner Semantic Lexical Syntax Little Lexical
generators Analyzers analyzers analyzers languages analyzers
143
Intermediate
code
generation Syntax Syntax
using Syntax directed Syntax Syntax directed
________ direction trlation trlation directed trlation
tool engine engine scheme scheme engine
Code
Optimization
phase using Automatic
________ Data flow code Code Code Data flow
tool engine generator generator optimizer engine
Code
Generation
phase using Automatic Automatic
________ Data flow code Code Code code
tool engine generator generator optimizer generator
Sequence
of
Lexeme is a Sequence of command Set of Sequence of
________ characters s strings Pattern characters
Relational
operators is a
________ Lexeme Pattern Token Character Token
Deleting an
extraneous
character is a
action of
______
phase Lexical Syntax Semantic Synthesis Lexical
Trposing two
adjacent
characters is
a action of
______
phase Semantic Syntax Lexical Synthesis Lexical
_________ is
an error
recovery
action in Inserting a Inserting a
Lexical missing Function Semicolon Misspelled missing
analysis character call return missing keyword character
Sentinel is an
_________ Foe Foef Feof Eof Eof
144
____ is an
one way of
implementing Using
lexical Using Operating
analyzer Using Lex Lexeme Using Yacc System Using Lex
The pointer
used in buffer
pair scheme Lexeme- Lexeme-
is Backward Forward End Start Forward
_______ is
an example
of Computer
Alphabets ASC EBDCIC ASCI EBCDIC EBCDIC
Finite
sequence of
symbols
called
_______ String Character Sequence Group String
Any set of
strings over
some fixed
alphabet is a
_____ Abstract Alphabets Language Sequence Language
Set of letters
and digits is
represented
by LD LUD (LU* (L* LUD
Set of strings
including
empty string
is
represented
is a language
as L+ L* D+ D* L*
_____ is an
representatio
n of one or
more digits L+ L* D+ D* D+
If X = class ,
Y = room Class
then XY is Class Room Room Classroom Classroom
Prefix of
Banana is Ban Ana Na Banana Ban
_____ is the
subsequence
of banana Can Baaa Nand Nanan Baaa
Suffix of
Banana is Ban Baaa Nana Banana Nana
Substring of
banana is Nan Baaa Aaa Bnn Nan
{ s| s is in { s| s is in L { s| s is in L
Definition of { s| s is in L or s L and s is nor s is in nand s is in { s| s is in L
LUM is is in M } in M } M} M} or s is in M }
_____ is a
notation for Digit
Regular Letter(letterdigit (letterdigit Digit (letter Letter(letter | Letter(letter |
Expression )+ )+ | digit) * digit) * digit) *
{ st | s is { st | s is in { st | s is in L { st | s is in L
Definition of { st | s is in L or in L and t L or t is in and s is in M and t is in M
LM is s is in M } is in M } M} } }
L* is an
representatio Negative Positive Kleene Kleene
n of _____ closure closure closure Line closure closure
146
Positive
closure of L is
written as L+ L* D+ D* L+
(r) | (s) is a
regular
expression
denoting
_____ L(r) | L(s) L(r) L(s) L(r)* L(s) L(r) U L(s) L(r) U L(s)
(r) (s) is a
regular
expression
denoting
_____ L(r) | L(s) L(r)L(s) L(r)* L(s) L(r) U L(s) L(r)L(s)
(r) * is a
regular
expression
denoting
_____ (L(r))* Lr* L(r) L*(r) (L(r))*
The regular
expression a* { a , aa , { ? , a , aa , { ? , a , aa ,
denotes {a} { ? , a} aaa, ……} aaa, ……} aaa, ……}
Identifier is
represented [ A-Za-z] [
in character [ A-Z] [ A-Z0- A-Za-z0- [ A-Za-z] [ [ A-Z] [ A- [ A-Za-z] [ A-
class as 9]* 9]* A-Za-z]* Za-z0-9]* Za-z0-9]*
_______
cannot be { w | w is ??{ w* | w ???{ wcw |
described by ???{ wcw | w is a string of is a string ??{ w+ | w is w is a string
a regular a string of a‟s a‟s and of a‟s and a string of of a‟s and
expression and b‟s } b‟s } b‟s } a‟s and b‟s } b‟s }
_______ is
an
associative
property of a
regular R|(s|t) = R|(s|t) = R|(s|t) =
expression R(s|t) = (r|s)t (r|s)t (r|s)|t (s|t)r= t(r|s) (r|s)|t
147
A
replacement
according to
a production
is
called______ Productio
_ Reduction n Derivation Parse tree Derivation
UNIT-II
opt opt
Questions opt1 opt2 opt3 opt4 5 6 answer
Process of
replacing a string
by an NT
according to a
production______ Productio
_ Reduction n Derivation Parse tree Reduction
Which of the
following is not a all the leaf root node interior interior node interior
true statement as nodes are is start node is is non - node is
a derivation tree? terminals symbol terminal terminal terminal
Method of
converting regular
expression to a
recognizer is Lexical Finite Finite
_______ analyzer automata Lex Yacc automata
Demerits of using
transition table is tough to takes up takes up lot of takes up lot
_______ implement slower less space space of space
In which Finite
Automata no
states has an ? -
transition NFA DFA NDFA DNFA DFA
148
_______ has
atmost one edge
labeled “a”
leaving state S NFA DFA NDFA DNFA DFA
Tool used to
design the lexical
analysis
is_______ Yacc Lexeme Lex Yaccer Lex
_______
language is used
to create Lex
program C Lex Yacc Linux Lex
_______ is a c
program
produced by Lex
compiler Lex.yy.c Lex.c a.out tokens Lex.yy.c
_______ is the
output produced
by C compiler in
Lex Lex.yy.c Lex.c a.out tokens a.out
_______ is the
input taken by C
compiler in Lex Lex.yy.c Lex.c a.out tokens Lex.yy.c
______ is one of
the field in the Transition Translatio Translation
Lex specification rules n rules Definition main function rules
Elimination of left
recursion for the A ? ßA‟ ,
A ? Aa / ß are A ? A‟a , A‟ A‟ ? aA‟ / A ? Aa , A‟ A ? Aa / ß , A A ? ßA‟ , A‟
_______ ? ß „a / ? ? ? A‟a / ? ? ß„a / ? ? aA‟ / ?
Elimination of left
recursion for the
E ? E+T / T is E? TE‟ , E‟ E? T‟E , E? T‟E‟ , E‟ E ? TE , E‟ ? E? TE‟ , E‟
_____ ? +TE‟ / ? E‟ ? +E / ? ? +T‟E/ ? +T‟E‟ / ?: ? +TE‟ / ?
Elimination of left
recursion for the
T ? T * F / F is T ? FT , T‟ T? F‟T , T‟ T? F‟T‟ , T‟ T? FT‟ , T‟ ? T? FT‟ , T‟
_____ ? *F‟T‟ / ? ? *E / ? ? *F‟T/ ? *FT‟ / ?\ ? *FT‟ / ?\
Process of
factoring out the
common prefixes Left Ambiguou Left Left Left
is ____ factoring s Recursion refactoring factoring
Elimination of left
factoring for the A A ? a‟A ,
? a ß1 / a ß2 are A ? aßA‟ , A‟ ? aß2 / A ? aA‟ , A‟ A ? a‟A‟ , A‟ ? A ? aA‟ , A‟
_______ A‟ ? ß1 / ß2 ß1 ? ß1 / ß2 a ß1 / a ß2 ? ß1 / ß2
150
left recursive
grammar can
cause top down
parser to go error
________ elimination condition infinite loop finite loop infinite loop
Difficulty of top
down parsing is Forward For Backward Back
________ loop tracking loop Back tracking tracking
________ is the
Top down parsing Recursive Recursion Predicate periodic Recursive
technique descent descent logic parsing descent
Demerits of
recursive descent
parsing is Backtrackin Left
________ g factoring Recursive Recursion Recursive
input ,
parsing input , input , input , input ,
program, stack, parsing parsing stack,
parsing parsing program, program,stac parsing
Predictive parsing table , table , stack , k, parsing table ,
consists of output output output table output
151
In predictive
parsing program
let X be the top
stack symbol and
a be the next
input symbol , if
X is terminal and push a on
if X= a then push X on to the pop a from pop X from pop X from
_________ to the stack stack the stack the stack the stack
In predictive
parsing program
let X be the top
stack symbol and
a be the next push X
input symbol , if push X from from the pop X from pop X from
X is Non - the stack , stack , the stack , pop X from the stack ,
terminal and if M[ pop Yk , pop Y1 , push Yk , the stack , push Yk ,
X,a] =X?Y1 , Y2 , Yk-1 , …. Y2 , …. Yk-1 , …. push Y1 , Y2 Yk-1 , ….
…. Yk then Y1 on to the Yk on to Y1 on to , …. Yk on to Y1 on to
_________ stack the stack the stack the stack the stack
In predictive
parsing if X is a
terminal , then
FIRST(X) is {?} {X} terminal nonterminal {X}
Two functions
used in predictive
parsing is FIRST , FIRST , FOLLOW , FIRST, FIRST ,
_______ LAST FOLLOW LAST PREDICT FOLLOW
152
In predictive
parsing if X ? ?,
then FIRST(X) is {?} {X} terminal nonterminal {?}
In predictive FIRST(ß)
parsing A? aBß , FIRST(a) , , FIRST(ß) , FIRST(ß) ,
then everything in FOLLOW FOLLOW FOLLOW FIRST(a) , FOLLOW
____ is in ____ (B) (ß) (B) FOLLOW (ß) (B)
In predictive
parsing A? aBß,
where FIRST(ß) FOLLOW FOLLOW FOLLOW FOLLOW
contains ? , then (A) , (a), (a) , FOLLOW (A) (A) ,
everything in FOLLOW FOLLOW FOLLOW , FOLLOW FOLLOW
______ is in ____ (B) (B) (A) (a) (B)
____ is included
in FOLLOW
function set of
predictive parsing % # $ @ $
If a grammar S ?
(L) | a , then
FIRST(S) is ____ {(,a} {?} {L,a} {$} {(,a}
Elimination of left
recursion for the L
? L,S | S is L? S,L‟ , L‟? L? S‟,L , L? SL‟ , L‟? L? S‟,L‟ , L‟? L? SL‟ , L‟?
_____ LL‟ / ? L‟? S,L / ? SL‟ / ? LL / ? SL‟ / ?
153
LL(1) grammar
has the property No No No No
of _____ Ambiguity ambiguity recursion backtracking ambiguity
LL(1) grammar
has _____ Left No No Left No Left
property Ambiguity recursive recursion recursive recursive
In LL(1) grammar
the first L stands left to right left most left to right left to right
for _____ scanning derivation derivation left subtree scanning
In LL(1) grammar
the second L left to right left most left to right left most
stands for _____ scanning derivation derivation left subtree derivation
In LL(1) grammar
the 1 stands for one output one time one sub one input one input
_____ symbol scanning tree symbol symbol
In predictive successful
parsing , If X = a completio advances pop X off the successful
=$ error report n pointer stack completion
advances advances
In predictive successful pointer & pop X off the pointer &
parsing , If X = a completio pop X off stack & pop X off
?$ error report n the stack parser halts the stack
Lex
The output of Lex Transition Transition specificatio Transition
compiler is _____ table diagram n action table
154
Recognizer is a
_____ tool program string grammar program
NFA
representation in NFA
a directed graph direction Transition Transition Transition
is called_______ NFA graph graph edge graph graph
_______ is an
easiest
implementation of
NFA in a Transition Transition Transition Finite Transition
computer. diagram table graph automata table
_______ is an
input for the source
syntax analysis token program parse tree syntax token
_______ is an intermediate
output of the representatio
parser expression token parse tree n parse tree
UNIT-III
opt opt
Questions opt1 opt2 opt3 opt4 5 6 answer
155
In which error
recovery strategy error
parser discards one Panic phrase production global
symbol at a time ? mode level s corrections Panic mode
In which error
recovery strategy error
parser makes local Panic phrase production global phrase
corrections ? mode level s corrections level
In which error
recovery strategy error
parser generate Panic phrase production global error
error diagnostics ? mode level s corrections productions
In which error
recovery strategy
parser using error
algorithmic Panic phrase production global global
approach ? mode level s corrections corrections
156
The syntactic
structure of a
programming Context Context
language is free Regular free
described by grammar CNF CCNF language grammar
Terminal ,
Context free token , Terminal ,
grammar consists T , NT, $ , T, NT, S , Non Token , T, NT, S ,
of ______ P Production terminal production Production
Punctuation Non
symbols are_____ Production terminal Terminal string Terminal
sequence of
replacements is
called ________ Reduction Derivation parse tree sentence Derivation
Graphical
representation of a
derivation is a syntactic syntax
________ parse tree tree graph pattern parse tree
substring that
matches the right
side of a production Handle
is called ________ pruning Pattern Handle parsing Handle
Right most
derivation in
reverse is Handle Handle
called______ pruning Pattern Handle parsing pruning
158
In______ action
parser discovers
the syntax error accept reduce shift error error
In______ action
parser replacing the
handle with the non
terminal accept reduce shift error reduce
a has
In precedence different a has same a takes a yields a yields
relation a < b precedenc precedence precedenc precedenc precedence
means _____ e to b as b e over b e to b to b
a has
In precedence different a has same a takes a yields a takes
relation a > b precedenc precedence precedenc precedenc precedence
means _____ e to b as b e over b e to b over b
a has
In precedence different a has same a takes a yields a has same
relation a = b precedenc precedence precedenc precedenc precedence
means _____ e to b as b e over b e to b as b
In precedence
relation * and + has
the precedence of
_____ *>+ +>* +=+ *=* *>+
In precedence
relation $ and id
has the precedence
of _____ $ > id $ =id $ <id $ ? id $ <id
160
In operator push b on
precedence parsing pop a from push a on pop b from to the push b on
if a < b then the stack to the stack the stack stack to the stack
In operator push a on
precedence parsing pop a from push b on pop b from to the push b on
if a = b then the stack to the stack the stack stack to the stack
In operator push a on
precedence parsing pop the push the pop b from to the pop the
if a > b then stack stack the stack stack stack
In LR(k) parsing ,
the L stands for left to right left most left to right left to right
_____ scanning derivation derivation left subtree scanning
_____ LR
technique is easy to canonical Look
implement SLR LR LALR ahead LR SLR
_____ LR
technique is most canonical Look canonical
powerful SLR LR LALR ahead LR LR
_____ LR
technique is most canonical Look canonical
expensive SLR LR LALR ahead LR LR
161
_____ LR
technique is least canonical Look
powerful SLR LR LALR ahead LR SLR
S‟ ? .S is included non
in _____ items closure non kernel kernel closure kernel
The functions
performed in LR action and action and action and goto and action and
parsing are _____ shift goto error shift goto
Action function
involved with Non start
_____ Terminals terminals symbol production Terminals
LALR is Left to
abbreviated from Left and Look ahead Look Right Look ahead
_____ Right LR Simple LR ahead LR simple LR LR
_______ is a
combination of
terminals and non ?productio regular regular
terminals n token expression definition ?production
_______ translates
intermediate
representation in to
an equivalent low Synthesize
level language Analyzer Front end Back end r Back end
______ is a
linearized Two
representation of a Postfix address Prefix Postfix
syntax tree Notation Parse Tree code notation Notation
UNIT-IV
op op
Questions opt1 opt2 opt3 opt4 t5 t6 answer
Semantic rule to
produce syntax
tree for the E.nptr = E.nptr = E.nptr = E.nptr = E.nptr =
production E ? mkleaf ( mknode ( mkleaf ( id , mknode (id mkleaf ( id ,
id is ______ id.place) id.place) id.place) , id.place) id.place)
Semantic rule to
produce syntax E.nptr = E.nptr = E.nptr = E.nptr = E.nptr =
tree for the mknode mkunode mkleaf mkuleaf mkunode
production E ? - („uminus‟ , („uminus‟ , („uminus‟ , („uminus‟ , („uminus‟ ,
E1 is ______ E1.nptr ) E1.nptr ) E1.nptr ) E1.nptr ) E1.nptr )
Semantic rule to
produce syntax
tree for the E.nptr = E.nptr = E.nptr =
production E ? E.nptr = E.nptr = mkpnode („+‟ mknode („+‟ mknode („+‟ ,
E1 + E2 is mknode („+‟ mkpnode („+‟ , E1.nptr , , E1.nptr , E1.nptr ,
______ , E1.nptr ) , E1.nptr ) E2.nptr ) E2.nptr ) E2.nptr )
_______ is an
general form of
three address
code op x = op y
representation x = y op z x = z op op x = z op z x = y op z
_______ is an
one type of
three address
code if x relop y goto L if x if x relop y
statements if x y goto L goto L goto L if x y relop y goto L
164
The sequence
of three address
statements
evaluating E is
called______ E.place E.code place.E code.E E.code
Record
structure with Three
four fields is address Indirect
called______ code Quadruples Triples triples Quadruples
Three fields of
Record Three
structure is address Indirect
called______ code Quadruples Triples triples Triples
emit emit
3address emit emit non 3address
statements assignments emit terminals to statements
emit is used to to an to an output terminals to an output to an output
______ output file file an output file file file
Translation of
E?(E1) is E1.place = E.place=E1.pl E.place = (E1.place)= E.place=E1.
_______ E.place ace (E1.place) E.place place
Translation of
E? id is E1.place = E.place=E1.pl E.place = E1.place= E.place =
_______ E.place ace id.place id.place id.place
_______ are
the 3 fields of arg1,arg2,a arg1,arg2,res arg1,op,resul arg1,arg2,o
triples rg3 ult t p arg1,arg2,op
_______ are
the 4 fields of arg1,arg2,a arg1,arg2,arg arg1,,arg2,o arg1,arg2,a arg1,,arg2,o
Quadruples rg3 , arg4 3,result p,result rg3,op p,result
165
Listing pointers
to triples is Indirect Indirect Indirect
called _____ triples triples Quadruples ruples triples
offset
represents____ relative relative
_ address three address location address address
E1.type=integer
,E2.type =
integer , E.type
is ____ integer real inttoreal float integer
E1.type=integer
,E2.type = real ,
E.type is ____ integer real inttoreal float real
E1.type=real,E2
.type = real ,
E.type is ____ integer real inttoreal float real
E1.type=real,E2
.type = integer,
E.type is ____ integer real inttoreal float real
E? true
represents
________ E.code=0 E.place = 0 E.place = 1 E.code=1 E.place = 1
E? false
represents
________ E.code=0 E.place = 0 E.place = 1 E.code=1 E.place = 0
E1.place =
E1.place E.code= E.place=
E?not E1 not E1.place not E1.code= not E.place= not
represents ____ E2.place E2.place not E1.place E1.place E1.place
166
_________ is a
one semantic
rule of S?if E E.false=ne E.true=newla S.next=S1.n S.code=E.p E.true=newl
then S1 wlabel bel ext lace B abel
_________ is a
one of the
semantic rule of E.false=ne S1.next=S.ne S.next=S1.n S.code=E.p S1.next=S.n
S?if E then S1 wlabel xt ext lace ext
_________ is a
one of the
semantic rule of
S? while E do E.false=ne S1.next=S.ne S.next=S1.n E.false=S.n E.false=S.ne
S1 wlabel xt ext ext xt
_________ is a
one of the
semantic rule of
S? while E do E.false=ne S1.next=S.ne E.true=newl E.false=S1. E.true=newl
S1 wlabel xt abel next abel
_________ is a
one of the
semantic rule of
S?if E then S1 E.false=S1. S1.next=S.ne S.next=S1.n S.code=E.c S.code=E.co
else S2 next xt ext ode de
_________ is a
one of the
semantic rule of
S?if E then S1 E.false=E.tr S1.next=S.ne S.next=S1.n S.code=E.p S1.next=S.n
else S2 ue xt ext lace ext
_________ is a
one of the
semantic rule of
S? while E do E.false=S.n S1.next=S2.n E.true=E.fals E.false=S1. E.false=S.ne
S1 ext ext e next xt
167
_________ is a
one of the
semantic rule of E1.true=E.t E1.false=E.fal E2.true=E.fal E2.fasle=E E1.true=E.tr
E?E1 or E2 rue se se 1.true ue
_________ is a
one of the
semantic rule of E1.true=E.f E1.false=E.fal E2.true=E.tr E2.fasle=E E2.true=E.tr
E?E1 or E2 alse se ue 1.true ue
_________ is a
one of the
semantic rule of E1.true=E.f E1.false=E.fal E2.true=E1.t E2.fasle=E. E2.fasle=E.f
E?E1 or E2 alse se rue false alse
_________ is a
one of the
semantic rule of E1.true=E.f E1.false=E.fal E2.true=E1.t E2.fasle=E E1.false=E.f
E?E1 and E2 alse se rue 1.false alse
_________ is a
one of the
semantic rule of E1.true=E.f E1.false=E.tr E2.true=E1.t E2.fasle=E. E2.fasle=E.f
E?E1 and E2 alse ue rue false alse
_________ is a
one of the
semantic rule of E1.true=E.f E1.false=E2.f E2.true=E.tr E2.fasle=E E2.true=E.tr
E?E1 and E2 alse alse ue 1.false ue
_________ is a
one of the
semantic rule of E1.true=ne E1.false=E2.f E2.true=E1.t E2.fasle=E. E1.true=new
E?E1 and E2 wlabel alse rue true label
168
_________ is a
one of the
semantic rule of E.true= E1.true = E.false= E.code=E.p E1.true =
E? not E1 E1.false E.false E2.true lace E.false
_________ is a
one of the
semantic rule of E1.true= E1.true = E.false= E.code=E.p E1.true=
E? not E1 E.false E2.false E2.true lace E.false
_________ is a
one of the
semantic rule of E.truelist= E1.truelist = E.falselist= E.code=E.p E.truelist=
E? not E1 E1.falselist E2.falselist E2.truelist lace E1.falselist
_________ is a
one of the
semantic rule of E1.truelist= E1.truelist = E.falselist= E.code=E.p E.falselist=
E? not E1 E1.falselist E2.falselist E1.truelist lace E1.truelist
________ is a
one of the
semantic rule of E.truelist= E1.truelist = E.falselist= E.code=E.p E.truelist=
E? (E1) E1.truelist E2.falselist E2.truelist lace E1.truelist
169
_________ is a
one of the
semantic rule of E2.truelist= E1.truelist = E.falselist= E.code=E.p E.falselist=
E? (E1) E1.truelist E2.falselist E1.falselist lace E1.falselist
translation of
s?begin L end S.list = S.nextlist=L.n L.List = L.nextlist=S S.nextlist=L.
is _____ L.nextlist extlist S.Listnext .list nextlist
append
append append E.code to append append
The translation E.code to E.place to the the E.place to E.place to
of Elist?Elist,E the end of beginning of beginning of the end of the end of
is the queue the queue the queue queue queue
Initialize
Initialize Initialize E.code to Initialize Initialize
E.code to E.place to the the queue to queue to
The translation the end of beginning of beginning of contain only contain only
of Elist? E is the queue the queue the queue E.place E.place
_______ is an
input of code
generation optimized source optimized
phase code target code program object code code
The
transformation
performed only
within a basic
block is
called_______ local global preserve optimization local
______ is a
transformations copy copy
of copy propagatio transformatio copy copy
statements n n copy for long elimination propagation
Useless code
transformation
is called usecode dead useless code Deadcode Deadcode
______ elimination elimniation elimination elimination elimination
Using the
constant and
deducing during
compile time is dead code copy constant constant constant
called ______ elimination propagation folding propagation folding
UNIT-V
opt opt
Questions opt1 opt2 opt3 opt4 5 6 answer
Decreasing Code
the amount Induction Reduction Loop motion motion Code motion
171
of code in a
inner loop is
called as
______
_______ is
an one way Induction Copy Induction
of loop variable propagati Deadcode constant variable
optimization elimination on elimination folding elimination
_______ is
an loop Reduction Reduction
optimization variable in Deadcode constant Reduction in
technique elimination strength elimination folding strength
The
Expansion Directed Direction Direction
for DAG is Directed Action Asymmetric Action Directed
_______ Acyclic Graph Graph Graph Graph Acyclic Graph
The
Algebraic
transformati reduction
on includes Algebraic Algebraic constant in Algebraic
______ Deduction Identities folding strength Identities
The output
for the code
generation
phase is Optimized Intermedi Machine level Machine level
_______ code ate Code language Token language
_______ is
the input of
code
generation Optimized Intermedi Machine level Intermediate
phase code ate Code language Token Code
to
determine
the
to compile
to determine determine time to determine
the run time the run to determine addresse the run time
The use of addresses of time value the compile s of the addresses of
symbol table the data of the time value of data the data
is objects data the data objects objects
_______ is a
linear
representati
ons of
intermediate Infix Postfix RP Postfix
code Prefix notation notation notation notation notation
_______ is
an
representati
on of three Indirect
address Quadrupl Postfix
code Quadruples es notation Linear Quadruples
_______ is
the virtual
machine Stack
representati Sequence of machine Stack Stack machine
on commands code Machine code code code
_____ is a Linear
Graphical DEG tree Parsing Syntax trees tree Syntax trees
172
representati
on of three
address
code
_____ is a
Graphical
representati
on of
intermediate Linear
code DAG Parsing Semantic tree tree DAG
_____ is the
output form linking Absolute
of a target Intermediate library linking machine Absolute
program code functions machine code code machine code
_____ is an
one of the
output form linking Absolute
of a target Intermediate library Re locatable intermedi Re locatable
program code functions machine code ate code machine code
Semantic
checking
done in Intermediate Lexical Syntax Code Code
____ code generator analyzers analyzers generator generator
Mapping
names to
addresses of
data objects
is done by intermediate code Lexical
________ code generator generator code optimizer analysis code generator
Deducing
the number
of jumping
labels is
done by Quadrupl Indirect
________ Backpatching e Triple Triple Backpatching
The speed is
increased
based on
instruction
selection by
using Machine Machine
________ Assignment idioms Structure Register idioms
we pick
specific
we select register we select
During variables that a Allocate variables
Register reside in the variable choose constants reside in the
Allocation register reside in register pairs to register register
We pick
specific We pick
We select register specific
During variables that a Allocate register that a
Register reside in the variable Choose constants variable reside
assignment register reside in registers pairs to register in
173
Shift Scan
SRDA Shift Right Round Scan Right Round Shift Right
stands for Double Direct Double Direct Double
______ Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic
______ is
used to Choice of Choice of
improve the Evaluation Evaluation
efficiency Choice of Run Syntax order Semantic order
________ is
an two
address op source op destinatio op
instruction source,destina destinatio source,destina n source,destina
form tion n tion op source,op tion
ADD ADD
destinatio source to
ADD is an ADD to n to ADD to destinatio ADD source to
_________ register memory memory n destination
SUB
SUB source
destinatio from SUB source
SUB is an SUB to n to SUB to destinatio from
_________ register memory memory n destination
For absolute
mode the
added cost
is _______ 1 0 2 3 1
For Register
mode the
added cost
is _______ 1 0 2 3 0
For Indexed
mode the
added cost
is _______ 1 0 2 3 1
For Indirect
indexed
mode the
added cost
is _______ 1 0 2 3 1
The form for
absolute
mode is
_______ c(R) *R R M M
The form for
Register
mode is
_______ c(R) *R R M R
The form for
Indexed
mode is
_______ c(R) *R R M c(R)
The form for
Indirect
Register
mode is
_______ c(R) *R R M *R
The form for
Indirect *c(R) *R R M *c(R)
174
Indexed
mode is
_______
The address
of Register
mode is
_______ c(R) *R R M R
The address
of Indexed
mode is c + contents of c + contents of
_______ (R) R contents of R M (R)
The address
of Indirect
Register
mode is c + contents of
_______ (R) R contents of (R) M contents of (R)
The address
of Indirect
Indexed contents (c + contents (c +
mode is contents of contents of
_______ (R)) R contents of R M (R))
The cost of
MOV R0,R1
is _______ 1 0 2 3 1
The cost of
MOV R5,M
is _______ 1 0 2 3 2
The cost of
ADD #1,R3
is _______ 1 0 2 3 2
The cost of
SUB 4(R0),
*12(R1) is
_______ 1 0 2 3 3
The cost of
MOV b,a
and ADD c,a
is _______ 1 0 6 3 6
The cost of
ADD R2,R1
and MOV
R1,a is
_______ 1 0 2 3 3
to
The getreg to determine
denotes to determine determine to determine the to determine
_______ the location L the value the Register memory the location L
to keep
track of to keep
The function what is track of to keep track
of register currently the of what is
descriptor is to keep track in each to keep track descriptor currently in
_______ of the location register of the register value each register
175
to keep
track of to keep
The function what is track of
of register currently the
descriptor is to keep track in each to keep track descriptor to keep track
_______ of the location register of the register value of the location
For the
statement t
= a – b , the
value of
Address
Descriptor R0
is_______ R0 contains t u in R0 t in R0 contains u t in R0
176