Compiler Design: Computer Science
Compiler Design: Computer Science
For
COMPUTER SCIENCE
COMPILER DESIGN
.
SYLLABUS
Lexical analysis, parsing, syntax-directed translation. Runtime environments.
Intermediate code generation.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
CONTENTS
Topics Page No
1. INTRODUCTION TO COMPILERS
2. PARSING
3.1 Introduction 65
3.2 Synthesized attributes 68
3.3 Syntax trees 70
3.4 Translation schemes 75
3.5 Top-down translation 79
3.6 Run time environment 83
Gate Questions 88
4.1 Introduction 93
4.2 Three address code 94
Gate Questions 98
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1 INTRODUCTION TO COMPILERS
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
Advantages:
Modification of user program can be
easily made and implemented as
execution proceeds.
Debugging a program and finding
errors is simplified task for a program
used for interpretation.
The interpreter for the language makes
it machine independent
Disadvantages:
The execution of the program is slower.
Memory consumption is more.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
declaration, or that parameters to a creates an intermediate representation
function call match the function definition). of the source program.
2) Synthesis : It constructs the desired
1.2.4 Intermediate code generation : target program from the intermediate
representation.
The output of the syntax analyzer is some
Out of the two parts, synthesis requires
representation of a parse tree. the
the most specialized techniques.
intermediate code generation phase
During analysis, the operations implied
transforms this parse tree into an
by the source program are determined
intermediate language representation of
and recorded in a hierarchical structure
the source program.
called as tree.
1.2.6 Code optimization: Example, a syntax tree for an assignment
statement is shown in Fig. 1.4
This is optional phase described to improve
the intermediate code so that the output
runs faster and takes less space.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.3.4 Error Detection and Reporting This code moves the contents of the
address a into register1, then adds the
Each phase can encounter errors. However, constant 2 to it treating the content of
after detecting an error, a phase must register1 as a fixed point number and
somehow deal with that error, so that finally stores the result in the location
compilation can proceed, allowing further named by b thus it computes b=a+2
errors in the source program to be
detected, A compiler that stops when it 1.3.7 Two-Pass Assembly
finds the first error is not as helpful as it
could be. The simplest form of assembler makes two
The lexical phase can detect errors passes over the input, where a pass
where the character remaining in the consists of reading an input file once. In the
input do not form any token in the first pass, all the identifiers that denote
language. storage locations are found and stored in a
Where token stream violates the rule symbol table.
that violates structure rules of language In the second pass, the assembler scans the
determined by syntax analysis phase. input again. This time, it translates each
The lexical phase can detect errors operation code into the sequence of bits
where the characters remaining in the representing that operation in machine
input do not form any token of the language, and it translates each identifier
language. Errors where the token representing a location into the address
stream violates the structure rules given for that identifier in the symbol table.
(syntax) of the Language are The output of the second pass is usually
determined by the syntax analysis relocatable machine code, meaning that it
phase. can be loaded starting at any location L in
memory; i.e., if L is added to all addresses
1.3.5 Retargeting in the code, then all references will be
correct. Thus, the output of the assembler
It is a process of creating more & more must distinguish those portions of
compilers for the same source language, instructions that refer to addresses that can
but for different machines. be relocated.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
more modern flow of control and data cohesive sequence of characters, such as an
structure facilities for example, such a identifier, a keyword (if, -while, etc.), a
pre-processor might provide user with punctuation character, or a multi-character
built in macros for construct like while operator like : = The character sequence
statement or if-statement , where none forming a token is called the lexeme for the
exists in the programming language token.
itself. Certain tokens will be augmented by a
Language Extensions : These processors "lexical value". For example, when an
attempt to add capabilities to the identifier like rate is found, the lexical
language by what amounts to built in analyzer not only generates a token, say id,
macros bat also enters the lexeme rate into the
symbol table, if it is not already there. The
1.3.9 Phases vs. Passes: lexical value associated with this
occurrence of id points to the symbol-table
Scanning of complete text from Left hand entry for rate.
side to Right side is called as "Pass" For In this section, we shall use id1, id2, and id3
translations, only use passes i.e. for position, initial, and rate, respectively,
implemented more 2 passes. emphasize that the internal representation
- Scan of an identifier is different from the
- Translate character sequence forming the identifier.
Multi pass - requires less space, but it is The representation of assignment
slower. statement (1.1) after lexical analysis is
Single pass - It is faster, but requires more therefore suggested by:
space. Id1 = id2 + id3 * 60 -------------(1.2)
1.3.12 Syntax Analysis Phase
i) The syntax Analysis Phase: The syntax
analysis phase imposes a hierarchical
structure on the token stream, shown
below
Fig. 1.4
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
↓
Code optimizer
Temp 1: = id3*60.0
Id1: = id2 + temp1
Code optimizer
↓
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
Fig. 1.6
ADDF R2, R1
1.3.13 The Synthesis Phases MOVF R1, id1
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1.3.14 Advantages of Code Optimization: Interpreter→50 ti (is better for single
instruction)
- Improves Efficiency
- Occupies less memory 1.3.16 Advantage of Interpreter:
- Executes faster
Code optimization applied on 3-address - Certain language features are supported
code is called Machine Independent by Interpreter rather than compiler.
optimization. - "Portability"
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1: Type of the token. Its main task is to read the input
2: Value of the token. characters and produce as output a
Type : variable, operator, keyword, sequence of tokens that the parser uses
constant for syntax analysis.
Value : Name of variable, current This iteration, summarized
variable (or) pointer to symbol table schematically in fig 1.9 is commonly
In a compiler, linear analysis is called implemented by making the lexical
lexical analysis or scanning. For analyzer be a sub- routine or a co-
example, in lexical analysis the routine of the parser. Upon receiving a
characters in the assignment statement "get next token" command from the
position: =initial + rate * 60 would be parser, the lexical analyzer reads input
grouped into the following tokens: characters until it can identify the next
1. The identifier ‘position’, ‘initial’, ‘rate’. token. -.
2. The assignment symbol ‘:=’, ‘+’, ‘*’
3. The number 60.
The blanks separating the characters of
these tokens would normally be eliminated
during lexical analysis.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The scanner is responsible for doing simple which the information about the token
tasks, while the lexical analyzer proper is kept.
does the more complex operations. The pointer becomes the attribute for
the token.
1.4.3 Typical Task Of the Scanner For diagnostic purposes, we may be
(Lexical Analyzer) interested in both the lexeme for an
identifier and the line number on which
- Scan input it was first seen. Both these items of
- Find Integer and Floating Point Constants information can be stored in the
- Remove Comments symbol-table entry for the identifier.
- Remove white spaces
- Treat white spaces appropriately in the 1.4.6 Lexical Errors
form of blank, tab and new line characters
- Counting the number of lines Error handling is very localized, w.r.t.
- Find string and character constants input source
- Pass token to parser Example : while (x: = 0) do
Generates no lexical errors in PASCAL
Example : The C statement In what situation do Errors Occur?
If (++x==5) foo(3) it is tokenized as - Prefix of remaining input doesn’t match
any defined token
IF ( ++ ID == 5 ) ID ( Int ) :
Other possible error recovery actions are:
1.4.4 Issues in Lexical Analysis 1. deleting an extraneous character
2. inserting a missing character
There are several reasons for separating 3. replacing an incorrect character by a
the analysis phase of compiling into lexical correct character
analysis and parsing. 4. transposing two adjacent characters.
1. Simpler design is perhaps the most
important consideration. If no prefix of the input string forms a
2. Compiler efficiency is improved. A valid token, a lexical error has occurred.
separate lexical analyzer allows us to When this happens, the lexer will
construct a specialized and potentially usually report an error.
more efficient processor for the task. At this point, it may stop reading the
3. Compiler portability is enhanced. Input input or it may attempt continued
alphabet peculiarities and other device- lexical analysis by skipping characters
specific anomalies can be restricted to until a valid prefix is found.
the lexical analyzer. The purpose of the latter approach is to
try finding further lexical errors in the
1.4.5 Attributes for Tokens same input, so several of these can be
corrected by the user before re-running
The lexical analyzer collects information the lexer.
about tokens into their associated Some of these subsequent errors may,
attributes. however, not be real errors but may be
The tokens influence parsing decisions; caused by the lexer not skipping enough
the attributes influence the translation characters (or skipping too many) after
of tokens. the first error is found.
As a practical matter, a token has If the string fi is encountered in a C
usually only a single attribute - a program for the first time in the context
pointer to the symbol-table entry in fi( a = = f(x) )........
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
A lexical analyzer cannot tell whether fi (iii) (r)* is a regular expression-
is a misspelling of the keyword if or an denoting. (L(r))*
undeclared function identifier. Since fi (iv)(r) is a regular expression denoting
is a valid identifier, the lexical analyzer L(r)
must return the token for an identifier
and let some other phase of the 1.4.10 Regular Definitions
compiler handle any error.
But, suppose a situation does arise in If ∑ is an alphabet of basic symbols, then a
which the lexical analyzer is unable to regular definition is a sequence of
proceed because none of the patterns definitions of the form
for tokens matches a prefix of the d1 r1
remaining input. Perhaps the simplest d 2 r2
recovery strategy is "panic mode"
recovery. We delete successive ...........
characters from the remaining input d n rn
until the lexical analyzer can find a well- Where each di is a distinct, name, and each
formed token. This recovery technique η is a-regular expression over the symbols
may occasionally confuse the parser, in ∑ ∪{d1,d2,…..di-1}i. e the basic symbols
but in an interactive computing and the previously defined names. By
environment it may be quite adequate. restricting each η to symbols of ∑ and the
previously defined names, we can construct
1.4.7 Specification of Tokens a regular expression over ∑ for any η by
repeatedly replacing regular expressions'
Regular expressions are an important names by the expressions they denote. If η
notation for specifying patterns. Each used dj for some j I, then η might be
pattern matches a set of strings so regular recursively defined, and this substitution
expressions will serve as names for sets of process would not terminate.
strings
Example: A regular definition for Pascal
1.4.8 Regular Expressions identifiers
Letter → A |B...|Z| a |b|......|z
A regular expression is a notation to specify Digit → 0|1|.....|9
a regular set. In Pascal, an identifier is a Id → letter (letter | digit)
letter followed by zero or more letters or
digits. With regular expression notation, we 1.4.11 Recognition of tokens:
define Pascal identifiers as
We learn how to express pattern using
1.4.9 Letter (letter | digit) regular expressions. Now, we must study
Following are the rules that define the how to take the patterns for all the needed
regular expressions over alphabet ∑ tokens and build a piece of code that
1. ∈ is a regular expression denoting {∈} examins the input string and finds a prefix
2. If x is a symbol in∑, then x is a regular that is a lexeme matching one of the
expression denoting {x} patterns.
3. Suppose r and s are regular expressions Stmt if expr then stmt | If expr then else
denoting L(r) and L(s) stmt | є
(i) (r) | (s) is a regular expression Expr term relop term | term
denoting L(r) L(s) Term id | number
(ii) (r)(s) is a regular expression The terminal of grammar, which are if, then
denoting L(r)L(s) , else, relop ,id and numbers are the names
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
of tokens as far as the lexical analyzer is the first approach, the design and use of an
concerned, the patterns for the tokens are automatic generator, we also consider
described using regular definitions. techniques that are helpful in manual
digit-->[0,9] design.
digits-->digit+
number-->digit(.digit)?(e.[+-]?digits)?
letter-->[A-Z,a-z]
id-->letter(letter/digit)*
if-->if
then-->then
else-->else
relop --></>/<=/>=/==/< >
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
GATE QUESTIONS
Q.1 The number of tokens in the 4. Code improving transformations
following C statements can be performed at both source
Print f(“i=%d, & i = % x”, i & i); is Language & intermediate code
a) 3 b) 26 level
c) 10 d) 21 a) 1 and 2 b) 1 and 4
[GATE- 2000] c) 3 and 4 d) 1, 3 and 4
[GATE -2009]
Q.2 Which of the following is not an
advantages of using shared, Q.5 Which data structure in a compiler
dynamically linked libraries as is used for managing information
opposed to using statically linked about variables and their attributes?
libraries? a) Abstract syntax-tree
a) Smaller sizes of executable files b) Symbol table
b) Lesser overall page fault rate in c) Semantic stack
the system d) Parser table
c) Faster program startup [GATE -2010]
d) Existing programs need not be
relinked to take advantage of Q.6 In a compiler, keyboards of a
newer versions language are recognized during
[GATE -2003] a) Parsing of the program
b) The code generation
Q.3 Consider a program P that consists c) The lexical analysis of the program
of two source modules M1 and M2 d) Data flow analysis
contained in two different files. If [GATE -2011]
M1 contains a reference to a function
defined in M2, the reference will be Q.7 Match the following:
resolved at List-I
a) edit time b) Compile time (P) Lexical analysis
c) link time d) load time (Q) Top down parsing
[GATE- 2004] (R) Semantic analysis
(S) Runtime environments
Q.4 Which of the following statements List-II
are true? (i) Leftmost derivation
1. There exist parsing algorithms (ii) Type checking
for some programming languages (iii) Regular expressions
whose complexities are less than (iv) Activation records
ɵ (n3). a) P ↔ i, Q ↔ ii, R ↔ iv, S ↔ iii
2. A programming language which b) P ↔ iii, Q ↔ i, R ↔ ii, S ↔ iv
allows recursion can be c) P ↔ ii, Q ↔ iii, R ↔ i, S ↔ iv
implemented with static storage d) P ↔ iv, Q ↔ i, R ↔ ii, S ↔ iii
allocation. [GATE -2016]
3. No L-attributed definition can
be evaluated in the framework of Q.8 Match the following according to
bottom-up parsing. input from the left column to the
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
complier phase (in the right column) a) P → (ii), Q→(iii), R→(iv),S→(i)
that process it: b) P→(ii)Q→(i),R→(iii),S→(iv)
List –I List-II c) P→(iii),Q→(iv), R→(i), S→(ii)
P) Syntax tree i) Code generator
Q) Character stream ii)Syntax analyzer
d) P→(i),Q→(iv),R→(ii),S→(iii)
R)Intermediate iii) Semantic analyzer [GATE- 2017]
representation
S) Token stream iv) Lexical analyzer
ANSWER KEY:
1 2 3 4 5 6 7 8
c d c b b c b c
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (c) Statement 1 True: Parsing
A token is strictly a string of algorithms exists for some of the
characters that the compiler programming languages whose.
replaces with another string of Complexities are even less than
characters. Token are defined as a θ(n^3).
sort of search and replace. Tokens Statement 2 False: A programming
can be used for common phrases, language allowing recursion cannot
page elements, handy shortcuts, etc. be implemented with static storage
Tokens are defined with either the allocation.
define or replace tags and can use Statement 3 False: L-attributed
single or paired tag syntax. definition can be evaluated in
Examples framework of bottom up parsing.
<define%bgcolor%#ccffcc> Statement 4 True: Code
<define (c) ©> improvement can be done at both
<define [email] joe@cool.com>, source language level and
<:replace [mailto] <a hre = intermediate language level.
mailto:[email]>[email]</a>:>
Q.5 (b)
Q.2 (d) Example Symbol table
Let’s take a look at the advantages of Symbol table is a data structure in a
the dynamic linking. compiler. It was devised for the
Advantages of Dynamic Linking purpose that it can be used to
1. A stub is a piece of code which is manage information about variable
used to stand in for functionality and their attributes.
of some other program. This
stub is included in the image or Q.6 (c)
each binary routine reference is The lexical analysis of program.
dynamic linking.
2. without dynamic linking, a Q.8 (c)
library when replaced with a
new version then all the
programs that has reference to
the library will be relinked to
gain access. Dynamic linking
avoids all such conditions.
Q.3 (c)
To link the modules, firstly the
modules need to be individually
compiled.
Once, the modules are compiled,
linking is done. The linking is done
at the link time.
Q.4 (b)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2 PARSING
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Parse Level Recovery:- On discovering where N is a non terminal and X1……Xn are
an error, a parser may perform local zero or more symbols, each of which is
correction on the remaining input, that either a terminal or a non terminal. The
it may replace a prefix of the remaining intended meaning of this notation is to say
input by a string that allows the parser that the set denoted by N contains strings
to continue. that are obtained by concatenating strings
Error Productions:- If we have a good from the sets denoted by X1…….Xn. In this
idea of the common errors that might setting, a terminal denotes a singleton set,
be encountered, we can augment the just like alphabet characters in regular
grammar for the language at hand with expressions. We will, when no confusion is
productions that generate the likely, equate a non terminal with the set of
erroneous constructs. We then use the strings it denotes
grammar augmented by these error Some examples:
productions to construct a parser. Aa
Global Correction:- Given an incorrect says that the set denoted by the non
string x and grammar G, the global terminal A contains the one-character
correction algorithms will find a parser string a.
tree for a related string y, such that the A aA
number of insertions, deletions and says that the set denoted by A contains all
changes of tokens required to transform x strings formed by putting an a in front of a
into y is as small as possible. string taken from the set denoted by A.
Together, these two productions indicate
2.3 Context-Free Grammars that A contains all non-empty sequences of
Like regular expressions, context-free as and is hence (in the absence of other
grammars describe sets of strings, i.e., productions) equivalent to the regular
languages. expression a+.
Additionally, a context-free grammar also We can define a grammar equivalent to the
defines structure on the strings in the regular expression a* by the two Productions
language it defines. B
A language is defined over some alphabet, B aB
for example the set of tokens produced by a Context-free grammar is a four-tuple
lexer or the set of alphanumeric characters. denoted as:
The symbols in the alphabet are called G = (V, T, P, S)
terminals. Where,
A context-free grammar recursively defines 1. V is a finite set of symbols called as non-
several sets of strings. Each set is denoted terminals or variables,
by a name, which is called a non terminal. 2. T is a set of symbol that are called as
The set of non terminals is disjoint from the terminals,
set of terminals. One of the non terminals 3. P is a set of productions, and,
are chosen to denote the language 4. S is a member of v, called as start symbol.
described by the grammar. This is called Example: Terminals: id, +,-,*, /, (,)
the start symbol of the grammar. The sets Non terminals: expr, op
are described by a number of productions. Productions: expr expr op expr
Each production describes some of the expr (expr)
possible strings that are contained in the expr - expr
set denoted by a non terminal. expr id
A production has the form op + |-|*|/
N X1........Xn The start symbol: expr
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2.3.1 Notational Connections 2.4 Derivations
1. These symbols are the terminals.
The basic idea of derivation is to
i. Lower –case letters early in the
consider productions as rewrite rules.
alphabet e.g., a, b, c etc.
Whenever we have a non terminal, we
ii. Operator symbols, e.g., +, - etc.
can replace this by the right-hand side
iii. Punctuation symbols, e.g.,
of any production in which the non
parentheses, comma, etc.
terminal appears on the left-hand side.
iv. The digits 0, 1, 2,..., 9,
We can do this anywhere in a sequence
v. Bold face strings such as id or if
of symbols (terminals and non terminals)
2. The symbols are the non- terminals
and repeat doing so until we have only
i. Upper case letters early in the
terminals left.
alphabet, e.g., A, B, C Etc.
The resulting sequence of terminals is a
ii. The letter S, which, when it appears,
string in the language defined by the
is usually the start symbol.
grammar.
iii. Lower case italic names such as expr
Formally, we define the derivation
Or Stmt
relation )by the three rules
3. Upper case letters in the alphabet, e.g.,
1: N
x,y,z, represent grammar symbols, i.e.,
either non-terminals or terminals if there is a production N
4. Lower case letters late in alphabet, e.g., 2:
u,v,...,z represent string of terminals 3:
5. Lower case greek letters, , , , if there is a b such that and
represent strings of grammar symbols.
Thus a generic production could be where , and are (possibly empty)
written as A sequences of grammar symbols
6. If A , A 2........A K are all (terminals and non terminals).
productions with A on the left, we may The first rule states that using a
write A 1| 2| ....... k the production as a rewrite rule (anywhere
alternatives for A. in a sequence of grammar symbols) is a
7. Unless otherwise stated, the left side of derivation step.
the first production is the start symbol. The second states that the derivation
Expr Expr Expr / Expr *Expr / id relation is reflexive, i.e., that a sequence
LMD: RMD: derives itself.
Expr Expr Expr Expr Expr *Expr The third rule describes transitivity, i.e.,
id Expr *Expr Expr Expr *id that a sequence of derivations is in itself
a derivation.
id id*Expr. Expr id*id.
We can use derivation to formally define
id id*id id id*id
the language that a context-free grammar
generates:
TR
T aTc
R
R RbR
aTc
aa T cc
aa R cc
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
aaRb R cc A leftmost derivation always chooses the
leftmost non-terminal to rewrite.
aa R bcc Example: E E E id E id id
aa R bbRbcc
2.4.2 Ambiguity
aabb R bcc
A grammar that produces more than
aabbbcc one parse tree for some sentences is
Derivation of the string aabbbcc using said to be ambiguous.
grammar Put another way, an ambiguous
aTc grammar is one that produces more
than one leftmost or more than one
aa T cc rightmost derivation for the same
sentence.
aa R cc
For certain types of parsers, it is
aa R bRcc describe that the grammar be made
unambiguous, for if it is not, we cannot
aa R bRbRcc uniquely determine which parse tree to
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Certain constraints on the input, such as every sentence derivable from S is
the requirement that identifiers be balanced, we use an inductive proof on the
declared before they are used, cannot number of steps in a derivation. For the
be described by a context-free basic step, we note that the only string of
grammar. terminals derivable from S in one step is
The sequences of tokens accepted by a the empty string, which surely is balanced.
parser form a superset of a Now assume that all derivations of fewer
programming language; subsequent than n steps produce balanced sentences,
phases must analyze the output of the and consider a leftmost derivation of
parser to ensure compliance with rules exactly n steps. Such a derivation must be
that are not checked by the parser. of the form
* *
Design: S (S)S (x)S (x)y
1. Rules for Token is described by Regular The derivations of x and y from S take
Expression fewer than n steps so, by the inductive
2. Convert Regular Expression (R. E) to hypothesis, x and y are balanced.
finite Automata. Therefore, string (x) y must be balanced.
We have thus shown that any string
derivable from S is balanced.
We must next show that every balanced
string is derivable from S. To do this we use
2.4.4 Regular Expression vs. Context- induction on the length of a string. For the
Free Grammars basic step, the empty string is derivable
Every construct that can be described by a from S, Now assume that every balanced
regular expression can also be described by string of length less then 2n is derivable
a context-free grammar. For example, the from S, and consider a balanced string w of
regular expression (a|b)*abb and the length 2n, n 1. Surely w begins with a left
grammar parenthesis. Then w can be written as (x)y
ARR 0 RR aARR 0 RR | bARR 0 | aARR1 where both x and y are balanced. Since x
and y are of length less than 2n, they are
ARR1RR bARR 2
derivable from 5 by the inductive
ARR 2 RR bARR 3RR hypothesis. Thus, we can find a derivation
ARR 3RR of the form
* *
Describe the same language, the set of S (S)S (x)S (x)y
strings of a’s and b’s ending in abb. We can Proving that w=(x)y is also derivable from S.
mechanically convert a nondeterministic
finite automaton (NFA) into a grammar 2.4.5 Rewriting ambiguous expression
that generates the same language as grammars
recognized by the NFA.
If we have an ambiguous grammar
Example 2.6: Consider the grammar E EE
S (S)S | E num
It may not be initially apparent, but this We can rewrite this to an unambiguous
simple grammar generates all strings of grammar that generates the correct
balanced parentheses, and only such structure.
strings. To see this, we shall show first that As this depends on the associativity of
every sentences derivable from S is we use different rewrite rules for different
balanced, and then that every balanced associativity.
string is derivable from S. To show that
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
If is left-associative, we make the E E E'
grammar left-recursive by having a E E E'
recursive reference to the left only of the E E'
operator symbol:
E EE' E ' num
Operators with the same precedence must
E E' have the same associativity for this to
E ' num work, as mixing left-recursive and right-
Now, the expression 2 3 4 can only be recursive productions for the same
parsed as nonterminal makes the grammar
ambiguous. As an example, the grammar
E E E'
E EE'
E E'
E ' num
Exp Exp+Exp2
Exp Exp-Exp2
Exp Exp2
Exp2 Exp2*Exp3
Exp2 Exp2/Exp3
Exp2 Exp3
Exp3 num
We get a slightly more complex syntax tree Exp3 (Exp)
than in figure 3.10, but not enormously So.
we handle right-associativity in a similar Grammar: Unambiguous expression
fashion: We make the offending production grammar
right-recursive: Seems like an obvious generalization of the
E E ' E principles used above, giving + and-.
E E' The same precedence and different
E ' num associativity. But not only is the grammar
Non-associative operators are handled by Ambiguous, it does not even accept the
non-recursive productions: intended language. For example, the string
E E ' E ' num + num num is not derivable by this
grammar.
E E' In general, there is no obvious way to
E ' num resolve ambiguity in an expression like
Note that the latter transformation actually 1+2_3, where + is left-associative and _ is
changes the language that the grammar right-associative (or vice-versa).
generates, as it makes expressions of the Hence, most programming languages (and
form num num num illegal. most parser generators) require operators
So far, we have handled only cases where at the same precedence level to have
an operator interacts with itself. This is identical associativity.
easily extended to the case where several We also need to handle operators with
operators with the same precedence and different precedence. This is done by Using
associativity interact with each other, as for a non-terminal for each precedence level.
example+ &-: The idea is that if an expression uses an
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
operator of a certain precedence level, then The non-terminal A generates the same
its sub expressions cannot us operators of string as before but is no longer left
lower precedence (unless these are inside recursive. This procedure eliminates all
parentheses). Hence, the productions for a immediate left recursion from the A and A’
non-terminal corresponding to a particular productions (provided no i is ), but it
precedence level refers only to non- does not eliminate left recursion involving
terminals that correspond to the same or derivations or two or more steps.
higher precedence levels, unless parentheses
or similar bracketing constructs Example 2.7
disambiguate the use of these grammar S Aa| b
shows how these rules are used to make an A Ac|Sd|
unambiguous version of grammar. The non terminal S is left recursive because
S Aa Sda , but it is not immediately left
2.4.6 Elimination of left recursion recursive. The following algorithm will
systematically eliminate left recursion from
A grammar is left recursive if it has non- any grammar. It is guaranteed to work if
terminals A such that there is a derivation the grammar has no cycles (derivations of
A A for some string . the form A A) o
Top-down parsing methods cannot handle - productions.
left recursive grammars, so a transformation
that eliminates left recursion is needed. Algorithm 1: Eliminating left recursion.
Consider the following grammar for
arithmetic expressions. Input: Grammar G with no cycles or -
E ET|T productions.
T T*F | F
Output: An equivalent grammar with no
F (E) | id
left recursion.
Eliminating the immediate left recursion Method: Apply the algorithm G. Note that
(Production of the from A A ) to the the resulting non – left recursive grammar
productions for E and then for T, we obtain may have -productions.
E TE '
E ' TE ' | 1. Arrange the non-terminals in some are
T FT ' ARR1RR,ARR2RR...ARRnRR,
2. For i := i to n do begin
T ' *FT ' |
3. For j :=i to i – I do begin
F (E) | id 4. Replace each production of the form AiR
Let production is of the from A A | . A j
No matter how many A – productions are By the productions
there, we can eliminate immediate left
ARR i RR RR i RR | .........| RR jRR
recursion from them by the following
technique. First, we group the A Where
productions as ARR j RR1RR | RR 2 RR | .......| RR jRR
A A1 | A2 | ...| Am | 1 | 2 | ...| n are all the current ARR j RR
Where no i begins with an A. Then we productions;
replace the A-productions by 5. End
A 1A ' | 2 A ' | .... | n A ' 6. Eliminate the immediate left recursion
A ' 1 A ' |2 A ' | ..... |m A ' | among the Ai – productions
7. End
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The reason the procedure works is that cannot immediately tell which production
after the iteration of the outer for loop in has to choose to expand stmt. In general, if
step (2), any production of the form A RR1RR | RR 2RR are two A-
ARR k RR ARR i RR where k<i must productions, and the input begins with a
have I> k. As a result, on the next iteration, non empty string derived from A, we do not
the inner loop (on j) progressively raises know whether to expand A to RR1RR or
the lower limit on m in any production to RR 2 RR . However, we may defer the
ARR i ARR m RR a, until we must have m
decision by expanding A to A ' . Then,
1. after seeing the input derived from it we
Then, eliminating immediate left recursion expand A’ to RR1RR or to RR 2 RR . That
for the A- productions forces m to be greater
than i. is, left-factored, the original productions
become
Example 2.8 A A '
Let us apply this procedure to grammar A ' RR1RR | RR 2 RR
(2.8), Technically, Algorithm 1 is not
guaranteed to work, because of the - Algorithm 2. Left factoring a grammar.
production. But in this case the production Input: Grammar G.
A turns out to be harmless.
We order the non terminals S, A. There is Output: An equivalent left-factored grammar.
no immediate left recursion among the S –
productions, so nothing happens during Method: For each nonterminal ‘A’ find the
step 2 for the case i=1. For i=2, we longest prefix common to two or more of
substitute the S – productions in A Sd to its alternatives. If , i.e., there is a
obtain the following A – productions. nontrivial common prefix, replace all the A
A Ac | Aad | bd | productions A 1 | 2 | ...| n |
Eliminating the immediate left recursion where represents all alternatives that do
among the A–productions yields the not begin with
following grammar. A A ' |
S Aa | b A ' 1RR | 2 | .... | n
A bdA ' | A ' Here A’ is a new nonterminal. Repeatedly
A ' cA ' | adA ' | apply this transformation until no two
alternatives for a nonterminal have a
2.4.7 Left factoring common prefix.
Example 2.9: The following grammar
Left factoring is a grammar transformation
abstracts the dangling-else problem;
that is useful for producing a grammar
S iEtS | iEtSeS | a
suitable for predictive parsing. The basic --------- 1
idea is that when it is not clear which of Eb
two alternative productions to use expand Here i, t, and e stand for it, then and else, E
a non–terminal A, we may be able to and S for “expression” and “statement.”
rewrite the A-productions to defer the Left-factored, this grammar becomes:
decision until we have seen enough of the S iEtSS' | a
input to make the right choice. For
S' eS | ------------------- 2
example, if we have the two productions
stmt if expr then stmt else stmt | if expr Eb
then stmt On seeing the input token if, we hus, we may expand S to iEtSS’ on input i,
and wait until iEtS has been seen to decide
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
whether to expand S’ to eS or to . Of (i.e., repeated scanning of the input ) ;
course, grammars 1 and 2 both are because in the attempt to obtain the
ambiguous, and on input e, it will not be leftmost derivation of the input string w, a
clear which alternatives for S’ should be parse may encounter a situation in which a
chosen. nonterminal A is required to be derived
next, and there are multiple A- productions
2.5 Top – Down Parsing
, such as A 1 | 2 ...| n . In such a
Parsing Methods situation deciding which A-productions to
1. Universal: There exist algorithms that derive A, and if this derivation finally leads
can parse any context free grammar. to the derivation of w, then the parser
These algorithms are too inefficient to announces the successful completion of
be used anywhere. parsing. Otherwise, the parser resets the
2. Top down parsing: Build the parse input pointer to where it was when the
tree from root to leaves ( using leftmost non-terminal A was derived, and it tries
derivation). Examples are recursive another A-production. The parser will
descent, and LL parser. continue this until it either announces the
3. Bottom up parsing: build the parse successful completion of the parsing or
tree form leaves to root. reports failure after trying all or the
Examples are operator precedence alternatives.
parsing , LR(SLR, canonical LR and
LALR ) 2.5.2 Recursive – Descent Parsing
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Failure so backtracks to A. The transition diagram for above grammar
Recursive-Descent parsing involves is as known
backtracking, that is making repeated scans
of the input. However, backtracking parsers
is not seen frequently. One reason is that
backtracking is rarely needed to parse
programming language constructs. A left
recursive grammar can cause a recursive
descent parser, even one with
backtracking, to go into an infinite a loop.
2.5.3 Predictive parsing
The predictive parsing is a special form
of recursive descent parsing, where no
back tracking is required.
The special case of recursive parsing is
called predictive parsing where no 2.5.4 Non-Recursive Predictive Parsing
backtracking is required.
In many cases, by carefully writing a The key problem during predictive
grammar, eliminating left recursion from parsing is that of determining the
it. And left factoring the resulting production to be applied for a non-
grammar, we can obtain a grammar that terminal.
can be parsed by a recursive-descent The non-recursive parser in Fig.2.10
parser that needs no backtracking i.e., a looks up the production to be applied in
predictive parser, a parsing table. In what follows, we
A (recursive) procedure is associated with shall see how the table can be
each non-terminal of a grammar. A constructed directly from certain
production is chosen by looking at the look grammars.
ahead symbol (the currently scanned input
token). A non-terminal in the right side of
the production results in a call to the
corresponding procedure for the non-
terminal. We can model predictive parsing
using transition diagrams.
Example
E ET|T
T T*F | F Fig.2.10 Model of a non recursive
F (E) | id predictive parser
A table-driven predictive parser has an
Eliminating the immediate left recursion to
input buffer, a stack, a parsing table, and an
the productions for E and then for T, we
output stream.
obtain
E TE ' The input buffer contains the string to be
parsed, followed by $ a symbol used as a
E TE ' | right end marker to indicate the end of the
T FT ' input string.
T ' *FT ' | The stack contains a sequence of grammar
symbols with $ on the bottom, indicating
F (E) | id
the bottom of the stack.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Initially, the stack contains the start symbol all j= 1, 2,.....k, then add to FIRST (X).
of the grammar on top of $. The parsing If Y1 does not derive , then we add
.
table is a two-dimensional array M [A, a ],
nothing more to FIRST (X), but if Y1
where A is a nonterminal, and a is a
, then we add FIRST (Y2) and so on
terminal or the symbol $.
The parser is controlled by a program that Example:
behaves as follows. E TE '
The program considers x, the symbol on E ' TE ' |
top of the stack, and a, the current input
symbol. T FT '
These two symbol determine the action or T ' *FT ' |
the parser. There are three possibilities. F (E) | id
a. If X=a=$, the parser halts and announces FIRST(E) {(,id}
successful completion parsing.
b. If X=a ≠$, the parser pops X off the stack FIRST(E ') {,}
and advances the input pointer to the FIRST(T) {(,id}
next input symbol. FIRST(T ') {*,}
c. If X is a nonterminal, the program or an
FIRST(F) {(,id}
error entry. If for example, M[X, a] = {X
UVM},the parser replaces x on top of
the stack by WVU (with U on top). As 2.5.6 Follow sets
output, we shall assume that the parser The follow se of a nonterminal A is the set
just prints the production used; any of terminals that can appear immediately
other code could be executed here if to the right of A in some sentential form,
M[X, a] = error, the parser calls an error namely,
recovery routine. S *CAa,a is in the follow set of A.
The behavior of the parser can be To compute FOLLOW (A) for all non-
described in terms of its configurations, terminals A, apply the following rules until
which give the stack contents and the nothing can be added to any FOLLOW set.
remaining input. i) In place $, in FOLLOW(S), where S is the
2.5.5 First sets start symbol and $, is the input right
end marker.
The first set of a string is the set of ii) If there is a production A B, then
terminals that begin the strings derived everything in FIRST( ) except for is
from If * .Then is also in the first placed in FOLLOW(B)
et of . iii) If there is a production A B , or
To compute First ( ) for all grammar there is a production A B where
symbols , apply the following rules until .
no more terminal or can be added to any FIRST( ) Contains (i.e., ) then
FIRST set. everything in FOLLOW (A) is in follow
i) If is terminal, then FIRST ( ) is { } (B).
ii) If is a production, then add to Example
FIRST ( ). E TE '
iii) If is nonterminal and X Y1Y2...Yk is
production, the place a in a FIRST (X) if E ' TE ' |
for come i, a is in FIRST (Y1) , and is in T FT '
FIRST(Y1),....FIRST (Yi-1) ; that is , T ' *FT ' |
Y1..........Yi-1 . If is in FIRST (Y1) for
F (E) | id
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
FOLLOW(E) {),$,} carefully, we see that it is tracing out a
FOLLOW(E ') {),$,} leftmost derivation for the input, That is,
FOLLOW(T) {, ),$,} the productions output are those of a
leftmost derivation, the input symbols that
FOLLOW(T ') {, ),$,} have
FOLLOW(F) {,*, ),$,}
Non- Input Symbol
Construction of Predictive Parsing Table terminal id + * ( ) $
Input: Grammar G. E E→TE’ E→TE’
Output: Parsing Table M. E’ E→+TE’ E’→ϵ E’→ϵ
T T→FT’ T→FT’
T’ T’→ϵ T’→*FT’ T’→ϵ T’→ϵ
Method: F F→id F→(E)
i) For each production A ,do steps 2&3.
ii) For each terminal a in FIRST ( ), add
Fig.2.11 parsing table M for grammar for
A to M [A, a ].
arithmetic expression
iii) If is in FIRST ( ), add A to M [A,
b] for each symbol b in FOLLOW (A). If The Parsing of a string id + id * id is:
is in FIRST ( ) and $ is in FOLLOW Stack Input Moves
(A), add A to M [A $] $E id + id * id$ Derivation using E→TE’
iv) Make each undefined entry of M be $E’T Id + id * id$ Derivation using T→FT’
error $E’T’F Id + id * id$ Derivation using F→id
$E’T’id Id + id * id$ Popping id from the stack
and advance one position in
Example the input
E TE ' $E’T’ + id * id$ Derivation using T→ϵ
E ' TE ' | $E’ + id * id$ Derivation using E’→+TE’
$E’T+ + id * id$ Popping + from the stack
T FT ' and advance one position in
T ' *FT ' | the input
$E’T id * id$ Derivation using T→FT’
F (E) | id SE’T’F id * id$ Derivation using F→id
FIRST(E) FIRST(T) FIRST(F) {(,id} $E’T’ id id * id$ Popping id from the stack
and advance one position in
FIRST(E ') {,} the input
FIRST(T1 ) {*,} $E’T’ *id$ Derivation using T→*FT’
$E’T’F *id$ Popping * from the stack
FOLLOW (E) =FOLLOW (E’) = { ), $} and advance one position in
FOLLOW (T) = FOLLOW (T’) = {+,), $} the input
FOLLOW (F) = { +, *, ), $} $E ‘T’ F id $ Derivation using F→id
$E’T’ id id $ Popping id from the stack
Example A predictive parsing table for this and advance one position in
the input
grammar is shown in Fig. 2.11. Blanks are
$E’T’ $
error entries; non-blanks indicate a Derivation using T’→ϵ
production with which to expand the top $E’ $ Derivation using E’→ϵ
non-terminal on the stack. Note that we
have not yet indicated how these entries $ $ Announce successful
could be selected, but we shall do so shortly. completion of parsing
With input id + id * id the predictive parser
makes the sequence of moves in Fig. 2.12. 2.6 LL (1) Grammars
The Input pointer points to the leftmost
symbol of the string in the Input column. If A grammar is LL (1) grammar if its LL (1)
we observe the actions of this parser parsing table has no multiply – defined
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
entries. No ambiguous and left recursive left parenthesis are in FIRST (T) as well. As
grammar can be LL (1). another example, is in FIRST(E’) by rule (2).
In the acronym LL (A), the first L stands for To compute FOLLOW sets, we put $ in
the left-to-right scan of the input, the FOLLOW (E) by rule (1) for FOLLOW. By
second L rule (2) applied to production F→ (E), the
Stands for the leftmost derivation, and the right parenthesis is also in FOLLOW (E). By
(1) indicates that the next input symbol is rule (3) applied to production
used to decide the next parsing process E→TE’ $ and right parenthesis are in
*
(i.e., length of the look ahead is “1”). In the FOLLOW (E’). Since E ' , they are also in
LL(1) parsing system, parsing is done by FOLLOW (T).
scanning the input from left to right, and an For a last example of how the FOLLOW
attempt is made to derive the input string rules are applied, the production E→TE’
in a leftmost order. The next input symbol implies, by rule (2), that everything other
is used to decide what is to be done next in than in FIRST(E’) must be placed in
the parsing process. FOLLOW (T). We have already seen that $
For a grammar to be LL (1), the following is in FOLLOW (T)
condition must be satisfied:
For every pair of productions 2.6.1 Some of the examples for finding
{ FIRST ( )
FIRST () FIRST() And
IF FIRST ( ) contains , and FIRST ( ) Examples: S→AB
does not contain Then A →a/b
FIRST ( ) follow (A) = B →d/c
} Solution:
FIRST (S) = FIRST (A)
Example: Find first & follow for given = {a, b}
grammar FIRST (A) = {a, b}
E TE ' FIRST (B) = {d, c}
E ' TE ' |
Example: S ABc
T FT ' A a / b/
T ' *FT ' | Solution:
F (E) | id FIRST (S) = {a, b, d,e,c}
Then: FIRST (A) = {a, b, }
FIRST(E) FIRST(T) FIRST(F) {(,id}, FIRST (B) = {d, e, }
FIRST(E ') {,}
Example: S→ABCD
FIRST(T ') {*,} A →d/
FOLLOW(E) FOLLOW(E ') {),$} B → e/l/
C →f/g
FOLLOW(T) FOLLOW(T ') {, ),$}
D →m/
FOLLOW(F) {,*, ),$}
For example, id and left parenthesis are Solution:
added to FIRST(F) by rule (3) in the FIRST (S) = FIRST (A) + FIRST (BCD)
definition of FIRST with i=1 in each case, = {d, e, l, f, g}
since FIRST(id) ={id} and FIRST(‘(‘) = {( } Here ‘m’ is not included because ‘C’
by rule (1). Then by rule (3) with i = 1, the contains only terminals, not . So ‘D’ is not
production T→FT1 implies that id and included.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Some of the examples for finding FOLLOW ( F contain
) = {g, f, h}
1. FOLLOW (C) : C is Non Terminal one &
S →ABCDE Ends with ‘C’ only
A →a/ Then FOLLOW (C) = FOLLOW (B)
B →b/ = {g, f, h}
C →c/ FOLLOW (D): {h}
D →d/ FOLLOW (E): {g, f, h}
E →e/ Important Observations ‘h’ came in
Solution: FOLLOW (E) because F contain ‘ ’. So go
FOLLOW() for ending term ‘h’
S {$} FOLLOW (F) = FOLLOW (D)
A {b, c, d, e, $} = {h}
B {c, d, e, $}
C {d, e, $} 2.6.2 Construction of Predictive Parsing
D {e, $} Tables
E {$}
FOLLOW (A) = FIRST (BCDE) The following algorithm can be used to
B contains , so proceed (Because beside B, construct a predictive parsing table for
non-terminals are present) a grammar G.
E contains , so add $ to it. The idea behind the algorithm is the
1. following. Suppose A is a
S →ABCDE production with a in FIRST( ). Then ,
A →a/ the parser will expand A by a when the
B →b/ current input symbol is .
C →c/ The only complication occurs when a
D →d/ or * . In this case, we should again
E →e/ expand A by a if the current input
symbol is in FOLLOW (A), or if the $ on
Solution : the input has been reached and $ is in
Same as above grammar, it changes is E → e FOLLOW (A).
then
Since if ‘ ’ at E→ e Ends with ‘e’ only Algorithm for Construction of a predictive
Do not processed, so ends with “$” parsing table.
2.
S →aBDh Input: Grammar G.
B →cC Output: Parsing table M.
C →bC/
D →EF Method:
E →g/
F →f/ 1. For each production A of
grammar, do steps 2 and 3.
Solution 2. For each terminal a in FIRST(A), add
FOLLOW S = $ A to M[A, a].
FOLLOW (B) = FIRST (D)+{h} 3. If is in FIRST ( ), add A to M[A,
{h is terminal, it comes after D) b] for each terminal b in FOLLOW (A). If
= FIRST (D)
is in FIRST ( ) and $ is in FOLLOW
= FIRST (E) + FIRST (F)
(A), add A to M[A, $].
= {g, f}
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4. Make each under find entry of M be A grammar whose parsing table has no
error. multiply-defined entries is said to be LL (1).
The first “L” in LL(1) stands for scanning
Example: Let us apply Algorithm above to the input from left to right, the second “L”
grammar Fig. 2.13 Since FIRST (TE’) = for producing a leftmost derivation, and the
FIRST (T) = {(, id}, production E →TE’ “l” for using one input symbol of lookahead
causes M [E, ... and M[E, id] to acquire the at each step to make parsing action
entry E →TE’. decisions.
Production E’→+ TE’ causes M [E’, +] to LL (1) grammars have several distinctive
acquire E’→+TE’. Production E’→ causes properties. No ambiguous or left recursive
M[E’, )] and M [E’,$] to acquire E’→ since grammar can be LL (1). It can also be
FOLLOW (E’) = {),$}. shown that a grammar G is LL (1) if and
only if whenever A→ | are two distinct
2.6.3 LL (1) Grammars parsing Table productions of G the following conditions
hold:
Algorithm for construction of predictive
parser table can be applied to any grammar 1. For no terminal a do both and
G to produce a parsing table M. For some derive strings beginning with a,
grammars, however, M may have some 2. At most one of and can derive the
entries that are multiply defined. For
empty string.
example, if G is left recursive or ambiguous, *
then M will have at least one multiply- 3. If , then does not derive any
defined entry. string beginning with a terminal in
Example 2.13: Let us consider grammar FOLLOW (A).
S →iEtss’/a
S’→eS/ The main difficulty in using predictive
E→b parsing is in writing a grammar for the
The Parsing table for this grammar is source language such that a predictive
shown in below Fig. parser can be constructed from the
grammar Although left-recursion elimination
Non- Input Symbol
and left factoring are easy to do, they make
terminal id + * ( ) $
E E→TE’ E→TE’ the resulting grammar hard to read and
E’ E→+TE’ E’→ϵ E’→ϵ difficult to use for translation purposes. To
T T→FT’ T→FT’ alleviate some of this difficulty, a common
T’ T’→ϵ T’→*FT’ T’→ϵ T’→ϵ
F F→id F→(E)
organization for a parser in a compiler is to
use a predictive parser for control constructs
Fig. 2.13 parsing table M for grammar
and to use operator precedence for
expressions. However, if a LR parser for
The entry for M[S’, e] contains both S’ → eS
control constructs and to use operator
and S’ → E, since FOLLOW (S’) = { e, $ }.
precedence for expressions. However, if a
The Grammar is ambiguous and the
LR parser generator, one can get all the
ambiguity is manifested by a choice in what
benefits of predictive parsing and operator
production to use when an e (else) is seen.
precedence automatically.
We can resolve the ambiguity if we choose
On the erroneous input, id * +id the parser
S’ → eS. This choice corresponds to
and error recovery mechanism of Fig.2.14
associating else with the closest previous
behaves as in Fig. 2.15
then. Note that the choice S’→ would
prevent e from ever being put on the stack
or removed from the input, and is therefore
surely wrong.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
STACK INPUT REMARK These routines may change, insert, or
delete symbols on the input and issue
$E ) id * + id$ Error,
appropriate error messages.
$E id * + id$ skip )
They may also pop from the stack. It is
$ E’T id * + id$ id is in
questionable whether we should permit
$ E’T’F id * + id$ FIRST (E)
alternation of stack symbols or the
$ E’T’ id id * + id$
pushing of new symbols onto the stack,
$ E’T’ * id $
since then the steps carried out by the
$ E’T’F * * id $
parser might
$ E’T’F + id$
$ E’T’F Not correspond to the derivation of any
+ id$
$ E’T’ word in the language at all.
+ id$
$ E’ + id$ In any event, we must be sure that there
$ E’T’+ + id$ Error, M [F, is no possibility of an infinite loop.
$ E’T id$ +1} = synch Checking that any recovery action
$ E’T’F id$ eventually results in an input symbol
$ E’T’ id id$ being consumed (or the stack being
$ E’T’ $ shortened if the end of the input has
$ E’ $ F has been been reached) is a good way to protect
$ $ popped against such loops.
Fig. 2.15 Parsing and error recovery
moves made by predictive parse 2.6.5 BOTTOM-Up Parsing
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Find a production whose right side is a the left of that production, and if the
substring (left most) of x. substring is chosen correctly at each
Clearly A →b satisfies. step, a right most derivation is tracked
Abbcde aAbcde since A →b out in reverse.
aAcde since A →Ab A more general method of shift
aAcBe since B →d reducing parsing is called LR parsing.
S since S →aAcBe
Consider the grammar: 2.6.7 Stack Implementation or SR
E → E + E | E * E | id Parsing
And the rightmost derivation:
A parser goes on shifting the input symbol
E → E + E → E + E * E → E + E * id →E + id *
onto stack until a handle comes on the top
id →id + id * id
of the stack. When a handle appears on
The handles of the sentential forms
the top of the stack, it performs reduction.
occurring in the above derivation are
This implementation makes use of a stack
shown in the following table.
Sentential Handle
to hold grammar symbols and an input
Form buffer to hold the string w to be parsed,
id + id * id E→ is at position preceding + which is terminated by the right end
E + id *id E→ is at position + marker $, the same symbol used to mark
E + E * id E→ is at the position following + the bottom of the stack. The stack is initially
E +E * E E→ E * E at the position following empty, and buffer contains the entire string
+
E+E E→ E + E at the position preceding
w. The parser shifts zero or more symbols
the end marker. from the input on to the stack until handle
appears on the top of the stack. The
Fig.2.16 Reduction made by Shift – parser then reduces to the left side of the
Reduce parser appropriate production. This cycle is
Therefore, the bottom-up parsing is an repeated until is empty giving the
attempt to detect the handle or right configuration ($S, $). If the parser enters
sentential form. And whenever a handle is ($S, $), then it announces the successful
detected, the reduction is performed. This completion of parsing. Thus, the primary
is equivalent to performing rightmost operation of the parser is to shift and
derivations in reverse and is called “handle reduce.
pruning”.
2.6.6 Shift Reduce Parsing
It is a general style of bottom-up syntax
analysis
An easy to implement form of shift
reducing implement form of shift
reducing parsing is called operator
precedence parsing
Shift-reduce parsing attempts to
construct a parse tree for an input
string beginning at the leaves(the
bottom)and working up towards the 2.6.8 Stack Operations
root (the top).
At each reduction step a particular Shift: Shift the next input symbol to the
substring matching the right side a of a top of the stack
production is replaced by the symbol on
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Reduce: replace the handle at the top of That’s why we use a state to uniquely
the stack with the corresponding identify a part of a handle ( viable prefix)
nonterminal so that stack scanning becomes
Accept: announce successful completion unnecessary.
of the parsing Every shift-reduce parser for such a
Error: call an error recovery routine grammar can reach a configuration in
which the parser, knowing the entire
Example: stack content and the next input
E →TE’ symbol, cannot decide whether to shift
E →+ TE’| or reduce (a shift/reduce conflict), or
T →FT’ cannot decide which of several
T →* FT’| reductions to make(a reduce/reduce
F →(E) |id conflict). These grammars are not there
Take the input string: id + id * id . The in LR(k) class of grammar, we refer to
various steps in parsing this string are them as non-LR grammars.
known in the following table in terms of the
contents of the stack and unspent input. 2.6.11 Shift Reduce Conflict
Stack Moves
contents Input
$ Id +id * id$ Shift id An ambiguous grammar can never be LR.
$id + id * id$ Reduce by F →id Consider the following grammar
$F + id * id$ Reduce by T →F Stmt →if expr then stmt
$T + id * id$ Reduce by E →T | if expr then stmt else stmt
$T +id * id$ Shift + | other
$E + id * id$ Shift id If we have shift reduce parser in
$E + id *id$ Reduce by F→id
$E + F *id$ Reduce by T →F
configuration
$E + T *id$ Shift * STACK INPUT
$E + T* id$ Shift id ......if expr then stmt else....$
$E + T*id $ Reduce by F →id We cannot tell whether if expr then stmt is
$E + T*F $ Reduce by T →T*F the handle, no matter what appears what
$E + T $ Reduce by E →E + T appears below it on the stack. Here is a
$E $ Accept
shift/reduce conflict. Depending on what
Fig.2.17 configuration of shift-reduce follows else on the input, it might be
parser on input id+id*id correct to reduce “if expr then stmt” to
2.6.9 Viable prefixes stmt, or it might be correct to shift else and
then to look for another stmt to complete
The set of prefixes of right sentential the alternative “if expr then stmt else stmt”.
forms that can appear on the stack of Thus we cannot tell whether to shift or
shift reduce parses are called viable reduce in this case, so the grammar is not
prefixes. LR(1).
An equivalent definition of a viable
prefix is that it is a prefix of a right 2.6.12 Reduce/Reduce Conflict
sentential form that does not continue Consider the grammar
past the right end of the rightmost 1. Stmt →id(parameter_list)
handle of that sentential form. 2. Stmt →expr :=expr
3. Parameter_list→parameter_list,
2.6.10 Conflicts During Shift-Reduce parameter
Parsing 4. Parameter_list →parameter
The stack has to be scanned to see 5. Parameter →id
whether a handle appears on it. 6. expr →id(expr_list)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
7. expr →id 1. *is of highest precedence and left
8. expr_list →expr_list, expr associative
9. expr_list →expr 2. + is the lowest precedence and left
A statement beginning with A (I,J) would associative
appear as the token stream id(id , id) to the The operator-precedence relations for this
parser. After shifting the first three tokens grammar are solved in the following table.
onto the stack, a shift-reduce parser would + * ( ) id $
be in configuration. + > < < > < >
STACK INPUT * > > < > < >
........id (id id)...... ( < < < = <
It is evident that id on top of the stack must ) > > > >
be reduced, but by which production? The Id > > > >
correct choice is production (5) if A is a $ < < < <
procedure and production (7) if A is an
array. The stack does not tell which 2.6.14 Using Operator Precedence
information in the symbol table obtained Relations
from the declaration of A has to be used.
Hence reduce-reduce conflict is here. The intension of the precedence
relations is to delimit the handle of a
2.6.13 Operator-Precedence Parsing right-sentential form, with < marking
the left end, = appearing in the interior
It is a form of shift-reduce parsing, which is of the handle, and > marking the right
easy to implement. These grammars have end.
the property (among other essential To be more precise, suppose we have a
requirements) that no production right right-sentential form of an operator
side is or has two adjacent non-terminals. grammar.
A grammar with these properties is called The fact that no adjacent nonterminals
an operator grammar. appear on the right side of productions
In operator precedence parsing, we define implies that no right-sentential form
three disjoint precedence relations, <, =, will have two adjacent nonterminals
and >, between certain pairs of terminals. either.
These precedence relations guide the Thus, we may write the right-sentential
selection of handles and have the following from as βRR 0 RR aRR1RRβRR1RR ......
meanings.
βRR n RR i is either (the empty string)
Relation meaning or a single nonterminal, and each a; is a
a<b A “yields precedence to” b single terminal.
a=b A “has the same precedence Suppose that between aRRjRR and
as” b aRRi+jRR exactly one of the relations < ,
a>b A “takes precedence over” b =, and > holds.
Fig: 2.18 Further, let us use $ to mark each end of
the string, and define $ < b and b > $ for
The common way of determining what all terminals b.
precedence relations should hold between Now suppose we remove the
a pair of terminals is notations of nonterminals from the string and place
associativity and precedence of operators. the correct relation < =, or >, between
For example if * is to have higher each pair of terminals and between the
precedence than +. We make + < * and * > + endmost terminals and the $’s marking
consider the following operator grammar: the ends of the string.
E → E + E | E * E | (E)| id with assumptions:
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
For example, suppose we initially have between * and $. These precedence
the right-sentential form id + id * id and relations indicate that, in the right-
the precedence relations are those sentential form E+E*E, the handle is E*E
given in Fig.2.19. These relations are Note how the E’s surrounding the * become
some of those that we would choose to part of the handle.
parse according to grammar. Since the non-terminals do not influence
id + * $ the parse, we need not worry about
Id .> .> .> distinguishing among them. A single
+ <. .> <. .>
marker “non-terminal” can be kept on the
* <. .> .> .>
$ <. <. <. stack of a shift-reduce parser to indicate
placeholders for attribute values.
Fig.2.19 operator-precedence relations It may appear from the discussion above
that the entire right-sentential form must
Then the string with the precedence be scanned at each step to find the handle.
relations inserted is: Such is not the case if we use a stack to
$ <.id.> + <.id.> * <.id.> $ store the input symbols already seen and if
For example, < is inserted between the the precedence relations are used to guide
leftmost $ and id since <. Is the entry in row the actions of a shift-reduce parser. If the
.
$ and column id. The handle can be found precedence relation < or holds between
by the following process. the topmost terminal symbol on the stack
1. Scan the string from the left end until and the next input symbol, the parser
the first .> is encountered. In Fig (2.18) shifts; it has not yet found the right end of
above, this occurs between the first id the handle. If the relation > holds, a
and +. reduction is called for. At this point the
2. Then scan backwards (to the left) over parser has found the right end of the
.
any s until a < is encountered. In Fig handle, and the precedence relations can be
(2.18), we scan backward to $. used to find the left end of the handle in the
3. The handle contains everything to the stack.
left of the first .> and to the < If no precedence relation holds between a
encountered in step (2), including any pair of terminals (indicated by a blank
intervening or surrounding entry in Fig. 2.19) then a syntactic error has
nonterminals. (The inclusion of been detected and an error recovery routine
surrounding nonterminals is necessary must be invoked, as discussed later in this
so that two adjacent nonterminals do section. The above ideas can be formalized
not appear in a right-sentential} form ) by the following algorithm.
In (Fig. 2.18), the handle is the first id. 2.6.15 Operator precedence parsing
If we are dealing with grammar for algorithm.
arithmetic expression we then reduce id to
E. At this point we have the right-sentential Input: An input string w and a table of
form E+id*id. After reducing the two precedence relations.
remaining id’s to E by the same steps, we Output: if w is well formed, a skeletal parse
obtain the right-sentential form E +E*E tree, with a placeholder nonterminal E
Consider now the string $+*$ obtained by Labeling all interior nodes; otherwise an
deleting the non-terminals. Inserting the error indication.
precedence relations, we get
$<. + <. *.>$ Method: Initially, the stack contains $ and
Indicating that the left end of the handle the input buffer contains the string w$. to
lies between + and * and the right end Parse, we execute the program in Fig. 2.20
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1) Set ip to point to the first symbol of w$; Given Grammar is not LL(1). Because it is a
2) repeat forever left recursive Grammar
3) if $ is on top of the stack and ip points to $ Example:
then Check, whether the given Grammar is
4) return Else begin LL(1)?
5) let a be the topmost terminal symbol on the
stack And let b be the: symbol pointed to S aAa/ Check for LL (1)
by ip; A abs / c
6) if a < b or a=b then begin Solution: Here
7) push b onto the stack; (S aAa/ )
8) advance ip to the next input symbol;
End;
9) else if a > b then /* reduce*/ S a abs a/ } it is in the form of
10) repeat A /
11) pop the stack (If first ( ) follow(A) = satisfies
12) until the top stack terminal is related by <
then only it is LL(1))
To the terminal most recently popped
A abS / C
13) else error end
Example: Check, whether the given
Fig 2.20 Operator-precedence parsing Grammar is LL(1)?
algorithm First Method of Checking:
First (a abSa ) Follow (S)
Check whether Grammar is LL(1) or not? = a {$, a}
Important observation: Because S is starting product
Suppose given Grammar is Ambiguous =a 0
Grammar. So it is not LL (1)
It is not LR (0), LL (1) (or) CLR (1), SLR (1), (or)
LALR (1) Second Method of checking: check with
Multiple Entries
2.6.16 LL (1) Grammar (Validations) : a
S aAa
1. A 1|2 |3 ........(1, 2, 3 are Non S S
terminals)If first ( 1, ) and first ( 2 ) Because of Multiple Entries it is not LL (1)
are mutually disjoint then first ( 1, ) 2.6.17 Operator-precedence Relations
from associatively and Precedence
first ( 2 ) =
2. A / First ( ) follow (A) = The operator precedence parsers
3. The given grammar should not be left usually do not store the precedence
recursive. table with the relationships; rather they
If above 3 conditions are satisfied, then are implemented in a special way.
only Given Grammar is LL (1). Operator precedence parsers use
Observation: precedence functions that map terminal
symbols to integers, and so the
For given Grammar, if LL(1) parsing table
precedence relations between the
is constructed without any multiple entries,
symbols are implemented by numerical
then Grammar is LL(1). comparison
Example: Check, whether the given Not every table of precedence relations
Grammar is LL(1)? has precedence functions but in
M M zc / h practice for most grammars such
Solution: functions can be designed.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1. If operator 1 has higher 1. + is of highest precedence and Right-
precedence than operator 2 , make associative,
2. * and/are of next highest precedence
1 > 2 and 2 < 1 . For example, if* and left-associative, and
has higher precedence than +, make 3. + and – are of lowest precedence and
* .> + <*. These relations ensure left-associative, (Blanks denote error
that, in an expression of the form entries.) The reader should try out the
E+E*E+E, the central E*E is the table to see that it works correctly,
handle that will be reduced first. ignoring problems with unary minus
2. If 1 and 2 are operators of equal for the moment. Try the table on the
precedence (they may in fact be the input id *(id +id)- id/id, for example.
same operator), then make 1 > 2
Fig.2.21 Operator – precedence
and 2 > 1 if the operators are left-
relations.
associative, or make 1 > 2 and 2 <
1 if they are right-associative. For 2.6.18 Precedence Functions
example, if + and – are left- Compilers using operator-precedence
associative, then make + .> +, +. >-,- parsers need not store the table of
.>-, and - . > +. If it is right precedence relations.
associative, then make +<+. These In most cases, the table can be enclosed
relations ensure that E-E+E will by two precedence functions f and g
have handle E-E selected and E+E+E that map
will have the last E+E selected. terminal symbols to integers. We
3. Make 0 <. id, id . > , < . (, ( < ) > , attempt to select f and g so that, for
>), . > $, and $ < for all symbols a & b,
operators . Also, let 1. f(a) < g(b) whenever a < . b,
(=) $<.( $ <. Id 2. f(a) – g(b) whenever a = b, and
( < . ( id. > $ ).>$ 3. f(a) > g(b) whenever a > b.
(<. id id. > ) ) . > ) Thus the precedence relation between a
These rules ensure that both id and and b can be determined by a numerical
(E) will be reduced to E. Also, $ comparison between f(a) and g(b). Note,
serves as both the left and right however, that f(a) and g(b) are. The loss of
endmarker, causing handles to be error detection capability is generally not
found between $’s whenever possible. considered serious enough to prevent the
usage of precedence functions where
Example 2.17: Figure 2.21 contains the possible errors can still be caught when a
operator-precedence relations for grammar reduction is called for and no handle can be
for arithmetic expression assuming found
Not every table of precedence relations has
+ - * / ↑ id ( ) $ precedence functions to encode it, but in
+ .> .> <. <. <. <. <. .> practical cases the functions usually exist.
- .> .> <. <. <. <. <. .> .>
* .> .> .> .> <. <. <. .> .> Example 2.18: The precedence table of Fig.
/ .> .> .> .> <. <. <. .> .>
2.21 has the following pair of precedence
↑ .> .> .> .> <. <. <. .> .>
Id .> .> .> .> .> .> functions,
( <. <. <. <. <. <. <. .> + - * / ↑ ( ) id
=
$
) .> .> .> .> .> .> f 2 2 4 4 4 0 6 6 0
$ <. <. <. <. <. <. <. G 1 1 3 3 5 5 0 5 0
Fig. 2.22
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
For example, * < id, and f(*) < g(id). Note length of the longest path from the
that f(id) > g(id) suggests that id > id; but, group of gRRbRR.
in fact, no precedence relation holds
between id and id. Other error entries in
Fig. 2.21 are similarly replaced by one or
another precedence relation.
A simple method for finding precedence
functions for a table, if such functions exist,
is the following.
Algorithm4.
Constructing precedence functions.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
possible strings that could be popped in Example 2.19:
lines (10-12) of Fig.2.20. let us reconsider grammar for arithmetic
Any such string bRR1RR bRR2RR.... expression.
bRRkRR must have = relations holding E → E+E | E-E | E*E | E/E | E+E | (E) | -E | id
between adjacent symbols, a < bRR1RR The precedence matrix for this grammar
= bRR2RR = .... = bRRkRR. was shown in Fig. 2.21, and its graph is
If an operator precedence table tell us given in Fig. 2.24. there is only one edge,
that there are only a finite number of because the only pair related by = is the left
sequences of terminals related by =then and right parenthesis. All but the right
we can handle these strings on a case- parenthesis are initial, and all but the left
by-case basis. parenthesis are final. Thus the paths from
For each such string x we can determine an initial to a final node are the paths +,-
in advance a minimum-distance legal ,*,/,id, and to of length one, and the path
right side y and issue a diagnostic from (to) of length two. There are but a
implying that was found when y was finite number, and each corresponds to the
found when y was intended. terminals of some production’s right side in
It is easy to determine all strings that the grammar. Thus the error checker for
could be popped from the stack in steps reductions needs to check that the proper
(10-12) of Fig. 2.20. these are evident in set of nonterminal markers appear among
the directed graph whose nodes the terminal strings being reduced.
represent the terminals, with an edge Specifically, the checker does the following;
from a to b if and only if a = b. Then the 1. If +,-,*,/, or +is reduced, it checks that
possible strings are the labels of the nonterminals appear on both sides. If
nodes along paths in this graph. Paths not, it issue the diagnostic missing
consisting of a single node are possible. operand
However, in order for a path bRR1RR 2. If id is reduced, it checks that there is no
bRR2RR ..... bRRkRR to be “poppable” on nonterminal between to the right or
some input, there must be a symbol a left. If there is it can warn missing
(possibly $) such that a < bRR1RRCall operator
such a bRR1 RR initial. Also, there must 3. If ( ) is reduced, it checks that there is a
be a symbol c (possible $) such that nonterminal between the parentheses.
bRRkRR >c. Call bRRkRR final:Only then If not, it can say no expression between
could a reduction be called for and parentheses Also it must check that no
bRR1RR bRR2 RR....bRRkRR the nonterminal appears
sequence of symbols popped. If the on either side of the parentheses. If one
graph has a path from an initial to a does, it issue the same diagnostic as in
final node containing a cycle, then (2).
are an infinity of strings that might be If there are an infinity of strings that
popped; otherwise, there are only a may be popped, error messages cannot
finite number. be tabulated on a case-by-case basis.
We might use a general routine to
determine whether some production
right side is close (say distance 1 or 2,
where distance is measured in terms of
tokens, rather than characters, inserted
deleted, or changed) to the popped
Fig. 2.25 Graph for precedence matrix of string and if so, issue a specific
Fig. 2.21 diagnostic on the assumption that
production was intended. If no
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
production is close to the popped string, routine; the same routine could be used in
we can issue a general diagnostic to the several places. Then when the parser
effect that “something is wrong in the consults the entry for a and b in step (6) of
current line”. Fig, 2.20, and no precedence relation holds
between a and b, it finds a pointer to the
2.6.21 Handling Shift/Reduce Errors error-recovery routine for this error.
We must now discuss the other way in
which the operator- precedence parser Example 2.20:
detects errors. When consulting the Consider the precedence matrix of Fig. 2.21
precedence parser detects errors. When again. In Fig. 2.25, we show the rows and
consulting the precedence matrix to decide columns of this matrix that have one or
whether to shift or reduce (lines 8 and 9 of more blank entries, and we have filled in
Fig. 2.20), we may find that no relation these blanks with the names of error
holds between the top stack symbol and the handling routines.
first input symbol. For example, suppose a Id ( ) $
and b are the two top stack symbols (A is at id e3 e3 .> .>
the top), c and d are the next two input ( <. <. .> e4
symbols, and there is no precedence ) e3 e3 = .>
relation between b and c. To recover, we
must modify the stack, input or both. we $ <. <. e2 e1
may change, we may change symbols, Fig. 2.26 Operator-precedence matrix
inserted symbols onto the input or stack, or with error entries
delete symbols from the input or stack. If The substance of these error handling
we insert or change, we must be careful routines is as follows:
that we do not get into an infinite loop, E1:/* called when whole expression is
where for example, we perpetually insert missing */
symbols at the beginning of the input Insert id onto the input
without being able to reduce or to shift any issue diagnostic: “missing operand”
of the inserted symbols. e2: /* called when expression begins with a
One approach that will assume us no right parenthesis */
infinite loop is to guarantee that after delete) from the input
recovery the current input symbol can be issue diagnostic: “unbalanced right
shifted (if the current input is $, guarantee parenthesis”
that no symbol is placed on the input, and e3: /* called when id or ) is followed by id
the stack is eventually shortened). For or ( * /
example, given ab on the stack and cd on insert + onto the input
the input, if a ≤cPP2PP we might pop b issue diagnostic: “missing operator”
from the stack. Another choice is to delete c e4: /* called when expression ends with a
from the input if b ≤ d. A third choice is to left parenthesis */
find a symbol e such that b ≤ e ≤ c and pop ( from the stack
insert e in front of c on the input. More issue diagnostic : “missing right
generally we might insert a string of parenthesis”
symbols such that b≤ eRR1RR≤ eRR2≤.........≤ Let us consider how this error-handling
eRRnRR≤c if a single symbol for insertion mechanism would treat the Erroneous
could not be found. The exact action chosen input id + ). The initial actions taken by the
reflect the compiler designer’s intuition parser are to shift id, reduce it to E (we
regarding what error is likely in each case. again use E for anonymous nonterminals
For each blank entry in the precedence on the stack), and theft to shift the +. We
matrix we must specify an error – recovery now have configuration
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
STACK INPUT one parser to another. The parsing
$E + )$ program uses a stack to store a string of the
Since + > ) a reduction is called for, and the S0X1X2S2.......XmSm, where sm is on top.
handle is +. The error checker for The parsing table consists of two parts, a
reductions is required to inspect for E’s to parsing function and a goto function. The
left and right. Finding one missing, it issues program driving the LR parser behaves as
the diagnostic missing follows: it determines sm. The state
operand and does the reduction anyway. currently on the top of the stack, and ai. The
Our configuration is now $E)$ current input symbol. If then consults
There is no precedence relation between $ action [sm, al]. The parsing action table
and), and the entry in Fig. 2.25 for this pair entry for state sm and input al which can
of symbols is e2. Routine e2 causes have one of four values:
diagnostic Unbalanced right parenthesis To i. Shift s, where s is a state
be printed and removes the right parenthesis ii. Reduce by a grammar production A →β
from the input. We are now left with the iii. accept
final configuration for parser. iv errors
$E $
2.7 LR Parsing
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
End symbol 5 have both been pushed onto the
else if action [s, a] = reduce A → β then stack and id has been removed from the
begin input.
Pop 2 * | β | symbols off the stack; Then * becomes the current input symbol,
Let s’ be the state now on top of the stack; and the action of state 5 on input * is to
Push A then goto [s’, A] on top of the stack; reduce by F → id. Two symbols are popped
output the production A → β off the stack (one state symbol and one
end grammar). State 0 is then exposed. Since
else if action [s, a]= accept then the goto of state 0 on F is 3, F and 3 are
return pushed onto the stack. We now have the
else error() configuration in line (3)
end Line Stack Input Moves
Example: 1 0 id*id + Shift
id$
1. E → E + T
2 0id 5 *id + id$ Reduce by F
2. E → T →id
3. T → T * F 3 0F 3 *id + id$ Reduce by T
4. T → F →F
5. F → (E) 4 0T 2 *id + id$ Shift
6. F → id
ACTION Go to 5 0T 2 * 7 id + id$ Shift
State ID + * ( ) $ E T F 6 0T 2 * 7 id +id$ Reduce by F
5 →id
0 S5 S4 1 2 3
7 0T 2 * 7 F +id$ Reduce by T
1 S6 acc
10 →T*F
2 R2 S& R2 R2
8 0T 2 +id$ Reduce by E
3 R4 R4 R$ R4
→T
4 S5 S$ 8 2 3
9 0E 1 +id$ Shift
5 R6 R6 R6 R6
10 0E 1 + 6 id$ Shift
6 S5 S4 9 3
11 0E 1 + 6id $ Reduce by F
7 S5 S4 10
5 →id
8 S6
12 0E 1 + 6F 3 $ Reduce by T
9 R1 S7 R1
→F
10 R3 R3 R3
13 0E 1 + 6 T $ E →E + T
11 R5 R5 R5 9
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
An LR parser does not have to scan the 5 4
entire stack to know when the handle 7 S S 1
5 4 0
appears on top. Rather, the state symbol on
8 S S1
top of the stack contains all the information 6 1
it needs It is a remarkable fact that if it is 9 r1 r r1 r1
possible to recognize a handle knowing 7
only the grammar symbols on the stack, 10 r3 r r3 r3
then there is a finite automation that can, 3
11 r5 r r5 r5
by reading the grammar symbols on the 5
stack, then there is a finite automation that
can, by reading the grammar symbols on Fig 2.29 Parsing table for expression
the stack from top to bottom, determine grammar
what handle, if any, is or top of the stack.
The goto function of an LR parsing table is There is a significant difference between LL
essentially such a finite automation. The and LR grammars.
automation need not, however, read the For a grammar to be LR (k), we must be
stack on every move. The state symbol able to recognize the occurrence of the
stored on top of the stack is the state the right side of a production, having seen all of
handle-recognizing finite automaton would what is derived from that right side with its
be in if it had read the grammar symbols of input symbols of lookahead.
the stack from bottom to top. Thus, the LR This requirement is far less stringent than
parser can determine from the state on that for LL (k) grammars where we must be
6top of the stack everything that it needs to able to recognize the use of a production
know about what is in the stack. seeing only the first k symbols of what its
right side derives.
Another source of information that an LR Thus, LR grammars can describe more
parser can use to help make its shift-reduce languages than LL grammars.
decisions are the next K input symbols. The
cases k = 0 or k = 1 are of practical interest, 2.8 SLR Parsing
and we shall only consider LR parsers with
k≤1 here. For example, the action table in “simple LR” or SLR is a method to construct
Fig. 2.28 uses one symbol of look ahead. A a LR parsing table from a grammar. In SLR
grammar that can be parsed by an LR method, one look ahead information is
parser examining up to k input symbols on used, but we usually omit the “(1)” after the
each move is called an LR(k) grammar. SLR.
2.8.1 LR(0) Items
STAT Action goto
E Id + * ( ) $ E T F An LR(0) item (item for short) of a
0 S S 1 2 3
grammar G is a production of G with a dot
5 4
1 S ac of some position of the right side. Thus
6 c production A → XYZ yields the four items
2 r2 r r2 r2 A → XYZ
7 A → XYZ
3 r4 r r4 r4 A → XYZ
4
A → XYZ
4 S S 8 2 3
5 4 The production A → generates only one
5 r6 r r6 r6 item, A →
6 The first item above indicates that we hope
6 S S 9 3 to see a string derivable from XYZ next on
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
the input. The second item indicates that end
we have just seen on the input a string
derivable from X and that we hope next to Example
see a string derivable from YZ. The central Consider the augmented expression
idea in the SLR method is to first construct grammar
from the grammar a deterministic finite Ep →E
automation to recognize viable prefixes. E →E + T | T
One collection of sets of LR(0) items, which T →T * F | F
we call canonical LR(0) collections, F →(E) | id
provides the basic for constructing SLR If I is the set of one item {[E’ →E]}, then
parsers. To construct the canonical LR(0) closure (I) contains the items
collection for a grammar, we define an E’ →E
augmented grammar and two functions, E →E + T
closure and goto. E→T
T →T * F
2.8.2 Augmented Grammar T →F
F → (E)
If G is a grammar with start symbol S, then F →id
‘G’, the augmented grammar for G, is ‘G’ Here, E’ → E is put in closure (I) by rule (1).
with a new start symbol S’ and production Since there is an E immediately to the right
S’ →S. The purpose of this new starting of dot, by rule (2) we add E productions
production is to indicate the parser when it with dots at the left end, that is, E → E + T
should stop parsing and announce and E → T. Now there is T immediately to
acceptance of the input. That is, acceptance the right of a dot, so we add T →T * F. Next,
occurs when and only when the parser is the F to the right of a dot forces F →id to be
about to reduce S’→S. added. No other item is put into closure (I)
by rule (2).
2.8.3 The Closure Operation
2.8.4 The Go to Operation
If I is set of items for grammar G, then
closure (I) is the set of items constructed The second useful function is goto (I, X)
from I by the two rules: where I is a set of items and X is a grammar
1. Initially, even item in I is added to symbol. Goto (I, X) is defined to be the
closure(I) closure of the set of all items [A →αXβ] such
2. If A →α Bβ is in closure(I) and B → is that [A →α.Xβ] in I.
a production, then add the item B → Intuitively, if I is the set of items that are
valid for some viable prefix , then goto (I,
to I, if it is not already there. We apply
this rule until no more new items can be X) is the set of items that are valid for the
added to closure(I) viable prefix X.
Functional closure (I); Example
begin If I is the set of two items {[E’→E], [E →E +
J : = I; T]}, then goto(I, +) consists of
repeat E →E +T
for each item A →α. Bβ in J and each T→T*F
production B → for G T →F
such that B → is not in J do F → (E)
add B → to J F →id
until no more items can be added to J. We computed goto (I, +) by examining I for
return J items with + immediately to the right of the
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
dot. E’ →E is not such an item, but E → E + T T → .F
is. We moved the dot over the + to get {E → F → (.E)
E + T} and then took the closure of this set. F → .id
I5: F → id.
2.8.5 The Set LR(0) items Construction I6: E → E. + T
T → .T * F
The algorithm to construct C, the canonical T → .F
collection of sets of LR(0) items for an F → (.E)
augmented grammar G’. The algorithm is F → .id
shown below.
Procedure items(G’): I7: T → T * F.
begin F → (.E)
C: {closure ({[S’ → S]})}. F → .id
repeat
for each set of items I in C and each I7: T → T * F.
grammar symbol X
such that goto(I, X) is not empty and not in I8: F → (E.)
C do E → E + .T
add goto(I, X) to C I9: E → .E + T.
until no more sets of items can be added to T → T. * T
C
end I10: T →T*F
I11: F → (E).
Example
Consider the augmented expression The goto function for this set of items is
grammar shown of items is shown as the transition
E’ →E diagram of a deterministic finite
E →E + T | T automation in the following figure
T →T * F | F
F → (E) | id
The canonical collection of sets of LR (0)
items for this grammar is shown below.
I0: E ' .E
E .E T
T .T * F
T .F
F .(E)
F .id
I1: E ' E.
E E. T
I2: E T.
T T.*F
I3: T F.
I4: F (E.)
E → .E + T
E → .T Fig. 2.30 Transition diagram of DFA D
T → .T * F for viable prefixes
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Explanation for point (3) above: Output: the SLR parsing table functions
E1 E (R) action and goto for G’.
T
E E (S) Method:
I1
i. Construct C = {I0, I1.....,In}, the collection
{ I1 state contains Reduce & shift. But it is of sets LR(0) items for G’
not SR conflict, because E1 → is dummy. ii. State I is constructed from Ii. The
So it is not inadequate state. Also E1 →E parsing actions for state i are
shows power of “LR (0)”} determined as follows:
OK NOT OK a.) If [A →αaβ] is in I, and goto (I, a) = I1,
[LR (0)] No SR and RR SR (or) RR conflicts then set action [i, a] to “shift j”. Here
conflicts, then It is LR(0) Present. It’s not LR(0) a must be a terminal.
(or) Given Grammar is b.) If [A →α] is in I1, then set action [i, a]
Ambiguous to “reduce A →α” for all a in
[SLR (1)] No SR (or) Given Grammars is FOLLOW (A);
RR conflicts & No Ambiguous & SR or RR Here A not be S”.
multiple entries in Conflicts are present
parsing table it is SLR c.) If [S’ →S] is in Ii,, then set action [i, $]
(1) to “accept”
[CLR (1) or LR (1)] If any conflicting actions are generated
shown below by the above rules, we say that
[LALR (1)] Looks ahead differ. grammar is not SLR (1). The algorithm
It is not LALR fails to produce a parser in this case.
[LL (1)] it is free of left It is ambiguous iii. The goto transitions for state i are
recursion & Left Grammar is not LL(1) constructed for all nonterminal A using
factoring. It is the rule: if goto (Ii A)=Ii then goto [i,A]
unambiguous
[LR (1)] (or) CLR (1):
=j.
LR (0) + Look a Head iv. All entries not defined by rules (2) and
LR(0) with a 1 look a (3) are made “error”
Head v. The initial state of the parser is the one
LL(2) C-language constructed from the set of items
containing [S’ →S].
2.8.6 Construction of SLR Parsing Table
Example
Now we shall study how to construct the Let us construct the SLR table for the
SLR parsing action and goto functions from following grammar
the deterministic finite automation that E' E
recognizes viable prefixes. Given a E ET|T
grammar G we augment G to produce G’,
T T*F | F
and from G’ we construct G, the canonical
collection of set of item for G’. We construct F (E) | id
action, the parsing action function, and The canonical collection of sets of LR (0)
goto, the goto function, from G using the items for this grammar was already given
following algorithm. above. First consider the set of items I0
E ' .E
Algorithm: constructing an SLR parsing E .E T
Table
E .T
Input: An augmented grammar G’ T .T * F
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
T .F
F .(E) Augment grammar with a new
production S‘’→S’ where S is start
F .id
symbol (serves a similar role to $)
The item F → (E) gives rise to the entry
Initial item is S‘’ → S’
action [0,( ] = shift 4, the item F → id to the
entry action [0, id] = shift 5. Other items in Compute closure and go,to functions
I0 yield no actions now consider I1:
E ' E. Closure : Given a set of items I
E .E T 1. All items in I are initially closure(I)
The first item yields action [1, $] = accept, 2. If A → α.B β is in closure(I) and B →
the second yields action [1, +] = shift 6. γ is a production, add B → .γ to
Next consider I2: closure(I),if not already there
E T. 3. Repeat until no new items are
generated.
T T.* F
Since FOLLOW (E) = {$, +, )}, the first item Example: E→E+T|T
makes action [2, $] = action [2, )] = reduce T→T*F|F
E →T. The second item makes action [2,*] = F → (E) | id
shift 7. Continuing in this fashion we obtain Augmented E’ → E
the parsing action and go to tables that E→E+T|T
were shown in following figure.2.28. T→T*F|F
F → (E) | id
2.8.7 Canonical LR parsing
2.8.9 Building the Canonical LR(0)
SLR = “Simple LR “ – weakest, but Collection : Closure
easiest to implement LR method
SLR grammar is one for which an SLR Initial closure(I) : {[E → .E]}
parser can be constructed Matches A → α.B β (α = β = ϵ )
LR(0) item is a grammar production Add: E → .E + T
with a dot somewhere in RHS. E → .T
E.g. for A → XYZ , have New closure(I):{[E’ →.E],[E → .E + T],[E →
A → .XYZ .T] }
A → X.YZ Final closure(I):{[E’ → .E] , [E → .E + T] , [E
A → XY.Z → .T],[T → .T*F] , [T → .F] , [F → .(E)], [F →
A → XYZ. .id]}
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Example, I = {[E’ → E.], [E → E. + T]} I1: S’ → S , $
Matches: A → α.Xβ
So add [E → E+.T] to goto(I, +) I2: S’ → C C, $
goto(I,+) = {[E → E + .T]} C → cC, $
Then Add closure of [E → E + .T] C → d, $
Go to(I,+) = {[E → E + .T]} [T → .T*F’], [T
→.F’], I3: C → cC, c | d
[F →.(E), [F →.id]]} C → cC, c | d
C → d, c | d
2.8.11 Construction of the Sets of LR(1)
items I4: C → d , c | d
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
2.9 LALR Parsing which have different lookaheads, then
combine these sets of LR(1) items to
Most LR-parser generators use an extended obtain a reduced collection, Ct of sets of
version of the SLR construction called LR(1) items
LALR(1). The “LA” in the abbreviation is There are two ways to construct LALR(1)
short for “lookahead” and the (1) indicates parsing tables. The first (and certainly
that the lookahead is one symbol, i.e., the more obvious way) is to construct the
next input symbol. LR(1) table and merge the sets manually.
If we compare the SLR(1)parser with This is sometimes referred as the "brute
CLR(1) , we can find that CLR(1) parser is force" way. If you don’t mind first
more powerful. finding all the multitude of states required
CLR(1) has a greater number of states than by the canonical parser, compressing the
the SLR(1) parser; hence its storage LR table into the LALR version is
requirement is also greater than the SLR(1) straightforward.
parser.
We can device a parser that is an 1. Construct all canonical LR(1) states.
intermediate between the two; that is the 2. Merge those states that are identical if
parser’s power will be in between of that of the Look a heads are ignored, i.e., two
SLR(1) and CLR(1), and its storage states being merged must have the
requirement will be the same as SLR(1)’s. same number of items and the items
Such a parse LALR(1), will be more useful; have the same core (i.e., the same
since each of its states corresponds to the productions, differing only in
set of LR(1) items, the information about lookahead). The look ahead on merged
the look ahead is available in the state items is the union of the lookahead
itself, making it more powerful than the from the states being merged.
parser. 3. The successor function for the new
LALR(1) state is the union of the
2.9.1Construction of LALR parsing successors of the merged states. If the
table two configurations have the same core,
The steps for constructing LALR parsing then the original successors must have
table is given below. the same core as well, and thus the new
Action Goto state has the same successors.
State on A B $ S X 4. The action and go to entries are
top of
constructed from the LALR(1) states as
stack
0 s36 s47 2
for the canonical LR(1) parser.
1 Acc 2.9.2 LALR table construction
2 s66 s47 5
36 s36 s47 89 S' → S
47 r3 r3 r3 S → XX
5 r1
89 r2 r2 r2
X → aX
X→b
Looking at the configuration sets, we saw
1. Obtain canonical collection of sets of that states 3 and 6 can be merged, so can 4
LR(1) items and 7, and 8 and 9. Now we build this
2. Construct the parsing table by using LALR(1) table with the six remaining state
this reduced collection
3. If more than one set of LR(1) items exist 2.9.3 The more clever approach
in the canonical collection obtained that Having to compute the LR(1) configuration
has identical cores of LR(0)’s out of sets first means we won’t save any time or
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
effort in building an LALR parser. However, I4: S → V =•E, $
the work wasn’t all for naught, because E → •F, $/+
when the parser is executing, it can work E → •E + F, $/+
with the compressed table, thereby saving F → •V, $/+
memory. The difference can be an order of F → •int, $/+
magnitude in the number of states. F → •(E), $/+
However there is a more efficient strategy V → •id, $/+
for building the LALR(1) states called step- I5: S → V = E•, $
by-step merging. The idea is that you E → E• + F, $/+
merge the configuration sets as you go, I6: E → F•, $/+
rather than waiting until the end to find the I7: F → V•, $/+
identical ones. Sets of states are I8: F–> int•, $/+
constructed as in the LR(1) method, but at I9: F → (•E), $/+
each point where a new set is spawned, you E → •F, )/+
first check to see whether it may be merged E → •E + F, )/+
with an existing set. This means examining F → •V, )/+
the other states to see if one with the same F → •int, )/+
core already exists. If so, you merge the F → •(E), )/+
new set with the existing one, otherwise V → •id )/+
you add it normally. I10: F → (E•), $/+
Here is an example of this method in E → E• + F, )/+
action: When we construct state I11, we get
S' → S something we’ve seen before:
S→V=E I11: E → F•,)/+
E→F|E+F It has the same core as I6, so rather than
F → V | int | (E) add a new state, we go ahead and merge
V → id with that one to get:
Start building the LR(1) collection of I611: E → F•, $/+/)
configuration sets as you would normally: We have a similar situation on state I12
I0: S' → •S, $ which can be merged with state I7 . The
S → •V = E, $ algorithm continues like this, merging into
V → •id, = existing states where possible and only
I1: S'→ S•, $ adding new states when necessary. When
I2: S' → V• = E, $ we finish creating the sets, we construct the
I3: V → id•, = table just as in LR(1).
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
parser and it is conflict-free, it implies the 2.10 Error Handling
grammar is LALR(1) and vice-versa.
As in LL(1) parsing tables, we can
Stack Action Go implement error processing for any of the
on top to variations of LR parsing by placing
of
stack Id + * ( ) $ E appropriate actions in the parse table.
0 S3 E1 E1 S2 E2 E1 1
Here is a parse table for a simple arithmetic
expression grammar with error actions
1 E3 S4 E5 E3 E2 Acc
inserted into what would have been the
2 S3 E1 E1 S2 E2 E1 6
blank entries in the table.
3 E3 R4 R4 E3 R4 R4 E –> E + E | E * E | (E) | id
4 S3 E1 E1 S2 E2 E1 7 Error e1 is called from states 0, 2, 4, 5 when
5 S3 E1 E1 S2 E2 E1 8 we encounter an operator. All of these
6 E3 S4 S5 E3 S9 E4 states expect to see the beginning of an
7 E3 R1 S5 E3 R1 R1 expression, i.e., an id or a left parenthesis.
8 E3 R2 R2 E3 R2 R2 One way to fix is for the parser to act as
though Id was seen in the input and
9 E3 R3 R3 E3 R3 R3
shift state 3 on the stack (the successor for
id in these states), effectively faking that
LALR(1) is a subset of LR(1) and a superset the necessary token was found. The error
of SLR(1). A grammar that is not LR(1) is message printed might be something like
definitely not LALR(1), since whatever "missing operand".
conflict occurred in the original LR(1) Error e2 is called from states 0, 1, 2, 4, 5 on
parser will still be present in the LALR(1). finding a right parenthesis where we were
A grammar that is LR(1) may or may not be expecting either the beginning of a new
LALR(1) depending on whether merging expression (or potentially the end of input
introduces conflicts. A grammar that is for state 1). A possible fix: remove right
SLR(1) is definitely LALR(1). A grammar parenthesis from the input and discard it.
that is not SLR(1) may or may not be The message printed could be "unbalanced
LALR(1) depending on whether the more right parenthesis."
precise lookaheads resolve the SLR(1)
conflicts. LALR(1) has proven to be the 2.10.1 The Parser generator Yacc
most used variant of the LR family. The
weakness of the SLR(1) and LR(0) parsers First, a file , say translate. Y containing a
mean they are only capable of handling a Yacc specification of the translator is
small set of grammars. prepared. The UNIX system command yacc
The expansive memory needs of LR(1) translate. Y transforms the file translate.y
caused it to languish for several years as into C program called y .tab. c using the
a theoretically interesting but intractable LALR method outlined in Algorithm. The
approach. It was the advent of LALR(1) that programme y.tab.c is a representation of a
offered a good balance between the power LALR parser written in C, along with other
of the specific lookaheads and table size. C routines that the user may have
The popular tools yacc and bison generate prepared.
LALR(1) parsers and most programming By compiling y.tab.c along with the ly
language constructs can be described with library
an LALR(1)grammar(perhaps with a little
grammar massaging or parser trickery to
skirt some isolated issues).
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
10: F → id Predict(10)= { id }
+ - * / ( ) id $
E 1 1
Q 2 3 4 4
T 5 5
R 8 8 6 7 8 8
F 9 10
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
B→ bC/bB/ S → S#
C → Bx/Cy/z S → dA/aB
Solution. A → bA/C
LL(1) Table: Solution.
a b c z Not taken SR conflict at I1 because S’ → .S is
S S→ aBS { S→ BC } “augmented Grammar”
B B→ϵ B bC / bB B→∈
B
C → Bx C→ Cy C→z
8) A → BCD
B → W/BX
C → YCZ/mB
D → DB/a
Solution.
It is not LL(1)
BX having left recursion
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Check for LR(0) iv) All entries not defined by rules (2) and
Solution. (3)
are made “error”
v) The initial state of the parser is the one
constructed from the set containing
item [
S’ → S, $].
Example:
Consider the following augmented
grammar
S' .S
S .CC
Since, in the above construction NO SR & C .cC,| d
RR conflicts are there,
therefore it is LR(0 STACK Action Goto
C d $ S C
2.10.2 Construction of LR parsing Table 0 s3 s4 1 2
1 acc
2 s6 s7 5
Algorithm : Construction of the canonical
3 s3 s4 8
LR parsing table 4 r3 r3
5 r1
Input : An augmented grammar G’. 6 s6 S7
7 r3
Output: The canonical LR parsing table 8 r2 r2
function action and goto for G’. 9 r2
The canonical parsing table for the
Method: grammar shown below
i) Construct C = {I0, I1..........,Ln}, the collection
of sets of LR(1) items for G’.
ii) State i of the parser is constructed from
Ii. The parsing actions for state i are
determined as follows:
a) If [A →α . α β, b] is in Ii and goto (Ii,
a) = Ij, then set action [i, a] to “shift
j.” Here, a is required to be a
terminal
b) If [A→α , a] is in Ii, A ≠ S’. Then set
action[i, a] to reduce A →α ”
c) If [S’ →S $] is in Ij, then set action [i,
$] to “accept”
If a conflict result from the above rules,
the grammar is said not be LR(1), and
the algorithm is said to fail.
iii) The goto transitions for state i are
determined as follows : If goto (Ii, A) = Ij,
then goto [i, A] = j.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
GATE QUESTIONS
Q.1 Which of the following statements is S → CC
false? C → cc | d
a) An unambiguous grammar has This grammar is
same leftmost & rightmost a) LL(1)
derivation b) SLR(1) but not LL(1)
b) An LL (1) parser is a top-down c) LALR(1) but not SLR(1)
parser d) LR(1) but not LALR(1)
c) LALR is more powerful than SLR [GATE- 2003]
d) An ambiguous grammar can
never be LR (k) for any k Q.6 Which of the following grammar
[GATE- 2001] rules violate the requirements of an
operator grammar? P, Q, R are non-
Q.2 Which of the following suffices to terminals, and r, s, t are terminals.
convert an arbitrary CFC to an LL(1) 1. P→ Q R 2. P → Q S R
grammar? 3. P → Ɛ 4. P → Q t R r
a) Removing left recursion alone a) 1 only b) 1 and 3
b) Factoring the grammar alone c) 2 and 3 d) 3 and 4
c) Removing left recursion and [GATE- 2004]
factoring the grammar
d) None of the above Q.7 The grammar A→AA`| (A) | Ɛ is not
[GATE 2003] suitable for predictive-parsing
because the grammar is
Q.3 Assume that the SLR parser for a a) ambiguous
grammar G has n1 states and the b) left-recursive
LALR parser for G has n2 states. The c) right-recursive
relationship between n1 and n2 is d) an operator grammar
a) n1 is necessarily less than n2 [GATE -2005]
b) n1 is necessarily equal to n2
c) n1 is necessarily greater than n2 Q.8 Consider the following grammar.
d) None of these S →S * E
[GATE- 2003] S→E
E →F + E
Q.4 Consider the grammar shown below E →F
S → i E t S S' | a F →id
S'→ e S | ε Consider the following LR(0) items
E→b corresponding to the grammar above.
in the predictive parser table, M of 1. S →S * E
this grammar, the entries M[S’,e] & 2. E →F. + E
M[S’, $] respectively are 3. E →F +. E
a) {S' → e S) and {S' → ε} Given the items above, which two of
b) { S' → e S} and { } them will appear in the same set in
c) {S' → Ɛ} and {S' → ε} the canonical sets-of-items for
d) {S' → e S, S' → ε} and {S’→ ε} grammar?
[GATE -2003] a) 1 and 2 b) 2 and 3
Q.5 Consider the grammar shown c) 1 and 3 d) None of these
below: [GATE -2006]
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.9 Consider the following grammar Si → eS│ε
S → FR C→b
R → *S|ε The grammar is not LL(1) because
F → id a) it is left recursive
In the predictive parser table M, of b) it is right recursive
thegrammar the entries M[S, id] and c) it is ambiguous
M[R, $] respectively. d) it is not context-free
a) {S → FR) and {R → e) [GATE- 2007]
b) (S → FR) and ( ),
c) {S → F R} and {R → *S} Q.14 Consider the following two
d) (F → id} and {R → Ɛ} statements:
[GATE- 2006] P: Every regular grammar is L L(1).
Q: Every regular set has a LR(1)
Statements tor Linked Answer grammar.
Questions 10 and 11 Which of the following is true?
a) Both P and Q are true
Q.10 Which one of the following b) P is true and Q is false
grammars generates the language c) P is false and Q is true
L={ai bj \ i ≠ j}? d) Both P and Q are false
a) S → AC\CB b)S→ aS|Sb|a|b [GATE -2007]
C→ aC b|a|b
A → aA| ε Statements for Linked Answer
B→ Bb| ε Questions 15 and 16
c) S → AC|CB d) S → AC |CB Consider the CFC with (S. A, B) as the non-
C→ aC b| ε C → aC b | ε terminal alphabet, {a ,b} as the terminal
A → aA| ε A → aA | a alphabet, S as the start symbol and the
B → Bb| ε B → bB | b following set of production rules
[GATE- 2006] S → aB S → bA
B→b A→ a
Q.11 In the correct grammar above, what B → bS A → aS
is the length of the derivation steps B → aBB A→ bAA
starting from S) to generate the
string al bm l ≠ m Q.15 Which of the following strings is
a) max (l, m) + 2 b) l + m +2 generated by the grammar?
c) l + m + 3 d) max (l, m)+3 a) aaaabb b) aabbbb
[GATE -2006] c) aabbab d) abbbba
[GATE- 2007]
Q.12 Which one of the following is a top-
down parser? Q.16 For the correct answer strings to
a) Recursive descent parser how many derivation trees are
b) Operator precedence parser there?
c) An LR(k) parser a) 1 b) 2
d) An LALR(k) parser c) 3 d) 4
[GATE -2007] [GATE-2007]
Q.13 Consider the grammar with non- Q.17 Which of the following describes a
terminals N={S,C,S}, terminals handle (as applicable to LR-parsing)
T = {a, b, i, t, e} with S as the start appropriately?
symbol and the following set of a) It is the position in a sentential
rules form where, the next shift or
S → iCtSS1│a reduce operation will occur
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
b) It is non-terminal whose B→S
production will be used for a B $
reduction in the next step S E1 E2 S →ε
c) It is a production that may be A A →S A→S error
used for reduction in a future
B B →S B →S E3
step along with a position in the
sentential form where, the next
shift or reduce operation will Q.20 The FIRST and FOLLOW sets for the
occur non-terminals A and B are.
d) It is the production,' that will be a) FIRST (A) = {a, b, ε) = FIRST (B)
used for reduction in the next FOLLOW (A) = {a, b}
step along with a position in the FOLLOW (B) = {a, b, $}
sentential form where, the right b) FIRST (A) = {a, b, $}
hand side of the production may FIRST (B) = {a, b, ε }
be found FOLLOW (A) = {a, b}
[GATE-2008] FOLLOW (B) = {$}
c) FIRST (A) = {a, b, ε } = FIRST (B)
Q.18 An LALR(1) parser for a grammar G FOLLOW (A) = {a, b}
can have shift-reduce (S-R) conflicts, FOLLOW (B) =⊘
if and only, if d) FIRST (A) = {a, b,} = FIRST (B)
a) the SLR(1) parser for G has S-R FOLLOW (A) = {a, b}
conflicts FOLLOW (B) = {a, b,}
b) the LR(1) parser for G has S-R [GATE-2012]
conflicts
c) the LR(0) parser for G has S-R Q.21 The appropriate entries for E1, E2
conflicts and E3 are
d) the LALR(1) parser for G has a) E1 : S →aAbB , A →S
reduce-reduce conflicts b) E1 : S→ aAbB , S →ε
[GATE- 2008] E2 : S → bAaB , B → S
E2 : S → bAaB , S →ε
Q.19 The grammar S → aSa│bS│c is E3 : B → S
a) LL(1) but not LR(1) E3 : S →ε
b) LR(1) but not LL(1) c) E1 : S →aAbB , S →ε
c) Both LL(1) and LR(1) d) E1 : A→S , S →ε
d) Neither LL(1 ) nor LR(1) E2 : S →bAaB , S→ε
[GATE- 2010] E2 : B →S , S →ε
E3 : B →S
Statements for Linked Answer E3 : B →S
Questions 20 and 21 [GATE-2012]
For the grammar below, a partial LL (1) Q.22 What is the maximum number of
parsing table is also presented along with reduce moves that can be taken by a
the grammar. Entries that need to be filled bottom-up parser for a grammar
are indicated as E1, E2 and E3. εis the empty with no epsilon and unit production.
string, $ indicates end of input and | (i.e., of type A ε and A →a) to
separates alternate right hand sides of parse a string with n tokens ?
productions a) n/2 b) n-1
S →a A b B | b A a B | ε c) 2n-1 d) dn
A →S [GATE-2013]
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.23 Consider the following tw sets of b) + is right associative, while * is
LR(1) items of an LR (1) grammar. left associative
X→ c.X, c/d X →c.X,$ c) Both + and * are right associative
X →cX, c/d X →c.X,$ d) Both + and * are left associative
[GATE-2014]
X → d, c/d X → .d,$
Which of the following statements Q.26 Which one of the following is True
related to merging of the two sets in at any valid state in shift-reduce
the corresponding LALR parser parsing?
is/are false? a) Viable prefixes appear only at
1. Cannot be merged since look the bottom of the stack and not
ahead are different. inside
2. Can be merged but will result in b) Viable prefixes appear only at
S-R the top of the stack and not
conflict. inside
3. Can be merged but will result in c) The stack contains only a set of
R-R conflict. viable prefixes
4. Cannot be merged since go to on d) The stack never contains viable
c will lead to two different sets. prefixes
a) Only 1 b) Only 2 [GATE-2015]
c) 1 and 4 d) 1, 2, 3 and 4
[GATE-2013] Q.27 Match the following:
(P) Lexical analysis (1) Graph coloring
Q.24 A canonical set of items is given (Q) Parsing (2) DFA minimization
below (R) Register allocation (3) Post-order traversal
s → L. >R (S) Expression evaluation (4) Production tree
Q → R. Codes:
On input symbol < the set has A B C D
a) a shift-reduce conflict and a a) 2 3 1 4
reduce-reduce conflict. b) 2 1 4 3
b) a shift-reduce conflict but not a c) 2 4 1 3
reduce-reduce conflict. d) 2 3 4 1
c) a reduce-reduce conflict but not [GATE-2015]
a shift-reduce conflict.
d) neither a shift-reduce nor a Q.28 Among simple LR (SLR), canonical
reduce-reduce conflict. LR, and look-ahead LR (LALR),
[GATE-2014] which of the following pairs identify
the method that is very easy to
Q.25 Consider the grammar defined by implement and the method that is
the following production rules, with the most powerful, in that order?
two operators ı and + a) SLR, LALR
S →T *P b) Canonical LR, LALR
T →U|T*U c) SLR, canonical LR
P →Q+P |Q d) LALR, canonical LR
Q→Id [GATE-2015]
U→Id Q.29 Which of the following statements
Which one of the following is TRUE? about parser is/ are CORRECT?
a) + is left associative, while * is
I. canonical LR is more powerful
right associative
than SLR.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
II. SLR is more powerful than LALR Q.31 Consider the following grammar:
III. SLR is more powerful than P xQRS
Cononical LR. Q yz | z
a) I only b)II only R w |
S y
c) III only d) II and III only
[GATE-2017] What is FOLLOW (Q)?
a) {R} b) {W}
Q.30 Consider the following expression c) {W,y} d) {W,$}
grammar G: [GATE-2017]
E ET|T
T TF| F
F (E) | id
Which of the following grammars
is not left recursive, but is equivalent
to G?
E ET|T
a) T T F | F
F (E) | id
E TE '
E ' TE ' |
b)
T TF| F
F (E) | id
E TX
X TX |
c) T FY
Y FY |
F (E) | id
E TX | (TX)
d) X TX | TX |
T id
[GATE-2017]
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(a) (c) (b) (d) (d) (b) (b) (c) (a) (d) (a) (a) (a) (a)
15 16 17 18 19 20 21 22 23 24 25 26 27 28
(c) (b) (d) (b) (c) (a) (c) (c) (d) (d) (b) (b) (c) (c)
29 30 31
(a) (c) (c)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (a) M[S',e] {S' ,S' eS}
Statement 1 False: And M[S',$] S
An unambiguous grammar does not
have similar leftmost and rightmost Q.5 (d)
derivation: The grammar C c C | d is a
Statement 2 True: LL (1) grammar canonical LR grammar.
is a top down parsing and hence LL And as we know the following
(1) parser is a top down parser. 1. Canonical grammar is also
Statement 3 True: The sequence known as LR (1) grammar.
with regards to power is 2. SLR (1) grammar is an LR (1)
LLR>LALR>SLR. grammar.
Statement 4 True: An ambiguous 3. Every LALR (1) is not LR (1)
grammar can never be LR (k) for grammar.
any k as they fail to LR.
Q.6 (b)
Q.2 (c)
In P → QR there is no terminal
LL(1) grammar is Top Down
symbol, hence, it violates the
Parsing. Now, we need to avoid the
requirements.
grammar to become ambiguous. For
Also, according to the property of
this, we need to remove left
the grammar, No production side
recursion and left factoring. This
should be ε. Hence, P →ε does not
how an arbitrary CFG can be
follow the requirements.
converted to LL(1) grammar.
Q.7 (b)
Q.3 (b)
The grammar is a left recursive
In deterministic finite automata, the
grammar. The predictive parser does
states of SLR and LALR parser are
not go hand in hand with left
the states of corresponding states.
recursive grammar.
Also, the number of states in SLR
parser = number of states in LALR
Q.8 (c)
parser.
The grammar is
Q.4 (d) S→S*E
The knowledge of the construction S→E
of predictive parser table will help E→F+E
in scoring at least 2 marks. These E→F
types of questions are very S→id
important. LR(0) items corresponding to the
Now, let’s view the predictive parser grammar above.
table (M) i) S→S*E
Non a b e i t $ ii) E → F. + E
terminal iii) E → F + .E
S s→a S→iEiSS
S’ S’→ε S→ε Now, S → S * .E …… (i)
S’→es S → .E …… (ii)
E E→b E → F + .E …… (iii)
From Eq. (i) and (iii),
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
We see that in both the productions, C → aC | b | ε
the Dot is left to E. Hence, belonging A → aA | a
to the same set in canonical set of B → Bb | b
items for the grammar . Consider the production:
S → AC | CB
Q.9 (a) Applying, C → ε
Predictive parser table is to be We get, S → A | B
constructed to get the values Now, Applying A or B, S generates
Non-terminal id * $
S S→FR a^n. b^n. So, L = {ai bj | i ≠ j }
R R→*S R→ε Now, consider the production, S →
F F→id AC | CB. If the production C → acb is
M[S,id] {S FR} applied then, it generates a^n. b^n.
M[R,$] {R } Now, applying either A or B it will
give that number of a ≠ number of b.
Q.10 (d)
In the language, Q.11 (a)
Language L = {ai bj | i ≠ j} Following determine the length of
The expression i ≠ j indicates that the derivation,
language L is a language in which Max. (l,m) + 2 to generate the string
number of a ≠ number of b. a^|.b^m with l≠m.
Here, we need to search for a In the option (d) of the previous
language in which number of a ≠ question, the derivation can be
number of b. given as
Considering the options: 1. S → AC
Option(a) 2. S → aC
S → AC | CB 3. S → aaCb
C → aC b | a | b 4. S→ aab
A → aA | ε Total steps are 4 and therefore, the
B → Bb | ε maximum length of derivation is A.
(a), S → (C) B → a(B) → aBb → ab Q.12 (a)
Here, number of a = number of b, Top-down parsing is a technique to
hence, it is not the required grammar. analyze unknown data relationships.
Option(b) This is done by hypothesizing
S → aS | Sb | a | b general parser tree structures and
S → aS → abB then considering whether the
Here, number of a = number of b, known fundamental structures are
hence, it is not the required compatible with the hypothesis that
grammar. was made earlier.
Option(c) Recursive descent parser is a top
S → AC | CB down parser.
C → aC b | ε
A → aA | ε Q.13 (a)
B → Bb | ε Let’s see the following set of rules
S(C) B → acb B → ab S→iCtSS1|a
Here, number of a = number of b, S1→eS|ε
hence it is not the required C→b
grammar. Step 1 S→iCtSS1|a
Option(d) Step 2 S→iCtSS1|a
S → AC | CB (as given C → b)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
In the second step, we can observe As per the definition of the handle, it
that terminals are i, b and t while is a production p that will be used
the left non-terminal is S. This for reduction in the next step along
implies that second step onwards with a position in the sentential
grammar is left recursive. form where the right hand side of
the production may be found.
Q.14 (a)
Statement P True: The regular Q.18 (b)
grammar is well recognized by LALR(1) item can have S-R (Shift
LL(1) parsers and this how every reduce) conflict on a condition that
regular grammar is LL(1). LR(1) parser for G has S-R (Shift
Statement Q True: Considering the Reduce) conflict.
strengths. The sequence with regards Now, lets understand all the queries
to power is LR>LALR>SLR(in of LALR
grammar). There are three issues which arises
The sequence with regards to power 1. whether LALR is equivalent to
is LR(1) > LL(1). SLR.
This implies that regular grammar is 2. shift/Reduce conflict in the cases
accepted by LR (1) parser and of LR(1) and LALR.
therefore, every regular set has a 3. Reduce/Reduce conflict in the
LR(1) grammar. cases of LR(1) and LALR.
Let’s take an example grammar
Q.15 (c) S’ → S
Let’s solve out to get the string S → Bbb
generated by the grammar. S → aab
S→aB S → bBa
S→aaBB B→a
S→aabB First of all we work the SLR parser.
S→aabbS Generating the items →
S→aabbaB 10: S’ → .S
S→aabbab S → .Bbb
We get the string as aabbab. S → .aab
S → .bBa
Q.16 (b)
B → .a
The string received is aabbab.
11: S’ → S
We can total 2 derivation trees:
12: S → B.bb
Leftmost derivation and rightmost
13: S → a.ab
derivation,
B → a.
14: S → b.Ba
B → .a
15: S → Bb.b
16: S → aa.b
17: S → bB.a
18: B → a.
19: S → Bb.b
|10: S → aab.
|11: S → bBa.
Q.17 (d)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
In the underlined item, there is a Generating the LR(1) items of the
shift/reduce conflict. To avoid this above grammar,
conflict, we work with LR(1) parser. 10: S’ → .S, $
Generating the items of LR(1) S → .aAd, $
parser → S → .bBd, $
10: S’ → .S, $ S → .aBe, $
S → .Bbb, $ S → .bAe, $
S → .aab, $ 11: S’ → S., $
S → .bBa, $ 12: S → a. Ad, $
B → .a, a/b S → a. Be, $
11: S’ → S., $ A → .c, d
12: S → B.bb, $ B → .c, e
13: S → a.ab, $ 13: S → b. Bd, $
B → a., b S → b. Ae, $
14: S → b.Ba, $ A → .c, e
B → .a, a B → .c, d
15: S → Bb.b, $ 14: S → aA.d, $
16: S → aa.b, $ 15: S → aB.e, $
17: S → bB.a, $ 16: A → c. d,
18: B → a., a B → c., e
19: S → Bb.b, $ 17: S → bB. d, $
|10: S → aab., $ 18: S → bA.e, $
|11: S → bBa., $ 19: B → c., d
In the underlined item, we find that A → c., e
there is no shift/reduce conflict. |10: S → aAd., $
Here, we see that LALR = LR(1), |11: S → aBe., $
since no merging was required. |12: S → bBd., $
As well as we find that number of |13: S → aBe., $
state in SLR and LR(1) re same, The underline items are of our
which may not be the same in order interest. We see that when we make
cases. the parsing table for LR(1), we will
Now, Let’s take another example. get something like this…
In this example it is shown that The LR(1) Parsing Table (partly
reduce/reduce conflict may arise in filled)
LALR parser, even if it is not present a………… d e
in LR(1) parser. 11 .
12 .
We are given the following grammar
. .
to parse using LALR parser. First of . .
all, this grammar will be parsed .
using LR(1), and after that it will be 16 ………………. r6 r7
fed to LALR parser. .. . .
S’ → S . . .
. .
S → .aAd
. . .
S → bBd 19 r7 r6
S → aBe .
S → bAe This table on reduction to the LALR
A→c parsing table, comes up in the forms
B→c of
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The LALR Parsing table (partly
filled) Q.20 (a)
a……… D e First (S) = first (aAbB) ∪First (bAaB)
11 . ∪∈
12 . = {a} ∪ {b}∪∈
. . = {a, b, ∈ }
. . First (A) = First S = {a, b, ∈}
. First (B) = First S = {a, b, ∈}
169 ………… r6/r7 r7/r6 Follow(S)=($)∪follow(A)∪follows (B)
.. . . Follow (A) = {a, b}
. . . Follow (B) = Follow (S) = {$, a, b}
. . Q.21 (c)
. . . Entry E for [S, a] = {S → aAbB , S →∈}
19 r7 r6 S → aAbB
. First (aAbB) = {a}
So, we find that the LALR gain S→ aAbB
reduce-reduce conflict whereas, the First (bAaB) = {b}
corresponding LR(1) counterpart Put (S→∈)Production as per follow
was void of it. This is a poof enough (S)
that LALR is less potent than LR(1). Follow (S) = {a, b, $}
But, since we have already proved Hence, (S → ∈) will come in [S , a] ,
that the LALR is void of shift-reduce [S , b] and [S , $]
conflicts (given that the Entry E2 for [S , b]={S→ bAaB ,S → ∈}
corresponding LR(1)is devoid of the Entry E3 for [B , $] = {B → S}
same),whereas SLR (or LR(0)) is not Production (B → S) will come in as
necessarily void of shift-reduce per First (S) = {a , b ,∈ }
conflict, the LALR grammar is more Hence, put B→ S in [B , a] [B , b]
potent than the SLR grammar. First (S) contains ∈ so consider [S , $]
Shift-reduce conflict present in SLR Put [B → S] in [S , $]
⇓
Some of them are solve in….. Q.22 (c)
LR(1) The maximum number of reduce
⇓ moves that can be taken by a
All those solved are preserved in…. bottom-up parser for a grammar
⇓ with no epsilon and unit production
LALR (i.e., of type A →ε and A → a) to
So, we have answered all the queries parse a string with n taken (c) 2n-1
on LALR that we raised intuitively.
Q.24 (d)
Q.19 (c)
S →aSa|bs|C ,
The above grammar is LL(1)
because,
First [aSa] ∩ first [bS] = (a) ∩ (b) = ∅ From above diagram, we can see
First [bS] ∩ first [c] = (b) ∩ (c) = ∅ that there is no shift- reduce or
First [c] ∩ first [aSa] =(c) ∩ (a) = ∅ reduce-reduce conflict.
As the above grammar is LL(1), also
LR(1) because LL(1) grammar is Q.25 (b)
always LR(1) grammar.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
S→T × P
T →U | T ×U
P →Q + P|Q
Q → Id
U→ Id
As the production rule T→T×U is
defined as left recursive rule, so * is
left associate operator.
As the production rule P →Q + P is
defined as right recursive rule, so +
is right associative operator.
Q.29 (a)
Cononical LR is the most powerful
parsers among all the L
R(K)parsers.
Q.30 (c )
E ET|T
T TF| F
F (E) | id
There are 2 left recursion and both
have to be removed .The following is
the conversion
E ' TE ' |
E TE '
T ' FT ' |
Y FT '
F (E) | id
Now by putting E’ as X and T’ as Y
we get option ( c ) which is
E TX
T FY
F (E) | id
X TX
Y FY
Q.31 (c)
Follow (Q) = First (RS)
{{w}} First (S)={w,y}
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3 SYNTAX DIRECTED TRANSLATION
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The value of a synthesized attribute at a Function in semantic rules will often be
node is computed from the values of written as expressions.
attributes at the children of that node in Occasionally, the only purpose of a
the parse tree. semantic rule in a syntax-directed
The value of an inherited attribute is definition is to create a side effect such
computed from the values of attributes semantic rules are written as procedure
at the sibling and parent of that node. calls or program fragments.
Semantic rules set up dependencies They can be thought of as rules defining
between attributes that will be the values of dummy synthesized
represented by a graph. attributes of the non-terminal on the
From the dependency graph, we derive left side of the associated production;
an evaluation order for the semantic the dummy attribute and the := sign in
rules. the semantic rule are not shown.
Evaluation of the semantic rules defines
Example 3.1:
the values of the attributes at the nodes
The syntax-directed definition in fig. 3.2 is
in the parse tree for the input string.
for a desk-calculator program. This definition
A semantic rule may also have side associates an integer-value synthesized
effects, e.g., printing a value of updating attribute called val with each of the non-
a global variable. Of courses an terminals E, T and F. For each E,T and F-
implementation need not explicitly production, the semantic rule computes the
construct a parse tree or a dependency value of attribute value for the non-
graph; it just has to produce the same terminal on the left side from the values of
output for each input string. vat f or the non terminal on the right side
The process of computing the attribute production Semantic Rules
values at the nodes is called annotating L→En Print(E,val)
or decorating the parse tree. E→E1÷T E,val:= E1.val+T.val
E→T E,val:= T.val
3.1.2 Form of a syntax-directed T→T1*F T.val:= T.val×F.val
definition T→F T.val:=F.val
F→(E) F.val:=E.val
In a syntax-directed definition, each F→digit F.val:=digit.lexval
grammar production A ―›α has associated Fig 3.2 syntax-directed definition of a
with it a set of semantic rules of the form simple desk calculator.
b:=f(C1,C2, … Cx) where f is function, and
either The token digit has a synthesized attribute
1. b is a synthesized attribute of A and lexval whose value is assumed to be
C1,C2,.. ,Ck are attributes belonging to supplied by the lexical analyzer. The rule
the grammar symbols of the associated with the production L → En for
production, or the starting non-terminal L is just a
2. b is an inherited attribution of one of the procedure that prints as output the value
grammar symbols on the right side of of the arithmetic expression generated by
the production and C1,C2,……Cx are E; we can think of this rule as defining a
attributes belonging to the grammar dummy attribute for the non-terminal L A
symbols of the production. yacc specification for this desk calculator
In either case, we say that attribute b was presented in fig. 3.2 to illustrate
depends on attributes C1,C2,……Cx an translation during LR parsing
attribute grammar is a syntax-directed In a syntax-directed definition, terminals
definition in which the function in semantic are assumed to have synthesized attributes
rules cannot have side effects. only, as the definition does not provide any
semantic rules for terminals. Values for
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
attributes of terminals are usually supplied also an edge to X.i from Y.y, since X.i
by the lexical analyzer, as discussed. depends on both A.a and Y.y.
Furthermore, the start symbol is assumed
not to have any inherited attributes, unless Example3.2:
otherwise started. Whenever the following production is used
in a parse tree, we add the edges shown in
3.1.3 Dependence Graph: fig 3.3 to the dependency graph.
E E1+E2 E.val:=E1.val+E2.val
If an attribute b depends on an attribute The three nodes of the dependency graph
c, then the semantic rule for b at the marked by • represent the synthesized
node must be evaluated after the attributes E.val, E1.val, and E2.val at the
semantic rule that defines c. corresponding nodes in the parse tree. The
The dependency among the nodes can edge to E.val from E2.val shows that E.val
be depicted by a directed graph called depends on E2.val and the edge to E.val
dependency graph from E2val shows that E.val also depends
Dependency Graph : Direct graph on E2.val the dotted lines represent the
indicating interdependencies among the parse tree and are not part of the
synthesized and inherited attributes of dependency graph.
various nodes in a parse tree. We can create a following dependency
Before constructing dependency graph for graph
a parse tree, we put semantic rule into the
form b=f(C1,C2,……Ck), by introducing a
dummy synthesized attribute b for each
semantic rule that consists of procedure
call. The graph has a node for each
attribute and an edge to the node for b
from the node for c if attribute b depends Fig.3.3 E.val is synthesized from E1.val
on attribute c in more detail, Algorithm to and E2.val
construct dependency graph for each node
in the parse tree do for each attribute a of The synthesized attribute E.val depends on
the grammar symbol at it do construct a E1.val and E2.val hence the two edges one
node in the dependency graph for a for each from E1.val & E2.val
each node n in the parse tree do
for each semantic rule b=f(C1,C2,……Cx) do 3.1.4 Evaluation Order
associated with the production used at n do
for i=1to k do construct an edge from ci to b; Any topological sort of dependency graph
For example, suppose A.a :=f(X.x,Y.y) is a gives a valid order in which semantic rules
semantic rule for the production A XY. must be evaluate
This rule defines a synthesized attribute a4 = real
A.a that depends on the attributes X.x and a5 = a4
Y.y. if this production is used in the parse addtype(id3.entry , a5)
tree, then there will be three nodes A.a,X.x a7 = a5
and Y.y in the dependency graph with an addtype(id2.entry , a7)
edge to A.a fromX.x since A.a depends on a9: = a7 addtype(id1.entry , a9)
X.x, and an edge to A.a from Y.y since A.a A topological sort of a directed acyclic
also depends on Y.y. graph ia any ordering m1, m2,..........mk of the
If the production A XY has the semantic nodes of the graph such that edges go from
rule X.i := g(A.a,Y.y) associated with it, then nodes earlier in the ordering to later nodes;
there will be an edge to X.i from A.a and that is, if m1 → mj, then mi appears before
mj in the ordering.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Inherited attributes, which must be
passed down from parent nodes to
children nodes of the abstract syntax
tree during the semantic analysis, pose
a problem for bottom-up parsing.
Synthesized attributes are used
extensively in practice. A syntax-
directed definition that uses synthesized
attributes exclusively is said to be an S-
attributed definition.
Fig.3.4 Dependency graph for parse tree
Example 3.3:
The figure shows the dependency graph for The s-attributed definition in example 3.1
the statement real id1 , id2 , id3 along with specifies a desk calculator that reads an
the parse tree. Procedure calls can be input line containing an arithmetic
through of as rules defining values of expression involving digits parentheses,
dummy synthesized attributes of the the operator + and the *, followed by a
nonterminal on the left side of the newline character n, and prints the value of
associated production. Each of the semantic the expression for example, given the
rules addtype (id.entry , L.in) associated expression 3*5+4 followed by a newline
with the L productions leads to the creation the program prints the value 19. Figure
of the dummy attribute. contains an annotated parse tree for the
The translation specified by a syntax- input 3*5+4n.
directed definition can be made precise as The output, printed at the root of the tree,
follows. The underlying grammar is used to is the value of E.val at the first child of the
construct a parse tree for the input. The root.
dependency graph is constructed as
discussed above. From a topological sort of
the dependency graph, we obtain an
evaluation order for the semantic rules.
Evaluation of the semantic rules in this
order yields the translation of the input
string.
A syntax-directed definition is said to be
circular if the dependency graph for some
parse tree generated by, its grammar has a
cycle.
3.2 Synthesized attributes Fig.3.5 Annotated parse tree for3*5+4n
A syntax directed definition that uses To see how attribute values are computed,
only synthesized attributes is said to be consider the leftmost bottommost interior
an S-attributes definition. node, which corresponds to the use of the
A parse tree for an S-attributed definition production F digit The corresponding
can be annotated by evaluating semantic
semantic rule, F.val:= digit lexval defines
rules for attributes. the attribute F.val at that node to have the
S-attribute grammars are a class of value 3 because the value of digit lexval at
attribute grammars, comparable with L- the child of this node is 3 similarly, at the
attributed grammars but characterized parent of this F-node, the attribute T.val
by having no inherited attributes at all. has the value 3.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Now consider the node for the production An inherited attribute system may be
T→T*F. The value of the attribute T.val at replaced by an S – attribute system but
this node is defined by it is more natural to use inherited
Production semantic rule attributes.
T →T1*F t.val:=T1.val×F.val In the following example, an inherited
When we apply the semantic rule at this attribute distributes type information to
node, T1.val has the value 3from the left the various identifiers in a declaration.
child and F.val has the value 5 from the
right child. Thus, T.val acquires the value Example3.4: A declaration generated by
15 at this node. the nonterminal D in the syntax directed
The rule associated with the production for definition in fig. consist of the keyword int
the starting nonterminal L→En prints the or real, followed by a list of identifiers. The
value of the expression generated by E. nonterminal T has a synthesized attribute
type, whose value is determined by the
3.2.1 Inherited attributes keyword in the declaration. The semantic
rule l. in:= T. type, associated with
An inherited attribute is one whose production D →TL, sets inherited attribute
value is defined in terms of attributes at L.in. to the type in the declaration. The
the parent and/or siblings. rules then pass this type down the parse
Used for finding out the context in tree using the inherited attribute L.in. rules
which it appears associated with the productions for L call
Inherited attributes are convenient for procedure add type to add the type of each
expressing the dependence of a identifier to its entry in the symbol table
programming language construct on the (pointed to by attribute entry).
context in which it appears. Figure shows an annotated parse tree
For example, we can use inherited for the sentence real id1,id2,id3.
attribute to keep track of whether an The value of L.in at the three L-nodes
identifier appears on the left or right gives the type of the identifiers id1,id2
side of an assignment in order to decide and id3.
whether the address or the value of the These values are determined by
identifier is needed. Although it is computing the value of the attribute
always possible to rewrite a syntax- T.type at the left child of the root and
direction definition to use only then evaluating L.in top-down at the
synthesized attributes, it is often more three L-nodes in the right subtree of the
natural to use syntax-directed root-at each node.
definition with inherited attributes. we also call the procedure addtype to
Possible to use only S-attributes but insert into the symbol table the fact that
more natural to use inherited attributes the identifier at the right child of this
D→TL L.in = T.type
T → real T.type = real
node has type real
T → int T.type = int
L → L1 , id L1.in = L.in; addtype (id.entry , L.in)
L → id addtype(id.entry , L.in)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Fig, 3.6 parse tree with inherited Another simplification found in syntax
attribute in at each node labeled L trees is that chains of single
productions may be collapsed.
3.2.2 Construction of Syntax Trees
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
1. mknode(op, left, right) creates an
operator node with label op and two Figure 3.10 contains a S-attributed
fields containing pointers to left and definition for constructing a syntax tree for
right, an expression containing the operators +
2. mkleaf (id, entry) creates an and -. It uses the underlying productions of
identifier node with pointer to the the grammar to schedule the calls of the
symbol-table entry for the identifier. functions mknode and mkleaf to construct
3. mkleaf (num, vat) creates a number the tree. The synthesized attribute nptr for
node with label of the num and a E and T keeps track of the pointers
field containing value of the number. returned by the function calls.
Example 3.6:
An annotated parse tree depicting the
construction of a syntax tree for the
expression a-4+c is shown in Fig. 3.11 the
parse tree is shown dotted. The parse-tree
Fig.3.9 Syntax tree for a-4+c
nodes labeled by the nonterminals E and T
use the synthesized attribute nptr to hold a
The tree is constructed bottom up.
pointer to the syntax-tree node for the
The function calls mkleaf (id, entrya) expression represented by the non
and mkleaf (num, 4) construct the terminal.
leaves for a The semantic rules associated with the
and 4; the pointers to these nodes are productions T → id and T → num define
saved using p1 and p2. attribute T.ntptr to be a pointer to a new
The call mknode ('-‘, p1, p2) then leaf for an identifier and a number,
constructs the interior node with the respectively. Attributes id, entry and
leaves for a and 4 as children. After two numval are the lexical
more steps, p5 is left pointing to the values assumed to be returned by the
root. lexical analyzer with the tokens id and
num.
3.3.2 A Syntax-Directed Definition for when an expression E is a single term,
Constructing Syntax Trees corresponding to a use
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
be represented as a duplicated sub
trees
Expression a + a*(b - c) + {b – c )* d
make a lef or node if not present,
otherwise return pointer to the existing
node.
The leaf for a has two parents because a
is common to the two sub-expressions a
and a* (b-c). Likewise, both occurrences
of the common sub-expression b-c are
represented by the same node, which
also has two parents.
Fig. 3.11 Construction of a syntax-tree
for a-4+c
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
(1) P1 := mkleaf (id, a); Algorithm 3.1 Value-number method for
(2) P2 := mkleaf (id, a); constructing a node in a dag.
(3) P3 := mkleaf (id, b); Suppose that nodes are stored in an array,
(4) P4 := mkleaf (id, c); as in Fig 3.14, and that each node is
(5) P5 := mknode (‘-‘, p3, p4); referred to by its value number. Let the
(6) P6 := mknode (‘*’, p2, p5); signature of an operator node be a triple
(7) P7 := mknode (‘+’, p1, p6); <op, l, r> consisting of its labels op, left
(8) P1 := mkleaf (id, b); child 1, and right child r.
(9) P1 := mkleaf (id, c);
(10) P1 := mknode (‘-‘, p8, p9); Input: Labels op node 1, and node r.
(11) P1 := mkleaf (id, d);
(12) P1 := mknode (‘*‘, p10, p11); Output: A. node with signature <op, 1, r>.
(13) P1 := mknode (‘+‘, p7, p12);
Fig. 3.13 Instructions for constructing Method: Search the array for a node m
the dag with labels op, left child 1, and right child r.
If there is such a node, return m; otherwise,
When the call mkleaf(id, a) is repeated on create a new node n with labels op, left
line 2, the node constructed by the child 1, right child r, and return n.
previous call mkleaf(id, a) is returned, so p1 - An obvious way to determine if node m
= p2. Similarly, the nodes returned on lines is already in the array is to keep all
8 and 9 are the same as those returned on previously created nodes on a list and
lines 3 and 4, respectively. Hence, the node to check each node on the list to see if it
returned on line 10 must be the same one has the desired signature.
constructed by the call of mknode on line 5. - The search for m can be made more
In many applications, nodes are efficient by using k lists, called buckets,
implemented as records stored in an array, and using a hashing function A to
as in the figure 3.14, each record has a label determine which bucket to search.
field that determines the nature of the - The hash function h computes the
node. We can refer to a node by its index or number of a bucket from the value of
position in the array. The integer index of a op, 1 and r. It will always return the
node is often called a value number for same bucket number, given the same
historical reasons. For example, using value arguments. If m is not in the bucket h
numbers, we can say node 3 has label +, its (op, l, r) then a new node n is created
left child is node 1, and its right child is and added to this bucket, so subsequent
node 2. The fallowing algorithm can be searches will find it there.
used to create nodes for a dag - Several signatures may hash into the
representation of an expression. same bucket number, but in practice we
expect each bucket to contain a small
number of nodes.
Each bucket can be implemented as a
linked list as shown in Fig.3.15.
Each cell in a linked list represents a
node. The bucket headers, consisting of
pointers to the first cell in a list, are
stored in an array. The bucket number
returned by h(op, /, r) is an index into
Fig.3.14 Nodes in a dag for i := i + 10 this array of bucket headers.
allocated from an array
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
evaluate inherited attributes of m;
dfvisit(m)
end;
evaluate synthesized attributes of n
end
Fig. 3.16 Depth-first evaluation order for
attributes at the nodes of a parse tree
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Whenever a reduction is made, the The following figure shows the sequence of
values of the new synthesized attributes moves made by the parser on input
are computed from the attributes 3* 5 + 4n.
appearing on the stack for the grammar The contents of the state and val fields of
symbol on the right side of the reducing the parsing stack are shown after each
production. move.
Following figure shows an example of a Consider the sequence of events on
parser for one attribute value. seeing the input symbol 3.
State Stack value stack - The first move, the parser shifts the
... ... state corresponding to the token digit
X Xx (whose attribute value is 3) onto the
Y Yy stack.
Z Zz
- On the second move the parser reduces
... ...
by the production F digit and
implements the semantic rule F. val :=
Fig.3.17 Parser stack with a field for
digit. texval.
synthesized
- On the third move the parser reduces
Attributes
by T F. No code fragment is
associated with this production, so the
The current top of the stack is indicated
val array is left unchanged.
by the point tap.
Note that after each reduction the top
We assume that synthesized attributes
of the val stack contains the attribute
are evaluated just before each
value associated with the left side of the
reduction. Suppose the semantic rule
reducing production.
A.a:=f(X.x, Y.y, Z.z) is associated with
the production.
Input State Val Production
A XYZ. Used
Before XYZ is reduced to A, the value of 3*5+4n ,, ,,
the attribute Z.z is in val[top], that of *5+4n 3 3
Y.y in val[top-1] & that value of X-x in *5+4n F 3 F →digit
val[top - 2]. *5+4n T 3 T →F
5+4n T* 3-
If a symbol has no attribute, then the +4n T*5 3.5
corresponding entry is undefined. +4n T*F 3.5 F →digit
After the reduction ,top is decremented +4n T 15 T →T*F
by 2, the state covering A is put in state +4n E 15 E→T
[ top] (i.e., where X was ) , and the value 4n E+ 15-
of the synthesized attribute A.a is put in N E+4 15-4
N E+F 15-4 F→digit
val [top]. N E+T 15-4 T→F
N E 19 E→E+T
Example: Consider the syntax-directed En 19-
definition of the desk calculator in the L 19 L→En
following figure
Production Code Fragment 3.4 Translation Schemes
L →En Print(val[top])
E →E1 + T Val[ntop] := val[top-2] + val[top]
A translation scheme is a CFG where
E →T semantic actions occur within the rhs of
T →T1*F Val[top] := val[top-2] × val[top] production
T →F A translation scheme to map infix to
F →(E) Val[ntop] := val[top-1] postfix
F →digit F.val := digit.lexval
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
A translation scheme is a context-free When designing a translation scheme, We
grammar in which attributes are must observe some restrictions to ensure
associated with the grammar symbols that an attribute value is available when an
and semantic actions enclosed between action refers to it These restrictions,
braces {} are inserted within the right motivated by L-attributed definitions,
sides of productions. ensure that an action does not refer to an
attribute that has not been computed.
Fig.3.18 A non-L-attributed syntax The easiest case occurs when only
directed definition synthesized attributes are needed. For this
case, we can construct the translation
We shall use translation schemes in this scheme by creating an
chapter as a useful notation for specifying action consisting of an assignment for each
translation during parsing. semantic rule, and placing this action at the
E→TR end of the right side of the associated
R → addop T{print (addop.lexeme)}R1| production.
T → num {print (num.val)} For example, the production and semantic
Parse tree for 9 – 5 + 2; rule
Production Rule Semantic Rule
T → T1 * F T.val := T1.val × F.val
yield the following production and
semantic action
T → T1*F {T.val := T1.val × F.val}
If we have both inherited and synthesized
attributes, we must be more careful:
1. An inherited attribute for a symbol on
Fig 3.20 the right side of a production must be
Assume actions are terminal symbols computed in an action before that
Perform depth first order traversal to symbol.
obtain 9 - 5 + 2 2. An action must not refer to a
When designing translation scheme , synthesized attribute of a symbol to
ensure attribute value is available when the right of the action.
referred to 3. A synthesized attribute for the non-
In case of synthesized attribute is terminal on the left can only be
arrived it is trival computed after all attributes it references
Figure 3.20 shows the parse tree for the have been computed. The action
input 9-5+2 with each semantic action computing such attributes can usually
attached as the appropriate child of the be placed at the end of the right side of
node corresponding to the left side of their the production.
production. In effect, we treat actions as In the next two sections, we show how a
though they are terminal symbols, a translation scheme satisfying these three
viewpoint that is a convenient mnemonic requirements can be implemented by
for establishing when the actions are to be generalizations of top-down and bottom-up
executed. We have taken the liberty of parsers.
showing the actual numbers and additive The following translation scheme does not
operator in place of the tokens num and satisfy the first of these three
addop. When performed in depth-first requirements.
order, the actions in Fig. 3.20 print the S → A1 A2{ A1.in : = 1; A2.in :=2}
output 9-5+2. A → a {print(A.in)}
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
We find that the inherited attribute A, in B2 {B.ht:= max(B1.ht, B2.ht)}
the second production is not defined when B→ {B1.ps:= B.ps}
an attempt is made to print its value during Sub {B2.ps:= shrink(B.ps)}
a depth-first traversal of the parse tree for B2 {B.ht:= disp(B1.ht, B2.ht}
the input string a. That is, a depth-first B →text{B.ht:= text.h × B.ps}
traversal starts at S and visits the sub trees
for A1 andA2 before the values of A1 and Fig.3.21 Translation scheme constructed
A2.in are set. If the action defining the from Fig. 3.20
values of A1.in and A2.in is embedded
before the A's on the right side of S→ A1 A2, Following the three requirements given
instead of after, then A.in will be defined above.
each time print (A.in) occurs. It is always For readability, each grammar symbol in a
possible to start with an L-attributed production is written on a separate line
syntax-directed- definition and construct a and actions are shown to the right Thus,
translation scheme that satisfies the three S → {B.ps:= 10} B {S.ht := B.ht}
requirements above. The next example is written as
illustrates this construction. Given the S → {B.ps := 10}
input B {S.ht:= B.ht}
E sub 1 .val Note that actions setting inherited
EQN places E, 1, and .val in the relative attributes
positions and sizes shown in Fig. 3.19 E1.ps and B2.ps appear just before B1 and B2
Notice that the subscript 1 is printed in a on the right sides of productions.
smaller size and font, and is moved down Differences between S-attributed and L-
relative to E and .val. attributed definition:
S-attributed L-attributed
1. It uses synthesized 1. It uses both synthesized
attributes only and inherited attributes
Fig. 3.19 Syntax-directed placement of only
boxes After putting actions in the right 2. Semantic actions 2. Each inherited attribute
place can be placed at the is restricted to inherit
Production Semantic Rules end of rules either from parent/left
sibling
S→B B.ps := 10
3. Attribute can be 3. Semantic actions can be
S.ht := B.ht evaluated daring placed anywhere of R.H.S
B → B 1B 2 B1.ps := B.ps "BUP"
B2.ps := B.ps 4. Both BUP, TDP 4. Attributes are evaluated
B.ht := max(B1.ht,B2.ht) plays role by traversing the parse
tree depth first left to
B → B1 sub B2 B1.ps := B.ps
right
B2.ps := shrink (B.ps) Fig.3.22 Bottom-Up Evaluation of
B.ht := disp(B1.ht, B2.ht) Inherited
B.ht := text.h × B.ps
Attributes
B → text
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
many (but not all) L-attributed definitions Since the value of X.s is already on the
based on LR(1) grammars. The method is a parser stack before any reductions take
generalization of the bottom-up translation place in the sub-tree below Y, this value can
technique. be inherited by Y. That is, if inherited
attribute Y.i is defined by the copy rule Y.i
3.4.1 Removing Embedding Actions := X.s, then the value X.s can be used where
from Translation Schemes Y.i is called for.
Examples real p, q, r
We needed to embed actions at various Then we show how the value of attribute T.
places within the right side. To begin our type can be accessed when the productions
discussion of how inherited attributes can for L are applied. The translation scheme
be handled bottom up, we introduce a we wish to implement is
transformation that makes all embedded
actions in a translation scheme occur at the
right ends of their productions.
The transformation inserts new marker
nonterminals generating e into the base
grammar. We replace each embedded
action by a distinct marker non- terminal M
and attach the action to the end of the
production M. For example, the translation Fig. 3.23.At each node for L, L.in = T.type
scheme
E→TR D → T {L.in:= T.type}L
R → T {Print (‘+’)} R | - T {Print (‘-‘)} R | ∈ T → int {T.type:= integer}
T → num {print(num.val)} T → real {T.type:= real}
is transformed using marker nonterminals L → {L1.in:= L.in}
M and N into L1, id {addtype (id.entry,L.in)}
E→TR L → id {addtype(id.entry, L.in)}
R→+TMR|-TNR|∈ If we ignore the actions in the above
T → num {print(num.val)} translation scheme, the sequence of moves
M → ∈ {print('+')} made by the parser on the input of fig.3.23
N → ∈ {print('-')} is as in the following Fig.3.24 For clarity,
The grammars in the two translation we show the corresponding grammar
schemes accept exactly the same language symbol instead of a stack state and the
and, by drawing a parse tree with extra actual identifier instead of the token id.
nodes for the actions, we can show that the INPUT STATE PRODUCTIO
actions are performed in the same order. N USED
Real p, q, r -
Actions in the transformed translation
scheme terminate productions, so they can p, q, r Real
be performed just before the right side is p, q, r T T → rel
, q, r Tp
reduced during bottom-up parsing. , q, r TL L → id
3.4.2 Inheriting Attributes on the , q, r TL,
,r TL, q
Parser Slack
,r TL L → L, id
R TL,
A bottom-up parser reduces the right side TL, r
of production A→XY by removing X and Y TL L → L, id
from the top of the parser stack and D D → TL
replacing them by Y Suppose X has a Fig.3.24
synthesized attribute X.s.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
The parser stack is implemented as a pair D → L: T
of arrays state and val. If state |i| is for T → integer | char
grammar symbol X, then vat [i] holds a L → L, id | id
synthesized attribute X.s. The contents of Since identifiers are generated by L but the
the state array are shown in Fig.3.24 Note type is not in the subtree for L, we cannot
that every time the right side of a associate the type with an identifier using
production for L is reduced in Fig.3.24 T is synthesized attributes alone. In fact, if
in the stack just below the right side. We nonterminal L inherits a type from T to its
can use this fact to access the attribute right in the first production, we get a
value T type, syntax-directed definition that is not L-
The implementation in Fig.3.25 uses the attributed, so translations based on it
fact that attribute T type is at a known cannot be done during parsing.
place in the val stack, relative to the top. Let A solution to this problem is to restructure
top and ntop be the indices of the top entry the grammar to include the type as the last
in the stack just before and after a element of the list of identifiers:
reduction takes place, respectively. From D → ML
the copy rules defining L.in, we know that L → , id L | : T
tt3rpe can be used in place of L.in. T → integer | char .
When the production L → id is applied, Now, the type can be carried along as a
entry is on top of the val stack and T.type is synthesized attribute L.type. As each
just below it. identifier is generated by L, its type can be
Hence, addtype(val[top], val[top - 1] is entered into the symbol table.
equivalent to addtype(id.fntry, T.type).
Similarly, since the right side of the 3.4.4 A Difficult Syntax-Directed
production L. → L, id has three symbols, Definition
T.type appears in val [top -3] when this
reduction takes place. The copy rules Algorithm 3.3, for implementing inherited
involving L.in are eliminated because the attributes during bottom-up parsing,
value of T.type in the stack is used instead. extends to some, but not all, LR grammars.
The L-attributed definition in Fig 3.26 is
PRODUCTION CODE FRAGMENT based on a simple LR(1) grammar, but A
D → TL cannot be implemented, as in, during LR
T → int Val[ntop]:= integer
parsing. Nonterminal L in L → ∈ inherits
T → real Val[ntop] := rwal
L → L, id Addtype (val[top], val[top-3]) the count of the number of l's generated by
L → id Addtype (val[top], val[top-1]) S. Since the production L → ∈ is the first
Fig.3.25 The value of T.type is used in that a bottom-up parser would reduce by,
place of L.in the translator at that time cannot know the
number of 1's in the input.
3.4.3 Replacing Inherited by
Synthesized Attributes PRODUCTION CODE FRAGMENT
S→L L.count := 0
It is sometimes possible to avoid the L → L1 1 L1. count := L count + 1
use of inherited attributes by changing L→∈ Print (L.count)
the underlying grammar. Fig.3.26 Ditneult syntax-directed
For example, a declaration in Pascal can definition
consist of a list of identifiers followed by a
type, e.g., m, n; integer. A grammar for such 3.5 Top-Down Translation
declarations may include productions of
the form In this section, L-attributed definitions will
be implemented during predictive parsing.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
We work with translation schemes rather {Rt.i:= Ri – T.val}
than syntax-directed definitions so that we A similar action adds 2 to the value of 9-5,
can be explicit about the order in which yielding the result R.i - 6 at the bottom
actions and attribute evaluations take node for R. The result is needed at the root
place. We also extend the algorithm for left- as the value of E.val, the synthesized
recursion elimination to translation attribute s for R, not shown in Fig. 3.29, is
schemes with synthesized attributes. used to copy the result up to the root.
For top-down parsing, we can assume that
3.5.1 Eliminating Left Recursion from a an
Translation Scheme action is executed at the time that a symbol
in
Since most arithmetic operators associate the same position would be expanded.
to the left, it is natural to use left-recursive Thus,in
grammars for expressions. We now extend the second
the algorithm for eliminating left recursion E → T {R.i := T.val}
in to allow for attributes when the R {E.val.:= R.s}
underlying grammar of a translation R→+
scheme is transformed, The transformation T {R1.i := R.i + T.val)
applies to translation schemes with R1 {R.s := R1.s}
synthesized attributes. The next example R → --
motivates the transformation. T {R1.i:= R.i-T.val}
R1 {R.s:= R1.s }
Example 3.10: The new scheme produces T→(
the annotated parse tree of Fig. 3.29 for the E
expression 9-5+2. The arrows in the figure ) {T.val := E.val}
suggest a way of determining the value of T → num {T.val := numval}
the expression.
E → E1 + T {E.val:= E1.val + T.val} Fig.3.28 Transformed translation
E → E1 - T {E.val:= E1.val - T.val} scheme with right-recursive grammar
E→T {E.val:= T.val}
T→ (E) {T.val:= E.val}
T→ num {T.val:= num. val}
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
An inherited attribute of a symbol must be Fig 3.30 two ways of computing an
computed by an action appearing before attribute value
the symbol and a synthesized attribute of
the non-terminal on the left must be Example 3.11: If the syntax-directed
computed after all the attributes it depends definition in Fig.3.30 for constructing syntax
on have been computed. trees is converted into a translation scheme,
In order to adapt other left-recursive then the productions and semantic actions
translation schemes for predictive parsing, for E become:
we shall express the use of attributes RA E→E1+T{E.nptr:=mknode(‘+’,E1.nptr,
and R.s in Fig. 3.28 more abstractly. T.nptr)}
Suppose we have the following translation E→E1-T{E.nptr: = mknode(‘-’, E1.nptr,
scheme T.nptr)}
A → A1 Y {A.a:= g(A1.a, Y.y) } E → T {E.nptr:= T.nptr}
A→ x {A.a := f(X.x) When left recursion is eliminated from this
Each grammar symbol has a synthesized translation scheme, nonterminal E
attribute written using the corresponding corresponds to A and the strings +T and -T
lower case letter, and f and g are arbitrary in the first two productions correspond to
functions. The generalization to additional Y; nonterminal T in the third production
A-productions and to productions with corresponds to X. The transformed
strings in place of symbols X and Y can be translation scheme is shown in Fig. 3.27.
done as below. E→ T {R.i := T.nptr}
The algorithm for eliminating left recursion R {E.nptr:= R.s}
constructs the following grammar R→+
A→ X R T {R1.i := mknode (‘+', R.i , T.nptr}
R→ Y R | R1→ -
Taking the semantic actions into account, T {R1 { R1.i:= mknode (`-', R.i , T.nptr}
the transformed scheme becomes R1 {R.s := R1.s}
A→ X {R.i.:= f(X.x)} R → ∈ {R.s := R1.s}
R { A.a:= R.s} T→ (
R → Y {R1.i:= g{ R.i, Y..y)}. E
R1 {R.s := R1.s} ) {T.nptr := E.nptr}
R → {r.s := R.i} T → id {T.nptr := mkleaf (id, id.entry)}
The transformed scheme uses attributes i T → num {T.nptr := mkleaf (num, num.val)}
and s for R, as in Fig.3.28. To see why the
result are the same, consider the two Fig.3.30 transformed translation
annotated parse trees in Fig.3.30. The value scheme for constructing syntax trees
of R.i at the bottom is passed up unchanged
as R.s, and it becomes the correct value of Figure 3.30 shows how the actions in
A.a at the root (R.s is not shown Fig.3.29 construct a syntax tree for a-4+c.
in Fig.3.30 (b)). Synthesized attributes are shown to the
right of the node for a grammar symbol,
and inherited attributes are shown to the
left. A leaf in the syntax tree is constructed
by actions associated with the productions
T → id and T → num, At the leftmost T,
attribute T.nptr points to the leaf for a. A
pointer to the node for a is inherited as
attribute. R.i on the right side of E →TR.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
When the production R → TR1, is applied at non-terminal has just one synthesized
the right child of the root, R.i points to the attribute. The function for A has a local
nodefor a, and T.nptr to the node for 4. The variable for each attribute of each
node for a-4 is constructed by applying grammar symbol that appears in a
mknode to the minus operator and these production for A.
pointers. Finally, when production R → ϵ is 2. The non-terminal for A decides what
applied, R.i points to the root of the entire production to use based on the current
syntax tree. The entire tree is returned input symbol
through the s attributes of the nodes for R 3. The code associated with each
(not shown in Fig.) until it becomes the production does the following, we
value of E.nptr. consider the tokens, non-terminals, and
actions on the right side of the
production from left to right.
i. For token X with synthesized
attribute x, save the value of x in the
variable declared for X, x. Then
generate a call to match token x and
advance the input.
ii. For non-terminal B, generate an a
assignment c := B(b1,b2...bk)with. a
Fig.3.32 Use of inherited attributes to function call on the right side, where
construct syntax trees b1, b2....bk are the variables for the
inherited attributes of band c that is
3.5.2 Design of a Productive Translator the variable for the synthesized
attribute of B.
The following algorithm generalizes the Iii. For an action, copy the code into the
construction of predictive parsing to parser, replacing each reference to
implement a translation scheme based on a an attribute by the variable for that
grammar suitable for top-down parsing. attribute.
Algorithm: Construction of a predictive 3.5.3 Activation Records
syntax directed translator.
Information needed by a single-
Input: A syntax directed translation execution of a procedure is managed
scheme with an underlying grammar using a contiguous block of storage
suitable for predictive parsing. called an activation record consisting of
the collection of fields shown in
Output: Code for a syntax directed Fig.3.38.
translator. Not all languages, nor all compilers use
Method: the technique is a modification of all of these fields; often registers can
the predictive parsing construction. take the place of one or more of them.
1. For each non-terminal A, construct a For languages like Pascal and C it is
function that has a formal parameter for customary to push the, activation
each incited attribute of A and that record of a procedure on the run-time
returns the values of the synthesized stack when the procedure is called and
attributes of a (possibly as a record, as a to pop the activation record off the
pointer to a record with a field for each stack when control returns to the caller.
attribute, or using the call-by-reference The purpose of the fields of an
mechanism for passing parameters). activation record is as follows, starting
For simplicity, we assume that each from the field for temporaries.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
can be determined at compile time. An
1. Temporary values, such as those arising exception occurs if a procedure may have a
in the evaluation of expressions, are local array whose size is determined by the
stored in the field for temporaries. value of an actual parameter, available only
2. The field for local data holds data that is when the procedure is called at run time.
local to an execution of a procedure.
The layout of this field is discussed 3.6 Run Time Environment
below.
3. The field for saved machine status holds When one start running the program
information about the state of the then some data is only available at run
machine just before the procedure is time, so we must relate the static text of
called. This information includes the a program to the actions that must
values of the program counter and occur at run time to implement the
machine registers that have to be program.
restored when control returns from the We need to understand the relationship
procedure. between names and data objects
4. The optional access link is used to refer (address and value).
nonlocal data held in other activation Allocation and de-allocation is managed
records. For a language like Fortran by run time support package , which
access links are not needed because consists of routines loaded with the
nonlocal data is kept in a fixed place. generated target code.
Access links, or the related "display" Each execution of a procedure is
mechanism, are needed for Pascal. referred to as an activation of the
5. The optional control link points to the procedure.
activation record of the caller. If the procedure is recursive, several of
returned value its activations may be alive at the same
actual parameters time.
optional control link
optional access link 3.6.1 Procedures
saved machine status
local data A procedure definition is a declaration
Temporaries that associates an identifier (procedure
Fig. 3.33 A general activation record name) with a statement (procedure
body).
6. The field for actual parameters is used When a procedure name appears in an
by the calling procedure to supply executable statement, it is called at that
parameters to the called procedure. We point.
show space for parameters in the Formal parameters are the one that
activation record, but in practice appear in declaration. Actual
parameters are often passed in machine parameters are the one that appear in
registers for greater efficiency. when a procedure is called.
7. The field for the returned value is used
by the called procedure to return a 3.6.2 Activation Trees
value to the calling procedure. Again, in
practice this value is often returned in a Control flows sequentially
register for greater efficiency. Execution of a procedure starts at the
The sizes of each of these fields can be beginning of body
determined at the time when a procedure It returns control to place where
is called. In fact, the sizes of almost all fields
procedure was called from
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
A tree can be used, called an activation
tree, to depict the way control enters - The above activation tree is for a
and leaves activations program to sort numbers.
The root represents the activation of - It calls read function(r). Then it calls the
main program quick sort function(q (1,9)), which in
Each node represents an activation of turns calls partition function.(p (1,9)),
procedure which partitions the array and returns
- The node a is parent of b if control flows index 4 and result in further calls as
from a to b shown
- The node a is to the left of node b if The flow control of the call quicksort(1
lifetime of a occurs before b Each ,9) would be like
execution of a procedure body is - execution begins,
referred to as an activation of the - enter readarray
procedure. Life time of an activation of - exit readarray
a procedure `p' is the sequence of stops - enter quicksort(1,9)
during the execution of a procedure - enter partition(1,9)
body. If 'a' and 'b' are procedure - exit partition(1,9)
activations then their lifetimes are - enter quicksort(1,3)
either non-overlapping or nested. A - exit quicksort(1,3)
procedure Is recursive if a new - enter quicksort(5,9)
activation can begin before an earlier - exit quicksort(5,9)
activation of the same procedure has - exit quicksort(1,9)
ended. An activation tree depicts the -
way control enters & leaves activations. 3.6.4 Control Stacks
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
applies when the name appears in the text maintain efficient access to appropriate
of a program. The portion of the program data and machine resources.
to which a declaration applies is called
scope of that declaration. There are two 3.6.9 Subdivision of Runtime Memory
types of scopes: LOCAL and NON LOCAL. At Runtime storage can be subdivided to hold:
compile time, symbol table can be used to i. Target code: static (as its size can be
find the declaration that applies to an determined at compile time)
occurrence of a name a symbol table entry ii. Static data object: static
is created whenever a declaration is iii. Dynamic data object: heap
encountered. iv. Automatic data objects: stack
Advantage of statically allocating data
3.6.6 Binding of Names object is that their address can be complied
into the target code.
The same name may denote different data
objects (storage locations) at runtime. The Static Allocation
term environment refers to a function that
maps a name to a storage location and the Statically allocated names are bound to
association is referred to as binding. The storage at compile time. Storage bindings of
state refers to a function that maps a statically allocated names never change, so
storage location to the value held there. even if a name is local to a procedure, its
name is always bound to the same storage.
The compiler uses the type of a name
(retrieved from the symbol table) to
determine storage size required. The
required number of bytes (possibly aligned)
is set aside for the name. The address of the
An assignment changes the state but not storage is fixed at compile time.
the environment. A binding is the dynamic
counterpart of a declaration. A local name Limitations:
in a procedure can have different binding in
The size required must be known at
each activation of a procedure.
compile time.
3.6.7 Static and Dynamic Notations Recursive procedures cannot be
implemented as all locals are statically
Static Notation Dynamic Counterpart
Definition of a Activations of a
allocated.
procedure procedure No data structure can be created
Declaration of a name Binding of the name dynamically as all data is static
Scope of a declaration Lifetime of a binding
Stack-dynamic allocation
Cade
Static Data
Storage is organized as a stack.
Heap
↓ Activation records are pushed and
↑ popped.
Stack Locals and parameters are contained in
the activation records for the call.
3.6.8 Storage Organization This means locals are bound to fresh
storage on every call.
Most block-structured languages require If we have a stack growing downwards,
manipulation of runtime structures to we just need a stack_top pointer.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
To allocate a new activation record, we
just increase stack_top. Call by reference:
To deallocate an existing activation
record, we just decrease stack_top. In call-by-reference evaluation (also
referred to as pass-by-reference), a
Heap Allocation function receives an implicit reference to a
variable used as argument, rather than a
Some languages do not have tree- copy of its value. This typically means that
structured allocations. In these cases, the function can modify the variable used
activations have to be allocated on the as argument.
heap. This allows strange situations, like
callee activations that live longer than their Example:
callers’ activations. This is not common procedure swap(var x, y: integer);
Heap is used for allocating space for objects var
created at run time. For example: nodes of temp: integer;
dynamic data structures such as linked lists begin
and trees temp := x;
x:= y;
3.6.10 Parameter passing techniques y := temp;
Call by value: end;
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
environment undefined depending on
which of the aliased arguments is copied
back first.
The result of call by reference and call by
copy-restore also differs when there is an
exception during execution of the function.
When the reference is passed to the collie
uninitialized, this evaluation strategy may
be called call-by-result.
Call by name:
Call by need:
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
GATE QUESTIONS
Q.1 In a bottom-up evaluation of a a) X = Y + Z
syntax directed definition, inherited b) t1 = Y + Z; X = t1
attributes can c) t1 = Y; t2 = t1 + Z; X = t2
a) always be evaluated d) t1 = Y; t2 = Z; t3 = t1 + t2 ; X = t3
b) be evaluated only, if the [GATE- 2003]
definition is- L-attributed
c) be evaluated only, if the Q.4 Consider the grammar rule E → E1 -
definition has synthesized E2 for arithmetic expressions. The
attributes code generated is targeted to a CPU
d) never be evaluated having a single user register. The
[GATE-2003] subtraction operation requires the
first operand to be in the register. If
Q.2 Consider the translation scheme E1 and E2 do not have any common
shown below: sub-expression, in order to get the
S→RT shortest possible code.
R → + T {print (‘+ ' ;) R|Ɛ a) E1 should be evaluated first
T → num {print {num.val);} b) E2 should be evaluated first
Here, num is a token that represents c) Evaluation of E1 and E2 should
an integer and num.val represents necessarily be interleaved
the corresponding integer value. d) Order to evaluation of E1 and E2
For an input string '9 + 5 + 2', this is of no consequence
translation scheme will print [GATE -2004]
a) 9 + 5 + 2 b) 9 5 + 2 +
c) 9 5 2 + + d) + + 9 5 2 Q.5 Consider the following translation
[GATE -2003] scheme :
S → FR
Q.3 Consider the syntax directed R → *E{print (‘*’);R|ε
definition shown below E → F + E{print (‘+'); |F
S → id = E F → (S)|id {print (id.value);}
{gen (id.place = E.place;);} Here, id is a token that represents an
E → E1 + E2 {t = newtemp ( ); integer and id value represents, the
gen (t = E1.place + E2. Place;); corresponding integer value. For an
E.place = t;} input '2*3+4', this translation
E → id scheme prints
{E.place = id.place;} a) 2 * 3 + 4 b) 2 * + 3 4
Here, gen is a function that c) 2 3 * 4 + d) 2 3 4 + *
generates the output code, and [GATE -2006]
newtemp is a function that returns
the name of a new temporary Q.6 In a simplified computer, the
variable on every call. Assume that instructions are OP Ri, Rj ─ Performs
t1's are the temporary variable Rj OP Rj and stores the result in
names generated by newtemp. For register Ri. OP m, Rj ─ Performs val
the statement 'X = Y + Z’, the 3- OP Rj and stores the result in Rj . val
address code sequence generated by denotes the content of memory
this definition is location m.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
MOV m, Ri ─ Moves the content of
memory location m to register Ri
MOV Ri, m ─ Moves the content of
register Rj to memory location m.
The computer has only two
registers, and OP is either ADD or
SUB. Consider the following basic
block: d) None of these
t1= a + b [GATE -2011]
t2= c + d
t3=e – t2 Q.8 Consider the expression tree shown.
t4=t1 – t3 Each leaf represents a numerical
Assume that all operands are value, which can either be 0 or 1.
initially in memory. The final value Over all possible choices of the
of the computation should be in values at the leaves, the maximum
memory. What is the minimum possible value of the expression
number of MOV instructions in the represented by the tree is ___.
code generated for this basic block?
a) 2 b) 3
c) 5 d) 6
[GATE -2006]
c)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.11 Consider the following Syntax Using the above SDTS, the output
Directed Translation Scheme printed by a bottom-up parser, for
(SDTS), with non-terminals {S, A} the input aab is:
and terminals {a, b}. a) 1 3 2 b) 2 2 3
S → aA {print 1} c) 2 3 1 d) syntax error
S → a {print 2} [GATE -2016]
A → Sb {print 3}
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11
(c) (b) (c) (b) (d) (b) (b) 6 (c) 9 (c)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (c) Q.3 (c)
Syntax directed definitions can be To generate the three addresses, we
understood as below: must know the general form is the
A syntax-directed definition is a three addresses.
generalization of a context-free X: Y op Z
grammar in which each grammar Where, X, Y and Z represents
symbol has an association with a set name/contents/ compiler generated
of attributes. temporaries.
This set of attributes for a grammar Op represents operator (fixed or
symbol is partitioned into two floating point arithmetic operator or
subsets called synthesized and logical operator)
inherited attributes of that grammar Representing it is in general form
symbol. Each production rule has an taking the operator as +, we get
association with a set of semantic X’ + Y + Z
rules. Here, Let Y = t1
Semantic rules set up dependencies ⇒ X = t1 + Z
between attributes which can be Let X = t2
represented by a dependency graph. ⇒ t2 = t1 + Z
This dependency graph determines Therefore, we get the expression
the evaluation order of this T1 = Y ; t2 = t1 + Z ; X = t2
semantic rule. Evaluation of a
semantic rule defines the value of an Q.4 (b)
attribute. But a semantic rule may This is option oriented answer.
also have some side effects such as Evaluate each option to get the
printing a value. answer. Let's evaluate E1 first. After
Example the evaluation we may see that to
Symbol T is associated with a evaluate E2 and to bring the
synthesized attribute type. Symbol operands in the register for
Z. is associated with an inherited subtraction, we have to make move
attribute in. store operations.
Now, if we evaluate E2 first
Q.2 (b) Accordingly, firstly the expression
Let’s make a tree representing the E2 is to be evaluated. Now, when E2
values to get the required output. is evaluated, next step is to evaluate
E1. Doing this, we obtain E1 as one of
the operands in the register and
subtraction is performed directly.
Q.5 (d)
Let’s construct a parser tree to get
the string.
For an input '2*3+4', this translation
Therefore, the translation scheme scheme prints 234+*.
prints 95 + 2.
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
So as per the above tree where
Q.6 (b) leaves have been given the values,
This how, we obtain 3MOV the maximum possible value of the
instructions in the code generated expression represented by the tree
for the basic Brock. is 6.
Q.7 (b)
Q.8 (6)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
4 INTERMEDIATE CODE GENERATION
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
the nodes of the tree in which a node 2. Assignment instructions of the form
appears immediately after its children. The x:=op y, where op is an unary operation.
postfix notation for the syntax tree in Fig. Essential unary operations include
4.2 (a) is a b c uminus * b c uminus * + unary minus, logical negation, shift
assign operators, and conversion operators
that, for example, convert a fixed-point
4.2 Three-Address code number to a floating-point number.
3. Copy statements of the from x:= y where
Tree address code is a sequence of the value of y is assigned to x.
statements of the general form 4. The unconditional jump goto L. The
x: =y op z three-address statement with label L is
Where x, y, z are names, constants or the next to be executed.
complier generated temporaries; op stands 5. Conditional jumps such as if x relop y
for any operator, such as a fixed or floating- goto L. This instruction applies a
point arithmetic operator, or a logical relational operator (<, =,> =, etc) to x
operator on Boolean-valued data. Note that and y, and executes the statement with
no build up arithmetic expressions are label. L next if x Stands in relation relop
permitted as there is only one operator on to y. If not, the three-address statement
the right side of a statement. following if x relop go to L is executed
Thus a source language expression like x + next, as in the usual sequence.
y*z might be translated into a sequence. 6. param x and call p, n for procedure calls
t1:=y*z and return y, where y representing a
t: := X + t1 returned value is optional- Their typical
Where t1 & t2 are compiler generated use is as the sequence of three- address
temporary names. statements
param x1
4.2.1 Type of Three-Address param x2
Statements param xn
call p, n
Three-Address codes are a form of IR generated as pan of a call of the
similar to assembler for an imaginary procedure p(x1,x2........ xn). The integer
machine. n indicating the number of actual
Three-address statements are similar to parameters in "call p, n" is not
assembly code. redundant because calls can be nested.
Statements can have symbolic labels Indexed assignments of the form x
and there are statements for flow of :=y[i] and x[i] :=y. The first of these sets
control. x to the, value in the location i memory
A symbolic label represents the index of units beyond location y. The statement
a three-address statement in the array x[i] := y sets the contents of the location
holding intermediate code. i units beyond x to the value of y. In
Actual indices can be substituted for the both these instructions, x, y, and i refer
labels either by making a separate pass, to data objects.
or by using "back patching". 7. Address and pointer assignments of the
Each three-address code instruction has form x: = &y, x := *y, and *x := y. The
the form. first of these sets the value of x to the
x:= y op z location of y. Presumably y is a name,
1. Assignment statements of the form x:= perhaps a temporary, that denotes- an
y op z, where op is a binary arithmetic expression with an L-value such as
or logical operation. A[i,j], and x is a pointer name or
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
temporary. That is, the r-value of x is
the L-value (location) of some -object. The S-attributed definition in Fig.4.3
In the statement x :=*y, presumably y is generates three-address code for
a pointer or a temporary whose r- value assignment statements. Given input a :=-b*-
is a location. The r-value of x is made c + b*-c, it produces the code in Fig. (a). The
equal to the contents of that location. synthesized attribute S.code represents the
Finally, *x := y sets the r-value of the three- address code for the assignment S,
object pointed by x to the r-value of y. The non-terminal has two attributes:
The choice of allowable operators is an The function new temp returns a sequence
important issue in the design of an of distinct names t1, t2„..in response to
intermediate form. The operator set must successive calls.
clearly be rich enough to implement the For convenience, we use the notation gen(x
operations in the source language. A small ‘:=’
operator set is easier to implement on a y ‘ + ‘z) in Fig 4.3. to represent the three-
new target machine. However, a restricted address statement x :=y + z.
instruction set may force the front end to Expressions appearing instead of variables
generate long sequences of statements for like x, y, and z are evaluated when passed
some scarce language operations. The to gen, and quoted operators or operands,
optimizer and code generator may then like +, are taken literally. In practice, three-
have to work harder if good code is to be address statements might be sent to an
generated. output file, rather than built up into the
code attributes.
4.2.2 Syntax-Directed Translation into Flow-of-control statements can be added to
Three-Address Code the language of assignments in Fig.4.3 by
productions and semantic rules like the
We give below a S-definition generating ones for while statements in the figure
3-address code from assignments. 4.3.1 the code for S→ while E do S1, is
When three-address code is generated, generated using new attributes S.begin and
temporary names are made up for the S .after to mark the first statement in the
interior nodes of a syntax tree. code for E
The value of non-terminal E on the left PRODUC SEMANTIC RULES
TON
side of E → E1+E2 will be computed into
S → id : = S.code:=E.code||gen(id.plac
a new temporary t,In general, the three- E e‘:=’E.place)
address code for id := E consists of code E → E1 + E.place := newtemp; E.code
to evaluate E into some temporary t, E2 := E1. Code || E2.code||
followed by the assignment id.place := t. gen(E.place ‘:=”E1.place ‘+’
If an expression is a single identifier, E2.place)
E → E1 * E.place := newtemp;
say y, then y itself holds the value of the
E2 E.code := E1. Code ||
expression. E2.code||
We use the following statements gen(E.place
1. S.code represents the 3-address code ‘:=”E1.place ‘*’ E2.place)
for the assignment S, E →- E1 E.place := newtemp;
2. E.place, the name that will hold the E.code := E1. Code ||
value of E, and gen(E.place‘ :=”
3. E.code represents the sequence of 3- ‘uminu’ E1/place)
E → (E1) E.place := E1.place;
address statements evaluating E.
E.code := E1. Code
4. Id.place is the name that holds the value E → id E.place := id.place
of id which may be found by consulting E.code := ”
the symbol table at id. entry
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Fig. 4.3 Syntax-directed definition to example, associated with the production E
produce three-address code for → -E1
assignments. is the semantic rule
E.code = Ei_code || ‘uminus’
and the statement following the code for S, In general, the intermediate form produced
respectively. These attributes represent by the syntax-directed translations in this
labels created by a function new label that chapter can be changed by making similar
returns a new label every time it is called. modifications to the semantic rules.
Note that S. after becomes the label of the
statement that comes after the code for the 4.2.3 Implementations of Three-
while statement. We assume that a non- Address Statements
zero expression represents true; that is,
when the value of E becomes zero, control A three-address statement is an abstract
leaves the while statement Expressions form of intermediate code.In a compiler,
that govern the flow of control may in these statements can be implemented as
general be boolean expressions containing records with fields for the operator and the
relational and logical operators. operands.
Three such representations are
quadruples, triples, and indirect triples.
4.2.4 Quadruples
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
three-address statements can be represented The triples in Fig. 4.5(b) correspond to the
by records with only three fields: op, arg1 quadruples in Fig. 4.5(a), Note that the
and arg2, as in Fig. 4.5(b) copy statement a:= t5is encoded in the
op arg 1 arg 2 result triple representation by placing a in the
(0) uminus C t1
arg1 field and using the operator assign.
A ternary operation like ×[i]; = y requires
(1) * B t1 t2
two entries in the triple structure, as
(2) Uminus C t3
shown in Fig 4.6(a), while x := y[i] is
(3) * B t3 t4
naturally represented as two operations in
(4) + t2 t4 t5 Fig.4.6(b).
(5) := t5 a
Op Arg1 Arg2
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
GATE QUESTIONS
Q.1 Consider the following C code d) x = 4 *5 ⇒ x = 20 is an example
segment: of common subexpression
For (i=0; i<n; i++){ elimination
For (j=0; j<n; j++){ [GATE -2014]
if (i % 2) {
x + = (4 * j + 5 * i); Q.4 Which one of the following is NOT
y + = ( 7 + 4 * j); }}} performed during compilation?
Which one of the following is false? a) Dynamic memory allocation
a) The code contains loop invariant b) Type checking
computation c) Symbol table management
b) There is scope of common sub- d) Inline expansion
expression elimination in this [GATE -2014]
code
c) There is scope of strength Q.5 For a C program accessing X[i] [j]
reduction in this code [k], the following intermediate code
d) There is scope of dead code is generated by a compiler. Assume
elimination in this code that the size of an integer is 32 bits
[GATE -2006] and the size of a character is 8 bits.
t0 = i * 1024
Q.2 Some code optimizations are carried t1 = j * 32
out on the intermediate code because t2 = k * 4
a) they enhance the portability of t3 = t1 + t0
the compiler to other target t4 = t3 + t2
processors t5 = X[t4]
b) program analysis is more Which one of the following
accurate on intermediate code statements about the source code
than on machine code for the C program is CORRECT?
c) the information from data flow a) X is declared as “int X[32] [32]
analysis cannot be used for [8]”.
optimization b) X is declared as “int X[4] [1024]
d) the information from the front [32]”.
end cannot be used for c) X is declared as “char X[4] [32]
optimization [8]”.
[GATE -2008] d) X is declared as “char X[32] [16]
[2]”.
Q.3 Which one of the following is [GATE -2014]
FALSE?
Q.6 Which of the following statements
a) A basic block is a sequence of
are CORRECT?
instructions where control
1) Static allocation of all data areas
enters the sequence at the
by a compiler makes it
beginning and exits at the end.
impossible to implement
b) Available expression analysis
recursion.
can be used for common
2) Automatic garbage collection is
subexpression elimination.
essential to implement recursion.
c) Live variable analysis can be
used for dead code elimination
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
3) Dynamic allocation of activation Q.9 The least number of temporary
records is essential to implement variables required to create a three-
recursion. address code in static single
4) Both heap and stack are essential assignment form for the expression
to implement recursion. q + r/3 + s – t * 5 + u * v/w is ________.
a) 1 and 2 only b) 2 and 3 only [GATE -2015]
c) 3 and 4 only d) 1 and 3 only
[GATE -2014] Q.10. In the context of abstract-syntax-
tree (AST) and control-flow-graph
Q.7 Consider the basic block given (CFG), which one of the following is
below. TRUE?
a=b+c a) In both AST and CFG, let node, N2
c=a+d be the successor of node N1. In
d=b+c the input program, the code
e =d − b corresponding to N2 is present
a=e+b after the code corresponding in
The minimum number of nodes and N1.
edges present in the DAG b) For any input program, neither
representation of the above basic AST nor CFG will contain a cycle
block respectively are c) The maximum number of
a) 6 and 6 b) 8 and 10 successors of a node in an AST
c) 9 and 12 d) 4 and 4 and a CFG depends on the input
[GATE -2014] program
d) Each node is AST and CFG
Q.8 A variable x is said to be live at a
corresponds to at most one
statement Si in a program if the
statement in the input program
following three conditions hold
[GATE -2015]
simultaneously:
1. There exists a statement Sj that Q.11 Consider the intermediate code
uses x given below.
2. There is a path from Si to Sj in (1) i = 1
the flow graph corresponding to (2) j=1
the program (3) t1= 5 * i
3. The path has no intervening (4) t2 = t1 + j
assignment to x including at Si (5) t3 = 4 * t2
and Sj (6) t4 = t3
(7) a[t4] = -1
(8) j= i+1
(9) if j<=5 goto (3)
(10) i = i+1
(11) if I < 5 goto (2)
The number of nodes and edges in
the control-flow-graph constructed
The variables which are live both at for the above code, respectively, are
the statement in basic block 2 and at a) 5 and 7 b) 6 and 7
the statement in basic block 3 of the b) 5 and 5 c) 7 and 8
above control flow graph are [GATE -2015]
a) p, s, u b) r, s, u Q.12 Consider the following code
c) r, u d) q, v segment.
[GATE -2015]
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
x = u - t; a) Both G1 and G2
y = x * v; b) Only G1
x = y + w; c) Only G2
y = t - z; d) Neither G1 nor G2
y = x * y; [GATE -2016]
The minimum number of total
variables required to convert the Q.14 Consider the following intermediate
above code segment to static single program in three address code
assignment form is _________. p ab
[GATE -2016] q p *c
p u*v
Q.13 A student wrote two context-free
q pq
grammars G1 and G2 for generating
a single C-like array declaration. The Which one of the following
dimension of the array is at least corresponds to a static single
one. For example, int a[10][3]; assignment form of the above Code?
The grammars use D as the start p1 a b p3 a b
symbol, and use six terminal q p1 *c q p1 *c
a) 1 b) 4
symbols int ; id [ ] num. p1 u * v p4 u * v
Grammar G1 Grammar G2 q1 p1 q1 q5 p4 q 4
D → int L; D → int L; p1 a b p1 a b
L → id [E L → id E q p 2 *c q p *c
c) 1 d) 1
E → num] E → E [num] p3 u * v q2 u * v
E → num][E E → [num] q 2 p4 q3 q2 p q
Which of the grammars correctly [GATE -2017]
generate the declaration mentioned
above?
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
d b d a a d a c 3 c b 7 a b
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.1 (d) Q.5 (a)
Considering the Options, when we It is given that Size of int is 4B and of
evaluate the code, it emerge that char is 1B. The memory is byte
there is no scope of common sub addressable.
expression elimination in this code. Let the array be declared as Type
There is no dead code in given code X[A][B][C] (where Type = int/char
segment. and A,B,C are natural numbers).
From t0 = i*1024, we conclude that
Q.2 (b) B*C*(size of Type) = 1024.
Code optimizations are generally From t1 = j*32, we conclude that
carried out on intermediate code. C*(size of Type) = 32.
This is done to convert the source From t2 = k*4, we conclude that size
code to the machine language of the of Type = 4.
target machine on the basis of back ⇒ Type = int, and
end tools. It enhances the portability ⇒C = 8, and
of the compilers to the other target ⇒B = 32.
processors.
Q.6 (d)
Q.3 (d) To implement recursion, activation
x = 4x 5 ⇒ x = 20 is not an example record should be implemented by
of common sub-expression but it is providing dynamic memory
constant folding. In constant folding allocation. This dynamic allocation
expression consisting of constants is done from runtime stack. Heap is
will be replaced by their final value essential to allocate memory for
at compile time, rather than doing data structures at run-time, not for
the calculation in run-time. recursion.
So statement 1 and 3 are correction.
Q.4 (a)
Symbol table management is done Q.7 (a)
during compitation to store and The given basic block can be
retriew the information about rewritten as
tokens. Type checking is one of the a=b+c a=b+c
check performed during semantic c=a+d c=b+c+d
analysis of compitation. Inline d=b+c ⟹ d=b+b+c+d=2b+c+d
expansion is a compiler e=d-b e= b +b+c+d- b
optimization that replaces a =b+c+d a=e+b
function call by the body of a=b+b+c+d=2b+c+d
respective function. From above simplification it is
Dynamic memory allocation is when visible that e is same as c and final
an executing program request the value of a is same as d. So the final
operating system to give it a block of basic block can be written as
main memory, so it is performed follows:
during sum time not during a=b+c
complete time. c=a+d
Option (a) is answer d = 2b + c + d
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
e=c
a=d
The DAG generated for the above
basic block in as
Q.10 (c)
Optional (A) is not true when CFG
contains cycle
Option (B) is false as CFG can
contain cycle
Option (D) is false as a single node
can contain block of statements.
Q.14 (b)
Considering each option :
In option (a) p1 , q1 are used
again for temporary storage
which is not allowed under static
single assignment.
In option (c ) in 2nd line it should
be
q1 = p1 *c
In option (d) in 2nd line it should
be q1 = p1 *c
The correct code is
p3 a b
q 4 p1 *c
p4 u * v
q5 p4 q 4
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
ASSIGNMENT QUESTIONS
Q.1 Cross compiler is a compiler d) If one changes a statement, only
a) Which is written in a language that statement needs
that is different from the source recompilation.
language
b) That generates object code for Q.5 The cost of developing a compiler is
its host machine. proportional to the
c) Which is written in a language a) Complexity of the source
that is a same as the source language
language b) Complexity of the architecture of
d) That runs on one machine but the target machine
produces object code for another c) Flexibility of the available
machine instruction set
d) None of the above
Q.2 Incremental-compilers is a compiler
a) Which is written in a language Q.6 An ideal compiler should
that is different from the source a) Be smaller in size
language b) Take less time for compilation
b) That generates object code for c) Be written in a high level
its host machine. language
c) Which is written in a language d) Produce object code that is
that is a same as the source smaller in size and executes
language faster
d) That allows a modified portion
of a program to be recompiled? Q.7 An optimizing compiler
a) Is optimized to occupy less space
Q.3 For which of the following reasons, b) Is optimized to take less time for
an interpreter is preferred to a execution
compiler? c) Optimizes the code
a) It takes less time to execute. d) None of the above
b) It is much helpful in the initial
stages of program development. Q.8 In a compiler, grouping of
c) Debugging can faster and easier. characters into tokens is done by
d) It needs less computer resources the
a) Scanner
Q.4 For which of the following reasons, a b) Parser
compiler is preferable to an c) Code generator
interpreter? d) code optimizer
a) It can generate stand-alone
programs that often take less Q.9 whether a given pattern constitutes
time for execution. a token or not
b) It is much helpful in the initial a) Depend on the source language
stages of program development. b) Depends on the target language
c) Debugging can be faster and c) Depends on the compiler
easier. d) None of the above comments true
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.10 A grammar will be meaningless if c) The number of grammar
the symbols in the left hand side is
a) Terminal set and the non- not greater than the number of
terminal set are not disjoint grammar symbols in the right
b) Left hand side of a production is hand side
a single terminal d) All of the above
c) Left hand side of a production
has no non-terminal Q.15 If w is a starting of terminals and
d) Left hand side of a production A,B are two non-terminal, than
has more than two non-terminal which of the following are right
linear grammar?
Q.11 Which of the following grammars are a) A → Bw
not phase-structured? b) A → Bw | w
a) Regular c) A → wB | w
b) Context-free d) None of the above
c) context-sensitive Q.16 If a is a terminals and S, A, B are
d) none of the above three non-terminals, then which of
the following are regular grammars?
Q.12 Which of the following is the most S A aB | a
general phase-structured grammar? a) b)
A aS | b B bA | b
a) Regular
b) Context-free c) A Ba | Bab d) A abB | aB
c) context-sensitive
d) none of the above Q.17 Representing the syntax by a
grammar is advantageous because
Q.13 In a context-sensitive grammar, a) It is concise
a) Ɛ can’t be the right hand side of b) It is accurate
any production c) Automation becomes easy
b) Number of grammar symbols on d) Intermediate cade can be
the left hand side of a production generated easily and efficiently
can’t be greater than the number
of grammar symbols on the right Q.18 CFG can be recognized by a
hand side a) Push-down automata
c) Number of grammar symbols on b) 2-way linear bounded automata
the left hand side of a production c) Finite state automata
can’t be greater than the number d) None of above
of terminals on the right hand
side Q.19 CSG can be recognized by a
d) Number of grammar symbols on a) Push-down automata
the left hand side of a production b) 2-way linear bounded automata
can’t be greater than the number c) Finite state automata
of non-terminals on the right d) None of above
hand side
Q.20 Choose the correct statements.
Q.14 In a context-free grammar,
a) Sentence of a grammar is a
a) Ɛ can’t be the right hand side of
sentential from without any
any production
terminals.
b) Terminal symbols can’t be
b) Sentence of a grammar should
present in the left hand side of
be derivable from the start state.
any production
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) Sentence of a grammar should string that can be generated by
be frontier of a derivation tree, the given grammar.
in which the root node has the b) A given language is ambiguous if
start state as the label. no unambiguous grammar exists
d) All of the above for it.
c) Two different grammars may
Q.21 A grammar can have generate the same language.
a) A non-terminal A that can’t d) None of the above
derive any string of terminals
b) A non-terminal A that can be Q.27 Consider the grammar
present in any sentential from S → ABC | Abc
c) Ɛ as the only symbol on the left BA → AB
hand side of a production Bb → bb
d) None of the above Ab → ab
Aa → aa
Which of the following sentences can be
Q.22 A top-down parser generate
derived by this grammar?
a) Left-most derivation
a) abc b) aab
b) Right-most derivation
c) abcc d) abbc
c) Right-most derivation in reverse
d) Left-most derivation in reverse Q.28 The language generated by the
above grammar is the set of all
Q.23 A bottom-up parser generates strings, made up of a, b, c such that
a) Left-most derivation a) The number of ɑ’ѕ, b’s, and c’s
b) Right-most derivation will be equal
c) Right-most derivation in reverse b) ɑ’s always precede b’s
d) Left-most derivation in reverse c) b’s always precede c’s
Q.24 A given grammar is said to be d) The number of ɑ’s b’s and c’s are
ambiguous if same and the a’s precede b’s,
a) Two or more productions have which precede c’s.
the same non-terminal on the Q.29 In an incompletely specified
left hand side automata
b) A derivation tree has more than a) No edge should be labeled Ɛ
one associated sentence b) From any given state, there can’t
c) There sentence with more than be any token leading to two
one derivation tree different states
corresponding to it c) Some states have no transition
d) Parenthesis are not present in on some tokens
the grammar d) Start state may not be there
Q.25 The grammar E → E + E | E * E | a, is Q.30 The main difference between a
a) Ambiguous DFSA and an NDFSA is
b) Unambiguous a) In DFSA, Ɛ transition may be
c) Ambiguous or not depends on present
the given sentence b) In NDFSA, Ɛ transitions may be
d) None of the above present
c) In DFSA, from any given state,
Q.26 Choose the correct sentence there can’t any alphabet leading
a) Language corresponding to a to two different states.
given grammar, is the set of all
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
d) In NDFSA, from any given state, b) Keeping track of variable
there can’t be any alphabet declaration
leading to two different states. c) Checking for the correct use of L-
values and R-values
Q.31 Two finite state machine are said to d) All of the above
be equivalent if they
a) Have the same number of states Q.37 The graph depicting the inter-
b) Have the same number of edges dependencies of the attributes of
c) Have the same number of states different nodes in a parse tree is
and edges called a
d) Recognize the same set of tokens a) Flow graph
b) Dependency graph
Q.32 Choose the correct answer. c) Karnaugh’s graph
FORTRAN is a d) Steffi graph
a) Regular language
b) context-free language Q.38 Choose the correct statements.
c) Context-sensitive language a) Topological sort can be used to
d) Turing language obtain an evaluation order of a
dependency graph.
Q.33 If two finite states machine M and N b) Evaluation order for a
are isomorphic, then M can be dependency graph dictates the
transformed to N by relabeling order in which the semantic
a) The states alone rules are done
b) The edges alone c) Code generation depends on the
c) Both the states and edges order in which the semantic
d) None of above actions are performed.
d) Only (a) and (c) are correct
Q.34 In a syntax directed translation
scheme, if the value of an attribute of Q.39 A syntax tree
a node is a function of the value of a) Is another name for a parse tree
the attributes of its children, then it b) Is a condensed form of parse tree
is called a c) Should not have keywords as
a) Synthesized attribute leaves
b) inherited attribute d) None of the above
c) Canonical attribute
d) None of the above Q.40 Syntax directed translation scheme
is desirable because
Q.35 Synthesized attribute can easily be a) It is based on the syntax
simulated by an b) Its description is independent of
a) LL grammar anyimplementation
b) Ambiguous grammar c) It is easy to modify
c) LR grammar d) Only (a) and (c) are correct
d) None of the above
Q.41 Which of the following is not an
Q.36 For which of the following intermediate code form?
situations, inherited attribute is a a) Postfix notation
natural choice? b) syntax tree
a) Evaluation of arithmetic c) Tree address code
expressions d) quadruples
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
Q.42 Three address codes can be a) List
implemented by b) Search tree
a)Indirect triples c) Hash table
b) Direct triples d) Self-organizing list
c) Quadruples
d) none of the above Q.49 Access time of the symbol table will
be logarithmic, if it is implemented
Q.43 Three address code involves by a
a) Exactly 3 addresses a) Linear list
b) At the most 3 addresses b) Search tree
c) No unary operator c) Hash table
d) none of the above d) Self-organizing list
Q.44) Symbol table can be used for Q.50 An ideal compiler should
a) Checking type compatibility a) Detect error
b) Suppressing duplicate error b) Detect and report error
messages c) Detect, report and correct error
c) Storage allocation d) none of the above
d) none of the above
Q.51 Which of the following is not a
Q.45 The best way to compare the source of error?
different implementations of symbol a) Faulty design specification
table is to compare the time b) Faulty algorithm
required to c) Compiler themselves
a) Add a new name d) None of the above
b) Make an inquiry
c) Add a new name and make an Q.52 Any transcription error can be
inquiry repaired by
d) None of above a) Insertion alone
b) Deletion alone
Q.46 Which of the following symbol table c) Insertion and deletion alone
implementation is based on the d) Replacement alone
property of locality of reference?
a) Linear list Q.53 Hamming distance is a
b) Search tree a) Theoretical way of measuring
c) Hash table errors
d) self-organization list b) Technique for assigning codes to
a set of item know to occur with
Q.47 Which of the following symbol table a given probability
implementation is best suited if c) Technique for optimizing the
access time is to be minimum? intermediate code
a) Linear list d) None of the above
b) Search tree
c) Hash table Q.54 Error repair may
d) self-organization list a) Theoretical way of measuring
error
Q.48 Which of the following symbol table b) Technique for assigning codes to
implementation, makes efficient use a set items know to accur with a
of memory? given probability
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) Technique for optimizing the b) Flow graph
intermediate code c) DAG
d) None of the above d) Hamiltonian graph
Q.55 A parser with the valid prefix Q.61 Reduction in strength means
property is advantageous because a) Replacing run time computation
a) It detects error as possible by compile time computation
b) It detects errors as and when b) Removing loop invariant
they computation
occur c) Removing common sub-
c) It limits the amount of erroneous expressions
output passed to the next phase d) Replacing a costly operation by a
d) All of the above relatively cheaper one
Q.56 The advantage of panic mode of Q.62 A basic block can be annualized by a
error recovery is that a) DAG
a) It is simple to implement b) Graph which my involve cycles
b) It is very effective c) Flow-graph
c) It never gets into an infinite loop d) None of the above
d) None of the above
Q.63 ud-chaining is useful for
Q.57 To recover from an error, the a) Determining whether a
operator precedence parser may particular definition is used
a) Insert symbols onto the stack anywhere or not
b) Insert symbols onto the input b) Constant folding
c) Delete symbol from the stack c) Checking whether a variable is
d) Delete symbols from the input used, without prior assignment
d) None of the above
Q.58 Which of the following optimization
techniques are typically applied on Q.64) Which of the following concepts can
loops be used to identify loops?
a) Removal of invariant a) Dominators
computation b) Unrolling
b) Elimination of induction variables c) Depth first ordering
c) Peephole optimization d) none of the above
d) Constant folding
Q.65 Which of the following are not loop
Q.59 The technique of the following run optimization techniques?
time computations by compile time a) Jamming
computations is called b) Unrolling
a) Constant folding c) Induction variable elimination
b) Code hoisting d) None of the above
c) Peephole optimization
d) invariant computation Q.66) Running time of a program depends
on the
Q.60 The graph that show the basic blocks a) Way the registers are used
& their successor relationship is b) Order in which computations are
called performed
a) Control graph
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) Way the addressing modes are d) A (comes in between
used
d) Usage of machine idioms Q.73 For the previous problem, the
precedence relation between) & (will
Q.67 du-chaining be
a) Stands for use definition chaining a) > b) <
b) Is useful for copy propagation c) = d) Undefined
removal
c) Is useful for induction variable Q.74 LR stands for
removal a) Left to right
d) None of the above b) Left to right reduction
Q.68 Which of the following comments c) Right to left
about peep-hole optimization are d) Left to right and right-most
true? derivation in reverse
a) It is applied to a small part of the
code. Q.75 LR parsers are attractive because
b) It can be used to optimize a) It can be constructed to
intermediate code. recognize CFG corresponding to
c) To get the best out of this almost all programming
technique, it has to be applied constructs.
repeatedly. b) It does not backtrack
d) It can be applied to a portion of c) It detects error as and when they
the code that is not contiguous. occur
d) All of the above
Q.69 Shift-reduce parsers are
a) Top-down parsers Q.76 Which of the following is the most
b) Bottom-up parsers powerful parser?
c) May be top-down or bottom-up a) SLR
parsers b) LALR
d) None of the above c) Canonical LR
Q.70 recursive decent parsing is an d) operator-precedence
example of
a) Top-down parsing Q.77 Choose the correct statements.
b) Bottom-up parsing a) There are CFG’s that are not LR
c) May be top-or bottom-up parsers b) An ambiguous grammar can
d) None of the above never be LR
c) An ambiguous grammar can be
Q.71 In operator precedence parsing, LR
precedence relations are defined d) Any CFG has to be LR
a) For all pair of non-terminals
b) For all pair of terminals Q.78 YACC builds up
c) Predictive parsing a) SLR parsing table
d) None of the above b) Canonical LR parsing table
c) LALR parsing table
Q.72 For parsing arithmetic expressions,
d) none of the above
Involving (,), + and -, the precedence
relation between – and -, will be <, if
Q.79 Choose the correct statements.
a) We are talking of unary minus
a) LL(k) grammar has to be a CFG
b) Minus is right associative
b) LL(k) grammar has to be
c) Minus is left associative
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
unambiguous c) Context-free
c) There are LL(k) grammars that d) None of the above
are not context free
d) LL(k) grammars cannot have left Q.85 The above grammar is used to
recursive non-terminals generate all valid arithmetic
expression in a hypothetical
Q.80 Consider an Ɛ-free CFG. If for every language in which
pair of production A→u and a→v a)/ Associates from the left
a) If FIRST (u) ∩ FIRST (v) is empty b)– Associates from the left
then the CFG has to be LL(1) c)+ Associative from the left
b) If the CFG is LL(1) then FIRST d)* Associative from the left
(u) ∩ FIRST (v) has to be empty Q.86 The above grammar is used to
c) If FIRST (u) ∩ FIRST (v) is empty grammar is used to generate all valid
then the CFG cannot be LL(1) arithmetic expression in a
d) None of the above hypothetical language in which
Q.81 LR (k) grammar a)+Has the highest precedence
a) Can only examine a maximum of b)* Has the highest precedence
k input symbols c)+Has the highest precedence
b) Can be used to identify handles d)/ Has the highest precedence
c) Can be used to identify the
production associated with a Q.87 Back patching is useful for handling
handle a) Conditional jumps
d) Covers the LL(k) class b) Unconditional jumps
c) Backward references
Q.82 The set of all viable prefixes of right d) Forward references
sentential from of a given grammar Let x be a string and let A be a non-
a) Can be recognized by a finite terminal. FIRSTk (x) is the set of all
state machine leading terminal strings of length k
b) Cannot be recognized by a finite or less, in the strings derivable from
state machine x. FOLLOWk (A) is the set of all
c) Can be used to control an LR(k) derivable terminal strings of length
parser k or less, that can follow A in some
d) None of the above left-most sentential from.
The next three questions are
Q.83 The ‘k’, in LR (k) cannot be based on the above definition
a)0 b) 1
c) 2 d) None of the above Q.88 Consider the grammar
E → TE’
The next three questions are based on E → +TE’|Ɛ
the following grammar T → FT’
E → E/X|X T’ → *FT’|Ɛ
X → T-X|X*T|T F → (E)|id
T → T+F|F FIRST1 (E) will be same as that of
F → (E)| id a) FIRST1 (T) b) FIRST1
(id stands for identifier) c) FIRST1 (T’) d) All of the
above
Q.84 This grammar is
a) Unambiguous Q.89 FOLLOW1 (F) is
b) Ambiguous a) { + , * , ) , $ } b) { + , ) , $ }
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
c) { * , ) , $ } d) { + , ( , ) , * } Q.94 A shift reduce parser carries out the
actions specified within braces
Q.90 Which of the following remarks immediately after reducing with the
logically follow? corresponding rule of grammar
a) FIRSTx (Ɛ) ={Ɛ} S → xxW {print“1”}
b) If FOLLOWk (A) contains Ɛ, then S → Y {print“2”}
A is the start symbol W→ Sz{print “3”}
c) If A → w, is a production in the What is the translation of xxxxyzz
given grammar G, then FIRSTk using the syntax directed translation
(A) contains FIRSTk (w) scheme described by the above
d) If A → w, is a production in the rules?
given grammar G, then FIRSTk a)23131 b) 11233
(w) contains FIRSTk (A) c) 11231 d) 33211
Q.91 Merging states with a common core
may produce __, conflicts but does no Q.95 Which of the following features
produce__conflicts in an LALR parser cannot be captured by CFG?
a) Reduce-reduce; shift-reduce a) Syntax of if-then-else statements
b) Shift-reduce; reduce-reduce b) Syntax of recursive procedures
c) Shift-reduce; shift; reduce c) Whether a variable is declared
d) None of the above before its
d) Matching nested parenthesis
Q.92 For a CFG, FOLLOW (A) is the set of
all terminals that can immediately Q.96 A language L allow the declaration
appear to the right of the non- of arrays whose sizes are not known
terminals A in some sentential from, during compilation. It is required to
We define two sets LFOLLOW(A) make efficient use of memory.
and RFOLLOW(A) by replacing the Which one of the following is true?
word sentential by “Left most a) A compiler using static memory
sentential” & “Right most sentential” allocation can be written for L.
respectively in the definition of b) A compiler cannot be written for
FOLLOW(A) L. an interpreter must be used 4
a) FOLLOW (A) and LFOLLOW (A) c) A compiler using dynamic
may be different allocation can be written for L.
b) FOLLOW (A) and RFOLLOW (A) d) None of the above
are always the same
c) All the three are same
d) All the three are different
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
ANSWER KEY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(d) (d) (b) (a) (a) (a) (c) (a) (a) (a) (d) (c) (a) (b) (c)
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
(b) (a) (a) (b) (b) (a) (a) (c) (c) (a) (a) (a) (d) (c) (c)
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
(d) (b) (a) (a) (c) (b) (b) (b) (b) (a) (d) (b) (b) (a) (c)
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(d) (c) (a) (b) (c) (d) (c) (a) (a) (a) (a) (a) (b) (a) (b)
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
(d) (a) (a) (a) (d) (a) (b) (a) (b) (a) (c) (a) (d) (d) (a)
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
(c) (a) (c) (a) (a) (a) (a) (d) (a) (a) (a) (d) (a) (a) (a)
91 92 93 94 95 96
(a) (a) (b) (a) (c) (c)
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission
EXPLANATIONS
Q.25 (a) of the stack, it is popped and
Consider the string a+a*a. it can be replaced by the corresponding left
derived as hand side of the production. If
E→E+E→E+E*E→a+E*E→a+a*E→a+ ultimately we have only the
a*a starting non-terminal on the stack,
Or when there are no more tokens to
E→E*E→E+E*E→a+E*E→a+a*E→a+ be scanned, the parsing will be
a*a successful. So, it is bottom-up.
Since we know a string that can be
derived in more than one way, the Q.94 (a)
given grammar is ambiguous. The right most derivation of the
string xxxxyzz is,
Q.27 (a) S→xxW→xxSz→xxxxWz→xxxxSzz
abc can be derived as follows. →xxxxyzz.
S → Abc → abc using {Ab → ab} A shift reduce parser, performs the
As we see, any production from the right-most derivation in reverse. So,
start state has to end in c. So aab is first it reduces the Y to s, by the
impossible. Option (c) and (d) also production S → y. As a consequences
not possible. of this, 2 is immediately printed.
Next, Sz is reduce to W, by the
Q.28 (d) production W → Sz. So, 3 will be
Generates some of the strings that printed. Proceeding this way, we get
can be derived from the start state the output string 23131.
and verify that they fall into the
category covered by option (d). Q.95 (c)
It is because, it is equivalent to
Q.47 (c) recognizing wcw, where the first w
If memory space is not the is the declaration and the second is
constraint, then by increasing the its use. wcw is not a CFG.
number of bins to K, the access time
can be reduced by a factor of K. so,
average number of item in a bin will
decrease as the number of bins
increases. In the case of list, access
time will be proportional to n , the
number of items, but we will be
using as much memory space as is
abQ.utely necessary. In the case of
search tree implementation, the
access time will be logarithmic.
Q.69 (b)
Any shift-reduce parser typically
word by shifting entries onto the
stack. If a handle is found on the top
© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission