CD Unit-2 Part 1
CD Unit-2 Part 1
CD Unit-2 Part 1
COMPILER DESIGN
UNIT -II
Syntax Analysis: Introduction, Contex-Free Grammers, writing a Grammar, Top-Down Parsing,
Bottom-Up Parsing, Introduction to LR Parsing, Simple LR, More Powerful LR Parsers, Using
Ambiguous Grammers and Parser Generators.
-------------------------------------------------------------------------------------------------------------------------------
Syntax Analysis Introduction:
The Syntax analysis is the second stage or phase of the compiler.
It accepts tokens as input and provides a parse tree as output. It is also known as parsing in a
compiler.
Example:
INPUT: Tokens
(id1) = (id2) + (Id3)*(id4)
Top-Down Parsing: Top-down parsing is also known as predictive parsing. This technique starts
with the root symbol of a grammar and moves down by expanding non-terminal symbols into
terminal symbols.
Bottom-Up Parsing: Bottom-up parsing is also known as shift-reduce parsing. It begins with the
input string and builds upwards to the root symbol by combining input symbols according to
grammar rules.
Syntax Trees: Syntax trees are often considered parse trees. These hierarchical structures
represent the grammatical structure of a sentence.
Error Detection: It is an essential part of syntax analysis. Error detection occurs when the parser
encounters a symbol sequence that violates grammar rules.
Derivation Tree:
Derivation tree is a graphical representation for the derivation of the given production rules for a
given CFG.
It is the simple way to show how the derivation can be done to obtain some string from a given
set of production rules. The derivation tree is also called a parse tree.
Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.
A parse tree contains the following properties:
1. The root node is always a node indicating start symbols.
2. The derivation is read from left to right.
3. The leaf node is always terminal nodes.
4. The interior nodes are always the non-terminal nodes.
Example:
Production rules:
1. E = E + E
2. E = E * E
3. E = a | b | c
Input:
a*b+c
Step 1:
Step 2:
Step 2:
Step 4:
Step 5:
Example:
Consider the grammar G with production:
S → aSS (Rule: 1)
S→b (Rule: 2)
Compute the string w = ‘aababbb’ with left most derivation.
Sol:
S ⇒ aSS (Rule: 1)
⇒ aaSSS (Rule: 1)
⇒ aabSS (Rule: 2)
⇒ aabaSSS (Rule: 1)
⇒ aababSS (Rule: 2)
⇒ aababbS (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “leftmost derivation”, follows “1121222” sequence rules.
Example:
Consider the grammar G with production:
S → aSS (Rule: 1)
S→b (Rule: 2)
Compute the string w = ‘aababbb’ with right most derivation.
Sol:
S ⇒ aSS (Rule: 1)
⇒ aSb (Rule: 2)
⇒ aaSSb (Rule: 1)
⇒ aaSaSSb (Rule: 1)
⇒ aaSaSbb (Rule: 2)
⇒ aaSabbb (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “Rightmost derivation”, follows “1211222” sequence rules.
Mixed Derivation: In a mixed derivation the string is obtained by applying production to the
leftmost variable and rightmost variable simultaneously as per the requirement in each successive
step.
S ⇒ aSS (Rule: 1)
⇒ aSb (Rule: 2)
⇒ aaSSb (Rule: 1)
⇒ aaSaSSb (Rule: 1)
⇒ aabaSSb (Rule: 2)
⇒ aabaSbb (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “mixed derivation”, follows “1211222” sequence rules.
Solution:
(i)
E⇒E+E (Rule: 1)
⇒E*E+E (Rule: 2 and left derivation)
⇒ id * id + id (Rule: 4 and left derivation)
(ii)
E⇒E*E (Rule: 2)
⇒ (E) * E (Rule: 3 and left derivation)
⇒ (E + E) * E (Rule: 1 and left derivation)
⇒ (id + id) * id (Rule: 4)
Grammar:
Grammar is defined as a system of language rules that allows you to combine individual words
to make complex meanings. By applying grammar rules.
A grammar consists of a number of productions.
Each production has an abstract symbol called a nonterminal as its left-hand side, and a
sequence of one or more nonterminal and terminal symbols as its right-hand side.
There are four categories in writing a grammar:
Eliminating ambiguity:
Ambiguity:
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivation or more than one parse tree for the given input string. If the
grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction.
Example:
Let us consider a grammar G with the production rule
1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.
Re-writing grammar from ambiguous to unambiguous for that some elimination is required.
1. Precedence:
If different operators are used, we will consider the precedence of the operators. The three
important characteristics are :
The production at higher levels will have operators with less priority.
The production at lower levels will have operators with higher priority.
2. Associativity:
If the same precedence operators are in production, then we will have to consider the associativity.
If the associativity is left to right, then we have to prompt a left recursion in the production.
If the associativity is right to left, then we have to prompt the right recursion in the
productions.
The language in the grammar will contain { id, id-id, id-id-id, ….}
So, to make the above grammar unambiguous, in this right side of the production use random
variable P.The grammar becomes :
EE – P
EP
Pid
The above grammar is now unambiguous and will contain only one Parse Tree for the above
expression.
E -> E + E | E * E | id
Example:
Elimination of Left Recursion of AABd | Aa | a
BBe | b
Solution:
A aA′
A′BdA’| aA’ | ϵ
BbB’
B’eB’ | ϵ
Left factoring is a process by which the grammar with common prefixes is transformed to make it
useful for Top-down parsers.
Top-down parsers cannot decide which production must be chosen to parse the string in hand.
To remove this confusion, we use left factoring.
The grammar obtained after the process of left factoring is called as Left Factored Grammar.
1.Top-Down Parser:
The top-down parser is the parser that generates parse tree for the given input string with the
help of grammar productions by expanding the non-terminals i.e. it starts from the start symbol
and ends with the terminals symbol
It uses for left most derivation.
Example:
S -> aABe
A -> Abc | b
B -> d
Input:
abbcde$
S a A B e
SaAbcBe
S abbcde
Recursive-descent parsers:
Recursive-descent parsers are a type of top-down parser that uses a set of recursive
procedures to parse the input.
Each non-terminal symbol in the grammar corresponds to a procedure that parses input for
that symbol.
Backtracking parsers:
Backtracking parsers are a type of top-down parser that can handle non-deterministic grammar.
When a parsing decision leads to a dead end, the parser can backtrack and try another
alternative.
Backtracking parsers are not as efficient as other top-down parsers because they can potentially
explore many parsing paths.
Example:
The following is a grammar rule:
S -> r P d
P -> m | m n
Input string: 'rmnd.'
Building the parse tree will start with the symbol S.
The next leaf of the parse tree 'P' has two production rules. Let's expand the symbol 'P' with the
given first production of 'P,' i.e., 'm' we get the following parse tree.
We get a mismatch when comparing the third input symbol 'n' against the next leaf labeled 'd'.
Therefore, the parser must return to the symbol 'P' and look for the other alternative.
The other alternative for the production of 'P' is 'mn .'Also, the input pointer must be back-tracked
and reset to the position where it was first before we expanded the symbol 'P .'It means the input
pointer must point to 'm' again.
Non-backtracking parsers:
Non-backtracking is a technique used in top-down parsing to ensure that the parser doesn’t
revisit already-explored paths in the parse tree during the parsing process.
This is achieved by using a predictive parsing table that is constructed in advance and selecting
the appropriate production rule based on the top non-terminal symbol on the parser’s stack and
the next input symbol.
By not backtracking, predictive parsers are more efficient than other types of top-down parsers,
although they may not be able to handle all grammar.
Predictive parsers:
Predictive parsers are top-down parsers that use a parsing to predict which production rule to
apply based on the next input symbol.
Predictive parsers are also called LL parsers because they construct a left-to-right, leftmost
derivation of the input string.
FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
Follow(): The terminal symbol that follows a variable throughout the derivation process.
Rule 1: For the start symbol S,
Follow(S) = {$}
Rule 2: For any production rule A → αB
Follow(B) = Follow(A)
Rule 3: For any production rule A → αBβ,
Follow(B) = First(β) If Є ∉ First(β)
Follow(B) = { First(β) – ∈ } ∪ Follow(A) If Є ∈ First(β),
Example:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id
FOLLOW Set
FOLLOW(E) = { $ , ) }
FOLLOW(E’) = FOLLOW(E) = { $, ) }
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = {+,$,)}
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T) = { *, +, $, ) }
First Follow
E’ –> +TE’/ε { +, ε } { $, ) }
T’ –> *FT’/ε { *, ε } { +, $, ) }
id + * ( ) $
$E id + id * id $ E → TE′
$E′T id + id * id $ T → FT′
$E′T′F id + id * id $ F → id
$E′T′id id + id * id $ Remove id
$E′T′ +id * id $ T′ → ε
$E′T id * id $ T → FT′
$E′T′F id * id $ F → id
$E′T′id id * id $ Remove id
$E′T′ * id $ T′ →* FT′
$E′T′F * * id $ Remove *
$E′T′F id $ F → id
$E′T′id id $ Remove id
$E′T′ $ T′ → ε
$E′ $ E′ → ε
$ $ Accept
abbcde$ (Ab)
aAbcde$ (AAbc)
aAde$ ( Bd)
aABe$ (SaABe)
S
Shift-reduce parsing:
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-
step and reduce-step.
Shift step: This involves moving symbols from the input buffer onto the stack.
Reduce step : If the handle appears on top of the stack then, its reduction by using appropriate
production rule is done.
LR Parser:
The LR parser is a non-recursive, shift-reduce, bottom-up parser.
LR parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the
input stream; R stands for the construction of right-most derivation in reverse, and k denotes
the number of lookahead symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser:
1. SLR(1) – Simple LR Parser:
Questions:
1. Explain the term -Syntax analysis.
2. What are the Important Terminologies Used in Syntax Analysis.
Construct a Predictive Parsing table for the following grammar & also
check whether string
id + id * id is accepted or not.
19.