CD Unit-2 Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

1

COMPILER DESIGN
UNIT -II
Syntax Analysis: Introduction, Contex-Free Grammers, writing a Grammar, Top-Down Parsing,
Bottom-Up Parsing, Introduction to LR Parsing, Simple LR, More Powerful LR Parsers, Using
Ambiguous Grammers and Parser Generators.
-------------------------------------------------------------------------------------------------------------------------------
Syntax Analysis Introduction:
 The Syntax analysis is the second stage or phase of the compiler.
 It accepts tokens as input and provides a parse tree as output. It is also known as parsing in a
compiler.

Roles and Responsibilities of Syntax Analyzer


 Note syntax errors.

 Helps in building a parse tree.

 Acquire tokens from the lexical analyzer.

 Scan the syntax errors, if any.

Example:
INPUT: Tokens
(id1) = (id2) + (Id3)*(id4)

OUTPUT: Parse Tree

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


2

Important Terminologies Used in Syntax Analysis:

Top-Down Parsing: Top-down parsing is also known as predictive parsing. This technique starts
with the root symbol of a grammar and moves down by expanding non-terminal symbols into
terminal symbols.

Bottom-Up Parsing: Bottom-up parsing is also known as shift-reduce parsing. It begins with the
input string and builds upwards to the root symbol by combining input symbols according to
grammar rules.

Syntax Trees: Syntax trees are often considered parse trees. These hierarchical structures
represent the grammatical structure of a sentence.

Error Detection: It is an essential part of syntax analysis. Error detection occurs when the parser
encounters a symbol sequence that violates grammar rules.

Context-Free Grammar (CFG):


Context-Free Grammar (CFG) plays an important role in describing the syntax of programming
languages.
A context-free grammar is a set of recursive rules used to generate patterns of strings.

A context-free grammar can be described by a four-element tuple (V, Σ, R, S), where

 V is a finite set of variables (which are non-terminal);


 Σ or T is a finite set (disjoint from V) of terminal symbols;
 R is a set of production rules where each production rule maps a variable to a string.
 S is a start symbol.

Terminals include various symbols, such as:


Lowercase letters (eg., a, b, c)
Operators like +, –
Punctuation symbols (eg., comma, parenthesis)
Digits (eg., 0, 1, 2)
Boldface letters or keywords (eg., id)

Non Terminals include various symbols, such as:


Uppercase letters (eg., A, B, C)
Lowercase italicized names (eg., expr, stmt)

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


3

Derivation Tree:
 Derivation tree is a graphical representation for the derivation of the given production rules for a
given CFG.
 It is the simple way to show how the derivation can be done to obtain some string from a given
set of production rules. The derivation tree is also called a parse tree.
 Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.
 A parse tree contains the following properties:
1. The root node is always a node indicating start symbols.
2. The derivation is read from left to right.
3. The leaf node is always terminal nodes.
4. The interior nodes are always the non-terminal nodes.
Example:

Production rules:

1. E = E + E
2. E = E * E
3. E = a | b | c

Input:

a*b+c

Step 1:

Step 2:

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


4

Step 2:

Step 4:

Step 5:

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


5

Types of Derivation Tree:


There are three types of Derivation trees follows:
 Leftmost Derivation tree

 Rightmost derivation tree

 Mixed derivation tree

Leftmost derivation: A leftmost derivation is obtained by applying production to the leftmost


variable in each successive step.

Example:
Consider the grammar G with production:
S → aSS (Rule: 1)
S→b (Rule: 2)
Compute the string w = ‘aababbb’ with left most derivation.
Sol:
S ⇒ aSS (Rule: 1)
⇒ aaSSS (Rule: 1)
⇒ aabSS (Rule: 2)
⇒ aabaSSS (Rule: 1)
⇒ aababSS (Rule: 2)
⇒ aababbS (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “leftmost derivation”, follows “1121222” sequence rules.

Rightmost derivation: A rightmost derivation is obtained by applying production to the rightmost


variable in each step.

Example:
Consider the grammar G with production:
S → aSS (Rule: 1)
S→b (Rule: 2)
Compute the string w = ‘aababbb’ with right most derivation.
Sol:

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


6

S ⇒ aSS (Rule: 1)
⇒ aSb (Rule: 2)
⇒ aaSSb (Rule: 1)
⇒ aaSaSSb (Rule: 1)
⇒ aaSaSbb (Rule: 2)
⇒ aaSabbb (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “Rightmost derivation”, follows “1211222” sequence rules.

Mixed Derivation: In a mixed derivation the string is obtained by applying production to the
leftmost variable and rightmost variable simultaneously as per the requirement in each successive
step.
S ⇒ aSS (Rule: 1)
⇒ aSb (Rule: 2)
⇒ aaSSb (Rule: 1)
⇒ aaSaSSb (Rule: 1)
⇒ aabaSSb (Rule: 2)
⇒ aabaSbb (Rule: 2)
⇒ aababbb (Rule: 2)
To obtain the string ‘w’ with “mixed derivation”, follows “1211222” sequence rules.

Example 1: A grammar G which is context-free has the productions


S → aAB (Rule: 1)
A → Bba (Rule: 2)
B → bB (Rule: 3)
B→c (Rule: 4)
Compute the string w = ‘acbabc’ with left most derivation.
S ⇒ aAB (Rule: 1)
⇒ aBbaB (Rule: 2)
⇒ acbaB (Rule: 4)
⇒ acbabB (Rule: 3)
⇒ acbabc (Rule: 4)
Left most derivation Tree to obtain the string ‘w’ as follows;

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


7

Example 2: A grammar G which is context-free has the productions


S→a (Rule: 1)
S → aAS (Rule: 2)
A → bS (Rule: 3)
Compute the string w = ‘abaabaa’ with left most derivation.
S ⇒ aAS (Rule: 2)
⇒ abSS (Rule: 3)
⇒ abaS (Rule: 1)
⇒ abaaAS (Rule: 2)
⇒ abaabSS (Rule: 3)
⇒ abaabaS (Rule: 1)
⇒ abaabaa (Rule: 1)

Example 3: A grammar G = (N, T, P, S) with N = {E}, S = E, T = {id, +, *, c}


E→E+E (Rule: 1)
E→E*E (Rule: 2)
E→(E) (Rule: 3)
E → id (Rule: 4)
Obtain the derivation tree of expressions:
(i) id * id + id
(ii) (id + id) * id

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


8

Solution:
(i)
E⇒E+E (Rule: 1)
⇒E*E+E (Rule: 2 and left derivation)
⇒ id * id + id (Rule: 4 and left derivation)

(ii)

E⇒E*E (Rule: 2)
⇒ (E) * E (Rule: 3 and left derivation)
⇒ (E + E) * E (Rule: 1 and left derivation)
⇒ (id + id) * id (Rule: 4)

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


9

Grammar:
 Grammar is defined as a system of language rules that allows you to combine individual words
to make complex meanings. By applying grammar rules.
 A grammar consists of a number of productions.
 Each production has an abstract symbol called a nonterminal as its left-hand side, and a
sequence of one or more nonterminal and terminal symbols as its right-hand side.
 There are four categories in writing a grammar:

1. Regular Expression Vs Context Free Grammar


2. Eliminating ambiguous grammar.
3. Eliminating left-recursion (LR)
4. Left-factoring (LF).

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


10

Eliminating ambiguity:

Ambiguity:
 A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivation or more than one parse tree for the given input string. If the
grammar is not ambiguous, then it is called unambiguous.

 If the grammar has ambiguity, then it is not good for compiler construction.

Example:
Let us consider a grammar G with the production rule
1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost derivation:

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.

Eliminating ambiguity (Make it Unambiguous):

Re-writing grammar from ambiguous to unambiguous for that some elimination is required.

We can remove ambiguity we can follow two properties –

1. Precedence:
If different operators are used, we will consider the precedence of the operators. The three
important characteristics are :
 The production at higher levels will have operators with less priority.
 The production at lower levels will have operators with higher priority.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


11

2. Associativity:
If the same precedence operators are in production, then we will have to consider the associativity.

 If the associativity is left to right, then we have to prompt a left recursion in the production.
 If the associativity is right to left, then we have to prompt the right recursion in the
productions.

Example 1: Consider the ambiguous grammar and make it Unambiguous.


EE-E
Eid

The language in the grammar will contain { id, id-id, id-id-id, ….}

if we want to derive the string id-id-id.


Let’s consider a single value of id=3 to get more insights.
Apply left recursion The result should be :3-3-3 =-3.
Apply right recursion The result should be :3-3-3 =3

So, to make the above grammar unambiguous, in this right side of the production use random
variable P.The grammar becomes :

EE – P
EP
Pid
The above grammar is now unambiguous and will contain only one Parse Tree for the above
expression.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


12

Example 2: Consider the grammar

E -> E + E | E * E | id

Clearly, the above grammar is ambiguous for the string “id+id*id”

So, to make the above grammar unambiguous

E -> E + P // + is at higher level and left associative


E -> P
P -> P * Q // * is at lower level and left associative
P -> Q
Q -> id

What is Left Recursion (LR) and how it is eliminated?


A Grammar G (V, T, P, S) is left recursive if it has a production in the form.
A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position
on the right side of production. It can eliminate left recursion by replacing a pair of production with
A → βA′
A → αA′|ϵ

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


13

Example:
Elimination of Left Recursion of AABd | Aa | a
BBe | b
Solution:
A  aA′
A′BdA’| aA’ | ϵ
BbB’
B’eB’ | ϵ

The general form for left recursion is


A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A’ → α1A′|α2A′| … . . |αmA′|ε

Left factoring (LF):

Left factoring is a process by which the grammar with common prefixes is transformed to make it
useful for Top-down parsers.

Example: A → αβ1 / αβ2 / αβ3

Reasons for use Left factoring:

 This kind of grammar creates a problematic situation for Top-down parsers.

 Top-down parsers cannot decide which production must be chosen to parse the string in hand.
To remove this confusion, we use left factoring.

Rules for left factoring:

1. We make one production for each common prefixes.


2. The common prefix may be a terminal or a non-terminal or a combination of both.
3. Rest of the derivation is added by new productions.

The grammar obtained after the process of left factoring is called as Left Factored Grammar.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


14

Types of Parsers in Compiler Design:


The parser is that phase of the compiler which takes a token string as input and with the help of
existing grammar, converts it into the corresponding Intermediate Representation (IR). The parser
is also known as Syntax Analyzer.

The parser is mainly classified into two categories, i.e.


1.Top-down Parser
2.Bottom-up Parser.

1.Top-Down Parser:
 The top-down parser is the parser that generates parse tree for the given input string with the
help of grammar productions by expanding the non-terminals i.e. it starts from the start symbol
and ends with the terminals symbol
 It uses for left most derivation.

Example:
S -> aABe
A -> Abc | b
B -> d

Input:
abbcde$

S a A B e
SaAbcBe
S abbcde

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


15

Top-down parsers can be classified based on their approach to parsing as follows:

Recursive-descent parsers:

 Recursive-descent parsers are a type of top-down parser that uses a set of recursive
procedures to parse the input.

 Each non-terminal symbol in the grammar corresponds to a procedure that parses input for
that symbol.

Backtracking parsers:

 Backtracking parsers are a type of top-down parser that can handle non-deterministic grammar.

 When a parsing decision leads to a dead end, the parser can backtrack and try another
alternative.

 Backtracking parsers are not as efficient as other top-down parsers because they can potentially
explore many parsing paths.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


16

Example:
The following is a grammar rule:
S -> r P d
P -> m | m n
Input string: 'rmnd.'
Building the parse tree will start with the symbol S.

The next leaf of the parse tree 'P' has two production rules. Let's expand the symbol 'P' with the
given first production of 'P,' i.e., 'm' we get the following parse tree.

We get a mismatch when comparing the third input symbol 'n' against the next leaf labeled 'd'.
Therefore, the parser must return to the symbol 'P' and look for the other alternative.

The other alternative for the production of 'P' is 'mn .'Also, the input pointer must be back-tracked
and reset to the position where it was first before we expanded the symbol 'P .'It means the input
pointer must point to 'm' again.

Now we reached Input string: 'rmnd.' Using backtracking method.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


17

Non-backtracking parsers:

 Non-backtracking is a technique used in top-down parsing to ensure that the parser doesn’t
revisit already-explored paths in the parse tree during the parsing process.

 This is achieved by using a predictive parsing table that is constructed in advance and selecting
the appropriate production rule based on the top non-terminal symbol on the parser’s stack and
the next input symbol.

 By not backtracking, predictive parsers are more efficient than other types of top-down parsers,
although they may not be able to handle all grammar.

Predictive parsers:

 Predictive parsers are top-down parsers that use a parsing to predict which production rule to
apply based on the next input symbol.

 Predictive parsers are also called LL parsers because they construct a left-to-right, leftmost
derivation of the input string.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


18

Prime Requirement of LL(1)


The following are the prime requirements for the LL(1) parser:
 The grammar must have no left factoring and no left recursion.
 FIRST() & FOLLOW()
 Parsing Table
 Stack Implementation
 Parse Tree

FIRST() and FOLLOW() calculation on each non-terminal:


 First(): The first terminal symbol is referred to as the First(). if there is a variable, and we
attempt to derive all the strings from that variable.
Rule 1: For a production rule X → ∈,
First(X) = { ∈ }
Rule 2: For a production rule X →ab,
First(X) = { a }
Rule 3: For a production rule X →BC,
if First(B) does not contain ∈ First(X) = First(B)
if First(B) contains ∈ then First(A)=First(B) U First(C)
Example:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


19

 Follow(): The terminal symbol that follows a variable throughout the derivation process.
Rule 1: For the start symbol S,
Follow(S) = {$}
Rule 2: For any production rule A → αB
Follow(B) = Follow(A)
Rule 3: For any production rule A → αBβ,
Follow(B) = First(β) If Є ∉ First(β)
Follow(B) = { First(β) – ∈ } ∪ Follow(A) If Є ∈ First(β),

Example:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

FOLLOW Set
FOLLOW(E) = { $ , ) }
FOLLOW(E’) = FOLLOW(E) = { $, ) }
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = {+,$,)}
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T) = { *, +, $, ) }

Parsing Table making in LL(1):


Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id

First() and Follow () sets are

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


20

First Follow

E –> TE’ { id, ( } { $, ) }

E’ –> +TE’/ε { +, ε } { $, ) }

T –> FT’ { id, ( } { +, $, ) }

T’ –> *FT’/ε { *, ε } { +, $, ) }

F –> id/(E) { id, ( } { *, +, $, ) }

Now, the LL(1) Parsing Table is:

id + * ( ) $

E E –> TE’ E –> TE’

E’ E’ –> +TE’ E’ –> ε E’ –> ε

T T –> FT’ T –> FT’

T’ T’ –> ε T’ –> *FT’ T’ –> ε T’ –> ε

F F –> id F –> (E)

Checking Acceptance of String id + id * id using Predictive Parsing Program:

Stack Input Action

$E id + id * id $ E → TE′

$E′T id + id * id $ T → FT′

$E′T′F id + id * id $ F → id

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


21

Stack Input Action

$E′T′id id + id * id $ Remove id

$E′T′ +id * id $ T′ → ε

$E′ +id * id $ E′ → +TE′

$E′T + +id * id $ Remove +

$E′T id * id $ T → FT′

$E′T′F id * id $ F → id

$E′T′id id * id $ Remove id

$E′T′ * id $ T′ →* FT′

$E′T′F * * id $ Remove *

$E′T′F id $ F → id

$E′T′id id $ Remove id

$E′T′ $ T′ → ε

$E′ $ E′ → ε

$ $ Accept

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


22

Bottom-up Parser(Shift Reduce Parser):


 Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it
reaches the root node.
 Bottom-up Parsers / Shift Reduce Parsers Build the parse tree from leaves to root.
 It will follow rightmost derivation in reverse order.
Example:
S -> aABe
A -> Abc | b
B -> d
Input – abbcde$

abbcde$ (Ab)
aAbcde$ (AAbc)
aAde$ ( Bd)
aABe$ (SaABe)
S

Rightmost derivation reverse order


SaABe
SaAde
SaAbcde
Sabbcde

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


23

Classification of Bottom-up Parsers:

Shift-reduce parsing:
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-
step and reduce-step.

Shift step: This involves moving symbols from the input buffer onto the stack.
Reduce step : If the handle appears on top of the stack then, its reduction by using appropriate
production rule is done.

LR Parser:
 The LR parser is a non-recursive, shift-reduce, bottom-up parser.
 LR parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the
input stream; R stands for the construction of right-most derivation in reverse, and k denotes
the number of lookahead symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser:
1. SLR(1) – Simple LR Parser:

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


24

 Works on smallest class of grammar


 Few number of states, hence very small table
 Simple and fast construction
2. LR(1) – LR Parser:
 Works on complete set of LR(1) Grammar
 Generates large table and large number of states
 Slow construction
3. LALR(1) – Look-Ahead LR Parser:
 Works on intermediate size of grammar
 Number of states are same as in SLR(1)

Example 1 – Consider the grammar


S –> S + S
S –> S * S
S –> id
Perform Shift Reduce parsing for input string “id + id + id”.

Questions:
1. Explain the term -Syntax analysis.
2. What are the Important Terminologies Used in Syntax Analysis.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


25

3. Define Context-Free Grammar (CFG).


4. Explain the concept of derivation tree.
5. Explain in brief about leftmost, rightmost and mixed derivations.
6. Consider the grammar below-EE+E | E-E | E*E | E\E | a | b obtains leftmost and rightmost
derivation for the string a + b* a-b.
7. Consider Sa | ^ | (T)
TT, S | S
a) Draw parse trees for (a,a) and (a,(a,a))
b) Construct leftmost and rightmost derivation for (a,e) and (a,(a,e))
8. Difference between Regular Expression Vs Context Free Grammar.
9. Define ambiguous grammar. Check whether the grammar SaAB, AbC|cd, Ccd, Bc|d
is ambiguous or not.
10. Explain about the Eliminating ambiguous grammar.
11. What is Left Recursion (LR) and how it is eliminated for the following grammar
EE + T| T
TT*F |F
F (E) | id
12. Explain Left-factoring (LF) and how it is applied for the following grammar
Aab1|ab2|ab3

13. Explain various parsing techniques used in compiler.


14. What is the difference between top-down parser and bottom-up parser?
15. What is Backtracking? Show the Backtracking in the following given grammar
S -> r P d
P -> m | m n
Input string: 'rmnd.'
16. Explain non-recursive predictive parsers. Draw the block diagram of it.
17. List out the rules for FIRST and FOLLOW ? Construct FIRST and FOLLOW for the grammar
E E+T | T
T T * F | F
F(E) | id
18. Construct predictive parsing table for the the grammar
E E+T | T
T T * F | F
F(E) | id

Construct a Predictive Parsing table for the following grammar & also
check whether string
id + id * id is accepted or not.
19.

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT


26

What is Shift-reduce parsing? Consider the grammar


S –> S + S
S –> S * S
S –> id
Perform Shift Reduce parsing for input string “id + id + id”.
20.
21.
***

K. RAMANA REDDY ASSOC.PROFESSOR CMREC-CSE DEPT

You might also like