0% found this document useful (0 votes)
6 views70 pages

Compiler Design CS_4

The document outlines important attendance guidelines for students, emphasizing the need for active participation and response during class sessions. It provides a comprehensive overview of Context-Free Grammar (CFG), including its definition, importance, and examples of grammar rules for various languages. Additionally, it discusses derivations, parse trees, and the concept of ambiguity in grammar.

Uploaded by

aryanvarshney782
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views70 pages

Compiler Design CS_4

The document outlines important attendance guidelines for students, emphasizing the need for active participation and response during class sessions. It provides a comprehensive overview of Context-Free Grammar (CFG), including its definition, importance, and examples of grammar rules for various languages. Additionally, it discusses derivations, parse trees, and the concept of ambiguity in grammar.

Uploaded by

aryanvarshney782
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

1

2
IMP Note to Self
IMP Note to Students
 It is important to know that just login to the session does not
guarantee the attendance.
 Once you join the session, continue till the end to consider
you as present in the class.
 IMPORTANTLY, you need to make the class more interactive
by responding to Professors queries in the session.
 Whenever Professor calls your number / name ,you need
to respond, otherwise it will be considered as ABSENT
Objectives
To Understand
Context - Free Grammar
Derivations – LMD, RMD
Ambiguity
First and Follow Sets

5
SyntaxJustAnalysis & Parsing
to recapitulate the definitions:
1. Syn-tax
The way in which words are stringed together to form
phrases, clauses and sentences
2. Syntax analysis:
The task concerned with fitting a sequence of tokens
into a specified syntax.
3. Parsing:
To break a sentence down into its component parts of
speech with an explanation of the form, function, and
syntactical relationship of each part
.

6
Syntax Analysis
The step two of Front end in Compilation
Every programming Language has RULES that prescribe the
syntactic structure (i.e. the way in which words and phrases can
be organized to form sentences.)
These rules can be formally specified using :

CONTEXT FREE GRAMMARS


( or GRAMMAR for Short)

7
Intro to Context Free Grammar (CFG)
Essential formalism for describing the structure of the programs
in a programming Language
A recipe for constructing elements of a set of strings of symbols
( or tokens)
Consists of a set production rules and a start symbol
Each production rule:
– Defines a named syntactic construct using Two parts
1. An LHS (with a syntactic Construct) and
2. An RHS (showing the possible form of the construct)
– Connected by an Arrow symbol ( meaning “ is defined
as” )

sentence → subject predicate ‘.’ 8


Importance of the Grammar

Grammar offers significant advantages to both language


Designers and Compiler writers:
1. Gives precise, yet easy-to-understand, syntactic
specification of Programming Languages
2. Helps construct efficient parsers automatically ( not in all
cases – of course)
3. Provides a good structure to the language which in turn
helps generate correct object code
4. Makes language open ended easily amenable to add
additional constructs at a later date

9
Context Free Grammar – Formal Definition
A context-free grammar G = (T, N, S, P) consists of:
1. T, a set of terminals (scanner tokens - symbols that may not
appear on the left side of rule,).
2. N, a set of nonterminals (syntactic variables generated by
productions – symbols on left or right of rule).
3. S, a designated start nonterminal.
4. P, a set of productions (Rules). Each production has the
form, A::=  , where A is a nonterminal and  is a sentential
form , i.e., a string of zero or more grammar symbols
(terminals / nonterminals)

10
CFG – An Example
A grammar derives strings by
1. Beginning with the Start Symbol and
2. Repeatedly replacing a nonterminal by the right side of the
production rule for that nonterminal
Example:
expr  expr op expr
expr  ( expr )  The terminal symbols are
expr  – expr
id, num, (, ), +, –, *, /, 
expr  id | num
op  [ +, –, *, /,  ]  The non-terminal symbols
are expr op
 The start symbol is expr
11
Notational Conventions
1. These Symbols are terminals
1. Lower case letters early in the alphabet ( like a, b, c)
2. Operator symbols such as +, - etc..
3. Punctuation symbols like ,(comma), (, ), etc
4. The digits 0, 1, ………..9
5. Boldface strings such as if, id etc…
2. These Symbols are nonterminals
1. Uppercase letters early in the alphabet ( like A, B, C)
2. The letter S when it appears, is usually the Start
Symbol
3. Lower case letters late in the alphabet chiefly u, v, …. ……. z
represent string of terminals

………. Contd.
12
Notational Conventions
4. Uppercase letters late in the alphabet such as X, Y, Z
represent grammar symbols that is either terminal or
nonterminal
5. Lower case Greek letters α, β, γ represent strings of grammar
symbols
6. If A → α1 ,, A → α2 …….. A → αn are all Productions with A on
the LHS ( known as A-productions), then, A → α1 | α2 …| αn
(α1 , α2 … αn are alternatives of A)
7. Unless otherwise stated , the LHS of the first production is the
start nonterminal s

13
Use of Conventional Notation –
Example
Remember the previous Example:
expr  expr op expr
expr  ( expr )
expr  - expr
expr  id | num
op  [ +, -, *, /,  ]
Using the notations described so far, this grammar can be
written as:
E → E A E | ( E ) | id | num
A→+|–|*|↑

14
Grammar for a Tiny Language
program ::= statement ; Statement
statement ::= assignStmt | ifStmt
assignStmt ::= id = expr ;
ifStmt ::= if ( expr ) stmt
expr ::= id | int | expr + expr
Id ::= a | b | c | i | j | k | n | x | y | z
int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

15
Construct the CFG for the language having any
number of a's over the set ∑= {a}.

As we know the regular expression for the above language is


r.e. = a*

Production rule for the Regular expression is as follows:


S → aS rule 1
S→ε rule 2

Now if we want to derive a string "aaaaaa", we can start with


start symbols.
16
1. S
2. aS
3. aaS rule 1
4. aaaS rule 1
5. aaaaS rule 1
6. aaaaaS rule 1
7. aaaaaaS rule 1
8. aaaaaaε rule 2
9. aaaaaa

17
Example
Construct a CFG for a language L = {wcwR | where w € (a, b)*}.

The string that can be generated for a given language is {aacaa,


bcb, abcba, bacab, abbcbba, ....}

The grammar could be:


S → aSa rule 1
S → bSb rule 2
S→c rule 3

18
Now if we want to derive a string "abbcbba", we can start with
start symbols.
S → aSa
S → abSba from rule 2
S → abbSbba from rule 2
S → abbcbba from rule 3

19
Example :
Construct a CFG for the language L = anb2n where n>=1.

The string that can be generated for a given language is {abb,


aabbbb, aaabbbbbb....}.

The grammar could be:


S → aSbb | abb

20
Now if we want to derive a string "aabbbb", we can start with
start symbols.

S → aSbb
S → aabbbb

21
Regular Expressions and Grammars
EVERY construct that can be described by an RE can also be
described by a grammar
Example : RE ( a |b )* abb can be described as:
A0 → aA0 | bA0 | aA1 A1 → bA2 A2 → bA3 A3 → ε

Then why use RE for Lex & CFG for syntax analysis ?
– Lexical rules are frequently quite simple & we do not need
powerful grammars to describe them
– RE’s are more concise and easy to understand for tokens
– RE’s provide for automatic construction of efficient Lexical
Analyzers
22
Derivation
A way of showing how an input sentence is recognized with a
grammar Now let us derive a structure
Can also be thought of as a type of algebraicctr + ( a – 5)using
substitution /2
the grammar
Remember the grammar: expr
1. expr  expr op expr expr op expr
2. expr  ( expr ) expr + expr
3. expr  - expr Expr + expr op exr
4. expr  id | num expr + expr / expr
5. op  [ +, -, *, /,  ]
expr + expr / num
expr + (expr) / num
expr + (expr – expr) / num
id = id + ( id – num) / num
23
Derivations – Some definitions
Uses productions to derive more complex Expressions
The symbol:
– > denotes “ derives in one step”.
– *
– > denotes “ derives in zero or more steps”

denotes “ derives in one or more steps”
*
+
>β)
+
> (α >β)
Given the grammar G with start symbol S, a string ω is part of
>
* if S
the language L (G), If and only ω
>
Further, if S * α and α contains
Sentence of G
– Only terminals, then it is a –
Sentential form of G
– nonterminals too, then it is a –
24
Derivation – Another Example
Consider the following Grammar a simple ; terminated
expression consisting of interspersed numbers and +
1: stmt → expr ‘;’ Stmt > expr ;
2: expr → factor ‘ +’ factor
3: | factor > factor + factor ;
4: factor → number > Number + factor ;
 Let us try and come up with > Number + Number ;
a derivation to prove that an  The process is called a
input sentence (like 1 + 4 ;) Leftmost Derivation because
can be recognized from this the left most nonterminal is
grammar always replaced 25
Leftmost Derivation
Start with the topmost symbol in the grammar
Replace the leftmost nonterminal in the partially parsed sentence
with the equivalent
L production’s RHS in each step
The operator > signifies left derivation
L
X
> Y means X derives Y by leftmost Derivation
Each of the step (Partial Derivation) is called Viable Prefix
Portion of the viable prefix that is replaced at each step is called
the Handle

26
Leftmost Derivation Example

27
Leftmost Derivation Example

28
Rightmost Derivation
Replace the rightmost nonterminal in the partially parsed
R
sentence > may signify Right Derivation
The operator
R
X > Y means X derives Y by rightmost Derivation
It is possible to start with the start symbol and do rightmost
derivation

29
Rightmost Derivation – Another Example

30
Rightmost Derivation – Another Example

31
Parse Tree
WHAT ?
A graphical representation for a derivation
sequence showing how to derive the string of a
language from grammar starting from Start symbol
WHY ?
Each stage of the compiler has two purposes:
– Detect and filter out some class of errors
– Compute some new information or translate the
representation of the program to make things easier for later
stages
Recursive structure of tree suits recursive structure of language
definition
33
HOW ?
Parse Tree (Contd.)
Tree nodes represent symbols of the grammar
(nonterminals or terminals) and tree edges represent
derivation steps
Example:
Given the production: expr
expr  expr op expr
The corresponding Parse tree is :
It’s Properties are: expr op expr
1. The root is labeled by a start symbol
2. Each leaf is labeled by a token or by ε
3. Each interior node is labeled by a nonterminal
34
Parse Tree Example
Remember the grammar: The corresponding
expr  expr op
expr → expr + expr expr Parse tree is
 +( expr
expr→ expr expr/ expr
) expr
expr
expr  - expr
expr → expr + (expr) / expr
expr + expr
expr  id, num
expr → expr + (expr – expr) / expr (expr ) / Expr
Id
op  [ +, -, *, /,
id → id + ( id – num) / num
 ] (ctr)
expr – expr num
(2)
And the String derived from it
ctr + ( a – 5) / 2 Id num
Using the derivation (a) (5)
35
Parse Tree Example (Continued)
the Yield of the tree expr (ctr)
– The string derived or
generated from the expr + expr
nonterminal at the root of
the tree Id (expr ) / Expr
– Obtained by reading the (ctr)
leaves of the parse tree expr – expr num
read from left to right (2)
In this Example
Id num
(a) (5)
ctr = ctr + (a – 5) / 2
36
Parse Tree – Another Example

37
Another Derivation Example
Given the grammar : E  E + E | E  E | ( E ) | - E | num
Find a derivation for the expression: 6 + 2  4
E E E E

E + E E + E E + E
6 + (2 * 4 ) = 14 
E E 6 E  E

Which derivation tree is correct? 2 4


E E E E

E  E E  E E  E

(6 + 2) * 4 = 32 E + E E + E 4

6 2
According
38 to above grammar BOTH are correct!!!
Another Derivation Example
Given the grammar : E  E + E | E  E | ( E ) | - E | id
Find a derivation for the expression: 6 + 2  4
E E E E

E + E E + E E + E
6 + (2 * 4 ) = 14 
E E 6 E  E

Which derivation tree is correct? 2 4


E E E E

E  E E  E E  E

(6 + 2) * 4 = 32 E + E E + E 4

6 2
According
39 to above grammar BOTH are correct!!!
Ambiguity in Grammar
A grammar that produces more than one parse tree for any
input sentence
– Happens because the derivation can be either
Leftmost or Rightmost
Can also happen because of the way the grammar is defined
Classic example — the if-then-else problem
– Stmt → if Expr then Stmt
– | if Expr then Stmt else Stmt
– | … other stmts …

40
Stmt → If Expr then Stmt
| if Expr then Stmt else Stmt
Ambiguous IF | … other stmts
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2

41
Stmt → If Expr then Stmt
| if Expr then Stmt else Stmt
Ambiguous IF
This sentential form has two derivations
| … other stmts

if Expr1 then if Expr2 then Stmt1 else Stmt2

42
Check whether the given grammar G is ambiguous or not.
1. S → aSb | SS
2. S → ε

Check whether the given grammar G is ambiguous or not.


1.A → AA
2.A → (A)
3.A → a

43
44
45
Recursion In Grammars
Very Common in grammars
Generally 3 types of recursions are Possible:
1. A ::= Ab (Left Recursion)
2. B ::= cB (Right Recursion)
3. C ::= dCf (Middle Recursion or Self-embedding)
Facts:
If a grammar contains no middle recursion then the language
it generates is regular.
If there is no recursion at all in a grammar, then it is finite and
therefore regular.

46
Left Recursion
Consider E  E + T | T A top-down parser might loop
the T  T  F | F forever when parsing an
grammar: F  ( E ) | id expression using this grammar
E E E E

E + T E + T E + T

E + T E + T

E + T

 A grammar that has at least one production of the form


A  A is a left recursive grammar
Top Down parsing cannot handle left-recursive grammars
47
Eliminating left Recursion
First group the A productions as A A1 |
A2| …..|Am|1 |2 |.. n |
Where no i begins with an A.
Then replace the A-productions by A 1 A’|2 A’|……n A’|
A’ 1 A’| 2 A’|… n A’| 
This eliminates all immediate left recursions from A and A’
productions
BUT
It does not eliminate left recursions involving derivations of two or
more steps

48
Left Recursion ( Contd.)
Left-recursion can often be eliminated by rewriting the grammar
EE+T|T
This left-recursive grammar: T  T  F | F
F  ( E ) | id
Can be re-written to eliminate the immediate left recursion:
E  TE’
E’  +TE’ | 
T  FT’
T’  FT’ | 
F  ( E ) | id 49
Properties of Grammars – A Summary
Epsilon Productions
– Has at least one production of the form “E → ε ”,
Right( / Left) Linear Grammar:
– Grammars where all the RHS of all productions has at most
one non-terminal and that nonterminal appears rightmost
(/ Leftmost.)
– E → x E” or “E → x”.
Left-Recursive Grammar:
– A grammar that can generate “E → E + X” (for example).
Similarly, “Right-recursive grammars.”
Ambiguous Grammar
– More than one parse tree is possible for a specific sentence.
50
Limitations of CFG
CFGs cannot express context conditions
For Example:
1. Every name must be declared before it is used
2. The operands of an expression must have compatible
types
Possible solutions
Use context-sensitive grammars
– Too complicated
Check context conditions during semantic analysis

51
Syntax Analysis
Problem Statement:
To find a derivation sequence in a grammar G for the input
token stream
or
To conclude that none exists

Solution
 Build parse tree for the string ( of tokens) based
on grammar rules
 The process is known as P A R S I N G
52
PARSING
Constructs the syntax tree for a given sequence of tokens
using grammar rules and productions such that:
1. The top node is labeled with the start symbol of the
grammar
2. Leaf nodes are labeled with terminals and inner nodes
with nonterminals
3. the children of the inner node labeled N correspond to
the members of an alternative of N, in the same order
as they occur in that alternative
4. The terminals labeling the leaf nodes correspond to the
sequence of tokens, in the same order they occur in
the input

………. Contd.
53
The Role of a Parser
Lexical Rest of the
PARSER
analysis front end
Symbol Table
The parser continues the process of translation and validation
started by the Lexical Analyzer:
1. Analyzing the phrase structure of the program,
2. Adding hierarchical (tree) structure to the flat stream of
AND by the lexical Analyzer,
tokens produced
3. Outputting a data structure conveying the program's
syntax to subsequent compile phases,
4. Reporting errors when the program does not match the
syntactic structure of the source language

54
FIRST & FOLLOW
The two functions represent a set of terminal symbols,
that aid us in constructing Predictive parser by helping us fill
the Parsing Table with valid entries
Set of terminal symbols yielded by FOLLOW function can also be
used in error recovery
FIRST – if α is the string of grammar symbols then FIRST (α) is
the set of terminals that begin the strings derived from α
(If α *> ε , then ε is also in FIRST (α))
FOLLOW (A) – the set of terminals a that can appear
immediately to the right of A such that there exists a derivation
of the form
S *> α A a β for some α and β
55
FIRST & FOLLOW (Contd.)

The set of terminal symbols ( including ε )that can appear at the


far left of any parse tree derived from a particular nonterminal
is the FIRST set of that nonterminal
The set of terminal symbols ( including ε ) That can follow a
nonterminal in some derivation or the other, is called the
FOLLOW set of that nonterminal
A set of rules are followed to compute the FIRST and FOLLOW
set
These sets will be used in creating Parsing table

56
Rules to Create FIRST
FIRST rules: GRAMMAR:
1. If X is a terminal, FIRST(X) = {X}
2. If X   , then   FIRST(X)
E  TE’
3. If X  aABC , then a  FIRST(X)
E’  +TE’ | 
T  FT’
4. If X →ABCD, then FIRST (A)  FIRST (X).
4a. Further, If A   , in the above production, then
T’  FT’ | 
FIRST (B)  FIRST (X)........ [& so on recursively]
F  ( E ) | id

FIRST(id) = {id} FIRST(E’) = {}{+, }


FIRST() = {} FIRST(T’) = {}{, }
SETS: FIRST(+) = {+} FIRST(F) = {(, id}
FIRST(() = {(} FIRST(T) = FIRST(F) = {(, id}
57 FIRST()) = {)} FIRST(E) = FIRST(T) = {(, id}
FIRST(E’) = {+, }
FIRST(T’) = { , }
FIRST(F) = {(, id} Rules to Create FOLLOW
FOLLOW rules:
FIRST(T) = {(, id}
FIRST(E) = {(, id} 1. If S is the start symbol, then $  FOLLOW(S)

GRAMMAR: 2. If there is a production A  B, then,


E  TE’ EVERYTHING in FIRST() except  is placed
E’  +TE’ |  in FOLLOW(B)
T  FT’
3. If there is a production A  B then,
T’  FT’ | 
EVERYTHING in FOLLOW(A) is in FOLLOW(B)
F  ( E ) | id
SETS:
FOLLOW(E) = {$} { ), $} 3a. If there is a production A  B where
FOLLOW(E’) = { ), $} FIRST() contains  ( i.e.  
*  ) then,

FOLLOW(T) = { ), $} {+, ), $} EVERYTHING in FOLLOW(A) is in FOLLOW(B)


FOLLOW(T’) = { +, ), $}
FOLLOW(F) = {+, ), $} A and B are non-terminals,
58 {+, , ), $}  and  are strings of grammar symbols
Find First and Follow
• CFG
S -> E F
E -> g | ∈
F -> f | ∈

• CFG
S -> E F G
E -> g | o
F -> f | ∈
G -> k | l

59
60
Example of first and follow function

How to find the first and follow functions for the given CFG?
S → aXZh
X → cY
Y → bY / ∈
Z → EF
E→g/∈
F→f/∈

61
First Functions
•First(S) = { a }
•First(X) = { c }
•First(Y) = { b , ∈ }
•First(Z) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }
•First(E) = { g , ∈ }
•First(F) = { f , ∈ }
Follow Functions
•Follow(S) = { $ }
•Follow(X) = { First(Z) – ∈ } ∪ First(h) = { g , f , h }
•Follow(Y) = Follow(X) = { g , f , h }
•Follow(Z) = First(h) = { h }
•Follow(E) = { First(F) – ∈ } ∪ Follow(Z) = { f , h }
•Follow(F) = Follow(Z) = { h }

62
Example of first and follow function with Left
Recursion

How to find the first and follow functions for the given
CFG with Left Recursive production rules.?
S→X
X → aY / Xd
Y→b
C→t

63
The given grammar is left recursive. So, we
first remove the left recursion from the given
grammar.
After the elimination of left recursion, we get
the following grammar.
S→X
X→ aYX’
X’ → dX’ / ∈
Y→b
C→t

64
The first and follow functions are described below;
First Functions
•First(S) = First(X) = { a }
•First(X) = { a }
•First(X’) = { d , ∈ }
•First(Y) = { b }
•First(C) = { t }
Follow Functions-
•Follow(S) = { $ }
•Follow(X) = Follow(S) = { $ }
•Follow(X’) = Follow(X) = { $ }
•Follow(Y) = { First(A’) – ∈ } ∪ Follow(A) = { d , $ }
•Follow(C) = NA

65
Example of first and follow function
with Left Recursion
How to find the first and follow functions for
the given CFG with Left Recursive production
rules.?
X→ X + Y / Y
Y→YdF/F
F → (X) / id

66
Solution-
The given grammar is left recursive. So, we
first remove the left recursion from the given
grammar.
After the elimination of left recursion, we get
the following grammar.
X → YX’
X’ → + YX’ / ∈
Y → FY’
Y’ → d FY’ / ∈
F → (X) / id

67
The first and follow functions are described below;
First Functions
•First(X) = First(Y) = First(F) = { ( , id }
•First(X’) = { + , ∈ }
•First(Y) = First(F) = { ( , id }
•First(Y’) = { d , ∈ }
•First(F) = { ( , id }
Follow Functions
•Follow(X) = { $ , ) }
•Follow(X’) = Follow(E) = { $ , ) }
•Follow(Y) = { First(X’) – ∈ } ∪ Follow(X) ∪ Follow(X’)
={+,$,)}
•Follow(Y’) = Follow(Y) = { + , $ , ) }
•Follow(F) = { First(Y’) – ∈ } ∪ Follow(Y) ∪ Follow(Y’)
={d,+,$,)}
68
References

https://www.javatpoint.com/context-free-grammar

https://t4tutorials.com/rules-of-first-and-follow-in-
predictive-parsing/

69
IMP Note to Self
Thank you

71

You might also like