Parsing - 1
Parsing - 1
Parsing - 1
if
== = ;
b 0 a b
• Grammar
list → list + digit
| list – digit
| digit
digit → 0 | 1 | … | 9
terminal
• if A is the label of anode and x1, x2, …xn
are labels of the children of that node
then A x1 x2 … xn is a production in the
grammar 11
Example
Parse tree for 9-5+2
list
list + digit
list - digit 2
digit 5
9
12
Ambiguity
• A Grammar can have more than one
parse tree for a string
• Consider grammar
list list+ list
| list – list
|0|1|…|9
9 5 5 2
14
Ambiguity …
• Ambiguity is problematic because meaning
of the programs can be incorrect
• Ambiguity can be handled in several ways
– Enforce associativity and precedence
– Rewrite the grammar (cleanest way)
• There is no algorithm to convert
automatically any ambiguous grammar to
an unambiguous grammar accepting the
same language
• Worse, there are inherently ambiguous
languages! 15
Ambiguity in Programming Lang.
• Dangling else problem
stmt if expr stmt
| if expr stmt else stmt
• For this grammar, the string
if e1 if e2 then s1 else s2
has two parse trees
16
if e1
stmt
if e2
s1
else s2
if expr stmt else stmt
if e1 e1 if expr stmt s2
if e2
s1
else s2 e2 s1
stmt
if expr stmt
e2 s1 s2 17
Resolving dangling else problem
• General rule: match each else with the closest
previous unmatched if. The grammar can be
rewritten as
stmt matched-stmt
| unmatched-stmt
matched-stmt if expr matched-stmt
else matched-stmt
| others
unmatched-stmt if expr stmt
| if expr matched-stmt
else unmatched-stmt 18
Associativity
• If an operand has operator on both the
sides, the side on which operator takes this
operand is the associativity of that
operator
• In a+b+c b is taken by left +
• +, -, *, / are left associative
• ^, = are right associative
• Grammar to generate strings with right
associative operators
right letter = right | letter
letter a| b |…| z
19
Precedence
• String a+5*2 has two possible
interpretations because of two
different parse trees corresponding to
(a+5)*2 and a+(5*2)
• Precedence determines the correct
interpretation.
• Next, an example of how precedence
rules are encoded in a grammar
20
Precedence/Associativity in the
Grammar for Arithmetic Expressions
Ambiguous • Unambiguous,
with precedence
E E +E and associativity
| E*E rules honored
| (E) E E+T|T
| num | id T T*F|F
F ( E ) | num
3+2+5 | id
3+2*5 21
Parsing
• Process of determination whether a string
can be generated by a grammar
• Parsing falls in two categories:
– Top-down parsing:
Construction of the parse tree starts at the root
(from the start symbol) and proceeds towards
leaves (token or terminals)
– Bottom-up parsing:
Construction of the parse tree starts from the
leaf nodes (tokens or terminals of the grammar)
and proceeds towards root (start symbol)
22
Top down Parsing
• Following grammar generates types of
Pascal
type simple
| id
| array [ simple] of type
simple integer
| char
| num dotdot num
1
Example …
• Construction of a parse tree is done by starting
the root labeled by a start symbol
(Which node?)
2
• Parse
array [ num dotdot num ] of integer
3
array [ num dotdot num ] of integer
Start symbol
look-ahead Expand using the rule
type type array [ simple ] of type
For example :
First(simple) = {integer, char, num}
First(num dotdot num) = {num}
5
Define a procedure for each non terminal
procedure type;
if lookahead in {integer, char, num}
then simple
else if lookahead =
then begin match( );
match(id)
end
else if lookahead = array
then begin match(array);
match([);
simple;
match(]);
match(of);
type
end
else error;
6
procedure simple;
if lookahead = integer
then match(integer)
else if lookahead = char
then match(char)
else if lookahead = num
then begin match(num);
match(dotdot);
match(num)
end
else
error;
procedure match(t:token);
if lookahead = t
then lookahead = next token
else error; 7
Left recursion
• A top down parser with production
A A may loop forever
A R
R R|
8
Parse tree corresponding Parse tree corresponding
to a left recursive grammar to the modified grammar
A A
A R
A R
β α α β α Є
E E+T|T
T T * F |F
F ( E ) | id
E T E’
E’ + T E’ | Є
T F T’
T’ * F T’ | Є
F ( E ) | id
10
Removal of left recursion
In general
transforms to
S Aa | b
A Ac | Sd | Є
S Aa Sda
– Starting from the first rule and replacing all the occurrences of the first
non terminal symbol
12
Removal of left recursion due to
many productions …
• After the first step (substitute S by its rhs in the rules) the
grammar becomes
S Aa | b
A Ac | Aad | bd | Є
S Aa | b
A bdA' | A'
A' cA' | adA' | Є
13
Left factoring
• In top-down parsing when it is not clear which production to choose
for expansion of a symbol
defer the decision till we have seen enough input.
• Therefore A 1 | 2
transforms to
A A’
A’ 1 | 2
14
Dangling else problem again
Dangling else problem can be handled by left factoring
can be transformed to
15
Predictive parsers
• A non recursive top down parsing method
16
Predictive parsing
• Predictive parser can be implemented by
maintaining an external stack
input
Parse table is a
two dimensional array
M*X,a+ where “X” is a
stack
Parse
table
17
Example
• Consider the grammar
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
18
Parse table for the grammar
id + * ( ) $
E ETE’ ETE’
T TFT’ TFT’
F Fid F(E)
19
Parsing algorithm
• The parser considers 'X' the symbol on top of stack, and 'a' the
current input symbol
• Assume that '$' is a special token that is at the bottom of the stack
and terminates the input string
if X = a = $ then halt
if X is a non terminal
then if M[X,a] = {X UVW}
then begin pop(X); push(W,V,U)
end
else error
20
Example
Stack input action
$E id + id * id $ expand by ETE’
$E’T id + id * id $ expand by TFT’
$E’T’F id + id * id $ expand by Fid
$E’T’id id + id * id $ pop id and ip++
$E’T’ + id * id $ expand by T’Є
$E’ + id * id $ expand by E’+TE’
$E’T+ + id * id $ pop + and ip++
$E’T id * id $ expand by TFT’
21
Example …
Stack input action
$E’T’F id * id $ expand by Fid
$E’T’id id * id $ pop id and ip++
$E’T’ * id $ expand by T’*FT’
$E’T’F* * id $ pop * and ip++
$E’T’F id $ expand by Fid
$E’T’id id $ pop id and ip++
$E’T’ $ expand by T’Є
$E’ $ expand by E’Є
$ $ halt
22
Constructing parse table
• Table can be constructed if for every non terminal, every lookahead
symbol can be handled by at most one production
first follow
23
Compute first sets
• If X is a terminal symbol then First(X) = {X}
• If X is a non terminal
and X YlY2 … Yk is a production
then
if for some i, a is in First(Yi)
and Є is in all of First(Yj) (such that j<i)
then a is in First(X)
24
Example
• For the expression grammar
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
25
Compute follow sets
1. Place $ in follow(S)
2. If there is a production A → αBβ
then everything in first(β) (except ε) is in follow(B)
3. If there is a production A → αB
then everything in follow(A) is in follow(B)
Since follow sets are defined in terms of follow sets last two steps
have to be repeated until follow sets converge
26
Example
• For the expression grammar
E T E’
E' + T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
follow(E) = follow(E’) = , $, ) -
follow(T) = follow(T’) = , $, ), + -
follow(F) = { $, ), +, *}
27
Construction of parse table
• for each production A α do
– for each terminal ‘a’ in first(α)
M[A,a] = A α
– If Є is in First(α)
M[A,b] = A α
for each terminal b in follow(A)
28
Practice Assignment
• Construct LL(1) parse table for the expression grammar
bexpr bexpr or bterm | bterm
bterm bterm and bfactor | bfactor
bfactor not bfactor | ( bexpr ) | true | false
• Steps to be followed
– Remove left recursion
– Compute first sets
– Compute follow sets
– Construct the parse table
29
Error handling
• Stop at the first error and print a message
– Compiler writer friendly
– But not user friendly
– Error productions
– Global correction
30
Panic mode
• Simplest and the most popular method
31
Panic mode …
• Consider following code
begin
a = b + c;
x=pr;
h = x < 0;
end;
32
Phrase level recovery
• Make local correction to the input
33
Error productions
• Add erroneous constructs as productions in the grammar
34
Global corrections
• Considering the program as a whole find a correct
“nearby” program
35
Error Recovery in LL(1) parser
• Error occurs when a parse table entry M[A,a] is empty
36
Practice Assignment
• Reading assignment: Read about error
recovery in LL(1) parsers
• Assignment to be submitted:
– introduce synch symbols (using both follow
and first sets) in the parse table created for the
boolean expression grammar in the previous
assignment
– Parse “not (true and or false)” and show how
error recovery works
37