Parsers 1
Parsers 1
Parsers 1
Bottom-up Parsing
Bottom-up parsing corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom nodes) and working up towards the root (the top node). It
involves “reducing an input string ‘w’ to the Start Symbol of the grammar. in each reduction
step, a perticular substring matching the right side of the production is replaced by symbol on the
left of that production and it is the Right most derivation. For example consider the following
Grammar:
E E+T|T
T T*F
F (E)|id
1. Shift-Reduce Parsing
2. Operator precedence parsing
3. Table Driven L R Parsing
i. LR ( 1 )
ii. SLR( 1 )
iii. CLR ( 1 )
iv. LALR( 1 )
Shift-Reduce Parsing
When we are parsing the given input string, if the match occurs the parser takes the
reduce action otherwise it will goes for shift action. And it can accept ambiguous grammars also.
Ex; consider the below grammar to accept the input string “id * id “using S-R parser
E E+T|T
T T*F
F (E)|id
S aAcBe
A Ab|b
B d
Let the input string is “abbcde”. The series of shift and reductions to the start symbol are
as follows.
abbcde aAbcde aAcde aAcBe S
Note: in the above example there are two actions possible in the second Step, these are as
follows:
1. Shift action going to 3rd Step
2. Reduce action, that is A->b
If the parser is taking the 1st action then it can successfully accepts the given input string,
if it is going for second action then it can’t accept given input string. This is called shift reduce
conflict. Where, S-R parser is not able take proper decision, so it not recommended for parsing.
1. Identification of Correct handles in the reduction step, such that the given input should be
reduced to starting symbol of the grammar.
2. Identification of which production to use for reducing in the reduction steps, such that we
should correctly reduce the given input to the starting symbol of the grammar.
Operator precedence parser consists of:
1. An input buffer that contains string to be parsed followed by a$, a symbol used to
indicate the ending of input.
2. A stack containing a sequence of grammar symbols with a $ at the bottom of the stack.
3. An operator precedence relation table O, containing the precedence ralations between the
pair of terminal. There are three kinds of precedence relations will exist between the pair
of terminal pair ‘a’ and ‘b’ as follows:
1. The relation a<•b implies that he terminal ‘a’ has lower precedence than terminal
‘b’.
2. The relation a•>b implies that he terminal ‘a’ has higher precedence than terminal
‘b’.
3. The relation a=•b implies that he terminal ‘a’ has lower precedence than terminal
‘b’.
4. An operator precedence parsing program takes an input string and determines whether it
conforms to the grammar specifications. It uses an operator precedence parse table and
stack to arrive at the decision.
Operator precedence
algorithm Parsing Output
Algorithm
$
Stack
E E+E
E E-E
E E*E
E E/E
E E^E
E -E
E (E)
E id construct operator precedence table and accept input string “
id+id*id”
+ - * / ^ id ( ) $
+ •> •> <• <• <• <• <• •> •>
- •> •> <• <• <• <• <• •> •>
* •> •> •> •> <• <• <• •> •>
/ •> •> •> •> <• <• <• •> •>
^ •> •> •> •> <• <• <• •> •>
Id •> •> •> •> •> Err Err •> •>
( <• <• <• <• <• <• <• = Err
) •> •> •> •> •> Err Err •> •>
$ <• <• <• <• <• <• <• Err Err
The intention of the precedence relations is to delimit the handle of the given input String
with <• marking the left end of the Handle and •> marking the right end of the handle.
Parsing Action
E E+E
E E*E Explain the parsing Actions?
E id
The first handle is ‘id’ and match for the ‘id ‘in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
2. $ <• E •> *<• id•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So , the string becomes
3. $ <• *<• id•> $
The next handle is ‘id’ and match for the ‘id ‘in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
4. $ <• *<• E•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes
5. $ <• * •> $
The next handle is ‘*’ and match for the ‘ ‘in the grammar is E E*E.
So, id is replaced with the Non terminal E. the given input string can be
written as
6. $ E $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes
7. $ $
$ On $ means parsing successful.
The operator precedence Parser parsing program determines the action of the parser
depending on
Ex: the sequence of actions taken by the parser using the stack for the input string “id * id “
E * E
id id
Fig: parse tree for “id * id” using operator precedence parser
1. It is difficult to handle the operator like ‘-‘which can be either unary or binary and hence
different precedence’s and associativities.
LR Parsing
Most prevalent type of bottom up parsing is LR (k) parsing. Where, L is left to right scan
of the given input string, R is Right Most derivation in reverse and K is no of input symbols as
the Look ahead.
It is the most general non back tracking shift reduce parsing method
The class of grammars that can be parsed using the LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.
OUTPUT
LR PARSING ALGORTHM
Shift GOTO
Stack LR Parsing Table
An input buffer that contains the string to be parsed followed by a $ Symbol, used to
indicate end of input.
A stack containing a sequence of grammar symbols with a $ at the bottom of the stack,
which initially contains the Initial state of the parsing table on top of $.
A parsing table (M), it is a two dimensional array M[ state, terminal or Non terminal] and
it contains two parts
1. Action part
The ACTION part of the table is a two dimensional array indexed by state and the
input symbol, i.e. action [state][input], An action table entry can have one of
following four kinds of values in it. They are:
1. Shift X, where X is a State number.
4. Error entry.
2. GO TO Part
The GO TO part of the table is a two dimensional array indexed by state and a
Non terminal, i.e. GO TO [state][Non terminal]. A GO TO entry has a state
number in the table.
A parsing Algorithm uses the current State X, the next input symbol ‘a’ to consult the
entry at action[X][a]. it makes one of the four following actions as given below:
1. If the action[X][a]=shift Y, the parser executes a shift of Y on to the top of the stack
and advances the input pointer.
2. If the action[X][a]= reduce Y (Y is the production number reduced in the State X), if
the production is Y->β, then the parser pops 2*β symbols from the stack and push Y
on to the Stack.
3. If the action[X][a]= accept, then the parsing is successful and the input string is
accepted.
4. If the action[X][a]= error, then the parser has discovered an error and calls the error
routine.
The parsing is classified in to
1. LR ( 1 )
2. Simple LR ( 1 )
3. Canonical LR ( 1 )
4. Look ahead LR ( 1 )
LR (1) Parsing
1. Write the Context free Grammar for the given input string
5. Draw DFA
7. Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.
Augment Grammar
The Augment Grammar G`, is G with a new starting symbol S` an additional production
S` S. this helps the parser to identify when to stop the parsing and announce the acceptance of
the input. The input string is accepted if and only if the parser is about to reduce by S` S. For
example let us consider the Grammar below:
E E+T|T
T T*F
F (E) | id the Augment grammar G` is Represented by
E` E
E E+T|T
T T*F
F (E) | id
NOTE: Augment Grammar is simply adding one extra production by preserving the actual
meaning of the given Grammar G.
Canonical collection of LR (0) items
LR (0) items
An LR (0) item of a Grammar is a production G with dot at some position on the right
side of the production. An item indicates how much of the input has been scanned up to a given
point in the process of parsing. For example, if the Production is X YZ then, The LR (0)
items are:
1. X •AB, indicates that the parser expects a string derivable from AB.
2. X A•B, indicates that the parser has scanned the string derivable from the A and
expecting the string from Y.
3. X AB•, indicates that he parser has scanned the string derivable from AB.
Canonical collection
This is the process of grouping the LR (0) items together based on the closure and Go to
operations
Closure operation
If I is an initial State, then the Closure (I) is constructed as follows:
1. Initially, add Augment Production to the state and check for the • symbol in the Right
hand side production, if the • is followed by a Non terminal then Add Productions
which are Stating with that Non Terminal in the State I.
2. If a production X α•Aβ is in I, then add Production which are starting with X in the
State I. Rule 2 is applied until no more productions added to the State I( meaning that
the • is followed by a Terminal symbol).
Ex:
0. E` E E` •E
1. E E+T LR (0) items for the Grammar is E •E+T
2. T F T •F
3. T T*F T •T*F
4. F (E) F • (E)
5. F id F • id
Closure (I0) State
Add E ` • E in I0 State
Since, the ‘•’ symbol in the Right hand side production is followed by A Non
terminal E. So, add productions starting with E in to Io state. So, the state
becomes
0. E ` •E
1. E •E+T
2. T •F
The 1st and 2nd productions are satisfies the 2nd rule. So, add productions
which are starting with E and T in I0
Note: once productions are added in the state the same production should
not added for the 2nd time in the same state. So, the state becomes
0. E` •E
1. E •E+T
2. T •F
3. T •T*F
4. F •(E)
5. F • id
Go to Operation
Go to (I0, X), where I0 is set of items and X is the grammar Symbol on which we
are moving the ‘•’ symbol. It is like finding the next state of the NFA for a give State I0 and the
input symbol is X. For example, if the production is E •E+T
Note: Once we complete the Go to operation, we need to compute closure operation for the
output production
E`->.E E E`-> E.
E->.E+T
E-> E.+T
T-> .T*F
Construction of LR (1) parsing Table:
Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:
If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then,
we should write the shift entry in the action part as shown below:
a $ A
Ii Sj
Ii Ij
Ij
If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:
a $ A
Ii j
Ii Ij
Ij
If there is one state (Ii), where there is one production which has no transitions. Then, the
production is said to be a reduced production. These productions should have reduced entry in
the Action part along with their production numbers. If the Augment production is reducing then,
write accept in the Action part.
States ACTION GO TO
1 A->αβ• a $ A
Ii r1 r1
Ii
Ii
Ex: Construct the LR (1) parsing Table for the given Grammar (G)
S aB
B bB | b
Sol: 1. Add Augment Production and insert ‘•’ symbol at the first position for every
production in G
0. S` •S
1. S •aB
2. B •bB
3. B •b
I0 State:
1. Add Augment production to the I0 State and Compute the Closure
I0 = Closure ( S` •S)
Since ‘•’ is followed by the Non terminal, add all productions starting with
S in to I0 State. So, the I0 State becomes
I0 = S` •S
S •aB
I1= S` S•
I2= Go to ( I0, a) = closure (S a•B)
Here, the ‘•’ symbol is followed by The Non terminal B. So, add
the productions which are Starting B.
I2 = B •bB
B •b
Here, the ‘•’ symbol in the B production is followed by he terminal value. So,
Close the State.
I2= S a•B
B •bB
B •b
B •bB
B •b
The Dot Symbol is followed by the terminal value. So, close the State.
I4= B b•B
B •bB
B •b
B b•
I7 = Go to ( I4 , b) = I4
Drawing DFA:
S->aB•
S`->S•
S I3
I1 B
S->•S
S->•aB
B->b•B
B
a b B->•bB
S->a•B
B->bB•
I0 B->•bB B->•b
B->•b B->b•
b
I5
I4
I2 I4
LR Table:
ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1 R1 R1
I4 R3 S4/R3 R3 5
I5 R2 R2 R2
Note: if there are multiple entries in the LR (1) parsing table, then it will not accepted by the
LR(1) parser. In the above table I3 row is giving two entries for the single terminal value ‘b’ and
it is called as Shift- Reduce conflict.
Shift-Reduce Conflict in LR (1) Parsing
Shift Reduce Conflict in the LR (1) parsing occurs when a state has
1. A Reduced item of the form A α• and
2. An incomplete item of the form A β•aα as shown below:
Ii Sj/r2 r2
Ii
Ij
Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• as shown below:
Ii Ij
1. Write the Context free Grammar for the given input string
5. Draw DFA
7. Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.
Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:
If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then,
we should write the shift entry in the action part as shown below:
a $ A
Ii Sj
Ii Ij
Ij
If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:
A States ACTION GO TO
A->α•Aβ A->αA•β a $ A
Ii j
Ii Ij
Ij
1 If there is one state (Ii), where there is one production (A->αβ•) which has no transitions
to the next State. Then, the production is said to be a reduced production. For all
terminals X in FOLLOW (A), write the reduce entry along with their production
numbers. If the Augment production is reducing then write accept.
1 S -> •aAb
2 A->αβ•
Follow(S) = {$}
Follow (A) = (b}
States ACTION GO TO
2 A->αβ• a b $ S A
Ii r2
Ii
Ii
S aB
B bB | b
ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1
I4 S4 R3 5
I5 R2
Note: When Multiple Entries occurs in the SLR table. Then, the grammar is not accepted by
SLR(1) Parser.
Conflicts in the SLR (1) Parsing
When multiple entries occur in the table. Then, the situation is said to be a Conflict.
Shift Reduce Conflict in the LR (1) parsing occurs when a state has
1. A Reduced item of the form A α• and Follow(A) includes the terminal value
‘a’.
2. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α
States Action GOTO
a
2 B->b• Ij a $ A B
Ii Sj/r2
Ii
Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• and Follow (A) ∩ Follow(B) ≠ null as shown below:
If The Grammar is
S-> αAaBa
A-> α
B-> β
Follow(S)= {$}
Follow(A)={a} and Follow(B)= {a}
2 B->β• a $ A B
Ii r1/r2
Ii
Ij
Canonical LR (1) Parsing
1. Write the Context free Grammar for the given input string
5. Draw DFA
7. Based on the information from the Table, with help of Stack and Parsing
algorithm generate the output.
LR (1) item
The LR (1) item is defined by production, position of data and a terminal symbol. The
terminal is called as look ahead symbol.
I0 State:
Add Augment production and compute the Closure, the look ahead symbol for the Augment
Production is $.
S`->•S, $= Closure(S`->•S, $)
S->•CC, $
C->•cC, FIRST(C, $)
C->•d, FIRST(C, $)
C->•cC, c/d
C->•d, c/d
The dot symbol is followed by a terminal value. So, close the I0 State. So, the
productions in the I0 are
S`->•S , $
S->•CC , $
C->•cC, c/d
C->•d , c/d
S-> C->•cC , $
C->•d,$ So, the I2 State is
S->C•C,$
C->•cC , $
C->•d,$
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
C->c•C , $
C->•cC , $
C->•d,$
S`->S•,$
S->CC•, $
S I1 C I5 C->cC• , $
d
C->c•C, c/d C->d•, $
C->d•, C->•cC, c/d
c/d C->•d , c/d
C I7
I4
d I3 c
C->cC•,
I4 I3 c/d
I8
Construction of CLR (1) Table
Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.
ACTION GOTO
States
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
The CLR Parser avoids the conflicts in the parse table. But it produces more number of
States when compared to SLR parser. Hence more space is occupied by the table in the memory.
So LALR parsing can be used. Here, the tables obtained are smaller than CLR parse table. But it
also as efficient as CLR parser. Here LR (1) items that have same productions but different look-
aheads are combined to form a single set of items.
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I47.
Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
ACTION GOTO
States
c D $ S C
I0 S36 S47 1 2
I1 ACCEPT
I2 S36 S47 5
I36 S36 S47 89
I47 R3 R3 R3 5
I5 R1
I89 R2 R2 R2
When multiple entries occur in the table. Then, the situation is said to be a Conflict.
Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α ,
States Action GOTO
$ a
Ij a $ A B
2 B->b• ,a
Ii Sj/r2
Ii
Ij
Reduce - Reduce Conflict in CLR (1) Parsing
Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B β• If two productions in a state (I) reducing on same look ahead symbol
as shown below:
1 A-> α• ,a
States Action GOTO
2 B->β•,a
a $ A B
Ii r1/r2
Ii
Ij
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
ACTION GOTO
States
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop
2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6
$0c3C8 d$ Reduce with R2 ,C->cC, pop
2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop
2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R1,S->CC, pop
2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept
Handing Ambiguous grammar
Ambiguity
. A Grammar can have more than one parse tree for a string
Consider grammar
| string - string
|0|1|.|9
A grammar is said to be an ambiguous grammar if there is some string that it can generate in
more than one way (i.e., the string has more than one parse tree or more than one leftmost
derivation). A language is inherently ambiguous if it can only be generated by ambiguous
grammars.
| string - string
|0|1|.|9
In this grammar, the string 9-5+2 has two possible parse trees as shown in the next slide.
Consider the parse trees for string 9-5+2, expression like this has more than one parse tree. The
two trees for 9-5+2 correspond to the two ways of parenthesizing the expression: (9-5)+2 and 9-
(5+2). The second parenthesization gives the expression the value 2 instead of 6.
Ambiguity is problematic because meaning of the programs can be incorrect
Ambiguity is harmful to the intent of the program. The input might be deciphered in a way which
was not really the intention of the programmer, as shown above in the 9-5+2 example. Though
there is no general technique to handle ambiguity i.e., it is not possible to develop some feature
which automatically identifies and removes ambiguity from any grammar. However, it can be
removed, broadly speaking, in the following possible ways:-
2) Implementing precedence and associatively rules in the grammar. We shall discuss this
technique in the later slides.
If an operand has operator on both the sides, the side on which operator takes this operand is the
associativity of that operator
. Grammar to generate strings with right associative operators right à letter = right | letter letter
a| b |.| z
A binary operation * on a set S that does not satisfy the associative law is called non-
associative. A left-associative operation is a non-associative operation that is conventionally
evaluated from left to right i.e., operand is taken by the operator on the left side.
For example,
For example,
Following is the grammar to generate strings with left associative operators. (Note that this is left
recursive and may go into infinite loop. But we will handle this problem later on by making it
right recursive)
letter a | b | .... | z
Important Questions:
1. Explain the Operator precedence parsing?
2. Explain the LR (1) Parsing?
3. Write the differences between canonical collection of LR (0) items and LR (1) items?
4. Write the Difference between CLR (1) and LALR(1) parsing?
5. Explain YACC Parser?
Assignment Questions:
Multiple choices:
21. Which of the following is the most general phase structured grammar
A. Context free
B. Right linear
C. Left linear
D. Context sensitive
A. R B. LR C.LR(1) D.L
25. In LR(1)___________ means the right most derivation is used to construct the parse
tree.
1. Predictive parser
2. Backus naus form
3. 6
4. Bottom up parsers
5. Grammar
6. [SAa,b,D]
7. If the left hand side of a production is a single terminal
8. 0
9. Only push
10. Yet Another Compiler Compiler
11. 3
12. If the left hand side of a production is a single terminal
13. [S.Aab]>
14. Canonical LR
15. Pushes a token and also advances the input
16. Derivation
17. Either push or pop
18. L
19. Context free
20. Canonical LR
21. A sequence of tokens
22. compiler
23. [SAa.b,D]
24. Handle
25. R
26. SLR, CLR, LALR