UNIT-4 Parsing Techniques
UNIT-4 Parsing Techniques
UNIT-4 Parsing Techniques
PARSING TECHNIQUES
Bottom-Up Parsing:
• A bottom-up parser creates the parse tree of the given input starting from leaves
towards the root.
• A bottom-up parser tries to find the right-most derivation of the given input in the
reverse order.
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most derivation in the
reverse order)
• Bottom-up parsing is also known as shift-reduce parsing because its two main
actions are shift and reduce.
– At each shift action, the current symbol in the input string is pushed to a
stack.
– At each reduction step, the symbols at the top of the stack (this symbol
sequence is the right side of a production) will replaced by the non-
terminal at the left side of that production.
– There are also two more actions: accept and error.
Shift-Reduce Parsing:
• A shift-reduce parser tries to reduce the given input string into the starting
symbol.
• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production
rule.
• If the substring is chosen correctly, the right most derivation of that string is
created in the reverse order.
Rightmost Derivation: S⇒ω
Shift-Reduce Parser finds: ω ⇐ ... ⇐ S
Example:
S → aABb input string: aaabb
A → aA | a aaAbb
B → bB | b aAbb ⇓ reduction
aABb
S
Handle:
• Informally, a handle of a string is a substring that matches the right side of a
production rule.
– But not every substring matches the right side of a production rule is
handle
S ⇒ αAω ⇒ ωβα
Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning.
• Start from γn, find a handle Anβ→n in γn, and replace βn in by An to get γn-1.
• Then find a handle An-1β→n-1 in γn-1,and replace βn-1 in by An-1 to get γn-2.
• Repeat this, until we reach S.
A Shift-Reduce Parser:
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery routine.
Shift-Reduce Parsers:
• There are two main categories of shift-reduce parsers
1. Operator-Precedence Parser
– simple, but only a small class of grammars.
2. LR-Parsers
– covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookahead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different.
Equation (2) implies that the sub expression B * C is to be computed before either of the
other operations in the expression is performed. In times of the parse tree this means that
the * operation appears at a lower level than does either + or -. Thus a bottom up parses
should recognize B * C by interpreting it in terms of the grammar, before considering the
surrounding terms. The first step in constructing an operator precedence parser is to
determine the precedence relations between the operators of the grammar. Operator is
taken to mean any terminal symbol (i.e., any token). We also have precedence relations
involving tokens such as BEGIN, READ, id and (. For the grammar in fig. 5, the
precedence relation is given in the above fig.
According to the grammar id may be considered as < factor >. (rule 12),
<program > (rule 9) or a < id-list > (rule 6). In operator precedence phase, it is not
necessary to indicate which non-terminal symbol is being recognized. It is interpreted as
non-terminal < N1 >. Hence the new version is shown in fig. 12(b).
An operator-precedence parser generally uses a stack to save token that have been
scanned but not yet parsed, so it can reexamine them in this way. Precedence relations
hold only between terminal symbols, so < N1 > is not involved in this process and a
relationship is determined between (and).
READ (<N1>) corresponds to rule 13 of the grammar. This rule is the only one that could
be applied in recognizing this portion of the program. The sequence is simply interpreted
as a sequence of some interpretation < N2 >. Fig. 12(c) shows this interpretation. The
parser tree is given in fig. 12(d).
Note: (1) The parse tree in fig. 1 and fig. 12 (d) are same except for the name of
the non-terminal symbols involved.
(2) The name of the non-terminals is arbitrarily chosen.
Top-Down Parsing:
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not
work, we backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as
LL(1) parser.
S → aBc
B → bc | b
input: abc
Predictive Parser:
a grammar è è a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
current token
Example:
stmt → if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token is if we
have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely choose the
production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it may not be
suitable for predictive parsing (not LL(1) grammar).
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
A → aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
• If all other productions fail, we should apply an ε-production. For example, if the
current token is not a or b, we may apply the ε-production.
• Most correct choice: We should apply an ε-production for a non-terminal A when
the current token is in the follow set of A (which terminals can follow A in the
sentential forms).
Example:
A → aBe | cBd | C
B → bB | ε
C→f
proc C { match the current token with f,
and move to the next token; }
proc A {
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the
current token with b,
and move to the next token; and move to the next
token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}
output
– a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S.
$S ç initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is
completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
Example 1:
€
Example 2:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
• FOLLOW(A) is the set of the terminals which occur immediately after (follow)
the non-terminal A in the strings derived from the starting symbol.
– a terminal a is in FOLLOW(A) if S ⇒ αAaβ
– $ is in FOLLOW(A) if S ⇒ αA
FIRST Example:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
• If ( A → αB is a production rule ) or
( A → αBβ is a production rule and ε is in FIRST(β) )
è everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
FOLLOW Example:
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
• All other undefined entries of the parsing table are error entries.
Constructing LL(1) Parsing Table – Example:
LL (1) Grammars:
• A grammar whose parsing table has no multiply-defined entries is said to be
LL (1) grammar.
• The parsing table of a grammar may contain more than one production rule. In
this case, we say that it is not a LL(1) grammar.