Unit 2 Notes
Unit 2 Notes
Unit 2 Notes
There are three general types of parsers for grammars: universal, top-down, and bottom-up.
Universal parsing methods such as the Cocke-Younger-Kasami algorithm and Earley's algorithm can
parse any grammar. These general methods are, however, too inefficient to use in production
compilers.
The methods commonly used in compilers can be classified as being either top-down or
bottom-up. As implied by their names, top-down methods build parse trees from the top (root) to the
bottom (leaves), while bottom-up methods start from the leaves and work their way up to the root. In
either case, the input to the parser is scanned from left to right, one symbol at a time.
The most efficient top-down and bottom-up methods work only for sub- classes of grammars, but
several of these classes, particularly, LL and LR grammars, are expressive enough to describe most
of the syntactic constructs in modern programming languages. Parsers implemented by hand often
use LL grammars; Parsers for the larger class of LR grammars are usually constructed using
automated tools.
Representative Grammars
E-> E + T | T
T-> T * F | F (4.1)
F-> (E) | id
Expression grammar (4.1) belongs to the class of LR grammars that are suitable for bottom-up
parsing. This grammar can be adapted to handle additional operators and additional levels of
precedence. However, it cannot be used for top-down parsing because it is left recursive. The
following non-left-recursive variant of the expression grammar (4.1) will be used for top-down
parsing:
E-> T E’
E’-> +T E’ | Ɛ
T-> F T’ (4.2)
T’-> *F T’ | Ɛ
F-> (E) | id
Context-Free Grammars
Using a syntactic variable stmt to denote statements and variable expr to denote expressions, the
production
stmt-> if (expr) stmt else stmt (4.4)
Specifies the structure of this form of conditional statement. Other productions then de ne precisely
what an expr is and what else a stmt can be.
The Formal Definition of a Context-Free Grammar
1. Terminals are the basic symbols from which strings are formed. Terminals are the keywords if
and else and the symbols “(" and “)."
2. Nonterminals are syntactic variables that denote sets of strings. In (4.4), stmt and expr are
nonterminals. The sets of strings denoted by non terminals help define the language generated by the
grammar. Nonterminals impose a hierarchical structure on the language that is key to syntax analysis
and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of strings it
denotes is the language generated by the grammar.
4. The productions of a grammar specify the manner in which the terminals and nonterminals can be
combined to form strings. Each production consists of:
(a) A nonterminal called the head or left side of the production; this production defines some of the
strings denoted by the head.
(b) The symbol→. Sometimes ::= has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and non- terminals. The components of
the body describe one way in which strings of the nonterminal at the head can be constructed.
The grammar in Fig. 4.2 defines simple arithmetic expressions.
In this grammar, the terminal symbols are
id + - * / ( )
The nonterminal symbols are expression, term and factor, and expression is the start symbol.
expression -> expression + term
expression -> expression – term
expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Figure 4.2: grammar for simple arithmetic expression
Notational Conventions
To avoid always having to state that “these are the terminals," “these are the nonterminals," and so
on, the following notational conventions for grammars will be used:
1. These symbols are terminals:
(a) Lowercase letters early in the alphabet, such as a, b, c.
(b) Operator symbols such as +, _, and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0,1,……,9.
(e) Boldface strings such as id or if, each of which represents a single terminal symbol.
2. These symbols are nonterminals:
(a) Uppercase letters early in the alphabet, such as A, B, C.
(b) The letter S, which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) When discussing programming constructs, uppercase letters may be used to represent
nonterminals for the constructs. For example, non-terminals for expressions, terms, and factors are
often represented by E, T, and F, respectively.
3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols; that is, either
nonterminals or terminals.
4. Lowercase letters late in the alphabet, chiey u,v,……,z, represent (possibly empty) strings of
terminals.
5. Lowercase Greek letters, ᾳ, β, γ for example, represent (possibly empty) strings of grammar
symbols. Thus, a generic production can be written as A ! _, where A is the head and _ the body.
6. A set of productions A -> ᾳ1, A -> ᾳ2, …… A -> ᾳk with a common head A (call them A-
productions), may be written A -> A -> ᾳ1,| ᾳ2 |……| ᾳk Call ᾳ1, ᾳ2, ……, ᾳk the alternatives for A.
7. Unless stated otherwise, the head of the _rst production is the start symbol.
Using these conventions, the grammar of Example 4.5 can be rewritten concisely as
E-> E + T | E – T | T
T-> T * F | T / F | F
F-> (E) | id
The notational conventions tell us that E, T, and F are nonterminals, with E the start symbol. The
remaining symbols are terminals.
Derivations
The construction of a parse tree can be made precise by taking a derivational view, in which
productions are treated as rewriting rules. Beginning with the start symbol, each rewriting step
replaces a nonterminal by the body of one of its productions. This derivational view corresponds to
the top-down construction of a parse tree.
Ambiguity
A grammar that produces more than one parse tree for some sentence is said to be ambiguous. An
ambiguous grammar is one that produces more than one leftmost derivation or more than one
rightmost derivation for the same sentence.
The arithmetic expression grammar (4.3) permits two distinctleftmost derivations for the sentence id
+ id * id:
E=> E + E E=> E * E
id + E => E + E * E
id + E * E => id + E * E
id + id * E => id + id * E
id + id * id => id + id * id
It is convenient to use carefully chosen ambiguous grammars, together with disambiguating rules
that “throw away" undesirable parse trees, leaving only one tree for each sentence.
The nonterminal A generates the same strings as before but is no longer left recursive. For example,
consider the grammar:
S->Aa | b
A-> Ac | Sd | Ɛ
The nonterminal S is left recursive because S=>Aa =>Sda, but it is not immediately left recursive.
Algorithm: Eliminating left recursion.
INPUT: Grammar G with no cycles or -productions.
OUTPUT: An equivalent grammar with no left recursion.
METHOD: Apply the algorithm to G.
1) arrange the nonterminals in some order A1,A2,… ..... ,An.
2) for ( each i from 1 to n ) {
3) for ( each j from 1 to i - 1 ) {
4) replace each production of the form Ai -> Ajγ by the productions Ai -> δ1γ
| δ2γ | .. |δkγ , where Aj ->δ1 | δ2 | ....... |δk are all current Aj-productions
5) }
6) eliminate the immediate left recursion among the Ai-productions
7) }
PRACTICE PROBLEMS BASED ON LEFT RECURSION ELIMINATION
1. A → ABd / Aa / a
B → Be / b
Solution: The grammar after eliminating left recursion is-
A → aA'
A' → BdA' / aA' / ∈
B → bB'
B' → eB' / ∈
2. E → E + E / E * E / a
Solution:The grammar after eliminating left recursion is-
E → aE'
E' → +EE' / *EE'/ ∈
3. E → E + T / T
T→T*F/F
F → id
Solution:- The grammar after eliminating left recursion is-
E → TE’
E’ → +TE’ / ∈
T → FT’
T’ → *FT’ / ∈
F → id
4. S → (L) / a
L→L,S/S
Solution:-The grammar after eliminating left recursion is-
S → (L) / a
L → SL’
L’ → ,SL’ / ∈
5. S → S0S1S / 01
Solution-:The grammar after eliminating left recursion is-
S → 01S’
S’ → 0S1SS’ / ∈
6. S → A
A → Ad / Ae / aB / ac
B → bBc / f
Solution:-The grammar after eliminating left recursion is-
S→A
A → aBA’ / acA’
A’ → dA’ / eA’ / ∈
B → bBc / f
7. A → AAα / β
Solution-:The grammar after eliminating left recursion is-
A → βA’
A’ → AαA’ / ∈
8. A → Ba / Aa / c
B → Bb / Ab / d
Solution-:This is a case of indirect left recursion.
Step-01:
First let us eliminate left recursion from A → Ba / Aa / c
Eliminating left recursion from here, we get-
A → BaA’ / cA’
A’ → aA’ / ∈
Now, given grammar becomes-
A → BaA’ / cA’
A’ → aA’ / ∈
B → Bb / Ab / d
Step-02:
Substituting the productions of A in B → Ab, we get the following grammar-
A → BaA’ / cA’
A’ → aA’ / ∈
B → Bb / BaA’b / cA’b / d
Step-03:
Now, eliminating left recursion from the productions of B, we get the following grammar-
A → BaA’ / cA’
A’ → aA’ / ∈
B → cA’bB’ / dB’
B’ → bB’ / aA’bB’ / ∈
9. X → XSb / Sa / b
S → Sb / Xa / a
Solution-:This is a case of indirect left recursion.
Step-01:
First let us eliminate left recursion from
X → XSb / Sa / b
Eliminating left recursion from here, we get-
X → SaX’ / bX’
X’ → SbX’ / ∈
Now, given grammar becomes-
X → SaX’ / bX’
X’ → SbX’ / ∈
S → Sb / Xa / a
Step-02:
Substituting the productions of X in S → Xa, we get the following grammar-
X → SaX’ / bX’
X’ → SbX’ / ∈
S → Sb / SaX’a / bX’a / a
Step-03:
Now, eliminating left recursion from the productions of S, we get the following grammar-
X → SaX’ / bX’
X’ → SbX’ / ∈
S → bX’aS’ / aS’
S’ → bS’ / aX’aS’ / ∈
Left Factoring:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive, or top-down, parsing. For example, if we have the two productions
on seeing the input if, we cannot immediately tell which production to choose to expand stmt. In
general,if A→αβ1| αβ2 are two A-productions, and the input begins with a nonempty string
derived from α we do not know whether to expand A to αβ1| αβ2. The decision by expanding A to
αA’. Then, after seeing the input derived from
,we expand A’ to β1 or β2 to That is, left-factored, the original productions become
A-> ᾳA1
A1 -> β1 | β2
Algorithm: Left factoring a grammar
INPUT: Grammar G
OUTPUT: An equivalent left-factored grammar
METHOD: For each nonterminal A, find the longest prefix ᾳ common to two or more of its
alternatives. If ᾳ ≠ Ɛ, i.e. there is a nontrivial common prefix, replace all of the A-productions
A->ᾳβ1 | ᾳβ2 | .. | ᾳβn | γ , where γ represents all alternatives that do not begin with ᾳ by
A->ᾳA1 | γ
A1 -> β1 | β2 | .. | βn
Here A1 is a new nonterminal. Repeatedly apply this transformation until no two alternatives for a
nonterminal have a common prefix.
S-> I E t S | I E t S e S | a
E-> b
Here, i, t, and e stand for if, then, and else; E and S stand for “conditional expression" and
“statement." Left-factored, this grammar becomes
S-> i E t S S1 | a
S1-> e S | Ɛ
E-> b
Left factoring is a process by which the grammar with common prefixes is transformed to make it
useful for Top down parsers.
How?
In left factoring,
Solution-
Step 1:
A → aA’
A’ → AB / Bc / Ac
Step-02:
A → aA’
A’ → AD / Bc
D→B/c
This is a left factored grammar.
Solution:
Step 1:
S → bSS’ / a
S’ → SaaS / SaSb / b
Step 2:
S → bSS’ / a
S’ → SaA / b
A → aS / Sb
Solution:
Step 1:
S → aS’ / b
S’ → SSbS / SaSb / bb
Step 2:
S → aS’ / b
S’ → SA / bb
A → SbS / aSb
4. S → a / ab / abc / abcd
Solution:
Step 1:
S → aS’
S’ → b / bc / bcd / ∈
Step-02:
S → aS’
S’ → bA / ∈
A → c / cd / ∈
Step-03:
S → aS’
S’ → bA / ∈
A → cB / ∈
B→d/∈
5.S → aAd / aB
A → a / ab
B → ccd / ddc
Solution:
S → aS’
S’ → Ad / B
A → aA’
A’ → b / ∈
B → ccd / ddc
Top-Down Parsing
Top-down parsing can be viewed as the problem of constructing a parse tree for the input string,
starting from the root and creating the nodes of the parse tree in preorder. Equivalently, top-down
parsing can be viewed as finding a leftmost derivation for an input string. Consider input id+id*id
is a top-down parse according to grammar
E→TE'
E'→+TE'| ε
T→+FT'
T'→*FT'| ε
F→(E)|id
Recursive-Descent Parsing
Void( ) {
1) Choose an A-production, A->X1X2……Xk;
2) for ( i=1 to k ) {
3) if ( Xi is a nonterminal)
4) call procedure Xi( );
5) else if (Xi equals the current input symbol a)
6) advance the input to the next symbol;
7) else /* an error has occurred * /;
}
}
A typical procedure for a nonterminal in a top-down parser
A recursive-descent parsing program consists of a set of procedures, one for each nonterminal.
Execution begins with the procedure for the start symbol, which halts and announces success if its
procedure body scans the entire input string. Pseudocode for a typical nonterminal is shown above.
General recursive-descent may require backtracking; that is, it may require repeated scans over the
input. However, backtracking is rarely needed to parse programming language constructs, so
backtracking parsers are not seen frequently.
FIRST and FOLLOW
The construction of both top-down and bottom-up parsers is aided by two functions, FIRST and
FOLLOW, associated with a grammar G. During top- down parsing, FIRST and FOLLOW allow us
to choose which production to apply, based on the next input symbol. During panic-mode error
recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.
Define FIRST( α), where α is any string of grammar symbols, to be the set of terminals that begin
strings derived from α If α=>*ε then ε is also in FIRST(α).
For a preview of how FIRST can be used during predictive parsing, consider two A-productions A→
α|β where FIRST(α) and FIRST(β) are disjoint sets.
1. If X is a terminal, then FIRST(X) = { X }
2. If X is a nonterminal and X -> Y1Y2…… Yk is a production for some k >=1, then place a in
FIRST(X) if for some i, a is in FIRST(Yi), and Ɛ is in all of FIRST(Y1), …… , FIRST(Yi-1); that is,
Y1…… Yi-1 )=>Ɛ. If Ɛ is in FIRST(Yj ) for all j = 1,2,……, k, then add Ɛ to FIRST(X). For
example, everything in FIRST(Y1) is surely in FIRST(X). If Y1 does not derive Ɛ, then we add
nothing more to FIRST(X), but if Y1 =>Ɛ, then we add FIRST(Y2), and so on.
3. If X -> Ɛis a production, then add Ɛ to FIRST(X).
Now, we can compute FIRST for any string X1X2 …… Xn as follows. Add to FIRST(X1X2 ……
Xn) all non-Ɛ symbols of FIRST(X1). Also add the non-Ɛ symbols of FIRST(X2), if Ɛ is in
FIRST(X1); the non- Ɛ symbols of FIRST(X3), if Ɛ is in FIRST(X1) and FIRST(X2); and so on.
Finally, add Ɛ to FIRST(X1X2 …… Xn) if, for all i, Ɛ is in FIRST(Xi).
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any FOLLOW set.
1. Place $ in FOLLOW(S), where S is the start symbol, and $ is the input right endmarker.
2. If there is a production A-> ᾳBβ, then everything in FIRST(β) except Ɛ is in FOLLOW(B).
3. If there is a production A->ᾳB, or a production A-> ᾳBβ, where FIRST(β) contains Ɛ, then
everything in FOLLOW(A) is in FOLLOW(B).
Predictive parsers can be constructed for LL(1) grammars since the proper production to apply for a
nonterminal can be selected by looking only at the current input symbol.
The next algorithm collects the information from FIRST and FOLLOW sets into a predictive parsing
table M[A,a], a two-dimensional array, where A is a nonterminal, and a is a terminal or the symbol
$, the input endmarker. The algorithm is based on the following idea: the production A→α is
chosen if the next input symbol a is in FIRST(α). The only complication occurs when α =ε or, more
Example:
Parsing Table:
Parse table:
Non recursive Predictive Parsing
A nonrecursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly
via recursive calls. The parser mimics a leftmost derivation. If w is the input that has been matched
so far, then the stack holds a sequence of grammar symbols such that
The table-driven parser in Fig. 4.19 has an input buffer, a stack containing a sequence of grammar
symbols, a parsing table constructed by Algorithm 4.31, and an output stream. The input buffer
contains the string to be parsed, followed by the end marker $. We reuse the symbol $ to mark the
bottom of the stack, which initially contains the start symbol of the grammar on top of $.
The parser is controlled by a program that considers X, the symbol on top of the stack, and a,
the current input symbol. If X is a nonterminal, the parser chooses an X-production by consulting
entry M[X,a] of the parsing table M. Otherwise, it checks for a match between the terminal X and
current input symbol a.
Note that the sentential forms in this derivation correspond to the input that has already been
matched (in column MATCHED) followed by the stack contents.
Error Recovery in Predictive Parsing
An error is detected during predictive parsing when the terminal on top of the stack does not match
the next input symbol or when nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error.
Panic Mode
Panic-mode error recovery is based on the idea of skipping over symbols on the input until a token in
a selected set of synchronizing tokens appears. Its effectiveness depends on the choice of
synchronizing set. Some heuristics are as follows:
1. As a starting point, place all symbols in FOLLOW(A) into the synchronizing set for nonterminal
A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely
that parsing can continue.
2. It is not enough to use FOLLOW(A) as the synchronizing set for A. For example, if semicolons
terminate statements, as in C, then keywords that begin statements may not appear in the FOLLOW
set of the nonterminal representing expressions. A missing semicolon after an assignment may
therefore result in the keyword beginning the next statement being skipped.
3. If we add symbols in FIRST(A) to the synchronizing set for nonterminal A, then it may be
possible to resume parsing according to A if a symbol in FIRST(A) appears in the input.
4. If a nonterminal can generate the empty string, then the production deriving ε can be used as a
default. Doing so may postpone some error detection, but cannot cause an error to be missed. This
approach reduces the number of nonterminals that have to be considered during error recovery.
5. If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue a
message saying that the terminal was inserted, and continue parsing. In effect, this approach takes
the synchronizing set of a token to consist of all other tokens.
If the parser looks up entry M[A,a] and finds that it is blank, then the input symbol a is skipped. If
the entry is “synch," then the nonterminal on top of the stack is popped in an attempt to resume
parsing. If a token on top of the stack does not match the input symbol, then we pop the token from
the stack, as mentioned above.
Non Input Symbols
terminal id + * ( ) $
E E->TE’ E->TE’ synch Synch
E’ E->+TE’ E->Ɛ
T T->FT’ synch T->FT’ synch Synch
T’ T’->Ɛ T’->*FT’ T’->Ɛ T’->Ɛ
F F->id synch synch F->(E) synch synch
Figure 4.22: Synchronizing tokens added to the paring table for the input id+id*id
On the erroneous input ) id* +id, the parser and error recovery mechanism of Figure. 4.22 behave as
in Figure. 4.23.
Stack Input Remarks
E$ )id*+id$ Error, skip )
E$ id*+id$ id is in FIRST(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
FT’E’$ +id$ Error, M[F,+]=synch
T’E’$ +id$ F has been popped
E’$ +id$
+TE’$ +id$
TE’$ id$
FT’E’$ id$
idT’E’$ id$
T’E’$ $
E’$ $
$ $
Figure 4.23: parsing and error recovery moves made by a predictive parser
Phrase-level Recovery
Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines. These routines may change, insert, or delete symbols on the
input and issue appropriate error messages. They may also pop from the stack. Alteration of stack
symbols or the pushing of new symbols onto the stack is questionable for several reasons. First, the
steps carried out by the parser might then not correspond to the derivation of any word in the
language at all. Second, we must ensure that there is no possibility of an infinite loop. Checking that
any recovery action eventually results in an input symbol being consumed (or the stack being
shortened if the end of the input has been reached) is a good way to protect against such loops.
Bottom-Up Parsing
A bottom-up parse corresponds to the construction of a parse tree for an input string beginning at the
leaves (the bottom) and working up towards the root (the top). It is convenient to describe parsing as
the process of building parse trees, although a front end may in fact carry out a translation directly
without building an explicit tree. The sequence of tree snapshots in Fig. 4.25 illustrates
a bottom-up parse of the token stream id*id, with respect to the expression grammar (4.1).
Reductions
Bottom-up parsing as the process of “reducing" a string w to the start symbol of the grammar. At
each reduction step, a specific substring matching the body of a production is replaced by the
nonterminal at the head of that production.
The snapshots in Fig. 4.25 illustrate a sequence of reductions; the grammar is the expression
grammar (4.1). The reductions will be discussed in terms of the sequence of strings
id*id,F *id,T *id,T *F,T,E
The strings in this sequence are formed from the roots of all the subtrees in the snapshots. The
sequence starts with the input string id*id. The first reduction produces F*id by reducing the
leftmost id to F, using the production F→ id. The second reduction produces T*id by reducing F to
T. Now, a choice between reducing the string T, which is the body of E *T, and the string consisting
of the second id, which is the body of F→ id. Rather than reduce T to E, the second id is reduced to
F, resulting in the string T *F. This string then reduces to T. The parse completes with the reduction
of T to the start symbol E.
The goal of bottom-up parsing is therefore to construct a derivation in reverse. The following
corresponds to the parse in Fig. 4.25:
E =>T =>T *F =>T *id => F * id => id * id
This derivation is in fact a rightmost derivation.
Handle Pruning
Bottom-up parsing during a left-to-right scan of the input constructs a right- most derivation in
reverse. Informally, a “handle" is a substring that matches the body of a production, and whose
reduction represents one step along the reverse of a rightmost derivation.
For example, adding subscripts to the tokens id for clarity, the handles during the parse of id1* id2
according to the expression grammar (4.1) are as in Fig. 4.26. Although T is the body of the
production E →T, the symbol T is not a handle in the sentential form T *id2. If T were indeed
replaced by E, the string E* id2, which cannot be derived from the start symbol E. Thus, the
leftmost substring that matches the body of some production need not be a handle.
Right Sentential Form Handle Production
id*id id F->id
F*id F T->F
T*id id F->id
T*F T*F T->T*F
T T E->T
E
Figure 4.26: Handles during a parse of id1*id2
Shift-Reduce Parsing
Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar symbols and an
input buffer holds the rest of the string to be parsed. We use $ to mark the bottom of the stack and
also the right end of the input. Initially, the stack is empty, and the string w is on the input, as
follows:
STACK INPUT
$ ω$
During a left-to-right scan of the input string, the parser shifts zero or more input symbols onto the
stack, until it is ready to reduce a string β of grammar symbols on top of the stack. It then reduces β
to the head of the appropriate production. The parser repeats this cycle until it has detected an error
or until the stack contains the start symbol and the input is empty:
STACK INPUT
$S $
Figure 4.28 steps through the actions a shift-reduce parser might take in parsing the input string
id1*id2 according to the expression grammar (4.1).
Stack Input Buffer Parsing Action
$ Id1*id2$ Shift
$id1 *id2$ Reduce F->id
$F *id2$ Reduce T->F
$T *id2$ Shift
$T* id2$ Shift
$T* id2 $ Reduce by F->id
$T*F $ Reduce by T->T*F
$T $ Reduce by E->T
$E $ Accept
Figure 4.28: Configurations of a shift-reduce parser on input id1*id2
While the primary operations are shift and reduce, there are actually four possible actions a shift-
reduce parser can make: (1) shift, (2) reduce, (3) accept, and (4) error.
1. Shift. Shift the next input symbol onto the top of the stack.
2. Reduce. The right end of the string to be reduced must be at the top of the stack. Locate the left
end of the string within the stack and decide with what nonterminal to replace the string.
3. Accept. Announce successful completion of parsing.
4. Error. Discover a syntax error and call an error recovery routine.
The use of a stack in shift-reduce parsing is justified by an important fact: the handle will
always eventually appear on top of the stack, never inside. This fact can be shown by considering the
possible forms of two successive steps in any rightmost derivation.
A→X•YZ
A→XY•Z
A→XYZ•
For example, the item A→•XY Z indicates that a string derivable from XY Z next on the input. Item
A→X•YZ indicates that on the input a string derivable from X and that to see a string derivable
from Y Z. Item A →XY Z• indicates that the body XY Z and that it may be time to reduce
XY Z to A.
One collection of sets of LR(0) items, called the canonical LR(0) collection, provides the basis for
constructing a deterministic finite automaton that is used to make parsing decisions.
Closure of Item Sets
If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I by the
two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A-> ᾳ . Bβ is in CLOSURE(I) and B-> γ is aproduction, then add the item B-> . γ to
CLOSURE(I), if it is not already there. Apply this rule until no more new items can be added
to CLOSURE(I).
The LR-Parsing Algorithm
A schematic of an LR parser is shown in Fig. 4.35. It consists of an input, an output, a stack, a driver
program, and a parsing table that has two parts (ACTION and GOTO). The driver program is the
same for all LR parsers; only the parsing table changes from one parser to another. The parsing
program reads characters from an input buffer one at a time. Where a shift-reduce parser would shift
a symbol, an LR parser shifts a state.
The stack holds a sequence of states, s0,s1 ............... sm, where sm is on top. In the SLR method, the
stack holds states from the LR(0) automaton; the canonical- LR and LALR methods are similar. By
construction, each state has a corresponding grammar symbol. Recall that states correspond to sets of
items, and that there is a transition from state i to state j if GOTO(Ii,X) = Ij . All transitions to state j
must be for the same grammar symbol X.
Structure of the LR Parsing Table
The parsing table consists of two parts: a parsing-action function ACTION and a goto function
GOTO.
1. The ACTION function takes as arguments a state i and a terminal a (or $, the input endmarker).
The value of ACTION[i; a] can have one of four forms:
(a) Shift j, where j is a state. The action taken by the parser effectively shifts input a to the stack, but
uses state j to represent a.
(b) Reduce A ->β. The action of the parser effectively reduces β on the top of the stack to head A.
(c) Accept. The parser accepts the input and finishes parsing.
(d) Error. The parser discovers an error in its input and takes some corrective action2. We extend the
GOTO function, de_ned on sets of items, to states: if
GOTO[Ii,A] = Ij , then GOTO also maps a state i and a nonterminal A to state j.
Let us construct the SLR table for the augmented expression grammar. The canonical collection of
sets of LR(0) items for the grammar was shown in Fig. 4.31. First consider the set of items I0:
E1->. E
E -> . E + T
E -> . T
T -> . T * F
T -> . F
F -> . (E)
F -> . id
The item F -> . (E) gives rise to the entry ACTION[0, (] = shift 4, and the item F -> . id to the entry
ACTION[0, id] = shift 5. Other items in I0 yield no actions. Now consider I1:
E1 -> E .
E -> E. + T
The first item yields ACTION[1, $] = accept, and the second yields ACTION[1, +]
= shift 6. Next consider I2:
E -> T.
T -> T.* F
Since FOLLOW(E) = {$,+, )}, the first item makes
ACTION[2, $] = ACTION[2, +] = ACTION[2, )] = reduce E -> T
The second item makes ACTION[2,*]] = shift 7. Continuing in this fashion we obtain the ACTION
and GOTO tables that were shown in Fig. 4.31. In that figure, the numbers of productions in reduce
actions are the same as the order in which they appear in the original grammar (4.1). That is,
E -> E + T is number 1, E -> T is 2, and so on.
Every SLR(1) grammar is unambiguous, but there are many unambiguous grammars that are not
SLR(1). Consider the grammar with productions
S -> L = R | R
L -> *R | id (4.49)
R -> L
Think of L and R as standing for l-value and r-value, respectively, and * as an operator indicating
“contents of."5 The canonical collection of sets of LR(0) items for grammar (4.49) is shown in Fig.
4.39.
1. Consider the grammar:
A->(A) | a
1) Find LR(0) items
2) Construct LR(0) automata
3) Construct SLR parsing table
4) Parse the input string : (((a))).
ANSWER:
Augmented grammar is:
A’-> A
A->(A)
A-> a
LR(0) items:
I0:
A’->.A
A->.(A)
A->.a
I1: GOTO(I0,A)
A’->A.
I2: GOTO(I0,( )
A->(.A)
A->.(A)
A->.a
I3: GOTO(I0,a)
A->a.
I4: GOTO(I2,A)
A->(A.)
I5: GOTO(I4,))
A->(A).
LR(0) automata:
A
A’->.A A’->A.
A->.(A)
$ Acc
A->.a
(
A A->(A.)
A->(.A)
a
A->.(A)
(
A->.a )
A->(A).
A->a.
FIRST(A’)={(,a} FOLLOW(A’)={$}
FIRST(A)={(,a} FOLLOW(A)={),$}
1. A->(A)
2. A->a
Rule-01:
➢ If precedence of b is higher than precedence of a, then we define a < b
➢ If precedence of b is same as precedence of a, then we define a = b
➢ If precedence of b is lower than precedence of a, then we define a > b
Rule-02:
➢ An identifier is always given the higher precedence than any other symbol.
➢ $ symbol is always given the lowest precedence.
Rule-03:
If two operators have the same precedence, then we go by checking their associativity.
Parsing A Given String-
The given input string is parsed using the following steps-
Step-01:
Insert the following-
➢ $ symbol at the beginning and ending of the input string.
➢ Precedence operator between every two symbols of the string by referring the operator
precedence table.
Step-02:
➢ Start scanning the string from LHS in the forward direction until > symbol is encountered.
➢ Keep a pointer on that location.
Step-03:
➢ Start scanning the string from RHS in the backward direction until < symbol is encountered.
➢ Keep a pointer on that location.
Step-04:
➢ Everything that lies in the middle of < and > forms the handle.
➢ Replace the handle with the head of the respective production.
Step-05:
Keep repeating the cycle from Step-02 to Step-04 until the start symbol is reached.
Advantages of operator precedence parsing are
➢ The implementation is very easy and simple.
➢ The parser is quite powerful for expressions in programming languages.
Disadvantages-
➢ The handling of tokens known to have two different precedence becomes difficult.
➢ Only small class of grammars can be parsed using this parser.
Important Note-
In practice, operator precedence table is not stored by the operator precedence parsers. it occupies
the large space. Instead, operator precedence parsers are implemented in a very unique style. They
are implemented using operator precedence functions.
Operator Precedence Functions- Precedence functions perform the mapping of terminal symbols
to the integers. To decide the precedence relation between symbols, a numerical comparison is
performed. It reduces the space complexity to a large extent.