Chương 3. Phân Tích Cú Pháp

Chương 3
Phân tích cú pháp

(Syntax Analysis)
Introduction to Parsing
• Relationship between lexical analysis and parsing.
string of string of parse tree

characters LEXER tokens PARSING
Introduction to Parsing
<key, ‘if’>
<(, ‘(‘>
<id, ‘x’> if - else
if (x==y) <==, ‘==‘>
result = 1; <id, ‘y’>
else <), ‘)’>
result = 2; <id, ‘result’> == = =
LEXER <=, ‘=‘>

<int, ‘1’> PARSING id id id int id int
<pun, ‘;’>
<key, ‘else’>
<id, ‘result’>
<=, ‘=‘>
<int, ‘2’>
<pun, ‘;’>
Context-Free Grammars (CFGs)
• A CFG consist of
• A set of terminals Σ
• A set of nonterminals Δ
• A start symbol S (S Δ)
• A set of productions (rules) R
A  B1B2…..Bn,
AΔ
Bi  Δ  Σ  {}
means A can be replaced by B1B2…..Bn
Key idea
1. Begin with a string consisting of the start symbol “S”
2. Replace any nonterminal A in the string by a the right-hand side of some
production
A → B1 … Bn
3. Repeat (2) until there are no nonterminals in the string
Notational conventions for grammars
1. These symbols are terminals:
(a) Lowercase letters early in the alphabet, such as a, b, c.
(b) Operator symbols such as +, *, and so on.
(c) Punctuation symbols such as parentheses ( ), comma ‘;’, and so on.
(d) The digits 0, 1, ….., 9.
(e) Boldface strings such as id or if, each of which represents a single terminal symbol.
2. These symbols are nonterminals:

(a) Uppercase letters early in the alphabet, such as A, B, C.
(b) The letter S, which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
Notational conventions for grammars
3. Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar symbols; that
is, either non-terminals or terminals.
4. Lowercase letters late in the alphabet, chiefly u, v, …, z, represent (possibly empty)
strings of terminals.
5. Lowercase Greek letters, , ,  for example, represent (possibly empty) strings of
grammar symbols. Thus, a generic production can be written as A  , where A is the
head and  is the body.
6. A set of productions A 1, A 2, ……., A k
with a common head A (call them A-productions), may be written A  1 | 2 | … | k
Call 1, 2, …., k the alternatives for A.
7. Unless stated otherwise, the head of the first production is the start symbol.
• Example of a Context-free Grammar
S  (S)
S
 set of terminals Σ = {(, )}

 set of nonterminals Δ = {S}
 S: S
• Example of a Context-free Grammar
expression  expression + term
expression  expression - term
expression  term
term  term * factor
term  term / factor
term  factor
factor  (expression)
factor  id
 set of terminals Σ = {+, -, *, /,(, ), id}

 set of nonterminals Δ = {expression, term, factor}
 S: expression
Derivations
• Consider the following grammar: E  E+E | E*E | -E | (E) | id
• The production E  -E signies that if E denotes an expression, then -E must also
denote an expression.
• The replacement of a single E by - E will be described by writing E  -E
which is read, “E derives -E"
• The production E  (E) can be applied to replace any instance of E in any string of
grammar symbols by (E), e.g., E*E  (E)*E or E*E  E*(E)
• We can take a single E and repeatedly apply productions in any order to get a sequence
of replacements.
E  -E  - (E)  - (id)
• We call such a sequence of replacements a derivation of -(id) from E. This derivation
provides a proof that the string -(id) is one particular instance of an expression.
Derivations
• Consider the following grammar: E  E+E | E*E | -E | (E) | id
• The string -(id+id) is a sentence of grammar because there is a derivation

E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
• We write E +-(id+id) to indicate that -(id+id) can be derived from E.
• In leftmost derivations, the leftmost nonterminal in each sentential is always

chosen.
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
• In rightmost derivations, the rightmost nonterminal is always chosen.
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
Parse Trees and Derivations
• A parse tree is a graphical representation of a derivation that filters out the order in
which productions are applied to replace nonterminals.
• A parse tree has
• Terminals at the leaves
• Nonterminals at the interior nodes
• Each interior node of a parse tree represents the application of a production
• The interior node is labeled with the nonterminal (A) in the head of the production
• The children of the node are labeled, from left to right, by the symbols in the body
of the production by which this A was replaced during the derivation
• An in-order traversal of the leaves is the original input
• The parse tree shows the association of operations, the input string does not.
• A derivation can be drawn as a tree
• Start symbol is the root of the tree
• For a production A  B1B2…..Bn add children B1B2…..Bn to node A
B1 B2 …….. Bn
• Note that leftmost derivations and rightmost derivations have the same
parse tree.
• Grammar
E  E+E | E*E | -E | (E) | id
• String
id*id+id
• leftmost derivations: EE*Eid*Eid*E+Eid*id+Eid*id+id
• rightmost derivations: EE*EE*E+EE*E+idE*id+idid*id+id
• Grammar
E  E+E | E*E | -E | (E) | id
• String E
-(id+id)
• Derivation - E
E
 -E ( E )
 - (E)
 - (E+E) E + E
 - (id+E)
 - (id+id) id id
E
E
 -E - E
 - (E)
 - (E+E) ( E )
 - (id+E)
E + E
 - (id+id)
id id
Ambiguity
• A grammar that produces more than one parse tree for some string is said to be
ambiguous.
• Equivalently, there is more than one leftmost derivations or rightmost
derivations for some string.
• Grammar
E  E+E | E*E | -E | (E) | id
• String
id+id*id
• This string has two parse trees
Ambiguity
• This string (id+id*id) has two parse trees
E E
E + E E  E+ E E  E*E E * E
 id+E  E+E*E
 id+E*E  id+E*E
id E * E E + E id
 id+id*E  id+id*E
 id+id*id  id+id*id
id id id id
Exercises
Exercise 1: Consider the context-free grammar:
S  SS+ | SS* | a
and the string aa+a*
a) Give a derivation for the string
b) Give a leftmost derivation for the string
c) Give a rightmost derivation for the string
d) Give a parse tree for the string
e) Is the grammar ambiguous or unambiguous? Justify your answer
f) Describe the language generated by this grammar
Exercises
Exercise 2: Repeat Exercise 1 for each of the following grammars and
strings:
a) S  0S1 | 01 with string 000111
b) S  +SS | *SS | a with string +*aaa
c) S  S(S)S |  with string (()())
d) S  S+S | SS | (S) | S* | a with string (a+a)*a
e) S  (L) | a
L  L,S | S with string ((a,a),a,(a))
f) S  aSbS | bSaS |  with string aabbab
Exercises
Exercise 3: Design grammars for the following languages:
a) The set of all strings of 0s and 1s that are palindromes; that is, the string
reads the same backward as forward.
b) The set of all strings of 0s and 1s with an equal number of 0s and 1s

Exercises
Exercise 4:
• Let Σ = {void, int, float, double, name, (, ), ‘,’, ;}.
• Examples:
void name(int name, double name);
int name();
int name(double name);
int name(int, int name, float);
void name(void);
Exercises
Here's one possible grammar:
S → ret name (args);
ret → type | void
type → int | double | float
args → ε | void | arglist
arglist → onearg | arglist, onearg
onearg → type | type name
Abstract Syntax Trees (AST)
string of string of parse tree

characters LEXER tokens
PARSING
• A parse tree is a graphical representation of a derivation that filters out the order
in which productions are applied to replace nonterminals.
• Parse tree (Concrete Syntax Tree – CST)
• Abstract syntax trees like parse trees but ignore some details
• In the abstract syntax tree (syntax tree), interior nodes represent
programming constructs
• In the parse tree (concrete syntax trees), interior nodes represent nonteminals
• Abstract Syntax Trees or ASTs are tree representations of code.
Abstract Syntax Trees (AST)
• Consider the grammar: E  int | (E) | E+E | E*E
• And the string: 7 + (4 + 5)
• After lexical analysis: int7 ‘+’ ‘(‘ int4 ‘+’ int5 ‘)’
Example of parse tree
• During parsing we build a parse tree
E + E
Too much information
)
- Parentheses
int7 ( E
- Single-successor nodes
E + E
int4 int5
Example of abstract syntax tree
Parse Tree Abstract Syntax Tree AST is represented as
E + PLUS
E + E int7 + PLUS
int7 ( E ) int4 int5 7 4 5
E + E
int4 int5
Parse Tree vs Syntax Tree
Parse Tree Syntax Tree
A parse tree is a graphical representation of the A syntax tree is a condensed form of parse tree
replacement process in a derivation
In parse trees In syntax trees

- Each interior node represents a grammar rule - Each interior node represents an operator
- Each leaf node represents a terminal - Each leaf node represents an operand
Parse tree represent every detail from the real Syntax trees does not represent every detail from
syntax the real syntax (that’s why they are called
abstract).
For example: no rule nodes, no parentheses, …
Parse Tree vs Syntax Tree
• Consider the following grammar:
E  E+T  T+T  F+T  id+T
E  E+T | T
 id+T*F  id+F*F  id+id*F
T  T*F | F  id+id*id
F  (E) | id E +
• For the string
id+id*id
E + T id *
• Genarate
T T F id
* id
Parse tree Syntax tree
F
F id
id id
Parsing
Parsing
Top Down Parsing Bottom Up Parsing
Recursive Descent Operator

Parsing LR(k)
Precedence
Non Back-tracking Back-tracking

Parsing Parsing LR(0) LR(1)
Predictive parsing
LR(0) SLR(1) LALR CLR
LL parsing LL(1)
Top-Down Parsing
• Parsing is a process of deriving string from a given grammar.
• Top-down parsing is a process of constructing a parse tree for the input string:
• From the top
• From left to right
• Top-down parsing can be viewed as finding a leftmost derivation for an input
string
• Recursive Descent Parsing uses procedures for every terminal and nonterminal
entity.
• Recursive Descent Parsing to make a parse tree, which may or may not require
back-tracking
• A form of recursive descent parsing that does not require any back-tracking is
known as predictive parsing.
Top-Down Parsing
• Consider the grammar:
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
and the input string: id+id*id
• Give a leftmost derivation for the input string
• Give a parse tree for the input string
Top-Down Parsing
• Give a leftmost derivation for the string:
E  TE’
E  TE’  FT’E’  idT’E’  idE’
E’  +TE’ |   id+TE’  id+FT’E’  id+idT’E’
T  FT’  id+id*FT’E’  id+id*idT’E’ E
 id+id*idE’  id+id*id
T’  *FT’ | 
F  (E) | id T E’
and the string: id+id*id F +
T’ T E’
Give a parse tree for the string:
id  F T’ id
id * F T’
id 
Left-recursion
• A grammar is left-recursive if and only if there exists a nonterminal
symbol that can derive to a sentential form with itself as the leftmost
symbol.
 + 
where + indicates the operation of making one or more substitutions, and 
is any sequence of terminal and nonterminal symbols.
• Recursive descent parsing does not work in such cases.

Elimination of Left-Recursion
• Consider the left-recursive grammar    | 
where  and  are sequences of terminals and nonterminals that do not start with .
• The nonterminal  and its production are said to be left-recursive, because the
production    has  itself as the leftmost symbol on the right hand side.
•  generates all strings starting with a  and followed by a number of .
• For example, in expr  expr+term | term

nonterminal  = expr, string  = +term, and string  = term
exprexpr+termexpr+term+termexpr+term+term+termexpr+term+term+term+term
Removing left-recursion
• Left-recursion often poses problems for parsers
• Therefore, a grammar is often preprocessed to eliminate the left-recursion.
• The left-recursive grammar    | 
can rewrite using right-recursion
  
   | 
• Example:
Consider the left-recursive grammar
S → a | ^(T)
T → T,S | S
Eliminate the left-recursion from the grammar.
Solution: We have immediate left recursion in T-productions.
• Comparing T → T,S | S with A → Aα | β where A = T, α =, S and β = S
• Complete grammar will be
S → a | ^(T)
T → ST′
T′→ ,ST′ | ε
Solution
• Comparing E → E+T | T with A → Aα | β Example:
A = E, α = +T, β = T Consider the left-recursive grammar
1) A → Aα | β is changed to A → βA′ and A′ → αA′ | ε E → E+T | T
2) A → βA′ means E → TE′ T → T*F | F
3) A′ → αA′ | ε means E′ → +TE′ | ε F → (E) | id
• Comparing T → T∗F | F with A → Aα | β
4) A = T, α =∗F, β = F
5) A → βA′ means T → FT′
6) A → αA′ | ε means T′ →*FT′ | ε
• Production F → (E) | id does not have any left-recursion
• Combining productions 1, 2, 3, 4, 5, 6 we get
E → TE′
E′ → +TE′ | ε
T → FT′
T’ → *FT′ | ε
F → (E) | id
Example:
• The left-recursive grammar    | 
can rewrite using right-recursion
  
   | 
or
   | 
Example:
• Consider the left-recursive grammar:
S  SabA | a
Aa|b
• Can rewrite as
S  aS
S  abAS | 
Aa|b
• In general   1 |…| n | 1 |…| m
•  generates all strings starting with one of 1,…,m and followed by

several instances of 1,…,n
• Can rewrite as
  1 | … | m 
  1 | … | n | 
Example:
• Consider the left-recursive grammar:
S  SbA | SaA | b | a
A a | b
• Can rewrite as
S  bS | aS
S  bAS | aAS | 
Aa|b
General Left-Recursion
• The grammar
  ' | 
  
Is it a left-recursive grammar?
A grammar is left-recursive if and only if

• It is also left-recursive because: there exists a nonterminal symbol that can
  '   or  +  derive to a sentential form with itself as
the leftmost symbol.
 + ,
where + indicates the operation of
making one or more substitutions,
and  is any sequence of terminal
and nonterminal symbols.
General Left-Recursion
• Consider the grammar there exists a nonterminal symbol that can
S  Aa | b derive to a sentential form with itself as the
leftmost symbol.
A  Ac | Sd |   + ,
where + indicates the operation of
The nonterminal S is left-recursive? making one or more substitutions, and
 is any sequence of terminal and
nonterminal symbols.
• The nonterminal S is left-recursive because S  Aa  Sda or S + Sda,

but it is not immediately left-recursive.
Left-Recursion
• Consider the grammar there exists a nonterminal symbol that can
S→ a | ^(T) derive to a sentential form with itself as the
T→ ST′ leftmost symbol.
 + ,
T′ →,ST′ | ε where + indicates the operation of
making one or more substitutions, and
Is it a left-recursive grammar?  is any sequence of terminal and
nonterminal symbols.
S  ^(T)  ^(ST')  ^(^(T)T’)  ……  ^

T  ST’  ……  ^ 
T’  ,ST′  ……  ,^ 
Algorithm For Eliminating Left-Recursion
• INPUT: Grammar G with no cycles (derivations of the form A + A) or -productions.
• OUTPUT: An equivalent grammar with no left-recursion.
(Note that the resulting non-left-recursive grammar may have -productions)
• METHOD:
1) arrange the nonterminals in some order 1, 2, …, n.
2) for (each i from 1 to n) {
3) for (each j from 1 to i-1) {
4) replace each production of the form i  j by the productions i 1 | 2 | … | k ,
where j 1 | 2 | … | k are all current i-productions
5) }
6) eliminate the immediate left-recursion among the i-productions
7) }
Example
• Let apply that algorithm (Algorithm For Eliminating Left-Recursion in the previous slide) to the
grammar
  B | 
B  
Solution:
• We order the nonterminals: A, B
• For i = 1: nothing happens (because there is no immediate left-recursion among the A-productions)
• For i = 2:
 For j = 1: replace the production B   by two productions:
B  B (where   B)
and B   (where A   )
A  B | 
B  B | 
 eliminate immediate left-recursion:
A B | 
B  B’
B’ B’ | 
Exercise
1. Eliminate the left-recursion from the grammar:
A  AT | T
T0|1|2|3|4|5|6|7|8|9
A  AB | AC | C
Ca
B0
A  B | 
B   |  | cd
C  db | bc
S  Aa | b
A  Ac | Sd
Exercise
A  AT | T
T0|1|2|3|4|5|6|7|8|9
Solution:
A  TA'
A'  TA' | ɛ
T0|1|2|3|4|5|6|7|8|9

A  AB | AC | C
Ca
B0
Solution:
A  CA’
A’  BA’ | CA’ | ɛ
Ca
B0
Exercise
3. Eliminate the left-recursion from the grammar
A  B | 
B   | A | cd
C  db | bc
- Solution:
• We order the nonterminals: A, B, C.
• For i = 1: nothing happens (because there is no immediate left recursion among the A-productions)
• For i = 2:
 For j = 1: - substitute for A in B   to obtain the following B-productions: B  B | 
- substitute for A in B  A to obtain the following B-productions: B  B | 
- we obtain:
A  B | 
B  B | B |  |  | cd
C  db | bc
- eliminate the immediate left recursion among these B-productions yields the following grammar
A  B | 
B  B’ | B’ | cdB’
B’ B’ | B’ | ɛ
C  db | bc
• For i = 3: nothing happens (because there is no immediate left recursion among the C-productions)
Left factoring
• Consider the grammar
stmt  if expr then stmt else stmt
| if expr then stmt
 on seeing the input if, we cannot immediately tell which production to

choose to expand stmt.
 we need to left-factor the grammar
Left factoring
• Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive or top-down parsing.
• When the choice between two alternative A-productions is not clear, we may be able to
rewrite the productions to defer the decision until enough of the input has been seen that we
can make the right choice.
• Consider -productions:  1  2
and the input begins with a nonempty string derived from 
 we don’t know whether to expand  to 1 or 2
 we may defer the decision by
- expanding  to ’
- then exapanding ’ to 1 or 2
 the original productions become
  ’
’  1 | 2
Left factoring
Example: Consider -productions:  1  2
• Consider the grammar:  we don’t know whether to expand  to 1 or 2
S  iEtS | iEtSeS | a  we may defer the decision by
- expanding  to ’
Eb - then exapanding ’ to 1 or 2
Left-factor this grammar.   ’
’  1 | 2
• Can rewrite as
S  iEtSS’ | a
S  eS| 
Eb
Left factoring
• Consider the grammar • Consider -productions:  1  2  …| n
E  T+E | T and the input begins with a nonempty string derived
from 
T  int | int*T | (E)  we don’t know whether to expand  to 1 or 2
Left-factor this grammar. - expanding  to ’
E  TE'  the original productions become
  ’
E'  +E | ɛ ’  1 | 2 | …| n
T  intT' | (E)
T'  *T | ɛ
Predictive Parsing
• Recursive-descent parsing is a top-down method of syntax analysis
• in which a set of recursive procedures is used to process the input.
• one procedure is associated with each nonterminal of a grammar.
• Predictive parsing is a special case of recursive-descent parsing, but parser
can “predict” which production to use
• By looking at the next few tokens
• No backtracking
• Predictive parsers accept LL(k)
• L means “left-to-right” scan of input
• L means “leftmost derivation”
• k means “predict based on k tokens of lookahead”
• In practice, LL(1) is used
LL(1) grammars
• Predictive parsers
• is recursive-descent parsers needing no backtracking
• can be constructed for a class of grammars called LL(1)
• LL(1)
• First L stands for scanning the input from left to right
• Second L stands for producing a leftmost derivation
• 1 stands for using one input symbol of lookahead at each step to make
parsing action decisions.
• LL(1) grammars are not ambiguous and not left recursive.
LL(1) vs. Recursive Descent
• In recursive-descent,
• At each step, many choices of production to use
• Backtracking used to undo bad choices
• In LL(1),
• At each step, only one choice of production
• That is
• When a nonterminal A is leftmost in a derivation
• And the next input symbol is t
• There is a unique production A   to use (or no production to use (an error state))
• LL(1) is a recursive descent variant without backtracking
FIRST and FOLLOW
• During top-down parsing, FIRST and FOLLOW allow us to choose which
production to apply, based on the next input token.
• FIRST
• Definition: FIRST(X) to be the set of terminals that begin strings derived from X
FIRST(X) = {b | X + b},   FIRST(X) if X + 
• Compute FIRST(X):
• If X is a terminal: FIRST(X) = {X}
• If X is a nonterminal:
•   FIRST(X)
• If X  
• If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
• b  FIRST(X)
• If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
FIRST example
• Production Rules of Grammar
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
• Compute FIRST
FIRST example
E  TE’
Compute FIRST(X):
E’  +TE’ |  If X is a terminal: FIRST(X) = {X}
T  FT’ If X is a nonterminal:
  FIRST(X):
T’  *FT’ |  FIRST (+) = {+} If X  
If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
F  (E) | id FIRST (*) ={*} b  FIRST(X):
FIRST (() = {(} If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
FIRST ()) ={)}
FIRST (id) ={id}
FIRST(F)={(, id}
FIRST(T)={(, id}
FIRST(E)={(, id}
FIRST(E’)={+, ε}
FIRST(T’)={*, ε }
FIRST example
E  TF
F  +E | 
T  intY | (E)
Y  *T | 
• Compute FIRST
FIRST example
E  TF Compute FIRST(X):
F  +E |  If X is a terminal: FIRST(X) = {X}
If X is a nonterminal:
T  intY | (E)   FIRST(X):
If X  
Y  *T |  FIRST (+) = {+} If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
b  FIRST(X):
FIRST (*) ={*} If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
FIRST (() = {(}
FIRST ()) ={)}
FIRST (int) = {int}
FIRST (T) ={int, (}

FIRST (E) ={int, (}
FIRST (F) ={+, }
FIRST (Y) ={*, }
FOLLOW example
• FOLLOW
• Definition: FOLLOW (X) to be the set of terminals b that can appear immediately to
the right of nonterminal X in some sentential form.
FOLLOW (X) = {b | S + b}, where X is a nonterminal
• Compute FOLLOW(X):
• $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
• For each production   : FIRST()\{}  FOLLOW()
• For each production   , where FIRST(): FOLLOW()  FOLLOW()
• For each production   : FOLLOW()  FOLLOW()
FOLLOW example
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id
• Compute FOLLOW
FOLLOW example
E  TE’ Compute FOLLOW(X):
E’  +TE’ |  $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
T  FT’ For each production   : FIRST()\{}  FOLLOW()
For each production   , where FIRST(): FOLLOW()  FOLLOW()
T’  *FT’ |  For each production   : FOLLOW()  FOLLOW()
F  (E) | id FIRST (+) = {+}
FIRST (*) ={*}
FIRST (() = {(}
FIRST ()) ={)}
FIRST (id) ={id}
FIRST(F)={(, id} FOLLOW(E)={$, )}

FIRST(T)={(, id} FOLLOW(T)={+, $, )}
FIRST(E)={(, id} FOLLOW(E’)={$, )}
FIRST(E’)={+, ε} FOLLOW(F)={*, +, $, )}
FIRST(T’)={*, ε } FOLLOW(T’)={+, $, )}
FOLLOW example
• Recall the grammar
E  TF
F  +E | 
T  intY | (E)
Y  *T | 
• Compute FOLLOW
FOLLOW example
• Recall the grammar Compute FOLLOW(X):
E  TF $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
For each production   : FIRST()\{}  FOLLOW()
F  +E |  For each production   , where FIRST(): FOLLOW()  FOLLOW()
For each production   : FOLLOW()  FOLLOW()
T  intY | (E)
Y  *T |  FIRST (+) = {+}
FIRST (*) ={*}
FIRST (() = {(}
FIRST ()) ={)}
FIRST (int) = {int}
FIRST (T) ={int, (} FOLLOW(E) = {$, )}

FIRST (E) ={int, (} FOLLOW(T) ={+, $, )}
FIRST (F) ={+, } FOLLOW(F) ={$, )}
FIRST (Y) ={*, } FOLLOW(Y) ={+, $, )}
FOLLOW example
stmt  ifst | other
ifst  if (exp) stmt elsepart
elsepart  else stmt | 
exp  0 | 1
• Compute FIRST, FOLLOW
FOLLOW set example
• Consider the grammar Compute FIRST(X):
stmt  ifst | other If X is a terminal: FIRST(X) = {X}
ifst  if (exp) stmt elsepart If X is a nonterminal:
  FIRST(X):
elsepart  else stmt |  If X  
exp  0 | 1 If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
b  FIRST(X):
FIRST(other) ={other} If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
FIRST(if) ={if}
FIRST(else) ={else} Compute FOLLOW(X):
FIRST(() ={(} $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
FIRST()) ={)} For each production   : FIRST()\{}  FOLLOW()
FIRST(0) ={0} For each production   , where FIRST(): FOLLOW()  FOLLOW()
For each production   : FOLLOW()  FOLLOW()
FIRST(1) ={1}
FIRST(stmt) ={other, if} FOLLOW(stmt) = {$, else}

FOLLOW(ifst) ={$, else}
FIRST(ifst) ={if}
FOLLOW(elsepart) ={$, else}
FIRST(exp) = {0,1} FOLLOW(exp) = {)}
FIRST(elsepart) ={, else}
LL(1) Parsing Table
• Construct a parsing table M for context free grammar G.
• M is a two-dimensional array M[A,t], where A is a nonterminal and t is a
terminal or $ (the end symbol of the input string)
NON- INPUT SYMBOL
TERMINAL t $
A  
• For each production   in G do:

• For each terminal tFIRST() do M[A, t] = 
• If FIRST(), for each tFOLLOW(A) do M[A, t] = 
• If FIRST() and $FOLLOW(A) do M[A, $] = 
LL(1) Parsing Table
• Given left-factored grammar • For each production   in G do:
E  TE’ • For each terminarl t  FIRST() do M[A, t] = 
E’  +TE’ |  • If FIRST(), for each tFOLLOW(A) do M[A, t] = 
T  FT’ • If FIRST() and $FOLLOW(A) do M[A, $] = 
T’  *FT’ | 
F  (E) | id NON- INPUT SYMBOL
FOLLOW(E)={$, )} TERMINAL id + * ( ) $
FIRST (+) = {+}
FOLLOW(T)={+, $, )}
FIRST (*) ={*} E TE’ TE’
FOLLOW(E’)={$, )}
FIRST (() = {(}
FIRST ()) ={)}
FOLLOW(F)={*, +, $, )} E’ +TE’  
FOLLOW(T’)={+, $, )}
FIRST (id) ={id} T FT’ FT’
FIRST(TE’)= FIRST(T)={(, id} T’  *FT’  
FIRST(F)={(, id} FIRST(+TE’)= FIRST(+)={+}
FIRST(T)={(, id} FIRST(FT’)= FIRST(F)={(, id} F id (E)
FIRST(E)={(, id} FIRST(*FT’)= FIRST(*)={*}
FIRST(E’)={+, ε} FIRST((E))= FIRST(()={(}
FIRST(T’)={*, ε } FIRST(ε)= {ε}
LL(1) Parsing Table
E  TF • For each terminarl t  FIRST() do M[A, t] = 
F  +E |  • If FIRST(), for each tFOLLOW(A) do M[A, t] = 
T  intY | (E) • If FIRST() and $FOLLOW(A) do M[A, $] = 
Y  *T | 
NON- INPUT SYMBOL
FIRST (+) = {+} FOLLOW(E) = {$, )} TERMINAL + int ( ) * $
FIRST (*) ={*} FOLLOW(T) ={+, $, )}
FIRST (() = {(} E TF TF
FOLLOW(F) ={$, )}
FIRST ()) ={)} FOLLOW(Y) ={+, $, )} F +E  
FIRST (int) = {int} T intY (E)
FIRST(TF)=FIRST(T)={int, (}
FIRST (T) ={int, (} FIRST(+E)=FIRST(+)={+} Y   *T 
FIRST (E) ={int, (} FIRST(int Y)=FIRST(int)={int}
FIRST (F) ={+, } FIRST((E))=FIRST(( )={(}
FIRST (Y) ={*, } FIRST(*T)=FIRST(*)={*}
FIRST(ε)= {ε}
LL(1) Parsing Table
NON- INPUT SYMBOL E  TE’
TERMINAL E’  +TE’ | 
id + * ( ) $
T  FT’
E TE’ TE’ T’  *FT’ | 
E’ +TE’   F  (E) | id
T FT’ FT’
T’  *FT’  
F id (E)
• Consider the [E, id] entry

• “When current nonterminal is E and next input is id, use production E  TE’ “
• This can generate an id in the first position
LL(1) Parsing Table
TERMINAL id + * ( ) $ E’  +TE’ | 
T  FT’
E TE’ TE’ T’  *FT’ | 
E’ +TE’   F  (E) | id
T FT’ FT’
T’  *FT’  
F id (E)
• Consider the [T’, +] entry

• “When current nonterminal is T’ and current token is +, get rid of T’ “
• T’ can be followed by + only of T’  
LL(1) Parsing Table
TERMINAL id + * ( ) $ E’  +TE’ | 
T  FT’
E TE’ TE’ T’  *FT’ | 
E’ +TE’   F  (E) | id
T FT’ FT’
T’  *FT’  
F id (E)
• Consider the [F, *] entry

• “There is no way to derive a string starting with * from nonterminal F “
• Blank entries indicate error situations
Using Parsing Tables
Model of a table-driven predictive parser

Table-driven predictive parsing algorithm
• INPUT: a string w and a parsing table M for grammar G
• OUTPUT:
• If w is in L(G): a leftmost derivation of w,
• Otherwise, an error indication
• METHOD
• the parser is in a cofiguration with w$ in the input buffer
• and the start symbol S of G on top of the stack, above $
• Follow program uses the predictive parsing table M to produce a predictive parse for the input
let a be the first symbol of w;

let X be the top stack symbol;
while (X  $) /* stack is not empty */
{ if (X = a) pop the stack and let a be the next symbol of w;
else if (X is a terminal) error();
else if (M [X, a] is an error entry) error();
else if (M [X, a] = Y1Y2…Yk) { output the production XY1Y2…Yk;
pop the stack;
push Yk,Yk-1,…,Y1 onto the stack, with Y1 on top;
}
let X be the top stack symbol;
}
MATCHED STACK INPUT ACTION
Example E$ id+id*id$
TE’$ id+id*id$ Output E  TE’
• Given left-factored grammar FT’E’$ id+id*id$ Output T  FT’
E  TE’ idT’E’$ id+id*id$ Output F  id
E’  +TE’ | 
T’E’$
T  FT’ id +id*id$ Match id
T’  *FT’ |  id E’$ +id*id$ Output T’  

F  (E) | id id +TE’$ +id*id$ Output E’  +TE’
id+ TE’$ id*id$ Match +
• Input: id+id*id
id+ FT’E’$ id*id$ Output T  FT’
NON- INPUT SYMBOL
TERMINAL id+ idT’E’$ id*id$ Output F  id
id + * ( ) $ id+id T’E’$ *id$ Match id
E TE’ TE’ id+id *FT’E’$ *id$ Output T’  *FT’
E’ +TE’   id+id* FT’E’$ id$ Match *
T FT’ FT’ id+id* idT’E’$ id$ Output F  id

id+id*id T’E’$ $ Match id
T’  *FT’  
id+id*id E’$ $ Output T’  
F id (E) id+id*id $ $ Output E’  
Example
E  TE’ E  TE’  FT’E’  idT’E’  idE’
E’  +TE’ |   id+TE’  id+FT’E’  id+idT’E’
T  FT’  id+id*FT’E’  id+id*idT’E’
E
T’  *FT’ |   id+id*idE’  id+id*id
F  (E) | id E’
T
• And the string: id+id*id
• Give a parse tree for the string T’ + T E’
F
id  F T’ id
id * F T’
id 
Example MATCHED STACK
E$
INPUT
(int)$
ACTION
TF$ (int)$ Output E  TF

• Consider the grammar (E)F$ (int)$ Output T  (E)
E  TF ( E)F$ int)$ Match (
F  +E |  ( TF)F$ int)$ Output E  TF
T  intY | (E) ( intYF)F$ int)$ Output T  intY
Y  *T |  (int YF)F$ )$ Match int

(int F)F$ )$ Output Y  
• Input: (int) (int )F$ )$ Output F  
(int) $ $ Output F  
NON- INPUT SYMBOL

TERMINAL
+ int ( ) * $
E TF TF
F +E  
T intY (E)
Y   *T 
Example
E  TF
F  +E |  E  TF  (E)F  (TF)F  (intYF)F
E
T  intY | (E)  (intF)F  (int)F  (int)
Y  *T |  T F
• Input: (int) 
( )
• Give a parse tree for the string E
T F
Y 
int

MATCHED STACK INPUT ACTION
Example E$ int+int$
TF$ int+int$ Output E  TF
• Consider the grammar intYF$ int+int$ Output T  intY
E  TF int YF$ +int$ Match int
F  +E |  int F$ +int$ Output Y  
T  intY | (E) int +E$ +int$ Output F  +E
Y  *T |  int+ E$ int$ Match +

int+ TF$ int$ Output E  TF
• Input: int+int int+ intYF$ int$ Output T  intY
int+int YF$ $ Match int
NON- INPUT SYMBOL int+int F$ $ Output Y  
TERMINAL
+ int ( ) * $ int+int $ $ Output F  
E TF TF
F +E  
T intY (E)
Y   *T 
Example
E  TF E  TF  intYF  intF
F  +E |   int+E  int+TF E
T  intY | (E)  int+intYF  int+intF
Y  *T |   int+int T F
• Input: int+int
int Y + E
• Give a parse tree for the string

T F
int 
Y

LL(1) Parsing Table
• Consider the grammar: S  Sa | b • For each terminarl t  FIRST() do M[A, t] = 
• The parsing table • If FIRST() and $FOLLOW(A) do M[A, $] = 
a b $
S b
Sa
This grammar is not LL(1)

LL(1) Parsing Table
• If any entry is multiply defined then G is not LL(1)
• Any grammar is not left factored, will not be LL(1)
• Any grammar is left recursive, will not be LL(1)
• Any grammar is ambiguous, will not be LL(1)
• Grammar required more than one token of lookahead, will not be LL(1)
• Other grammars are not LL(1) too
 Mechanical way to check that the grammar is LL(1)

• Build the LL(1) parsing table
• And see if all the entries in the table is unique
Exercise 1
A → AaE | E
E → E*T | b
T → aAb | c
and the input string b*abb
 LL(1) grammar
 Show the parsing table
 Devise predictive parsers (for the input string)
 Give a parse tree (for the input string)
Exercise 2
B → FA
F → a-E | a-b | a-c
E → -Aa
A→b|c| 
and the input string a--cab
 LL(1) grammar
Exercise 3
S → SE | Sb | a
E → +T | +b
T → (S) | c
and the input string a+(ab)
 LL(1) grammar
Exercise 1
A → AaE | E
E → E*T | b
T → aAb | c
 LL(1) grammar
A  EA’
A’  aEA’ | 
E  bE’
E’  *TE’ | 
T  aAb | c
Exercise 1
• LL(1) grammar Compute FIRST(X):
A  EA’
If X is a terminal: FIRST(X) = {X}
A’  aEA’ |    FIRST(X):
If X  
E  bE’ If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
b  FIRST(X):
E’  *TE’ |  If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
T  aAb | c
Compute FOLLOW(X):
FIRST (a) = {a} $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
 FIRST, FOLLOW FIRST (b) = {b} For each production   : FIRST()\{}  FOLLOW()
FIRST (c) ={c} For each production   , where FIRST(): FOLLOW()  FOLLOW()
FIRST (*) ={*} For each production   : FOLLOW()  FOLLOW()
FOLLOW(A)={$, b} FIRST(A)={b}
FOLLOW(A’)={$, b} FIRST(A’)={a, ε}
FOLLOW(E)={a, $, b} FIRST(E)={b}
FOLLOW(E’)={a, $, b} FIRST(E’)={*, ε}
FOLLOW(T)={*, a, $, b} FIRST(T)={a, c}
Exercise 1
• LL(1) grammar
A  EA’
FIRST (a) = {a}
A’  aEA’ |  FIRST (b) = {b} • For each production   in G do:
• For each terminarl t  FIRST() do M[A, t] = 
E  bE’ FIRST (c) ={c}
E’  *TE’ |  FIRST (*) ={*}
• If FIRST() and $FOLLOW(A) do M[A, $] = 
T  aAb | c FIRST(A)={b}
FIRST(A’)={a, ε}
 Show the parsing table FIRST(E)={b} NON- INPUT SYMBOL
FIRST(E’)={*, ε}
TERMINAL a b c * $
FIRST(T)={a, c}
A EA’
FIRST(EA’)=FIRST(E)={b}
FOLLOW(A)={$, b} FIRST(aEA’)=FIRST(a) ={a} A’ aEA’ ε ε
FOLLOW(A’)={$, b} FIRST(bE’)=FIRST(b) ={b} E bE’
FOLLOW(E)={a, $, b} FIRST(*TE’)= FIRST(*) ={*}
FIRST(aAb)=FIRST(a)={a}
E’ ε ε *TE’ ε
FOLLOW(E’)={a, $, b}
FOLLOW(T)={*, a, $, b} FIRST(ε)= {ε} T aAb c
Exercise 1 MATCHED STACK
A$
INPUT
b*abb$
ACTION
• LL(1) grammar EA’$ b*abb$ Output A  EA’
A  EA’ bE’A’$ b*abb$ Output E  bE’

A’  aEA’ |  b E’A’$ *abb$ Match b
E  bE’ b *TE’A’$ *abb$ Output E’  *TE’
E’  *TE’ |  b* TE’A’$ abb$ Match *
T  aAb | c b* aAbE’A’$ abb$ Output T  aAb
b*a AbE’A’$ bb$ Match a
 Devise predictive parsers for the input string b*abb b*a EA’bE’A’$ bb$ Output A  EA’
NON- INPUT SYMBOL b*a bE’A’bE’A’$ bb$ Output E  bE’
TERMINAL a b c * $ b*ab E’A’bE’A’$ b$ Match b
b*ab A’bE’A’$ b$ Output E’  
A EA’
b*ab bE’A’$ b$ Output A’  
A’ aEA’ ε ε
b*abb E’A’$ $ Match b
E bE’ b*abb A’$ $ Output E’  
E’ ε ε *TE’ ε b*abb $ $ Output A’  
T aAb c
Exercise 1
• LL(1) grammar A
A  EA’
E
A’  aEA’ |  A’
E  bE’ b E’
E’  *TE’ |  
T  aAb | c * T E’
 Give a parse tree for the input string b*abb a
A b 
A  EA’  bE’A’  b*TE’A’  b*aAbE’A’
E
 b*aEA’bE’A’  b*abE’A’bE’A’ A’
 b*abA’bE’A’  b*abbE’A’  b*abbA’ b E’
a 
 b*abb

Exercise 2
B → FA Consider -productions:  1  2  …| n
F → a-E | a-b | a-c  we don’t know whether to expand  to 1 or 2
E → -Aa - expanding  to ’
A→b|c| - then exapanding ’ to 1 or 2
 LL(1) grammar   ’
B → FA ’  1 | 2 | …| n
F  a-F’
F’  E | b | c
E → -Aa
A→b|c|
Exercise 2
• LL(1) grammar Compute FIRST(X):
B → FA If X is a terminal: FIRST(X) = {X}
F  a-F’   FIRST(X):
If X  
F’  E | b | c If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
E → -Aa b  FIRST(X):
If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
A→b|c|
Compute FOLLOW(X):
FIRST (a) = {a} $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
 FIRST, FOLLOW FIRST (b) = {b} For each production   : FIRST()\{}  FOLLOW()
FIRST (c) ={c} For each production   , where FIRST(): FOLLOW()  FOLLOW()
FIRST (-) ={-} For each production   : FOLLOW()  FOLLOW()
FOLLOW(B)={$} FIRST(B)={a }
FOLLOW(A)={a, $} FIRST(A)={b, c, ε}
FOLLOW(F)={b, c, $} FIRST(F)={a}
FOLLOW(F’)={b, c, $} FIRST(F’)={-, b, c}
FOLLOW(E)={b, c, $} FIRST(E)={-}
Exercise 2
• LL(1) grammar
B → FA • For each production   in G do:
F  a-F’ • For each terminarl t  FIRST() do M[A, t] = 
F’  E | b | c • If FIRST(), for each tFOLLOW(A) do M[A, t] = 
FIRST (a) = {a} • If FIRST() and $FOLLOW(A) do M[A, $] = 
E → -Aa FIRST (b) = {b}
FIRST (c) ={c}
A→b|c| FIRST (-) ={-}
FIRST(B)={a } NON- INPUT SYMBOL

 Show the parsing table FIRST(A)={b, c, ε} TERMINAL a b c - $
FIRST(F)={a}
FIRST(F’)={-, b, c} B FA
FIRST(E)={-}
A ε b c ε
FOLLOW(B)={$}
FOLLOW(A)={a, $} FIRST(FA)=FIRST(F)={a} F a-F’
FOLLOW(F)={b, c, $} FIRST(a-F’)=FIRST(a) ={a}
F’ b c E
FOLLOW(F’)={b, c, $} FIRST(-Aa)=FIRST(-) ={-}
FOLLOW(E)={b, c, $} FIRST(ε)= {ε} E -Aa
B$
INPUT
a--cab$
ACTION
• LL(1) grammar FA$ a--cab$ Output B  FA
B → FA a-F’A$ a--cab$ Output F  a-F’

F  a-F’ a -F’A$ --cab$ Match a
F’  E | b | c a- F’A$ -cab$ Match -
E → -Aa a- EA$ -cab$ Output F’  E
A→b|c| a- -AaA$ -cab$ Output E  -Aa
 Devise predictive parsers for the input string a--cab a-- AaA$ cab$ Match -
a-- caA$ cab$ Output A  c
NON- INPUT SYMBOL a--c aA$ ab$ Match c
TERMINAL a b c - $ a--ca A$ b$ Match a
B FA a--ca b$ b$ Output A  b
a--cab $ $ Match b
A ε b c ε
F a-F’
F’ b c E
E -Aa
Exercise 2
• LL(1) grammar B
B → FA
F
F  a-F’ A
F’  E | b | c a F’
E → -Aa - b
A→b|c| E
 Give a parse tree for the input string a--cab - a
A
B  FA  a-F’A  a-F’A  a-EA

 a--AaA  a--caA  a--cab c
Exercise 3
S → SE | Sb | a
E → +T | +b
T → (S) | c
 LL(1) grammar
S  aS’ Consider -productions:  1  2  …| n
S’  ES’ | bS’ |   we don’t know whether to expand  to 1 or 2
E  +E’  we may defer the decision by
E’  T | b - expanding  to ’
T  (S) | c  the original productions become
  ’
’  1 | 2 | …| n
Exercise 3
Compute FIRST(X):
• LL(1) grammar If X is a terminal: FIRST(X) = {X}
S  aS’   FIRST(X):
If X  
S’  ES’ | bS’ |  If X  Y1Y2…Yk and   FIRST(Yi) for all 1≤ i ≤ k
E  +E’ b  FIRST(X):
If X  Y1Y2…Yk and b  FIRST(Yi) and   FIRST(Yj) for all 1≤ j ≤ i-1
E’  T | b
T  (S) | c FIRST (a) = {a}
FIRST (b) = {b}
Compute FOLLOW(X):
FIRST (c) = {c} $  FOLLOW(S), where S is the start symbol, $ is the end symbol of the input string
 FIRST, FOLLOW FIRST (+) ={+} For each production   : FIRST()\{}  FOLLOW()
FIRST (() ={(} For each production   , where FIRST(): FOLLOW()  FOLLOW()
FIRST ()) ={)} For each production   : FOLLOW()  FOLLOW()
FOLLOW(S)={$, )} FIRST(S)={a}
FOLLOW(S’)={$, )} FIRST(S’)={+, b, ε}
FOLLOW(E)={+, b, $, )} FIRST(E)={+}
FOLLOW(E’)={+, b, $, )} FIRST(E’)={(, c, b}
FOLLOW(T)={+, b, $, )} FIRST(T)={(, c}
Exercise 3
• LL(1) grammar FIRST (a) = {a}
S  aS’ FIRST (b) = {b} • For each production   in G do:
S’  ES’ | bS’ |  FIRST (c) = {c} • For each terminarl t  FIRST() do M[A, t] = 
E  +E’ FIRST (+) ={+} • If FIRST(), for each tFOLLOW(A) do M[A, t] = 
FIRST (() ={(} • If FIRST() and $FOLLOW(A) do M[A, $] = 
E’  T | b FIRST ()) ={)}
T  (S) | c
FIRST(S)={a}
 Show the parsing table FIRST(S’)={+, b, ε}
FIRST(E)={+} NON- INPUT SYMBOL
FIRST(E’)={(, c, b} TERMINAL a b c + ( ) $
FIRST(T)={(, c}
S aS’
FIRST(aS’)=FIRST(a)={a}
FOLLOW(S)={$, )} FIRST(ES’)=FIRST(E) ={+} S’ bS’ ES’ ε ε
FOLLOW(S’)={$, )} FIRST(bS’)=FIRST(b) ={b} E +E’
FOLLOW(E)={+, b, $, )} FIRST(+E’)= FIRST(+) ={+}
FOLLOW(E’)={+, b, $, )}
E’ b T T
FIRST((S))=FIRST(()={(}
FOLLOW(T)={+, b, $, )} FIRST(ε)= {ε} T c (S)
S$
INPUT
a+(ab)$
ACTION
• LL(1) grammar aS’$ a+(ab)$ Output S  aS’
S  aS’ a S’$ +(ab)$ Match a

Output S’  ES’
S’  ES’ | bS’ |  a ES’$ +(ab)$
+E’S’$ Output E  +E’
E  +E’ a +(ab)$
E’S’$
E’  T | b
a+ (ab)$ Match +
a+ TS’$ (ab)$ Output E’  T
T  (S) | c
a+ (S)S’$ (ab)$ Output T  (S)
 Devise predictive parsers for the input string a+(ab)
a+( S)S’$ ab)$ Match (
NON- INPUT SYMBOL a+( aS’)S’$ ab)$ Output S  aS’
TERMINAL a b c + ( ) $ a+(a S’)S’$ b)$ Match a
a+(a bS’)S’$ b)$ Output S’  bS’
S aS’
a+(ab S’)S’$ )$ Match b
S’ bS’ ES’ ε ε
a+(ab )S’$ )$ Output S’  
E +E’ a+(ab) S’$ $ Match )
E’ b T T a+(ab) $ $ Output S’  
T c (S)
Exercise 3
• LL(1) grammar S
S  aS’  aES’  a+E’S’
S  aS’ S’
 a+TS’  a+(S)S’ a
S’  ES’ | bS’ | 
 a+(aS’)S’  a+(abS’)S’
E  +E’ E S’
 a+(ab)S’  a+(ab)
E’  T | b + E’
T  (S) | c 
 Give a parse tree for the input string a+(ab) T
( S )
a S’
b S’


Chương 3. Phân Tích Cú Pháp

Uploaded by

Copyright:

Available Formats

Chương 3. Phân Tích Cú Pháp

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chương 3. Phân Tích Cú Pháp

Uploaded by

Copyright:

Available Formats

Chương 3

Phân tích cú pháp

string of string of parse tree

LEXER <=, ‘=‘>

2. These symbols are nonterminals:

 set of terminals Σ = {(, )}

 set of terminals Σ = {+, -, *, /,(, ), id}

• The string -(id+id) is a sentence of grammar because there is a derivation

• In leftmost derivations, the leftmost nonterminal in each sentential is always

b) The set of all strings of 0s and 1s with an equal number of 0s and 1s

string of string of parse tree

• And the string: 7 + (4 + 5)

int7 ( E ) int4 int5 7 4 5

In parse trees In syntax trees

Top Down Parsing Bottom Up Parsing

Recursive Descent Operator

Non Back-tracking Back-tracking

• Recursive descent parsing does not work in such cases.

•  generates all strings starting with a  and followed by a number of .

• For example, in expr  expr+term | term

•  generates all strings starting with one of 1,…,m and followed by

A grammar is left-recursive if and only if

• The nonterminal S is left-recursive because S  Aa  Sda or S + Sda,

S  ^(T)  ^(ST')  ^(^(T)T’)  ……  ^

2. Eliminate the left-recursion from the grammar:

 on seeing the input if, we cannot immediately tell which production to

FIRST (T) ={int, (}

FIRST(F)={(, id} FOLLOW(E)={$, )}

FIRST (T) ={int, (} FOLLOW(E) = {$, )}

FIRST(stmt) ={other, if} FOLLOW(stmt) = {$, else}

• For each production   in G do:

• Consider the [E, id] entry

• Consider the [T’, +] entry

• Consider the [F, *] entry

Model of a table-driven predictive parser

let a be the first symbol of w;

T’  *FT’ |  id E’$ +id*id$ Output T’  

T FT’ FT’ id+id* idT’E’$ id$ Output F  id

TF$ (int)$ Output E  TF

T  intY | (E) ( intYF)F$ int)$ Output T  intY

Y  *T |  (int YF)F$ )$ Match int

NON- INPUT SYMBOL

F  +E |  int F$ +int$ Output Y  

T  intY | (E) int +E$ +int$ Output F  +E

Y  *T |  int+ E$ int$ Match +

This grammar is not LL(1)

 Mechanical way to check that the grammar is LL(1)

• LL(1) grammar EA’$ b*abb$ Output A  EA’

A  EA’ bE’A’$ b*abb$ Output E  bE’

FIRST(B)={a } NON- INPUT SYMBOL

• LL(1) grammar FA$ a--cab$ Output B  FA

B → FA a-F’A$ a--cab$ Output F  a-F’

A→b|c| a- -AaA$ -cab$ Output E  -Aa

B  FA  a-F’A  a-F’A  a-EA

• LL(1) grammar aS’$ a+(ab)$ Output S  aS’

S  aS’ a S’$ +(ab)$ Match a

You might also like

T’  FT’ |  id E’$ +idid$ Output T’  