Chapter4 ND - Edu Dthain
Chapter4 ND - Edu Dthain
Chapter4 ND - Edu Dthain
Chapter 4 – Parsing
Grammar G2
1. P→E
2. E→E+E
3. E → ident
4. E → int
In top-down derivation, we begin with the start symbol, and then ap-
ply rules in the CFG to expand non-terminals until reaching the desired
sentence. For example, ident + int + int is a sentence in this language, and
here is one derivation to prove it:
P P
E E
E + E E+E
Grammar G3
1. P→E
2. E→E+T
3. E→T
4. T → ident
5. T → int
Grammar G4
1. P→E
2. E→E+T
3. E→T
4. T→T*F
5. T→F
4. F → ident
5. F → int
Grammar G5
1. P→S
2. S → if E then S
3. S → if E then S else S
4. S → other
Exercise: Write out the two possible parse trees for the sentence if E
then if E then other else other.
4.3 LL Grammars
LL(1) grammars are a subset of CFGs that are easy to parse with simple
algorithms. A grammar is LL(1) if can be parsed by considering only only
one non-terminal and the next token in the input stream.
To ensure that a grammar is LL(1), we must do the following:
Once we have taken those steps, then we can prove that it is LL(1) by
generating the F IRST and F OLLOW sets for the grammar, and using them
to create the LL(1) parse table. If the parse table contains no conflicts, then
we have proven that the grammar is LL(1).
Substitute with:
A → β1 A′ |β2 A′ |...
A’ → α1 A′ |α2 A′ |...|ǫ
Grammar G6
1. P →E
2. E → T E’
3. E’ → + T E’
4. E’ → ǫ
5. T → ident
6. T → int
A → αA′
A′ → β1 |β2 |...
Grammar G7
1. P→E
2. E → id
3. E → id [ E ]
4. E → id ( E )
Grammar G8
1. P →E
2. E → id E’
3. E’ → [ E ]
4. E’ → ( E )
5. E’ → ǫ
Repeat:
For each rule X → Y1 Y2 ...Yk in a grammar G:
Add a to F IRST(X)
if a is in F IRST(Y1 )
or a is in F IRST(Yn ) and Y1 ...Yn−1 → ǫ
If Y1 ...Yk → ǫ then add ǫ to F IRST(X).
until no more changes occur.
Repeat:
If A → αBβ then:
add F IRST(β) to F OLLOW(B).
If A → αB or F IRST(β) contains ǫ then:
add F OLLOW(A) to F OLLOW(B).
until no more changes occur.
Grammar G9
1. P →E
2. E → T E’
3. E’ → + T E’
4. E’ → ǫ
5. T → F T’
6. T’ → * F T’
7. T’ → ǫ
8. F →(E)
9. F → int
int parse_P() {
return parse_E() && expect_token(TOKEN_EOF);
}
int parse_E() {
return parse_T() && parse_E_prime();
}
int parse_E_prime() {
token_t t = scan_token();
if(t==TOKEN_PLUS) {
return parse_T() && parse_E_prime();
} else {
putback_token(t);
return 1;
}
}
int parse_T() {
return parse_F() && parse_T_prime();
}
int parse_T_prime() {
token_t t = scan_token();
if(t==TOKEN_MULTIPLY) {
return parse_F() && parse_T_prime();
} else {
putback_token(t);
}
}
int parse_F() {
token_t = scan_token();
if(t==TOKEN_LPAREN) {
return parse_E() && expect_token(TOKEN_RPAREN);
} else if(t==TOKEN_INT) {
return 1;
} else {
printf("parse error: unexpected token %s\n",
token_string(t));
return 0;
}
}
For example, here is the parse table for Grammar G9 . Notice that the
entries for P , E, T , and F are straightforward: each can only start with
int or (, and so these tokens cause the rules to descend toward F and a
choice between rule 8 (F → int) and rule 9 (F → (E)). The entry for E ′ is
a little more complicated: a + token results in applying E → +T E ′ , while
) or $ indicates E → ǫ.
Now we have all the pieces necessary to operate the parser. Informally,
the idea is to keep a stack which tracks the current state of the parser. In
each step, we consider the top element of the stack and the next token on
the input. If they match, then pop the stack, accept the token and continue.
If not, then consult the parse table for the next rule to apply. If we can
continue until the end-of-file symbol is matched, then the parse succeeds.
Create a stack S.
Push $ and P onto S.
Let c be the first token on the input.