Principles of Programming Languages: Syntax Analysis
Principles of Programming Languages: Syntax Analysis
Principles of Programming Languages: Syntax Analysis
Syntax Analysis
Outline
Grammar
– Context-free grammar
– Derivation and Derivation Tree
Grammar for Arithmetic Expression
– Operation precedence and associativity
Syntax Analysis
Ambiguity in Grammar
Parser Construction
The Big Picture Again
source
code
Scanner Parser Opt1 Opt2 ... Optn
machine
Instruction Register Instruction code
Selection Allocation Scheduling
COMPILER
Syntax and Grammar
A kind of grammar
Not as complex as context-sensitive and
phase-structure grammar
More powerful than regular grammar
Formal Definition of CFG
• G = (VN ,VT,S, P)
• VN: finite set of nonterminal symbols
VT: finite set of tokens (VTVN=)
SVN: start symbol
P: finite set of rules (or productions) of
BNF (Backus – Naur Form) form A
(a)* where A VN, a(VTVN)
Example 1
G = ({exp,op,exp},{+,-,*,/,id})
exp exp op exp
exp id
op +|-|*|/
Derivation
Expr Expr
Op
Number Identifier
*
34 ax
Example 4
exp
exp
Example 4
exp
exp op exp
Example 4
exp
exp op exp
id
Example 4
exp
exp op exp
id +
Example 4
exp op exp
id + id
Classic Expression Grammar
exp
exp op exp
id + exp op exp
id * id
Operation Precedence
exp
exp op exp
exp op exp * id
id + id
Operation Precedence
exp
exp + term
factor factor id
id id
Operation Precedence
term
term * factor
factor id
( exp )
Operator Associativity
exp
exp op exp
exp op exp + id
id + id
Operator Associativity
exp
exp op exp
id + exp op exp
id + id
Operator Associativity
exp
exp + term
factor
exp + term
id
term factor
factor id
id
Precedence and Associativity
Example:
– “for while i == == == 12 + for ( abcd)”
– Lexer will produce a stream of tokens:
<TOKEN_FOR> <TOKEN_WHILE> <TOKEN_IDENT, “i”>
<TOKEN_COMPARE> <TOKEN_COMPARE> <TOKEN_COMPARE>
<TOKEN_NUMBER,”12”> <TOKEN_OP, “+”> <TOKEN_FOR>
<TOKEN_OPAREN> <TOKEN_ID, “abcd”> <TOKEN_CPAREN>
– But clearly we do not have a valid program
– This program is lexically correct, but syntactically
incorrect
A Grammar for Expressions
What we just saw is the process of, starting with the start
symbol and, through a sequence of rule derivation obtain a
string of terminal symbols
– We could generate all correct programs (infinite set though)
Expr Expr
3 * 5 5 + 8
The problem is that the syntax impacts meaning (for the later
stages of the compiler)
For our example string, we’d like to see the left tree because
we most likely want * to have a higher precedence than +
We don’t like ambiguity because it makes the parsers difficult to
design because we don’t know which parse tree will be
discovered when there are multiple possibilities
So we often want to disambiguate grammars
Problems with Ambiguity
Factor Number
Number 3 9
Number 8
5
4
Non-Ambiguous Grammar
Expr Term | Expr + Term | Expr - Term
Term Term * Factor
| Term / Factor Expr
| Factor
Factor Number | Identifier Expr - Term
Factor Number
Number 3 9
Number 8
5
4
In-class Exercise
S
Consider the CFG:
S (L) |
( L )
a
L L,S|S L , S
S a
Draw parse trees for:
(a, a) a
(a, ((a, a), (a, a)))
S
( L )
In-class Exercise
L , S
( L ) L , S
Draw parse trees for: S a
L , S
(a, a)
S a a
(a, ((a, a), (a, a)))
a
In-class Exercise
P () | PP | (P)
In-class Exercise
not A A and A
A and A not A 1
0 1 0
Another Example Grammar