Ch2 Modified
Ch2 Modified
Ch2 Modified
A SIMPLE SYNTAX-
DIRECTED TRANSLATOR
Chapter 2
2
Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4
Example Grammar
with productions P =
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5
Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or
• Each interior node is labeled by a nonterminal
• If A X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or ( denotes the empty string)
9
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>
x – 2
This evaluates as ( x – 2 ) * y
13
Ambiguity
with production P =
Ambiguity (cont’d)
string string
9 - 5 + 2 9 - 5 + 2
15
Associativity of Operators
Left-associative operators have left-recursive productions
left left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -
left-associative: */
16
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr expr + term | expr – term | term
term term * factor | term / factor | factor
factor number | ( expr )
number 0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17
Syntax of Statements
stmt id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts stmt ; opt_stmts
|
18
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19
expr.t = “95-2+”
term.t = “9”
9 - 5 + 2
22
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23
expr.t = “95-2+”
term.t = “9”
Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.
expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
- term 2
expr
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27
Parsing
• Parsing = process of determining if a string of
tokens can be generated by a grammar
• For any CF grammar there is a parser that takes at
most O(n3) time to parse a string of n tokens
• Top-down parsing “constructs” a parse tree from
root to leaves
• Bottom-up parsing “constructs” a parse tree from
leaves to root
28
Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Each nonterminal has one (recursive) procedure that is
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead
token to unambiguously determine the parse
operations
29
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
30
Check lookahead
type()
and call match
match(‘array’)
lookahead
31
match(‘array’) match(‘[’)
lookahead
32
match(‘num’)
lookahead
33
match(‘num’) match(‘dotdot’)
lookahead
34
lookahead
35
lookahead
36
lookahead
37
match(‘integer’)
Input: array [ num dotdot num ] of integer
lookahead
38
Lexical analyzer
y := 31 + 28*x
lexan()
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
token
(lookahead)
tokenval Parser
(token attribute) parse()