Ch2 Modified

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

1

A SIMPLE SYNTAX-
DIRECTED TRANSLATOR

Chapter 2
2

Building a Simple Compiler


• Building our compiler involves:
– Defining the syntax of a programming language
– Develop a source code parser: for our compiler
we will use predictive parsing
– Implementing syntax directed translation to
generate intermediate code
3

Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4

Example Grammar

Context-free grammar for simple expressions:

G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list>

with productions P =

list  list + digit

list  list - digit

list  digit

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
5

Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6

Derivation for the Example


Grammar

list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
9-5+2

This is an example leftmost derivation, because we replaced


the leftmost nonterminal (underlined) in each step.
7

Derivation for the Example


Rightmost Grammar
Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step
list
P=  digit - list
list  digit + list  digit - digit + list
 digit - digit + digit
list  digit - list
 digit - digit + 2
list  digit  digit - 5 + 2
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 9-5+2
8

Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or 
• Each interior node is labeled by a nonterminal
• If A  X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or  ( denotes the empty string)
9

Parse Tree for the Example


Grammar
Parse tree of the string 9-5+2 using grammar G

list

list digit

list digit

digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>

Leftmost derivation Rightmost derivation

In both cases, Expr * id – num * id


• The two derivations produce different parse trees
• The parse trees imply different evaluation orders!
Derivations and Parse Trees
G
Leftmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 <id,x> Op Expr
5 <id,x> – Expr E Op E
1 <id,x> – Expr Op Expr
2 <id,x> – <num,2> Op Expr
6 <id,x> – <num,2> * Expr x – Op E
E
3 <id,x> – <num,2> * <id,y>

This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>

x – 2
This evaluates as ( x – 2 ) * y
13

Ambiguity

Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

with production P =

string  string + string | string - string | 0 | 1 | … | 9


This grammar is ambiguous, because more than one parse tree
represents the string 9-5+2
14

Ambiguity (cont’d)

string string

string string string

string string string string string

9 - 5 + 2 9 - 5 + 2
15

Associativity of Operators
Left-associative operators have left-recursive productions
left  left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right  term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -

left-associative: */
16

Precedence of Operators
Operators with higher precedence “bind more tightly”
expr  expr + term | expr – term | term
term  term * factor | term / factor | factor
factor  number | ( expr )
number  0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17

Syntax of Statements

stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts  stmt ; opt_stmts
|
18

Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19

Synthesized and Inherited


Attributes
• An attribute is said to be …
– synthesized if its value at a parse-tree node is
determined from the attribute values at the children of
the node
– Suppose a node N in a parse tree is labeled by the
grammar symbol X . We write X.a to denote the value
of attribute a of X at that node.
– inherited if its value at a parse-tree node is determined
by the parent (by enforcing the parent’s semantic rules)
20

Example Attribute Grammar


Syntax-directed definition for infix to postfix translation

String concat operator


Production Semantic Rule
expr  expr1 + term expr.t := expr1.t // term.t // “+”
expr  expr1 - term expr.t := expr1.t // term.t // “-”
expr  term expr.t := term.t
term  0 term.t := “0”
term  1 term.t := “1”
… …
term  9 term.t := “9”
21

Example Annotated Parse Tree

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2
22

Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23

Depth-First Traversals (Example)

expr.t = “95-2+”

expr.t = “95-” term.t = “2”

expr.t = “9” term.t = “5”

term.t = “9”

9 - 5 + 2 Note: all attributes are


of the synthesized type
24

Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.

rest  + term { print(“+”) } rest


rest
Embedded
semantic action
+ term { print(“+”) } rest
25

Example Translation Scheme

expr  expr + term { print(“+”) }


expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }
26

Example Translation Scheme


(cont’d)

expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
- term 2
expr
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27

Parsing
• Parsing = process of determining if a string of
tokens can be generated by a grammar
• For any CF grammar there is a parser that takes at
most O(n3) time to parse a string of n tokens
• Top-down parsing “constructs” a parse tree from
root to leaves
• Bottom-up parsing “constructs” a parse tree from
leaves to root
28

Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Each nonterminal has one (recursive) procedure that is
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead
token to unambiguously determine the parse
operations
29

Example Predictive Parser


(Grammar)

type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
30

Example Predictive Parser


(Execution Step 1)

Check lookahead
type()
and call match

match(‘array’)

Input: array [ num dotdot num ] of integer

lookahead
31

Example Predictive Parser


(Execution Step 2)
type()

match(‘array’) match(‘[’)

Input: array [ num dotdot num ] of integer

lookahead
32

Example Predictive Parser


(Execution Step 3)
type()

match(‘array’) match(‘[’) simple()

match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
33

Example Predictive Parser


(Execution Step 4)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’)

Input: array [ num dotdot num ] of integer

lookahead
34

Example Predictive Parser


(Execution Step 5)
type()

match(‘array’) match(‘[’) simple()

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
35

Example Predictive Parser


(Execution Step 6)
type()

match(‘array’) match(‘[’) simple() match(‘]’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
36

Example Predictive Parser


(Execution Step 7)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’)

match(‘num’) match(‘dotdot’) match(‘num’)

Input: array [ num dotdot num ] of integer

lookahead
37

Example Predictive Parser


(Execution Step 8)
type()

match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type()

match(‘num’) match(‘dotdot’) match(‘num’) simple()

match(‘integer’)
Input: array [ num dotdot num ] of integer

lookahead
38

Adding a Lexical Analyzer


• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers and store identifier names
in a global symbol table
39

The Lexical Analyzer “lexer”

Lexical analyzer
y := 31 + 28*x
lexan()

<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>

token
(lookahead)
tokenval Parser
(token attribute) parse()

You might also like