CD Unit-Iii

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

UNIT - III

Syntax-Directed Translation: Syntax-Directed Definitions, Evaluation Orders for SDD's,


Applications of Syntax-Directed Translation, Syntax-Directed Translation Schemes,
Implementing L-Attributed SDD's.
Intermediate-Code Generation: Variants of Syntax Trees, Three-Address Code, Types and
Declarations, Type Checking, Control Flow, Switch-Statements, Intermediate Code for
Procedures.

3.1 SYNTAX-DIRECTED DEFINITIONS


Syntax Directed Definition (SDD) is a kind of abstract specification. It is generalization of
context free grammar in which each grammar production X –> a is associated with it a set of
production rules of the form s = f(b1, b2, ……bk) where s is the attribute obtained from function f.
The attribute can be a string, number, type or a memory location. Semantic rules are fragments
of code which are embedded usually at the end of production and enclosed in curly braces ({ }).

• In syntax directed translation, every non-terminal can get one or more than one attribute
or sometimes 0 attribute depending on the type of the attribute. The value of these
attributes is evaluated by the semantic rules associated with the production rule.
• In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a
number, a memory location and a complex record
• In Syntax directed translation, whenever a construct encounters in the programming
language then it is translated according to the semantic rules define in that particular
programming language.
Example:

Production Semantic Rules

E→E+T E.val := E.val + T.val

E→T E.val := T.val

Page 1 of 20
T→T*F T.val := T.val + F.val

T→F T.val := F.val

F → (F) F.val := F.val

F → num F.val := num.lexval

E.val is one of the attributes of E.

Types of attributes – There are two types of attributes:

1. Synthesized Attributes – These are those attributes which derive their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.

Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E1.val and T.val

• Write the SDD using appropriate semantic rules for each production in given grammar.
• The annotated parse tree is generated and attribute values are computed in bottom up
manner.
• The value obtained at root node is the final output.
Example: Consider the following grammar
S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F --> digit
The SDD for the above grammar can be written as follow

Page 2 of 20
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is

For computation of attributes we start from leftmost bottom node. The rule F –> digit is used to
reduce digit to F and the value of digit is obtained from lexical analyzer which becomes value of

Page 3 of 20
F i.e. from semantic action F.val = digit.lexval. Hence, F.val = 4 and since T is parent node of F
so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 = 20

Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26. Then, the
production S –> E is applied to reduce E.val = 26 and semantic action associated with it prints
the result E.val . Hence, the output will be 26.

2. Inherited Attributes – These are the attributes which derive their values from their parent or
sibling nodes i.e. value of inherited attributes are computed by value of parent or sibling nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }

Computation of Inherited Attributes –


• Construct the SDD using semantic actions.
• The annotated parse tree is generated and attribute values are computed in top down
manner.

Example: Consider the following grammar


S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id

The SDD for the above grammar can be written as follow

Page 4 of 20
Let us assume an input string int a, c for computing inherited attributes. The annotated parse tree
for the input string is

The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained
as int, float or double. Then L node gives type of identifiers a and c. The computation of type is
done in top down manner or preorder traversal. Using function Enter_type the type of identifiers
a and c is inserted in symbol table at corresponding id.entry.

Page 5 of 20
3.2 SYNTAX DIRECTED TRANSLATION SCHEME
• The Syntax directed translation scheme is a context -free grammar.
• The syntax directed translation scheme is used to evaluate the order of semantic rules.
• In translation scheme, the semantic rules are embedded within the right side of the
productions.
• The position at which an action is to be executed is shown by enclosed between braces. It
is written within the right side of the production.
Example

Production Semantic Rules

S→E$ { printE.VAL }

E→E+E {E.VAL := E.VAL + E.VAL }

E→E*E {E.VAL := E.VAL * E.VAL }

E → (E) {E.VAL := E.VAL }

E→I {E.VAL := I.VAL }

I → I digit {I.VAL := 10 * I.VAL + LEXVAL }

I → digit { I.VAL:= LEXVAL}

3.3 SYNTAX-DIRECTED TRANSLATION IMPLEMENTATION


Syntax direct translation is implemented by constructing a parse tree and performing the actions
in a left to right depth first order. SDT is implementing by parse the input and produce a parse
tree as a result.
Example

Production Semantic Rules

Page 6 of 20
S→E$ { printE.VAL }

E→E+E {E.VAL := E.VAL + E.VAL }

E→E*E {E.VAL := E.VAL * E.VAL }

E → (E) {E.VAL := E.VAL }

E→I {E.VAL := I.VAL }

I → I digit {I.VAL := 10 * I.VAL + LEXVAL }

I → digit { I.VAL:= LEXVAL}

Parse tree for SDT:

3.4 INTERMEDIATE-CODE GENERATION


In Intermediate code generation we use syntax directed methods to translate the source program
into an intermediate form programming language constructs such as declarations, assignments
and flow-of-control statements.

Page 7 of 20
Figure: Intermediate Code Generator
• If the compiler directly translates source code into the machine code without generating
intermediate code then a full native compiler is required for each new machine.
• The intermediate code keeps the analysis portion same for all the compilers that's why it
doesn't need a full compiler for every unique machine.
• Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase. It takes input in the form of an annotated syntax tree.
• Using the intermediate code, the second phase of the compiler synthesis phase is changed
according to the target machine.

Advantages of an intermediate language


1. Retargeting is facilitated - Build a compiler for a new machine by attaching a new code
generator to an existing front-end.
2. Optimization - reuse intermediate code optimizers in compilers for different languages
and different machines.

Note: the terms ―intermediate code‖, ―intermediate language‖, and ―intermediate


representation‖ are all used interchangeably.

Types of Intermediate representations / forms: There are three types of intermediate


representation:-
1. Syntax Trees
2. Postfix notation
3. Three Address Code

Syntax tree

Page 8 of 20
When you create a parse tree then it contains more details than actually needed. So, it is very
difficult to compiler to parse the parse tree. Take the following parse tree as an example:

• In the parse tree, most of the leaf nodes are single child to their parent nodes.
• In the syntax tree, we can eliminate this extra information.
• Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators and
leaves are operands.
• Syntax tree is usually used when represent a program in a tree structure.
A sentence id + id * id would have the following syntax tree:

Page 9 of 20
Abstract syntax trees are important data structures in a compiler. It contains the least unnecessary
information.
Abstract syntax trees are more compact than a parse tree and can be easily used by a compiler.

Postfix Notation
• Postfix notation is the useful form of intermediate code if the given language is
expressions.
• Postfix notation is also called as 'suffix notation' and 'reverse polish'.
• Postfix notation is a linear representation of a syntax tree.
• In the postfix notation, any expression can be written unambiguously without
parentheses.
• The ordinary (infix) way of writing the sum of x and y is with operator in the middle: x *
y. But in the postfix notation, we place the operator at the right end as xy *.
• In postfix notation, the operator follows the operand.

Example
Production
1. E → E1 op E2
2. E → (E1)
3. E → id

Semantic Rule Program fragment

E.code = E1.code || E2.code || op print op

Page 10 of 20
E.code = E1.code

E.code = id print id

Three address code


• Three-address code is an intermediate code. It is used by the optimizing compilers.
• In three-address code, the given expression is broken down into several separate
instructions. These instructions can easily translate into assembly language.
• Each Three address code instruction has at most three operands. It is a combination of
assignment and a binary operator.
Example
a := (-c * b) + (-c * d)
Three-address code is as follows:
t1 := -c
t2 := b*t1
t3 := -c
t4 := d * t3
t5 := t2 + t4
a := t5
t is used as registers in the target program. The three address code can be represented in two
forms: quadruples and triples.

Quadruples
The quadruples have four fields to implement the three address code. The field of quadruples
contains the name of the operator, the first source operand, the second source operand and the
result respectively.

Example
1. a := -b * c + d
Three-address code is as follows:
t1 := -b

Page 11 of 20
t2 := c + d
t3 := t1 * t2
a := t3

These statements are represented by quadruples as follows:

Operator Source 1 Source 2 Destination

(0) uminus B - t1

(1) + c d t2

(2) * t1 t2 t3

(3) := t3 - a

Triples
The triples have three fields to implement the three address code. The field of triples contains the
name of the operator, the first source operand and the second source operand.
Example:
a := -b * c + d
Three address code is as follows:
t1 := -b
t2 := c + d
t3 := t1 * t2
a := t3
These statements are represented by triples as follows:

Operator Source 1 Source 2

Page 12 of 20
(0) uminus b -

(1) + c d

(2) * (0) (1)

(3) := (2) -

3.5 TYPES AND DECLARATIONS


As the sequence of declarations in a procedure or block is examined, we can lay out storage for
names local to the procedure. For each local name, we create a symbol-table entry with
information like the type and the relative address of the storage for the name. The relative
address consists of an offset from the base of the static data area or the field for local data in an
activation record.

Declarations in a Procedure:
The syntax of languages such as C, Pascal and Fortran, allows all the declarations in a
single procedure to be processed as a group. In this case, a global variable, say offset, can keep
track of the next available relative address.
In the translation scheme shown below:
• Non-terminal P generates a sequence of declarations of the form id : T.
• Before the first declaration is considered, offset is set to 0. As each new name is seen ,
that name is entered in the symbol table with offset equal to the current value of offset,
and offset is incremented by the width of the data object denoted by that name.
• The procedure enter( name, type, offset ) creates a symbol-table entry for name, gives its
type type and relative address offset in its data area.
• Attribute type represents a type expression constructed from the basic types integer and
real by applying the type constructors pointer and array. If type expressions are
represented by graphs, then attribute type might be a pointer to the node representing a
type expression.

Page 13 of 20
• The width of an array is obtained by multiplying the width of each element by the
number of elements in the array. The width of each pointer is assumed to be 4.

Computing the types and relative addresses of declared names

3.6 FLOW-OF-CONTROL STATEMENTS


We now consider the translation of boolean expressions into three-address code in the context of
if-then, if-then-else, and while-do statements such as those generated by the following grammar:

In each of these productions, E is the Boolean expression to be translated. In the translation, we


assume that a three-address statement can be symbolically labeled, and that the function
newlabel returns a new symbolic label each time it is called.

Page 14 of 20
➢ E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
➢ The semantic rules for translating a flow-of-control statement S allow control to flow
from the translation S.code to the three-address instruction immediately following
S.code.
➢ S.next is a label that is attached to the first three-address instruction to be executed after
the code for S.
Code for if-then , if-then-else, and while-do statements

Syntax-directed definition for flow-of-control statements

Page 15 of 20
Control-Flow Translation of Boolean Expressions:
Syntax-directed definition to produce three-address code for Booleans

Page 16 of 20
3.7 CASE STATEMENTS
The “switch” or “case” statement is available in a variety of languages. The switch-statement
syntax is as shown below :

Syntax-Directed Translation of Case Statements:

Page 17 of 20
Translation of a case statement

3.8 INTERMEDIATE CODE FOR PROCEDURES


The procedure is such an important and frequently used programming construct that it is
imperative for a compiler to generate good code for procedure calls and returns. The run-time
routines that handle procedure argument passing, calls and returns are part of the run-time
support package.
Let us consider a grammar for a simple procedure call statement

Page 18 of 20
Calling Sequences: The translation for a call includes a calling sequence, a sequence of actions
taken on entry to and exit from each procedure. The falling are the actions that take place in a
calling sequence :

1. When a procedure call occurs, space must be allocated for the activation record of the
called procedure.
2. The arguments of the called procedure must be evaluated and made available to the called
procedure in a known place.
3. Environment pointers must be established to enable the called procedure to access data in
enclosing blocks.
4. The state of the calling procedure must be saved so it can resume execution after the call.
5. Also saved in a known place is the return address, the location to which the called,
routine must transfer after it is finished.
6. Finally a jump to the beginning of the code for the called procedure must be generated.

For example, consider the following syntax-directed translation

Page 19 of 20
➢ Here, the code for S is the code for Elist, which evaluates the arguments, followed by a
param p statement for each argument, followed by a call statement.
➢ queue is emptied and then gets a single pointer to the symbol table location for the name
that denotes the value of E.

Page 20 of 20

You might also like