Compiler Key1
Compiler Key1
Compiler Key1
6. List out the three functions that are used to manipulate the list of labels in back
patching.
· mklist(i) creates the new list. The index i is passed as an argument to this
function where I is an index to the array of quadruple.
· merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It
returns the pointer to the concatenated list.
· backpatch(p,i) inserts i as target label for the statement pointed by pointer p.
PART – B
11. a. i. Discuss about the input buffering scheme in lexical analyzer. (6)
Determining the next lexeme requires reading the input beyond the end of the
lexeme.
Buffer Pairs: (2)
Concerns with efficiency issues
Used with a look ahead on the input
-2-
NOV/DEC-'08/CS1352-Answer Key
ii. Construct a NFA using Thompson’s construction algorithm for the regular
expression (a|b)*abb(a|b)* and convert it into DFA. (10)
NFA:
-3-
NOV/DEC-'08/CS1352-Answer Key
-4-
NOV/DEC-'08/CS1352-Answer Key
DTrans [H, b] = I
å-closure (move(I, a)) =å-closure (3, 8, 13) = F
DTrans [I, a] = F
å-closure (move(I, b)) =å-closure (5, 15) = G
DTrans [I, b] = G
Input symbol
State
a b
A B C
B B D
C B C
D B E
E * F G
F * F H
G * F G
H * F I
I * F G
(OR)
-5-
NOV/DEC-'08/CS1352-Answer Key
id1 +
id1 +
id2 *
id2 *
id3 inttoreal
id3 50 50
Syntax analysis Semantic analysis
– Intermediate code generation, intermediate code is something that is both close
to the final machine code and easy to manipulate (for optimization). One example
is the three-address code:
dst = op1 op op2
• The three-address code for the assignment statement:
temp1 = inttoreal(50);
temp2 = id3 * temp1;
temp3 = id2 + temp2;
id1 = temp3
– Code optimization: produces better/semantically equivalent code.
temp1 = id3 * 50.0
id1 = id2 + temp1
– Code generation: generates assembly
MOVF id3, R2
MULF #50.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
-6-
NOV/DEC-'08/CS1352-Answer Key
1
Lexical Analyzer
2
Syntax Analyzer
3
Semantic Analyzer
5
Code Optimizer
6
Code Generator
Target Program
(2)
ii. Construct a DFA directly from the regular expression (a|b)*abb without
constructing NFA. (8)
Syntax tree for (a|b)*abb#:
-7-
NOV/DEC-'08/CS1352-Answer Key
Calculation of followpos:
Node followpos
1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {5}
5 {6}
6 -
-8-
NOV/DEC-'08/CS1352-Answer Key
The position associated with the end marker #, 6 is in D. So, D is the final state.
DFA
a a
b
a b b
A B C D
a
-9-
NOV/DEC-'08/CS1352-Answer Key
12. a. i. Give the definitions for FIST(X) and FOLLOW (A) procedures used in
construction of predictive parser. (4)
First(X): (Apply the following rules until no more terminals or å can be added to any
first set). (2)
If X is a terminal, then first(X) is {X}
If X-> å is a production, then add å to First(X)
If X is a nonterminal and X->Y1 Y2 .. Yk is a production, then place a in
First(X) if for some I, a is in First(Yi) and å is in all of First(Y1)…First(Yi-1)
Follow(X): (Apply the following rules until nothing can be added to any Follow set). (2)
Place $ in Follow(S)
If there is a production A->áBâ, then everything in First(â) except for å is place in
Follow(B)
If there is a production A->áB, or a production A->áBâ where First(â) contains å,
then everything in Follow(A) is in Follow(B)
ii. What is an operator grammar? Draw the precedence graph for the following
table: (12)
a ( ) , $
a > > >
( < < = <
) > > >
, < > > >
$ < <
Operator grammar:
A grammar with the property that no production has å on the right side or has two
adjacent non terminals is called operator grammar.
- 10 -
NOV/DEC-'08/CS1352-Answer Key
ii. Write the LR parsing algorithm. Check whether the grammar is SLR (1) or not.
Justify the answer with reasons. (12)
S->L=R | R; L->*R | id; R->L
LR parsing algorithm:
set ip to point to the first symbol of w$
repeat
let s be the state on the top of the stack and a the symbol pointed to by ip
if action[s, a]=shift s’ then
push a then s’ on top of the stack
advance ip to the next input symbol
else if action[s, a]=reduce A-> â then
pop 2*| â| symbols off the stack
let s’ be the state now on top of the stack
push A then goto[s’, A] on top of the stack
output the production A-> â
else if action[s, a]=accept then
return
else
error()
end
- 11 -
NOV/DEC-'08/CS1352-Answer Key
Consider the set of Items I2. The first item in this set makes action [2, =] be shift 6.
Since Follow(R) has =, the second item sets action [2, =] to reduce R->L. Thus the entry
action [2, =] is multiply defined. So it is not SLR (1)
13. a. i. What are the various data structure used for symbol table construction and
explain any one in detail. (8)
A compiler uses symbol table to keep track of scope and binding information about
names. It is searched every time a name is encountered in the source text. Changes to the
table occur if a new name or new information about existing name is discovered. (2)
Data Structures: (2)
Linear lists and hash tables
Hash tables: (4)
Open hashing is used. Open refers to the property that there need be no limit on
the number of entries that can be made in the table. There are two parts to the data
structure:
A hash table consisting of a fixed array of m pointers to table entries
Table entries organized into m separate linked lists, called buckets.
To determine whether there is an entry for string s in the symbol table, apply a hash
function h to s, such that h(s) returns an integer between 0 and m-1. If s is in the symbol
table, then it is on the list numbered h(s). If s is not yet in the symbol table, it is entered
by creating a record for s that is linked at the front of list numbered h(s).
ii. Let A be a 10 * 20 array with low1 = low2 =1. Let w=4. Draw an annotated parse
tree for the assignment statement X:=A[y,z]. Give the sequence of three address
statement generated. (8)
In case of a two-dimensional array in row major form (row-by-row), the relative
address of A[i1,i2] can be calculated by the formula
Base +((i1-low1)*n2+i2-low2)*ù
- 12 -
NOV/DEC-'08/CS1352-Answer Key
Where n2= high2-low2+1, low1 and low2 are the lower bounds on the values of i1 and
i2, n2 is the number of values that i1 can take.
Assuming that i1 and i2 are the only values that are not known at compile time,
we can rewrite the above expression as
((i1*n2)+i2)* ù+ (base-((low1*n2)+low2)* ù)
=> ((i1*n2)+i2)* ù + constant, c
Where c can be determined at compile time
Here, n1=10, n2=20, i1=y, i2=z, ù=4, low1=low2=1
c=base-((low1*n2)+low2)* ù
=base-((1*20)+1)*4 =base-84
Expression: (y*20+z)*4
3AC to access A[y,z] is
t1:=y*20
t1:=t1+z
t2:=4*t1
t3:=c //c=base – 84
t4:=t3[t2]
x:=t4
(OR)
b. How would you generate the intermediate code for the flow of control statements?
Explain with examples.
Flow of control statements:
S-> if E then S1 | if E then S1 else S2 | while E do S1
Semantic rules for if E then S1:
E.true:= newlabel;
E.false:=S.next;
S1.next:=S.next;
S.code:=E.code || gen(E.true “:”) || S1.code
Semantic rules for if E then S1 else S2:
E.true:= newlabel;
E.false:=newlabel;
S1.next:=S.next;
S2.next:=S.next;
S.code:=E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.next) ||
gen(E.false “:”) || S2.code
Example:
Statement: a and b and c
if a were false, then we need not evaluate the rest of the expressions. So, we insert
labels E.true and E.false in the appropriate places.
if a goto E.true
goto E.false
E.true: if b goto E1.true
goto E.false
E1.true: if c goto E2.true
goto E.false
- 13 -
NOV/DEC-'08/CS1352-Answer Key
E2.true : exp =1
E.false: exp =0
Semantic rules for while E do S1:
S.begin:=newlabel
E.true:= newlabel;
E.false:=S.next;
S1.next:=S.begin;
S.code:=gen(S.begin’:’) || E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.begin)
Example:
while a<b do
if c<d then
x=y+z
else
x=y-z
3AC generated is
L1: if a<b goto L2
goto Lnext
L2: if c<d goto L3
goto L4
L3: t1:=y+z
x:=t1
goto L1
L4: t4:=y-z
x:=t2
goto L1
Lnext:
- 14 -
NOV/DEC-'08/CS1352-Answer Key
goto L2
x := x + 1 unneeded
• Flow of control optimizations
Unnecessary jumps are eliminated.
goto L1
…
L1: goto L2 becomes goto L2
• Algebraic Simplification
x := x + 0 unneeded
• Dead code elimination
x := 32 where x not used after statement
y := x + y y := y + 32
• Reduction in strength
Replace expensive operations by equivalent cheaper ones
x := x * 2 x := x + x
ii. Explain the DAG representation of the basic block with an example. (8)
A DAG for a basic block is a directed acyclic graph in which (2)
leaves are labeled by unique ids, either variable names or constants
Interior nodes are operators
Nodes are also given a sequence of ids for labels to store the computed values.
It is useful for implementing transformations on basic blocks and shows how values
computed by a statement are used in subsequent statements.
e.g t1:=4*i
t2:=a[t1]
Dag is
[] t2
t1
*
a
4 i
Algorithm for the construction of DAG: (4)
Input: A basic block
Output: DAG for that basic block, having
Label for each node where leaves are identifiers, interior nodes are operator
symbol.
for each node, a list of identifiers to hold computed values
1) x = y op z 2) x = op y 3) x = y
Step 1: If node(y) is undefined, create a leaf labeled y and let node(y) be this node. In 1),
if node(z) is undefined, create a leaf labeled z and let that leaf be node(z)
Step 2: For 1), create node op with left child y and right child z, after checking for
common sub expression
For 2), check for a node op with a child y. If not create such node
For 3), let n be node y.
- 15 -
NOV/DEC-'08/CS1352-Answer Key
Step 3: Delete x from the list of identifiers for node x. Append x to the list of attached
identifiers for node n found in step 2 and set node x to n
Applications of DAG: (2)
· Determining the common sub-expressions.
· Determining which identifiers have their values used in the block
· Determining which statements compute values that could be used outside the block
· Simplifying the list of quadruples by eliminating the common sub-expressions and not
performing the assignment of the form x: = y unless and until it is a must.
(OR)
b. i. What is a three address code? What are its types? How is it implemented? (12)
It is one of the intermediate representations. It is a sequence of statements of the
form x:= y op z, where x, y, and z are names, constants or compiler-generated
temporaries and op is an operator which can be arithmetic or a logical operator. E.g.
x+y*z is translated as t1=y*z and t2=x+t1. (2)
Reason for the term three-address code is that each statement usually contains
three addresses, two for the operands and one for the result. (2)
- 16 -
NOV/DEC-'08/CS1352-Answer Key
Triples
Op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)
Indirect Triples
Op arg1 arg2 Statement
(14) uminus c (0) (14)
(15) * b (14) (1) (15)
(16) uminus c (2) (16)
(17) * b (16) (3) (17)
(18) + (15) (17) (4) (18)
(19) assign a (18) (5) (19)
- 17 -
NOV/DEC-'08/CS1352-Answer Key
- 18 -
NOV/DEC-'08/CS1352-Answer Key
- 19 -