Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
Symbol Table
4
Attributes of Tokens
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
token
tokenval
(token attribute) Parser
5
s0 =
si = si-1s for i > 0
note that s = s = s
8
letter AB…Zab…z
digit 01…9
id letter ( letterdigit )*
r+ = rr*
r? = r
[a-z] = abc…z
• Examples:
digit [0-9]
num digit+ (. digit+)? ( E (+-)? digit+ )?
13
lex.yy.c
C a.out
compiler
input sequence
stream a.out of tokens
18
Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
19
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
21
Optional
regular
NFA DFA
expressions
Nondeterministic Finite
Automata
• An NFA is a 5-tuple (S, , , s0, F) where
Transition Graph
• An NFA can be diagrammatically
represented by a labeled directed graph
called a transition graph
a
S = {0,1,2,3}
start
0
a
1
b
2
b
3
= {a,b}
s0 = 0
b F = {3}
27
Transition Table
• The mapping of an NFA can be
represented in a transition table
Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3} 2 {3}
28
Subset construction
DFA
30
a start a
i f
start N(r1)
r1r2 i f
N(r2)
start
r1r2 i N(r1) N(r2) f
r* start
i N(r) f
31
a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2
start
0 3
a
4
b
5
b
6
a b
7 b 8
32
a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
33
a b b a
none
0 2 5 6 action2
1 4 8 8
action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
34
Example DFA
b
b
a
start a b b
0 1 2 3
a a
36
a3 Dstates
C A = {0,1,3,7}
b a
b b B = {2,4,7}
start C = {8}
A D
D = {7}
a
a
b b E = {5,8}
B E F
F = {6,8}
a1 a3 a2 a3
42
C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a
43
* a
3
alternation
| position
number
a b (for leafs )
1 2
46
Leaf true
{3} a {3}
firstpos lastpos
{1, 2} {1, 2}
* 3
{1, 2} | {1, 2}
Directly: Algorithm
s0 := firstpos(root) where root is the root of the syntax tree
Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do
51
b b
a
start a 1,2, b 1,2, b 1,2,
1,2,3
3,4 3,5 3,6
a
a
52
Time-Space Tradeoffs
Space Time
Automaton
(worst case) (worst case)
NFA O(r) O(rx)
DFA O(2|r|) O(x)