Symbol Tables: ASU Textbook Chapter 7.6, 6.5 and 6.3
Symbol Tables: ASU Textbook Chapter 7.6, 6.5 and 6.3
Symbol Tables: ASU Textbook Chapter 7.6, 6.5 and 6.3
Tsan-sheng Hsu
tshsu@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/~tshsu
1
Denitions
Symbol table: A data structure used by a compiler to keep track of semantics of variables.
Data type. When is used:
scope.
Possible implementations:
Unordered list: for a very small set of variables. Ordered linear list: insertion is expensive, but implementation is
relatively easy. Binary search tree: O (log n) time per operation for n variables. Hash table: most commonly used, and very ecient provided the memory space is adequately larger than the number of variables.
Hash table
Hash function h(n): returns a value from 0, . . . , m 1, where n is the input name and m is the hash table size.
Uniformly and randomly.
Add up the integer values of characters in a name and then take the
remainder of it divided by m. Add up a linear combination of integer values of characters in a name, and then take the remainder of it divided by m.
Resolving collisions:
try (h(n) + 1) mod m, where m is a large prime number, and then (h(n) + 2) mod m, . . ., (h(n) + i) mod m. Chaining: most popular.
Keep a chain on the items with the same hash value. Open hashing.
Linear resolution:
Quadratic-rehashing: try (h(n) + 12) mod m, and then try (h(n) + 22) mod m, . . ., try (h(n) + i2) mod m.
Uniformly distributed.
Data type. Scope information: where and when it can be used. Storage allocation, size, . . .
Too little: names must be short. Too much: waste a lot of spaces.
NAME s a r i o e 2 r a t d a r r a y
ATTRIBUTES
Variable-length name:
A string of space is used to store all names. For each name, store the length and starting index of each name.
NAME index length 0 5 5 2 7 10 17 3 ATTRIBUTES
0 s
1 o
2 r
3 t
4 $
5 a
6 $
7 r
8 e
9 a
10 d
11 a
12 r
13 r
14 a
15 y
16 $
17 i
18 2
19 $
Nested blocks mean nested scopes. Two major ways for implementation:
Approach 1: multiple symbol tables in a STACK. Approach 2: one symbol table with chaining.
Use a stack to maintain the current scope. Search top of stack rst. If not found, search the next one in the stack. Use the rst one matched. Note: a popped scope can be destroyed in a one-pass compiler, but it must be saved in a multi-pass compiler.
main() { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ ... /* x and y can only be used here */ /* H used here is float */ ... } /* close an old scope */ ... /* H used here is integer */ ... { char A,C,M; /* parse point C */ ... } }
searching direction
parse point A
parse point B
parse point C
Disadvantage:
Waste lots of spaces. Need to allocate adequate amount of entries for each symbol table if
it is a hash table.
A block within a procedure does not usually have many local variables. There may have many global variables and many local variables when a procedure is entered.
H(2) A(1)
parse point B
parse point C
11
12
Another example is the using namespace directive in C++. Example (PASCAL code):
A, R: record A: integer X: record A: real; C: boolean; end end ... R.A := 3; /* means R.A := 3; */ with R do A := 4; /* means R.A := 4; */
13
boolean
Associate a record number within the eld names. Assign record number #0 to names that are not in records. A bit time consuming in searching the symbol table. Similar to the scope numbering technique.
Compiler notes #5, Tsan-sheng Hsu, IIS 14
number and continue to search. If everything fails, search the normal main symbol table.
15
Overloading (1/3)
A symbol may, depending on context, have more than one semantics. Examples.
operators: I := I + 3; X := Y + 1.2; function call return value and recursive function call: f := f + 1;
16
Overloading (2/3)
Implementation:
Link together all possible denitions of an overloading name.
Call this an
17
Overloading (3/3)
Example: PASCAL function name and return variable.
Within the function body, the two denitions are chained. i.e., function call and return variable. When the function body is closed, the return variable denition disap-
pears. [PASCAL] function f: integer; begin if global > 1 then f := f +1; return end
18
Forward reference
Denition:
A name that is used before its denition is given. To allow mutually referenced and linked data types, names can some-
Possible usages:
GOTO labels. Recursively dened pointer types. Mutually or recursively called procedures.
19
GOTO labels
If labels must be dened before its usage, then one-pass compiler suces. Otherwise, we need either multi-pass compiler or one with back-patching.
Avoid resolving a symbol until all its possible denitions have been
seen. In C, ADA and languages commonly used today, the scope of a declaration extends only from the point of declaration to the end of the containing scope.
20
21
Structural equivalence.
Express a type denition via a directed graph where nodes are the elements and edges are the containing information. Two types are equivalent if and only if their structures (graphs) are the same. A dicult job for compilers.
entry = record info : real; coordinates : record x : integer; y : integer; end end [entry] +-----> [info] <real> +-----> [coordinates] +----> [x] <integer> +----> [y] <integer>
Name equivalence.
Two types are equivalent if and only if their names are the same. An easy job for compilers, but the coding takes more time.
Symbol table is needed during compilation, might also be needed during debugging.
Compiler notes #5, Tsan-sheng Hsu, IIS 23
check whether a name within a particular scope is currently in the symbol table or not.
Insert into symbol table(name,scope) Return the newly created entry. Delete from symbol table(name,scope)
For interpreters:
Use the attributes associated with the symbols to hold temporary
values. Use a structure to record all attributes. struct YYTYPE { char type; /* data type of a variable */ int value; int addr; char * namelist; /* list of names */ }
24
| id
{the variable name is in yytext; create a list of one name, i.e., yytext, $$.namelist}
25
| Expression Expression
{$$.value = $1.value - $3.value}
| id
{ use Find in symbol table to check whether yytext is already declared; $$.value = the value of the variable yytext}
Compiler notes #5, Tsan-sheng Hsu, IIS 26