Introduction To Compiler Design
Introduction To Compiler Design
Introduction To Compiler Design
Syllabus:
Language Processor
The Structure of a Compiler
The evolution of Programming Languages
Applications of a Compiler Technology
Programming Language Basics
Language Processor:
Q. Explain a language processing system with a block diagram. -8M, Dec09/Jan10.
Compiler: A compiler is a language processor that can read a program in one language
(source language) and translates it into an equivalent program in another language (target
language), as shown below,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 1
Unit-01
The machine-language target program produced by a compiler is usually much faster than
an interpreter at mapping inputs to outputs.
An interpreter can give a better error diagnostic than a compiler, because it executes the
source program statement by statement.
In addition to a compiler, several other programs may be required to create an executable
target program, as shown below,
Source Program
Preprocessor
Modified source program
Compiler
Target assembly program
Assembler
Relocatable machine code
Linker/Loader
Library files
Relocatable object files
The source program may be divided into modules, stored in separate files. The task of
collecting the source program is sometimes entrusted to a separate program, called a
preprocessor.
The modified source program is then fed to a compiler. The compiler may produce an
assembly-language program as its output.
The assembly language is then processed by a program called an assembler that produces
relocatable machine code as its output.
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 2
Unit-01
Large programs are often compiled in pieces, so the relocatable machine code may have
to be linked together, with relocatable object files and library files into the code that
actually runs on the machine.
The linker resolves external memory addresses, where the code in one file may refer to a
location in another file.
The loader then puts together all the executable object files into memory for execution.
We known that, a compiler maps a source program into a semantically equivalent target
program. There are two parts of mapping:
Analysis Part, and
Synthesis Part.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them.
It then uses this structure to create an intermediate representation of the source program.
If the analysis part detects that the source program is either syntactically ill formed or
semantically unsound, then it must provide informative message, so that the user can take
corrective action.
The analysis part also collects information about the source program and stores it in a
data structure called a symbol table.
The synthesis part constructs the desired target program from the intermediate
representation and from the information in the symbol table.
The analysis part is often called the front end of the compiler and the synthesis part is
often called as back end of the compiler.
The compilation process operates as a sequence of phases, each of which transforms one
representation of the source program to another. A typical decomposition of a compiler
into phases is as shown below,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 3
Unit-01
The symbol table, which stores information about the entire source program, is used by
all phases of the compiler.
Some machines have a machine independent optimization phase between the front end
and the back end. The purpose of this optimization phase is to perform transformations
on the intermediate representation, so that the back end can produce better target
program.
But, optimization phases shown in the above figure are optional.
For example, the translation for an assignment statement,
position = initial+rate*60
the phases of a compiler produces an output as shown below,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 4
Unit-01
Lexical Analysis:
Q. Write a short note on Lexical Analysis or Scanning or Lexical Analyzer or Scanner.
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 5
Unit-01
The second component, attribute-value points to an entry in the symbol table for this
token.
For example, the translation for an assignment statement,
position = initial+rate*60
the lexical analyzer of a compiler produces an output as shown below,
position = initial + rate * 60
Lexical Analyzer
Syntax Analysis:
Q. Write a short note on Syntax Analysis or Parsing or Syntax Analyzer or Parser.
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 6
Unit-01
*
<id,3>
60
The subsequent phases of the compiler use the grammatical structure to help analyze the
source program and generate the target program.
Semantic Analysis:
Q. Write a short note on Semantic Analysis or Semantic Analyzer.
The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
It also gathers type information and saves it in either the syntax tree or the symbol table,
for the subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that
each operator has matching operands.
The language specification may permit some type conversions called coercions.
Suppose that position, initial and rate have been declared to be floating-point numbers
and the lexeme 60 by itself forms a integer in the following statement,
position = initial + rate * 60
The type checker in the semantic analyzer discovers that the operator * is applied to a
floating-point number rate and an integer 60.
In this case, the integer may be converted into a floating-point number, as shown below,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 7
Unit-01
*
<id,3>
60
Semantic Analyzer
=
<id,1>
+
<id,2>
*
<id,3>
inttofloat
60
Notice that the output of the semantic analyzer has an extra node for the operator
inttofloat, which explicitly converts its integer argument into a floating-point number.
After syntax and semantic analysis of the source program, many compilers generate an
explicit low-level or machine-like intermediate representation.
This intermediate representation should have two important properties:
1. It should be easy to produce, and
2. It should be easy to translate into the target program.
Intermediate code generation considers a form called as three-address code, which
consists of a sequence of assembly-like instructions with three operands per instruction.
Each operand can act like a register.
An intermediate code generation for the syntax tree produced by a semantic analyzer for,
position = initial + rate * 60
is as shown below,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 8
Unit-01
*
<id,3>
inttofloat
60
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
There are several points worth noting about three-address instructions,
First, each three-address assignment instructions has at most one operator on the
right side.
Second, the compiler must generate a temporary name to hold the value computed
by a three-address instruction.
Third, some three-address instructions have fewer than three operands.
Code Optimization:
Q. Write a short note on Code Optimization.
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 9
Unit-01
t1 = id3 * 60.0
id1 = id2 + t1
The optimizer can deduce that the conversion of 60 from integer to floating point can be
done once for all at compile time, so the inttofloat operation can be eliminated by
replacing the integer 60 by the floating-point number 60.0.
Moreover, t3 is used only once to transmit its value to id1, so the optimizer can transform
that by a shorter sequence.
Code Generation:
Q. Write a short note on Code Generation.
The code generator takes as input an intermediate representation of the source program
and maps it into the target language.
If the target language is machine code, registers or memory locations are selected for
each of the variables used by the program. Then, the intermediate instructions are
translated into sequences of machine instructions that perform the same task.
Using registers R1 and R2 the intermediate code produce by a code optimizer for,
position = initial + rate * 60
is translated into machine instructions as shown below,
t1 = id3 * 60.0
id1 = id2 + t1
Code Generator
LDF
MULF
LDF
ADDF
STF
R2, id3
R2, R2, #60.0
R1, id2
R1, R1, R2
id1, R1
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 10
Unit-01
Symbol-Table Management:
Q. Write a short note on Symbol-Table or Symbol-Table Management.
An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name, its type,
its scope and in case of procedure names, such things as the number and types of its
arguments, the method of passing each argument and the type returned.
The symbol table is a data structure containing a record for each variable name, with
fields for the attributes of the name.
The data structure should be designed to allow the compiler to find the record for each
name quickly and to store or retrieve data from that record quickly.
Compiler-Construction Tools:
Q. Mention and explain briefly the existing compiler-construction tools.
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 11
Unit-01
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 12
Unit-01
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 13
Unit-01
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 14
Unit-01
Static Scoping: If a language uses a policy that allows the compiler to decide an issue,
then the language is said to use static policy. A language uses static scope or lexical
scope if it is possible to determine the scope of a declaration by looking only at the
program.
Dynamic Scoping: If a language only allows a decision to be made when we execute the
program, then the language is said to use dynamic policy. A language uses dynamic
scope or run-time scope if it is not possible to determine the scope of a declaration by
looking only at the program.
For example, consider the following program segment,
Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet
Page 15