CD ch1
CD ch1
CD ch1
Chapter One
1. Introduction to compiler and phases of compilers
Programming languages are notations for describing computations to people and to machines. The world as we know it depends
on programming languages, because all the software running on all the computers was written in some programming language.
But, before a program can be run, it first must be translated into a form in which it can be executed by a computer. The software
systems that do this translation are called compilers.
Compiler: is a program that reads a program written in one language (the source language) and translates it into an equivalent
program in another language (the target language).
1.1.1. Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input for compilers. It deals with macro-
processing, augmentation; file inclusion, language extension, etc.
1. Macro processing: A preprocessor may allow a user to define macros that are shorthand’s for longer constructs.
For example, in C, #define foo(x,y) (3*x+y*(2+x)) defines a macro foo, that when used in later in the program, is expanded by
the preprocessor. For example, a = foo (a,b) becomes a = (3*a+b*(2+a)). Each time the preprocessor sees the word foo(a,b), it
Obj pgm
input Obj pgm opj pgm output
Ultimately, programs written in a high-level language must be translated into machine language by a compiler. The target
machine language is efficient for hardware but lacks readability for human. An important role of the compiler is to recognize
legal (and illegal) programs, to report any errors in the source program that it detects during the translation process and manage
Front end
Recognize legal code
Report errors
Produce IR
Preliminary storage map
Shape code for the back end
b. Synthesis Phase(back end)
The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol
table.
It includes Code optimizer , code generator
Back end
Translate IR into target machine code
Choose instructions for each IR operation
Each phase:
Transforms the source program from one representation into another representation and
They communicate with error handlers and the symbol table
1. Lexical Analysis
The first phase of a compiler is called lexical analysis or scanning. The tool used is called lexical analyzer (LA) or scanner. The
lexical analyzer reads the stream of characters making up the source program is read from left to right and is grouped into
tokens.
A token is a sequence of characters having a collective meaning. It describes a pattern of characters having same meaning in the
The considered intermediate form called three-address code, which consists of a sequence of assembly-like instructions with
three operands per instruction. Each operand can act like a register.
The optimizer can deduce that the conversion of 60 from integer to floating point can be done once and for all at compile time,
so the int to float operation can be eliminated by replacing the integer 60 by the floating-point number 60.0. Moreover, t3 is
used only once
There are simple optimizations that significantly improve the running time of the target program without slowing down
compilation too much.
Ex. Unnecessary lines of code in loops (i.e. code that could be executed outside of the loop) are moved out of the loop.
for(i=1; i<10; i++){
x = y+1;
z = x+i; }
x = y+1;
for(i=1; i<10, i++)
z = x+i;
6. Code Generation: takes as input an intermediate representation of the source program and maps it into the target
language. If the target language is machine code; registers or memory locations are selected for each of the variables used
by the program. Then, the intermediate instructions are translated into sequences of machine instructions that perform
the same task. A crucial aspect of code generation is the judicious assignment of registers to hold variables.
Symbol-Table Management
The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name.
Identifier, numbers, variables, functions, parameters, types, fields, etc
Tokens are entered by the scanner and parser
Semantic analyzer adds type information and other attributes
Code generation and optimization phases use the information in the symbol table
The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data
b. Two pass Compiler: it is divided into two sections front end and back end.
c. Multi Pass Compiler: A multi pass compiler makes several passes over the program. The output of a preceding phase
is stored in a data structure and used by subsequent phases.
Dependency diagram of a typical Multi Pass Compiler: