Compiler Design Final Notes
Compiler Design Final Notes
Compiler Design Final Notes
The structure of a typical four pass compiler is shown in the above figure :
• The first pass is the preprocessor which produces input for Compiler and deals with macro processing,
augmentation, language extension, etc.
• The second pass is the heart of a compiler and it is made up of a lexical analyzer, parse and code
generator, and it translate source code into intermediate code.
• The third pass is the optimizer, which improves the quality of intermediate code.
• The fourth pass translate the optimized code to real assembly language or some form of binary
executable code.
COMPILER DESIGN (MODULE 2)
2. TYPE OF COMPILER :
2.3 ONE PASS COMPILER VS MULTI PASS COMPILER.
[ Note : The division of the compilation processes in phases (or passes), terms front end, middle end (rarely
heard today), and back end.]
1. Source Code/Program :
Source code is written by a programmer in some human-readable computer programming language. It is a high level
code and it is written in high level languages like C, C++, Java, Python etc
2. Lexical Analysis :
Lexical analysis is the first phase of compilation process. It takes source code as input and reads the source program
one character at a time and converts it into meaningful lexemes and represents these lexemes in the form of tokens.
3. Syntax Analysis :
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates a parse tree as
output. The main aim of this phase is to make sure that the source code was written by the programmer is correct or
not.
COMPILER DESIGN (MODULE 2)
4. PHASES OF A COMPILER :
4. Semantic Analysis :
Semantic analysis is the third phase of compilation process. It checks whether the parse tree follows
the rules of language.
5. Intermediate Code Generation :
In the intermediate code generation, compiler generates the source code into the intermediate code .
6. Code Optimization :
It is used to improve the intermediate code so that the output of the program could run faster and take
less space.
7. Code Generation :
Code generation is the final stage of the compilation process and it translates the intermediate code
into the machine code of the specified computer.
8. Target Program :
The target program is the output of the code generator. The output can be: (a) Assembly Language (b)
Relocatable Machine language
COMPILER DESIGN (MODULE 2)
5. COUSINS / RELATIVES OF A COMPILER :
The Cousins / Relatives of a compiler are as follows :
(a) Preprocessor (b) Assembler (c) Loader and Link-editor.
5.1 PREPROCESSOR
Preprocessor is a program which produces input for Compiler . It deals with macro processing,
augmentation, language extension, etc.
(a) Lexical Preprocessor - Lexical preprocessors are the lowest-level of preprocessors, as they only require
lexical analysis.
(b) Synthetic preprocessor - Syntactic preprocessors are used to customize the syntax of a language, extend a
language inside a general purpose language.
5.2 ASSEMBLER
• An assembler converts assembly level language code into machine language code.
• It converts the code line by line
• It inputs assembly level code .
5.3 LOADER AND LINKER
Loader : It loads the executable module to the main memory. It takes executable module generated by a
linker..
Linker : It generates the executable module of a source program. It takes input of object code generated
by compiler/ assembler.
COMPILER DESIGN (MODULE 2)
6. COMPILER vs INTERPRETER :
• A compiler is a complex computer program (or set of programs) that translates text written in a computer
language (the source language) into another computer language (the target language).
• It converts the source code into object code.
• It does not require source code for later execution.
• Example : C , C++
• An interpreter is a translating program which converts high level language into machine code
• It does not convert source code into object code instead it scans it line by line
• It requires source code for later execution.
• Example : Python , MATLAB etc
COMPILER DESIGN
(MODULE 3- INTRODUCTION TO SOURCE CODE & LEXICAL
ANALYSIS)
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
• Source code is written by a programmer in some human-readable computer programming language. It is
a high level code and it is written in high level languages like C, C++, Java, Python etc
1.1 PURPOSES
• Source code is written by a programmer in some human-readable computer programming language. It is
a high level code and it is written in high level languages like C, C++, Java, Python etc .It is input to
compiler or any other translator and it can be changed over time.
1.2 QUALITY
The quality code languages provide general syntactic criteria and syntactic elements.
(A)GENERAL SYNTACTIC CRITERIA
The primary purpose of syntax is to provide a notation for communication between the programmer and
programming language translator. It includes
(i) Readability : If underlying structure of source code and data represented by source code is apparent from
an inspection of source code text then source code is said to be readable
(ii) Writability : The syntactic features that make a program easy to write are often in conflict with these
features that make it easy to read.
(iii)Ambiguity : Ambiguity is a central problem in every language design & a language ideally provides a
unique meaning for every syntax construct.
(iv)Verifiability : Verifiability is also related to source code. Understanding programming statements is easy,
but producing correct programs is hard.
(B) SYNTHETIC ELEMENTS OF SOURCE CODE
If we use syntactic elements in source code then readability, writeability and verifiability is enhanced.
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
1.3 THE STRUCTURE OF A PROGRAM
• We can analyze the structure of a program in four levels
1. Lexical structure level 2. Syntactic structure level 3. Contextual structure level 4. Semantic structure level
(A) Lexical structure level
The Lexical structure level is the lowest level in which computer programs are viewed as simple sequence of
lexical items called tokens. A token are the sequence of characters which represents a unit of information in
the source program. We can classify tokens as
(i)Identifiers : (ii) Keywords : (iii) Operators : (iv) Seperator : (v) Literals :
(B) Synthetic structure level
The syntactic level describes the way that program statements are constructed from tokens and it is defined in
terms of context free grammar. The best known examples are BNF (Backus Naur Form) or EBNF (Extended
Backus Naur Form).
Let us see an EBNF example:
<Sum> = <operand>
<Operator>
<Operand>
<Operand> = <number> / (<Sum>)
<Operator> = + / –
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
1.3 THE STRUCTURE OF A PROGRAM
(C) Contextual structure level
The contextual structure level is concerned with the ‘context’ in which program statements usually contain
identifier whose value is dictated by earlier statements.
int c = a + b; //not valid because a, b, c are not defined.
int a; //valid
int b;
int c;
int c = a + b; //valid because a, b, c are defined.
2. LEXICAL ANALYSER :
2.1 Lexical Analyzer
Lexical analyzer is the first phase of compilation process. It takes source code as input and reads the source program
one character at a time and converts it into meaningful lexemes and represents these lexemes in the form of tokens.
Lexical Analysis can be implemented with the Deterministic finite Automata.
e.g. (i) If ( 5>4) then goto
Tokens – If, (, 5, >, 4, ), then, goto
(ii) For (i=0; i< 10; i++)
Tokens - for, (, I,=0,; <, 10, +,0)
2.2 Implementation & Role of Lexical Analyzer
Lexical analyzer is classified into two stages : (a) Scanner (b) Evaluator / Lex Analysis
(A)Scanner : The first stage is the scanner which is based on a finite state machine. It has information embedded in it
about the various character sequences that can be found in any of the tokens it handles.
(B)Evaluator : The second stage is the evaluator which is used to construct a token which generates a value by going
over the characters in the lexeme The type of a lexeme, along with its value, makes it a token that may be passed
to a parser.
COMPILER DESIGN (MODULE 3)
2. LEXICAL ANALYSER :
The main work / role of a lexical analyzer is
• It helps to identify token into the symbol table
• It removes white spaces and comments from the source program
• It correlates error messages with the source program
• It reads input characters from the source program
• It keeps track of line numbers.
3. BASIC TERMINOLOGIES :
3.1 Tokens :
A token are the sequence of characters which represents a unit of information in the source program.
In Lexical structure level is the lowest level in which computer programs are viewed as simple sequence of
lexical items which is also termed as tokens . A token has only a single attribute— a pointer to the symbol
table
Examples of Tokens : Keywords: for, while, if etc. ,Identifier: Variable name, function name, etc.
Operators: '+', '++', '-' etc. ,Separators: ',' ';' etc
3.2 Lexemes : A lexeme is a sequence of characters in the source program that is matched by the pattern for
a token.. It is identified by the lexical analyzer as an instance of that token. For example, English ‘run, runs,
ran and running’ are forms of the same lexeme.
3.3 Patterns :
A pattern is a rule describing the set of lexeme that represent particular token . In the case of a keyword
which uses as a token, the pattern is a sequence of characters.
COMPILER DESIGN (MODULE 3)
4. TRANSITION DIAGRAM :
In transition diagram the boxes of flowchart are drawn as circle called as states. States are
connected by arrows called as edges An intermediate step in the construction of a lexical analyzer, we
draw a pictorial representation of token which show all the action taken by a lexical analyzer or parser is
known as translation diagram .
5. FLEX
FLEX generates a lexical analyzer in C or C++ given an input program. Flex (fast lexical analyzer generator)
is an alternative to Lex. It is frequently used with the free Bison parser generator
COMPILER DESIGN
(MODULE 4- INTRODUCTION TO SYNTAX ANALYSIS)
COMPILER DESIGN (MODULE 4)
4.1 : PARSING & ITS TYPES
Parsing takes tokens as input and generates a parse tree as output. The main aim of parsing is to make sure
that the source code written by the programmer is correct or not.
There are two types of parsing
(1) Top down parsing (2) Bottom up parsing
TOP DOWN PARSING BOTTOM UP PARSING
1. Top down approach/parsing starts evaluating the Bottom up approach starts evaluating the parse tree
parse tree from the top and move downwards for from the lowest level of the tree and move upwards
parsing other nodes. for parsing other nodes.
2. Top down parsing searches for a production rule to Bottom up parsing searches for a production rule to
be used to construct a string. be used to reduce a string
3. Top down parsing uses leftmost derivation. Bottom up parsing uses the rightmost derivation.
4.2 : PARSING & ITS TYPES
(1) Top down parsing
(1.1) Recursive descent parsing/parser : It is a common form of top-down parsing. It is called recursive
as it uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
Backtracking means, if one derivation of a production fails, the syntax analyzer restarts the process using
different rules of same production.
COMPILER DESIGN (MODULE 4)
4.2 : PARSING & ITS TYPES
(1.1) Recursive descent parsing/parser :