CH1-1 and 1-2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Principle of Compiler Design

 Introduction to Compiling

 A simple one-pass compiler

 Syntax-Directed Translation and Type checking

 Run-Time Environment

 Code Optimization

 Code Generation

1
2
Session objectives
 Introduction to Compiling

 Analysis of the source program.

 Phases of the compiler

 Grouping

 Compiler construction tools

3
Introduction
 Compiler design is the field of computer science that deals with the
creation of compilers.

 The study of compiler writing touches upon programming languages,


machine architecture, language theory, algorithms, and software
engineering.

 All software which runs the computer system is written in some


particular programming language (HLL).

 High-level languages are easier for humans to read and write, but
machines don't understand them directly.

4
Introduction
 The computer can understand only the Low Level
Language i.e. machine language

 Machine code, is specific to a particular computer


architecture and is difficult for humans to work with.

 The software systems that do this compilation or


translation is nothing but called as the process of
compilation.
5
Introduction
 Compiler is a program which translates/converts/compile a
program written in one language (the source language) to an
equivalent program in other language (the target language).

 Usually the source language is a high level language like Java, C,


C++, Fortran etc. whereas the target language is machine code or
"code" that a computer's processor understands.

 The source language is optimized for humans. It is more user-


friendly, to some extent platform-independent. They are easier to
read, write, and maintain and hence it is easy to avoid errors.
6
Introduction
 Low level language a language that is close to the machine(computer)
upon which the language will run(execute).

 Object Language (sometimes called machine language) the language


of some computer, usually is not human readable(and is expressed in
bits or hex).

 Intermediate language: It is a temporary step in translation process.


Its neither particularly high nor low and is the output of translation.

 Assembly language: a language that translate almost one-to-one to


machine language, but it is in human readable form.

7
Introduction
 Ultimately, programs written in a high-level language must be
translated into machine language by a compiler.

 The target machine language is efficient for hardware but lacks


readability for human.

 An important role of the compiler is to report any errors in the


source program that it detects during the translation process.

 A compiler converts the source code into binary instructions.

8
Introduction
 Benefits of Compilers
 Abstraction: Compilers allow programmers to write code in
languages that are more intuitive and productive compared to
writing directly in machine code.
 Portability: Since the compiler translates the code for the specific
machine, the same high-level code can often be compiled to run on
different computer architectures with different machine code.
(Though some modifications might be needed)
 Error Detection: Compilers help catch errors early in the
development process, saving programmers time and effort.

9
What Is Compiler

 It is likely to perform many operations such as pre-processing,


lexical analysis, parsing, semantic analysis (syntax-directed
translation), conversion of input programs to an intermediate
representation, code optimization and code generation.

 The main purpose of compiler is to change the code written in


one language without changing the meaning of the program.

 When you execute a program which is written in HLL


programming language then it executes into two parts.
10
What Is Compiler
 Besides compilers, the principles and techniques for compiler design
are applicable to so many other domains that they are likely to be
reused many times in the career of a computer scientist.

 Today, Compilers are written using different high-level


languages(java, c, c++…).

 But the earliest compilers were written using assembly language


(FORTRAN and COBOL).

 Some times a compiler is written in the same language for which one
is writing a compiler. That is done through Bootstrapping.

11
Terminologies (compiler & interpreter)
Compiler:
 A program that translates an executable program in one
language into an executable program in another language

 we expect the program produced by the compiler to be better, in


some way, than the original.

12
Compiler

➢ Note: the target code is called by the user to


process inputs & produce outputs Input

Outp
ut

Role of compiler - recognize legal (and illegal)


programs, manage storage of all variables and code, report
any errors in the source program.
13
Interpreter

➢ Interpreter:
➢ It convert line by line from the total source code.
➢ Error will be displayed at the time of conversion.

14
Interpreter
 A program that reads an executable program and produces the
results of running that program

 Usually, this involves executing the source program in some


fashion.

 An interpreter is another common kind of language processor.


Instead of producing a target program as a translation, an
interpreter appears to directly execute the operations specified in
the source program on inputs supplied by the user.

15
Compiler Interpreter

• Compiler scans the • Interpreter translates just


entire program and one statement of the
translates the whole of it program at a time into
into machine code at machine code.
once.
• A compiler takes a lot of • An interpreter takes
time to analyze the very less time to analyze
source code. the source code.
• The overall time taken • The overall time to
to execute the process is execute the process is
much faster. much slower.

16
Compiler Interpreter
• A compiler always generates • An interpreter does not
an intermediary object code. It generate an intermediary code.
will need further linking. Hence, an interpreter is highly
Hence more memory is efficient in terms of its
needed. memory.
• A compiler generates the error • Keeps translating the program
message only after it scans the continuously till the first error
complete program and hence is confronted. If any error is
debugging is relatively harder spotted, it stops working and
while working with a hence debugging becomes
compiler. easy.
• Compliers are used by • Interpreters are used by
programming languages like C programming languages like
and C++ for example Ruby and Python for example
17
Structure of Compiler
 Compiler that maps a source program into a
semantically equivalent target program.

 There are two parts to this mapping:

 Analysis/front end
 Breaks up the source program into constituent pieces & creates an
intermediate representation of source program.

 Synthesis/ back end.


 Constructs desired target program from the intermediate representation
18
Analysis / Front end/
 Front end breaks down the source code into smaller meaningful
units called tokens.

 During analysis, the operations implied by the source program are


determined and recorded in a hierarchical structure called a tree.

 Often a special kind of tree called a syntax tree is used, in which


each node represents an operation and the children of a node
represent the arguments of the operation.

 It then uses this structure to create an intermediate representation of


the source program.

19
Analysis / Front end/
 It includes:- Lexical analysis, Syntax analysis , and Semantic
analysis

 If the analysis part detects that the source program is either


syntactically ill formed or semantically unsound, then it must
provide informative messages, so the user can take corrective
action. (structurally failed and meaning less)

 The analysis part also collects information about the source


program and stores it in a data structure called a symbol table,
which is passed along with the intermediate representation (IR) to
the synthesis part.
20
Analysis / Front end/

Semantic Analyzer
IR

 Front end
 Recognize legal code
 Report errors
 Shape code for the back end
21
Synthesis / Back end
 The synthesis part constructs the desired target program from the
intermediate representation and the information in the symbol table.

 It includes Code optimizer , code generator

Back end
 Translate IR into target machine code

 Choose instructions for each IR operation

 Decide what to keep in registers at each point

 Ensure conformance with system interfaces


In back-end, automation has been less successful here
22
The Grouping of Phases into Passes
 Phases of compiler deals with the logical organization of a compiler.

 In an implementation, activities from several phases may be grouped


together into a pass that reads an input file and writes an output file.

 For example, the front-end phases of lexical analysis, syntax analysis


and semantic analysis, might be grouped together into one pass.

 Code optimization might be an optional pass. Then there could be a


back-end pass consisting of code generation for a particular target
machine.

23
The Grouping of Phases into Passes
 A “pass” is a complete traversal of the source program, or a complete traversal
of some internal representation of the source program (such as an AST(abstract
syntax tree)).

 A pass can correspond to a “phase” but it does not have to!

 Sometimes a single pass corresponds to several phases that are interleaved in


time.

 What and how many passes a compiler does over the source program is an
important design decision.

 Generally, there are three type of compiler pass


 Single pass
 Two pass
 Multi pass
24
Single Pass Compiler
 A one pass/single pass compiler is that type of compiler that passes through
the part of each compilation unit exactly once.

 A single pass compiler makes a single pass over the source text, parsing,
analyzing, and generating code all at once.

 All 6 phases of compiler are grouped in a single module.

 Single pass compiler is faster and smaller.

 Less efficient in comparison with multi-pass compiler.

 When we merge all the phases of compiler design in a single module, then
it is called a single pass compiler. In the case of a single pass compiler, the
source code converts into machine code.
25
Single Pass Compiler

26
Single Pass Compiler
 Dependency diagram of a typical Single Pass Compiler:

27
Single Pass Compiler
 Advantages:
 Speed: One-pass compilers can potentially be faster than their multi-pass counterparts
because they avoid the overhead of loading and unloading the code from memory for each
pass.
 Simplicity: The design of a one-pass compiler can be simpler as it doesn't need to manage
multiple stages and intermediate code representations.

 Disadvantages:
 Limited Optimization: Due to the single-pass nature, one-pass compilers often struggle
with complex optimizations.
 Less Error Detection: Since all the analysis happens simultaneously, one-pass compilers
might have limitations in catching certain errors that require understanding the context of
the entire program.
28
Two Pass Compiler
 The Processor that runs through the program to be translated twice is
considered a two-pass compiler

IR

29
Two Pass Compiler
 Advantages:
 Better Error Detection: By separating analysis and generation, two-pass compilers can catch
errors that depend on the context of the entire program, which might be missed in a single pass.
 Improved Optimization: The second pass can analyze the entire program structure and use this
information for more effective code optimization techniques.
 Flexibility: The use of an intermediate representation allows for some decoupling between the
front-end (analysis) and back-end (code generation) stages. This can be useful for targeting
different machine architectures or implementing optimizations specific to the target machine.

 Disadvantage:
 Slower Compared to One-Pass: Two scans can be slightly slower than a single pass, especially
for smaller programs. This slowdown is often negligible compared to the benefits.
 Increased Complexity: The design of a two-pass compiler can be more complex due to
managing separate stages and potentially intermediate code.

30
Multi Pass Compiler
 A program’s source code or syntax tree is processed many times by the
multi pass compiler.

 It breaks down a huge program into numerous smaller programs and runs
them all at the same time.

 It creates a number of intermediate codes, All of these multi passes use the
previous phase’s output as an input.

 As a result, it necessitates less memory. ‘Wide Compiler’ is another name


for it.

31
Multi Pass Compiler

32
Multi Pass Compiler
 A multi pass compiler makes several passes over the program. The output
of a preceding phase is stored in a data structure and used by subsequent
phases. Dependency diagram of a typical Multi Pass Compiler:
Compiler Driver

calls calls
calls

Syntactic Analyzer Contextual Analyzer Code Generator


input input
input output output output

Source Text AST Decorated AST Object Code

33
33
Multi Pass Compiler
 Advantages:
 Improved Optimization: By having separate passes, compilers can perform more sophisticated optimizations
across the entire program. Each pass can take advantage of information gathered in previous passes.
 Better Error Detection: Separating analysis from code generation allows for more comprehensive error
checking. Errors that rely on understanding the whole program structure can be effectively identified.
 Modular Design: The division into passes promotes a modular design, making the compiler easier to develop,
maintain, and potentially extend with new functionalities in future passes.
 Flexibility for Different Targets: The use of intermediate code allows for targeting various machine
architectures by implementing machine-specific optimizations in later passes.

 Disadvantage:
 Slower Execution: Compared to one-pass compilers, multi-pass compilers can be slower due to the overhead of
loading and unloading the code for each pass. However, modern compilers optimize this process significantly.
 Increased Complexity: Designing a multi-pass compiler can be more complex due to managing multiple
stages, intermediate code representations, and ensuring proper information flow between passes.

34
34

You might also like