Introduction To Compiler Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Unit-01

Introduction to Compiler Design

Syllabus:

Language Processor
The Structure of a Compiler
The evolution of Programming Languages
Applications of a Compiler Technology
Programming Language Basics

Language Processor:
Q. Explain a language processing system with a block diagram. -8M, Dec09/Jan10.

The prominent language processing systems are,


1. Interpreter &
2. Compiler.
Interpreter: An interpreter is a language processor, which read a program in one language
(source language) and translates it into an equivalent program in another language (target
language) line by line (Instruction by Instruction). Interpreter appears to directly execute
the operations specified in the source program on inputs supplied by the user, as shown
below,

Compiler: A compiler is a language processor that can read a program in one language
(source language) and translates it into an equivalent program in another language (target
language), as shown below,

If the target program is an executable machine-language program, it can then be called by


the user to process input and produce outputs, as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 1

Unit-01

Introduction to Compiler Design

The machine-language target program produced by a compiler is usually much faster than
an interpreter at mapping inputs to outputs.
An interpreter can give a better error diagnostic than a compiler, because it executes the
source program statement by statement.
In addition to a compiler, several other programs may be required to create an executable
target program, as shown below,
Source Program
Preprocessor
Modified source program
Compiler
Target assembly program
Assembler
Relocatable machine code
Linker/Loader

Library files
Relocatable object files

Target Machine code

The source program may be divided into modules, stored in separate files. The task of
collecting the source program is sometimes entrusted to a separate program, called a
preprocessor.
The modified source program is then fed to a compiler. The compiler may produce an
assembly-language program as its output.
The assembly language is then processed by a program called an assembler that produces
relocatable machine code as its output.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 2

Unit-01

Introduction to Compiler Design

Large programs are often compiled in pieces, so the relocatable machine code may have
to be linked together, with relocatable object files and library files into the code that
actually runs on the machine.
The linker resolves external memory addresses, where the code in one file may refer to a
location in another file.
The loader then puts together all the executable object files into memory for execution.

The Structure of a Compiler:


Q. Give the general structure of a compiler. Show the working of different phases of a compiler
taking an example OR Explain the various phases of a compiler. Show the translation for an
assignment statement position = initial+rate*60; clearly indicate the output of each phase OR
With a neat diagram, explain the different phases of compilation. -8M, 10M, 12M -June 2012,
Dec 2011, June/July 2011, May/June 2010, June/July 2009.
Q. What are the different phases of a compiler? Explain each of the phases with example. -12M,
Dec 09/Jan 10.

We known that, a compiler maps a source program into a semantically equivalent target
program. There are two parts of mapping:
Analysis Part, and
Synthesis Part.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them.
It then uses this structure to create an intermediate representation of the source program.
If the analysis part detects that the source program is either syntactically ill formed or
semantically unsound, then it must provide informative message, so that the user can take
corrective action.
The analysis part also collects information about the source program and stores it in a
data structure called a symbol table.
The synthesis part constructs the desired target program from the intermediate
representation and from the information in the symbol table.
The analysis part is often called the front end of the compiler and the synthesis part is
often called as back end of the compiler.
The compilation process operates as a sequence of phases, each of which transforms one
representation of the source program to another. A typical decomposition of a compiler
into phases is as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 3

Unit-01

Introduction to Compiler Design

The symbol table, which stores information about the entire source program, is used by
all phases of the compiler.
Some machines have a machine independent optimization phase between the front end
and the back end. The purpose of this optimization phase is to perform transformations
on the intermediate representation, so that the back end can produce better target
program.
But, optimization phases shown in the above figure are optional.
For example, the translation for an assignment statement,
position = initial+rate*60
the phases of a compiler produces an output as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 4

Unit-01

Introduction to Compiler Design

Lexical Analysis:
Q. Write a short note on Lexical Analysis or Scanning or Lexical Analyzer or Scanner.

The first phase of a compiler is called lexical analysis or scanning.


The lexical analysis reads the streams of characters making up the source program and
groups the characters into meaningful sequences called lexemes.
For each lexemes, the lexical analyzer produces as output a token of the form,
<token-name, attribute-value>
In the token, the first component token-name is an abstract symbol that is used during
syntax analysis.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 5

Unit-01

Introduction to Compiler Design

The second component, attribute-value points to an entry in the symbol table for this
token.
For example, the translation for an assignment statement,
position = initial+rate*60
the lexical analyzer of a compiler produces an output as shown below,
position = initial + rate * 60
Lexical Analyzer

<id,1> <=> <id,2> <+> <id,3> <*> <60>


Where,

id is an abstract symbol standing for identifier,


1,2,3 is an symbol table entry for an identifiers position, initial and rate
respectively,
=, +, * are the lexemes mapped to the tokens <=>, <+> and <*> respectively,
60 is a lexeme that is mapped to a token <60>.
The symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
1 position .
.
2 initial
.
3 Rate

Syntax Analysis:
Q. Write a short note on Syntax Analysis or Parsing or Syntax Analyzer or Parser.

The second phase of the compiler is syntax analysis or parsing.


The parser uses the first components of the tokens produced by the lexical analyzer to
create a tree-like intermediate representation that depicts the grammatical structure of the
token stream.
A typical representation of a syntax-tree in which each interior node represents an
operation and the children of the node represent the arguments of the operation.
A syntax tree for the token stream produced by a lexical analyzer for,
position = initial + rate * 60
is as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 6

Unit-01

Introduction to Compiler Design


<id,1> <=> <id,2> <+> <id,3> <*> <60>
Syntax Analyzer
=
<id,1>
+
<id,2>

*
<id,3>

60

The subsequent phases of the compiler use the grammatical structure to help analyze the
source program and generate the target program.

Semantic Analysis:
Q. Write a short note on Semantic Analysis or Semantic Analyzer.

The semantic analyzer uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.
It also gathers type information and saves it in either the syntax tree or the symbol table,
for the subsequent use during intermediate-code generation.
An important part of semantic analysis is type checking, where the compiler checks that
each operator has matching operands.
The language specification may permit some type conversions called coercions.
Suppose that position, initial and rate have been declared to be floating-point numbers
and the lexeme 60 by itself forms a integer in the following statement,
position = initial + rate * 60
The type checker in the semantic analyzer discovers that the operator * is applied to a
floating-point number rate and an integer 60.
In this case, the integer may be converted into a floating-point number, as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 7

Unit-01

Introduction to Compiler Design


=
<id,1>
+
<id,2>

*
<id,3>

60

Semantic Analyzer
=
<id,1>
+
<id,2>

*
<id,3>

inttofloat
60

Notice that the output of the semantic analyzer has an extra node for the operator
inttofloat, which explicitly converts its integer argument into a floating-point number.

Intermediate Code Generation:


Q. Write a short note on Intermediate Code Generation.

After syntax and semantic analysis of the source program, many compilers generate an
explicit low-level or machine-like intermediate representation.
This intermediate representation should have two important properties:
1. It should be easy to produce, and
2. It should be easy to translate into the target program.
Intermediate code generation considers a form called as three-address code, which
consists of a sequence of assembly-like instructions with three operands per instruction.
Each operand can act like a register.
An intermediate code generation for the syntax tree produced by a semantic analyzer for,
position = initial + rate * 60
is as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 8

Unit-01

Introduction to Compiler Design


=
<id,1>
+
<id,2>

*
<id,3>

inttofloat
60

Intermediate Code Generator

t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
There are several points worth noting about three-address instructions,
First, each three-address assignment instructions has at most one operator on the
right side.
Second, the compiler must generate a temporary name to hold the value computed
by a three-address instruction.
Third, some three-address instructions have fewer than three operands.

Code Optimization:
Q. Write a short note on Code Optimization.

The machine-independent code-optimization phase attempts to improve the intermediate


code so that better target code will result.
Usually, better means faster, but other objectives may be desired, such as shorter code or
target code that consumes less power.
An optimized code for the three-address form produced by a intermediate code generator
for,
position = initial + rate * 60
is as shown below,

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 9

Unit-01

Introduction to Compiler Design


t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimizer

t1 = id3 * 60.0
id1 = id2 + t1
The optimizer can deduce that the conversion of 60 from integer to floating point can be
done once for all at compile time, so the inttofloat operation can be eliminated by
replacing the integer 60 by the floating-point number 60.0.
Moreover, t3 is used only once to transmit its value to id1, so the optimizer can transform
that by a shorter sequence.

Code Generation:
Q. Write a short note on Code Generation.

The code generator takes as input an intermediate representation of the source program
and maps it into the target language.
If the target language is machine code, registers or memory locations are selected for
each of the variables used by the program. Then, the intermediate instructions are
translated into sequences of machine instructions that perform the same task.
Using registers R1 and R2 the intermediate code produce by a code optimizer for,
position = initial + rate * 60
is translated into machine instructions as shown below,
t1 = id3 * 60.0
id1 = id2 + t1
Code Generator

LDF
MULF
LDF
ADDF
STF

R2, id3
R2, R2, #60.0
R1, id2
R1, R1, R2
id1, R1

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 10

Unit-01

Introduction to Compiler Design

The first operand of each instruction specifies a destination.


The F in each instruction tells that it deals with floating-point numbers.
The # signifies that 60.0 is to be treated as the immediate constant.

Symbol-Table Management:
Q. Write a short note on Symbol-Table or Symbol-Table Management.

An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name, its type,
its scope and in case of procedure names, such things as the number and types of its
arguments, the method of passing each argument and the type returned.
The symbol table is a data structure containing a record for each variable name, with
fields for the attributes of the name.
The data structure should be designed to allow the compiler to find the record for each
name quickly and to store or retrieve data from that record quickly.

The Grouping of Phases into Passes:


Q. Distinguish between Phases and Passes OR Briefly explain a strategy to reduce the number
of passes. -4M, June/July 08, July/Aug 05, Dec 06/Jan 07.

The discussion of phases deals with the logical organization of a compiler.


In an implementation, activities from several phases may be grouped together into a pass
that reads an input file and writes an output file.
For example, front-end phases of lexical analysis, syntax analysis, semantic analysis and
intermediate code generation might be grouped into one pass. Code optimization may be
an optional pass.
There could be a back-end pass consisting of code generation for a particular target
machine.

Compiler-Construction Tools:
Q. Mention and explain briefly the existing compiler-construction tools.

Some commonly used compiler-construction tools include:


1. Parser Generators that automatically produce a grammatical description of a
programming languages.
2. Scanner Generator that produce a regular-expression description of the tokens of
a language.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 11

Unit-01

Introduction to Compiler Design


3. Syntax-directed Translation that produce collection of routines for walking a
parse tree and generating intermediate code.
4. Code-generator Generators that produce a code generator from a collection of
rules for translating each operation of the intermediate language into the machine
language for a target machine.
5. Data-flow analysis engine that facilitate the gathering of information about how
values are transmitted from one part of a program to each other part. Data-flow
analysis is a key part of code optimization.
6. Compiler-construction toolkits that provide an integrated set of routines for
constructing various phases of a compiler.

The Evolution of Programming Languages:


Q. Explain in brief the evolution of programming languages.

The first electronic computers appeared in the 1940s.


Thus, based on the generation, programming languages evolution can be classified as
follows:
1. First Generation Languages - Machine languages (1940s): These languages were
programmed by sequences of 0s and 1s that explicitly told the computer what
operations to execute and in what order. This kind of programming was slow,
tedious, and error prone. And once written, the programs were hard to understand
and modify.
2. Second Generation Languages Assembly Languages (early 1950s): Instructions
in assembly languages were just mnemonic representation of machine
instructions. Later macro instructions were added, so that a programmer could
define parameterized shorthands for frequently used sequences of machine
instructions.
3. Third Generation Languages High Level Languages (late 1950s): High-level
programming languages, such as Fortran for scientific computation, Cobol for
business data processing and Lips for symbolic computation were developed to
create high-level notations with which programmers could more easily write
numerical computations, business applications and symbolic programs.
4. Fourth Generation Languages Designed for Specific Applications: These
languages were designed for specific applications like NOMAD for report
generation, SQL for database queries.
5. Fifth Generation Languages: These languages have been applied to logic and
constraint-based languages like Prolog and OPS5.
Another classification of programming languages uses following terms:
1. Imperative Languages: These languages are one in which a program specifies how
a computation is to be done. E.g., C, C++, Java, C# etc.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 12

Unit-01

Introduction to Compiler Design


2. Declarative Languages: These languages are one in which a program specifies
what computation is to be done. E.g., Prolog, ML etc.

Applications of Compiler Technology:


Q. Explain about applications of compiler technology.

The major applications of compiler technology are as follows:


1. Implementation of High-level Programming Languages: A high-level
programming language defines a programming abstraction, the programmer
expresses an algorithm using the languages and the compiler must translate that
program to the target language.
2. Optimization of Computer Architecture: The rapid evolution of computer
architecture has also led to an insatiable demand for new compiler technology.
Almost all high-performance systems take advantage of the same two basic
techniques: parallelism and memory hierarchies.
3. Design of New Computer Architectures: In the early days of computer
architecture design, compilers were developed after the machines were built. One
of the best known examples of how compilers influenced the design of computer
architecture was the invention of the RISC (Reduced Instruction-Set Computer).
4. Program Translations: The following are some of the important applications of
program-translation techniques,
a. Binary Translation: Compiler technology can be used to translate the binary
code for one machine to that of another, allowing a machine to run
programs originally compiled for another instruction set.
b. Hardware Synthesis: Not only is most software written in high-level languages,
even hardware designs are mostly designed in
hardware
description
languages like Verilog and VHDL.
c. Database Query Interpreters: Besides specifying software and hardware
languages compilers are useful in many other applications. E.g., Query
languages especially SQL are used to search databases.
d. Compiled Simulation: Simulation is general technique used in many scientific
and engineering disciplines to understand a phenomenon or to validate a
design. Compiled simulation can run orders of magnitude faster than an
interpreter-based approach. Compiled simulation is used in many state-ofthe-art tools that simulate designs written in Verilog or VHDL.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 13

Unit-01

Introduction to Compiler Design


5. Software Productivity Tools:
Q. Discuss three types of software productivity tools. -6M, Dec 2012.
The three types of software productivity tools are,
i.
Type-Checking: is an effective and well-established technique to
catch inconsistencies in programs. It can be used to check errors.
For example, whether an operation is applied to the wrong type of
object or if parameters passed to a procedure do not match the
signature of the procedure.
ii.
Bounds Checking: It is easier to make mistakes when
programming in a lower-level language than a higher-level one.
For example, many security breaches in systems are caused by
buffer overflows in programs written in C. Because C does not
have array bound check, it up to the user to ensure that the arrays
are not accessed out of bounds.
iii. Memory-Management Tools: Garbage collection is another
excellent example of the tradeoff between efficiency and a
combination of ease of programming and software reliability.
Automatic memory management obliterates all memory
management errors.

Programming Language Basics:


The Static/Dynamic Distinction:
Q. Define static and dynamic scoping. Explain the working and output of the following
programming segment if scoping used is static and dynamic:

-4M, Dec 2010.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 14

Unit-01

Introduction to Compiler Design

Static Scoping: If a language uses a policy that allows the compiler to decide an issue,
then the language is said to use static policy. A language uses static scope or lexical
scope if it is possible to determine the scope of a declaration by looking only at the
program.
Dynamic Scoping: If a language only allows a decision to be made when we execute the
program, then the language is said to use dynamic policy. A language uses dynamic
scope or run-time scope if it is not possible to determine the scope of a declaration by
looking only at the program.
For example, consider the following program segment,

The output of the programming segment is as follows,


The first declaration of b is a global and hence when the procedure P is called it prints
the value of b as true. But the second declaration of b as false is a local declaration
and the output of procedure call P is decided at runtime by the programming construct.

Prepared By: Navile Nageshwara Naveen, Assistant Professor, Dept of CSE, CIT-Ponnampet

Page 15

You might also like