Unit 1 PDF
Unit 1 PDF
ENGINEERING COLLEGE
Compiler Construction
COURSE OBJECTIVES:
➢ To introduce the steps in language translation pipeline and
runtime data structures used in translation
➢ To learn about Scanning (lexical analysis) process using
regular expressions and use of LEX to generate scanner
➢ To introduce different Parsing strategies including top-
down (e.g., recursive descent, Early parsing, or LL) and
bottom-up (e.g., backtracking or LR) techniques
➢ Describe semantic analyses using an attribute grammar
➢ To learn how to build symbol tables and generate
intermediate code.
➢ To introduce techniques of program analysis and code
optimization
9/25/2020 Compiler Construction 2
MATRUSRI
ENGINEERING COLLEGE
Compiler Construction
COURSE OUTCOMES:
After completing this course, the student will be able to
1. Create lexical rules and grammars for a given language
2. Generate scanners and parsers from declarative
specifications.
3. Describe an abstract syntax tree for a small language.
4. Use program analysis techniques for code optimization
5. Develop the compiler for a subset of a given language
9/25/2020 Compiler Construction 3
MATRUSRI
ENGINEERING COLLEGE
UNIT-I
Introduction: Compilers
The translation process,
Data structures and issues in compiler structure,
Bootstrapping and Porting.
Scanning: The scanning process
Regular expressions,
Finite Automata,
Regular expressions to DFA‘s,
use of LEX to generate scanner.
INTRODUCTION:
Compilers
The Translation Process
OUTCOMES:
1. Identify the basic concepts needed for the development of a compiler
2. Analyze the various phases and Tools of a Compiler
MODULE-I
CONTENTS:
Compilers
The Translation Process
OUTCOMES:
Identify the basic concepts needed for the development of a compiler
Analyze the various phases and Tools of a Compiler
Introduction
What is a compiler?
Assemblers
• A translator for the assembly language of a
particular computer
• Assembly language is a symbolic form of
one machine language
• A compiler may generate assembly
language as its target language and an
assembler finished the translation into
object code
Linkers
• Collect separate object files into a directly
executable file
• Connect an object program to the code for
standard library functions and to resource
supplied by OS
• Becoming one of the principle activities of a
compiler, depends on OS and processor
Loaders
• Resolve all re-locatable address relative to a
given base
• Make executable code more flexible
• Often as part of the operating environment,
rarely as an actual separate program
Preprocessors
• Delete comments, include other files, and
perform macro substitutions
• Required by a language (as in C) or can be
later add-ons that provide additional
facilities
Editors
• Compiler have been bundled together with
editor and other programs into an
interactive development environment (IDE)
• Oriented toward the format or structure of
the programming language, called structure-
based
• May include some operations of a compiler,
informing some errors
Debuggers
• Used to determine execution error in a
compiled program
• Keep tracks of most or all of the source
code information
• Halt execution at pre-specified locations
called breakpoints
• Must be supplied with appropriate symbolic
information by the compiler
9/25/2020 Compiler Construction 18
MATRUSRI
ENGINEERING COLLEGE
5.Code Optimization
6. Code Generation Synthesis
Intermediate Code Ge
Intermediate code
Error
Code Optimizer Handler
m/c level code
Code Generator
9/25/2020 Compiler Construction 21
Target code
MATRUSRI
ENGINEERING COLLEGE
The Scanner
• Lexical analysis: it collects sequences of characters into
meaningful units called tokens
• An example: a[index]=4+2
• a identifier
• [ left bracket
• index identifier
• ] right bracket
• = assignment
• 4 number
• + plus sign
• 2 number
The Parser
• Syntax analysis: it determines the structure
of the program
• The results of syntax analysis are a parse
tree or a syntax tree
• An example: a[index]=4+2
– Parse tree
– Syntax tree ( abstract syntax tree)
Assign-expression
expression = expression
subscript-expression additive-expression
Assign-expression
subscript-expression additive-expression
subscript-expression additive-expression
integer integer
subscript-expression additive-expression
integer integer
subscript-expression
integer
t = 4 + 2
a[index]=t
t= 6
a[index]=t
a[index]=6
MODULE-II
CONTENTS
Data structures and issues in compiler structure
Bootstrapping and Porting
OUTCOMES
Understand the Bootstrapping and Porting
Passes
• The repetitions to process the entire source program before
generating code are referred as passes.
• Passes may or may not correspond to phases
– A pass often consists of several phases
– A compiler can be one pass, which results in efficient compilation
but less efficient target code
– Most compilers with optimization use more than one pass
• One Pass for scanning and parsing
• One Pass for semantic analysis and source-level optimization
• The third Pass for code generation and target-level optimization
Error Handling
• Static (or compile-time) errors must be reported
by a compiler
– Generate meaningful error messages and resume
compilation after each error
– Each phase of a compiler needs different kind of error
handing
• Exception handling
– Generate extra code to perform suitable runtime tests to
guarantee all such errors to cause an appropriate event
during execution.
• Machine language
Third – compiler to execute immediately;
• Another language with existed compiler on the same target
Language for machine : (First Scenario)
– Compile the new compiler with existing compiler
Compiler • Another language with existed compiler on different
machine : (Second Scenario)
Construction – Compilation produce a cross compiler
Process of Bootstrapping
• Write a compiler in the same language
S T
S
• No compiler for source language yet
• Porting to a new host machine
• Original compiler
• Compiler source code retargeted to K
• Result in Cross Compiler
• Cross compiler
• Compiler source code retargeted to K
• Result in Retargeted Compiler
MODULE-III
CONTENTS
Scanning
The scanning process
Regular expressions
OUTCOMES
Understand the relation between types of languages and types of finite
automata
Understanding the Context free languages and grammars
source tokens
Scanner Parser
Tokens
• Idea: we want a distinct token kind (lexical
class) for each distinct terminal symbol in
the programming language
– Examine the grammar to find these
• Some tokens may have attributes
– Examples: integer constant token will have the
actual integer (17, 42, ?) as an attribute;
identifiers will have a string with the actual id
Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression,
L(a) = {a}
• (r) | (s) is a regular expression denoting the
language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the
language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
9/25/2020 Compiler Construction 67
MATRUSRI
ENGINEERING COLLEGE
Regular definitions
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]
• Example:
– letter_ -> [A-Za-z_]
– digit -> [0-9]
– id
9/25/2020
-> letter_(letter|digit)*
Compiler Construction 69
MATRUSRI
ENGINEERING COLLEGE
Recognition of tokens
• Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
9/25/2020 | number Compiler Construction 70
MATRUSRI
ENGINEERING COLLEGE
Transition diagrams
• Transition diagram for relop
lex.yy.c
C a.out
compiler
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
MODULE-IV
CONTENTS
Finite Automata
Regular expressions to DFA‘s
Use of LEX to generate scanner
OUTCOMES
Understand basic properties of formal languages and formal grammars.
And basic properties of deterministic and nondeterministic finite automata
Finite Automata
• Regular expressions = specification
• Finite automata = implementation
• A finite automaton consists of
– An input alphabet
– A set of states S
– A start state n
– A set of accepting states F S
– A set of transitions state →input state
Finite Automata
• Transition
s1 →a s2
• Is read
In state s1 on input “a” go to state s2
• If end of input
– If in accepting state => accept, othewise =>
reject
• If no transition possible
9/25/2020 => reject
Compiler Construction 77
MATRUSRI
ENGINEERING COLLEGE
• An accepting state
a
• A transition
A Simple Example
• A finite automaton that accepts only “1”
1
0 0
1
1
Epsilon Moves
• Another kind of transition: -moves
A B
Deterministic and
Nondeterministic Automata
• Deterministic Finite Automata (DFA)
– One transition per input per state
– No -moves
• Nondeterministic Finite Automata (NFA)
– Can have multiple transitions for one input in a
given state
– Can have -moves
• Finite automata have finite memory
9/25/2020 Compiler Construction 84
– Need only to encode the current state
MATRUSRI
ENGINEERING COLLEGE
Acceptance of NFAs
• An NFA can get into multiple states
1
0 1
• Input: 1 0 1
1 0
0 0
DFA
1
1
Regular
expressions DFA
Lexical Table-driven
Specification Implementation of DFA
• For
• For input a
a
• For A | B
B
A
A
C
1
E
1
A B
D
0 F
G H I J
Next
NFA
Regular
expressions DFA
Lexical Table-driven
Specification Implementation of DFA
C 1 E
1
0
A B G H I J
D F
0
0 FGABCDHI
0 1
ABCDHI
1
1 EJGABCDHI
Implementation
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbols”
– For every transition Si →a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and
skip to state Sk
– Very efficient
9/25/2020 Compiler Construction 98
MATRUSRI
ENGINEERING COLLEGE
0 1
S T U
T T U
U T U
Implementation (Cont.)
• NFA -> DFA conversion is at the heart of
tools such as flex or jflex