0% found this document useful (0 votes)
190 views100 pages

Unit 1 PDF

The document discusses the objectives and outcomes of a Compiler Construction course. It aims to introduce students to the different phases of compiler development including scanning, parsing, semantic analysis, code generation and optimization. The course will teach students how to build scanners and parsers using tools like Lex and Yacc. By the end of the course, students will be able to develop a compiler for a small programming language.

Uploaded by

Sai Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views100 pages

Unit 1 PDF

The document discusses the objectives and outcomes of a Compiler Construction course. It aims to introduce students to the different phases of compiler development including scanning, parsing, semantic analysis, code generation and optimization. The course will teach students how to build scanners and parsers using tools like Lex and Yacc. By the end of the course, students will be able to develop a compiler for a small programming language.

Uploaded by

Sai Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

MATRUSRI

ENGINEERING COLLEGE

MATRUSRI ENGINEERING COLLEGE


DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING

SUBJECT NAME: Compiler Construction

FACULTY NAME: L. Raghavendar Raju, Assistant Professor

9/25/2020 Compiler Construction 1


MATRUSRI
ENGINEERING COLLEGE

Compiler Construction

COURSE OBJECTIVES:
➢ To introduce the steps in language translation pipeline and
runtime data structures used in translation
➢ To learn about Scanning (lexical analysis) process using
regular expressions and use of LEX to generate scanner
➢ To introduce different Parsing strategies including top-
down (e.g., recursive descent, Early parsing, or LL) and
bottom-up (e.g., backtracking or LR) techniques
➢ Describe semantic analyses using an attribute grammar
➢ To learn how to build symbol tables and generate
intermediate code.
➢ To introduce techniques of program analysis and code
optimization
9/25/2020 Compiler Construction 2
MATRUSRI
ENGINEERING COLLEGE

Compiler Construction

COURSE OUTCOMES:
After completing this course, the student will be able to
1. Create lexical rules and grammars for a given language
2. Generate scanners and parsers from declarative
specifications.
3. Describe an abstract syntax tree for a small language.
4. Use program analysis techniques for code optimization
5. Develop the compiler for a subset of a given language
9/25/2020 Compiler Construction 3
MATRUSRI
ENGINEERING COLLEGE

UNIT-I
Introduction: Compilers
The translation process,
Data structures and issues in compiler structure,
Bootstrapping and Porting.
Scanning: The scanning process
Regular expressions,
Finite Automata,
Regular expressions to DFA‘s,
use of LEX to generate scanner.

INTRODUCTION:
Compilers
The Translation Process

OUTCOMES:
1. Identify the basic concepts needed for the development of a compiler
2. Analyze the various phases and Tools of a Compiler

9/25/2020 Compiler Construction 4


MATRUSRI
ENGINEERING COLLEGE

MODULE-I

CONTENTS:
Compilers
The Translation Process

OUTCOMES:
Identify the basic concepts needed for the development of a compiler
Analyze the various phases and Tools of a Compiler

9/25/2020 Compiler Construction 5


MATRUSRI
ENGINEERING COLLEGE

Introduction
What is a compiler?

• A computer program translates one language to


another
Source Compiler Target
Program Program

• A compiler is a complex program


• From 10,000 to 1,000,000 lines of codes
• Compilers are used in many forms of computing
• Command interpreters, interface programs

9/25/2020 Compiler Construction 6


MATRUSRI
ENGINEERING COLLEGE

Brief History of Compiler


• The first compiler was developed between 1954
and 1957
– The FORTRAN language and its compiler by a team at
IBM led by John Backus
– The structure of natural language was studied at about
the same time by Noam Chomsky

9/25/2020 Compiler Construction 7


MATRUSRI
ENGINEERING COLLEGE

Brief History of Compiler


• The related theories and algorithms in the 1960s and 1970s
– The classification of language: Chomsky hierarchy
– The parsing problem was pursued:
• Context-free language, parsing algorithms
– The symbolic methods for expressing the structure of
the words of a programming language:
• Finite automata, Regular expressions
– Methods have been developed for generating efficient
object code:
• Optimization techniques or code, improvement techniques

9/25/2020 Compiler Construction 8


MATRUSRI
ENGINEERING COLLEGE

Brief History of Compiler


• Programs were developed to automate the
complier development for parsing
– Parser generators,
• such as Yacc by Steve Johnson in 1975 for the Unix
system
– Scanner generators,
• such as Lex by Mike Lesk for Unix system about
same time

9/25/2020 Compiler Construction 9


MATRUSRI
ENGINEERING COLLEGE

Brief History of Compiler


• Projects focused on automating the
generation of other parts of a compiler
– Code generation was undertaken during the late
1970s and early 1980s
– Less success due to our less than perfect
understanding of them

9/25/2020 Compiler Construction 10


MATRUSRI
ENGINEERING COLLEGE

Brief History of Compiler


• Recent advances in compiler design
– More sophisticated algorithms for inferring and/or
simplifying the information contained in program,
• such as the unification algorithm of Hindley-Milner type
checking
– Window-based Interactive Development Environment,
• IDE, that includes editors, linkers, debuggers, and project
managers.
– However, the basic of compiler design have not
changed much in the last 20 years.

9/25/2020 Compiler Construction 11


MATRUSRI
ENGINEERING COLLEGE

Programs related to Compiler


Interpreters
• Execute the source program immediately rather
than generating object code
• Examples: BASIC, LISP, used often in educational
or development situations
• Speed of execution is slower than compiled code
by a factor of 10 or more
• Share many of their operations with compilers

9/25/2020 Compiler Construction 12


MATRUSRI
ENGINEERING COLLEGE

Assemblers
• A translator for the assembly language of a
particular computer
• Assembly language is a symbolic form of
one machine language
• A compiler may generate assembly
language as its target language and an
assembler finished the translation into
object code

9/25/2020 Compiler Construction 13


MATRUSRI
ENGINEERING COLLEGE

Linkers
• Collect separate object files into a directly
executable file
• Connect an object program to the code for
standard library functions and to resource
supplied by OS
• Becoming one of the principle activities of a
compiler, depends on OS and processor

9/25/2020 Compiler Construction 14


MATRUSRI
ENGINEERING COLLEGE

Loaders
• Resolve all re-locatable address relative to a
given base
• Make executable code more flexible
• Often as part of the operating environment,
rarely as an actual separate program

9/25/2020 Compiler Construction 15


MATRUSRI
ENGINEERING COLLEGE

Preprocessors
• Delete comments, include other files, and
perform macro substitutions
• Required by a language (as in C) or can be
later add-ons that provide additional
facilities

9/25/2020 Compiler Construction 16


MATRUSRI
ENGINEERING COLLEGE

Editors
• Compiler have been bundled together with
editor and other programs into an
interactive development environment (IDE)
• Oriented toward the format or structure of
the programming language, called structure-
based
• May include some operations of a compiler,
informing some errors

9/25/2020 Compiler Construction 17


MATRUSRI
ENGINEERING COLLEGE

Debuggers
• Used to determine execution error in a
compiled program
• Keep tracks of most or all of the source
code information
• Halt execution at pre-specified locations
called breakpoints
• Must be supplied with appropriate symbolic
information by the compiler
9/25/2020 Compiler Construction 18
MATRUSRI
ENGINEERING COLLEGE

The Translation Process

9/25/2020 Compiler Construction 19


MATRUSRI
ENGINEERING COLLEGE

The phases of a compiler


Compiling is mainly done in Two phases and further
into Six phases
1. Lexical Analysis
Front
End
2. Syntax Analysis
3. Semantic Analysis Analysis

4.Intermediate Code Generator


Back
End

5.Code Optimization
6. Code Generation Synthesis

1. Symbol table Auxiliary


2. Error Handler components
9/25/2020 Compiler Construction 20
MATRUSRI
ENGINEERING COLLEGE

The Phases of a Compiler


Source code
Lexical Analyzer
Tokens
Syntax Analyzer
Syntax Tree Symbol
Semantics Analyzer
Table
Annotated Tree

Intermediate Code Ge
Intermediate code
Error
Code Optimizer Handler
m/c level code

Code Generator
9/25/2020 Compiler Construction 21
Target code
MATRUSRI
ENGINEERING COLLEGE

The Scanner
• Lexical analysis: it collects sequences of characters into
meaningful units called tokens
• An example: a[index]=4+2
• a identifier
• [ left bracket
• index identifier
• ] right bracket
• = assignment
• 4 number
• + plus sign
• 2 number

• Other operations: it may enter literals into the literal table


9/25/2020 Compiler Construction 22
MATRUSRI
ENGINEERING COLLEGE

The Parser
• Syntax analysis: it determines the structure
of the program
• The results of syntax analysis are a parse
tree or a syntax tree
• An example: a[index]=4+2
– Parse tree
– Syntax tree ( abstract syntax tree)

9/25/2020 Compiler Construction 23


MATRUSRI
ENGINEERING COLLEGE

The Parse Tree


expression

Assign-expression

expression = expression

subscript-expression additive-expression

Expression [ expression ] expression + expression

identifier identifier number number


a index 4 2

9/25/2020 Compiler Construction 24


MATRUSRI
ENGINEERING COLLEGE

The Syntax Tree

Assign-expression

subscript-expression additive-expression

identifier identifier number number


a index 4 2

9/25/2020 Compiler Construction 25


MATRUSRI
ENGINEERING COLLEGE

The Semantic Analyzer


• The semantics of a program are its “meaning”, as
opposed to its syntax, or structure, that
– determines some of its running time behaviors prior to
execution.
• Static semantics: declarations and type checking
• Attributes: The extra pieces of information
computed by semantic analyzer
• An example: a[index]=4=2
– The syntax tree annotated with attributes

9/25/2020 Compiler Construction 26


MATRUSRI
ENGINEERING COLLEGE

The Annotated Syntax Tree


Assign-expression

subscript-expression additive-expression
integer integer

identifier identifier number number


a index 4 2
array of integer integer integer integer

9/25/2020 Compiler Construction 27


MATRUSRI
ENGINEERING COLLEGE

The Source Code Optimizer


• The earliest point of most optimization steps is
just after semantic analysis
• The code improvement depends only on the
source code, and as a separate phase
• Individual compilers exhibit a wide variation in
optimization kinds as well as placement
• An example: a[index]=4+2
– Constant folding performed directly on annotated tree
– Using intermediate code: three-address code, p-code

9/25/2020 Compiler Construction 28


MATRUSRI
ENGINEERING COLLEGE

Optimizations on Annotated Tree


Assign-expression

subscript-expression additive-expression
integer integer

identifier identifier number number


a index 4 2
array of integer integer integer integer

9/25/2020 Compiler Construction 29


MATRUSRI
ENGINEERING COLLEGE

Optimizations on Annotated Tree


Assign-expression

subscript-expression
integer

identifier identifier number


a index 6
array of integer integer integer

9/25/2020 Compiler Construction 30


MATRUSRI
ENGINEERING COLLEGE

Optimization on Intermediate Code

t = 4 + 2
a[index]=t

t= 6
a[index]=t

a[index]=6

9/25/2020 Compiler Construction 31


MATRUSRI
ENGINEERING COLLEGE

The Code Generate


• It takes the intermediate code or IR and generates
code for target machine
• The properties of the target machine become the
major factor:
– Using instructions and representation of data
• An example: a[index]=4+2
– Code sequence in a hypothetical assembly
language
9/25/2020 Compiler Construction 32
MATRUSRI
ENGINEERING COLLEGE

A possible code sequence

MOV R0, index


MUL R0,2
a[index]=6 MOV R1,&a
ADD R1,R0
MOV *R1,6

9/25/2020 Compiler Construction 33


MATRUSRI
ENGINEERING COLLEGE

The Target Code Optimizer


• It improves the target code generated by the
code generator:
– Address modes choosing
– Instructions replacing
– As well as redundant eliminating
MOV R0, index
MUL R0,2 MOV R0, index
MOV R1,&a SHL R0
ADD R1,R0 MOV &a[R1],6
MOV *R1,6
9/25/2020 Compiler Construction 34
MATRUSRI
ENGINEERING COLLEGE

MODULE-II

CONTENTS
Data structures and issues in compiler structure
Bootstrapping and Porting

OUTCOMES
Understand the Bootstrapping and Porting

9/25/2020 Compiler Construction 35


MATRUSRI
ENGINEERING COLLEGE

Principle Data Structure for Communication


among Phases

TOKENS THE SYNTAX TREE


A scanner collects characters into a token, as A standard pointer-based structure generated
a value of an enumerated data type for by parser
tokens Each node represents information collect by
May also preserve the string of characters or parser or later, which maybe dynamically
other derived information, such as name of allocated or stored in symbol table
identifier, value of a number token The node requires different attributes
A single global variable or an array of tokens depending on kind of language structure,
which may be represented as variable record.
9/25/2020 Compiler Construction 36
MATRUSRI
ENGINEERING COLLEGE

• THE SYMBOL TABLE


– Keeps information associated with
identifiers: function, variable,
Principle Data constants, and data types
Structure for – Interacts with almost every phase of
compiler.
Communication – Access operation need to be constant-
among Phases time
– One or several hash tables are often
used,
• THE LITERAL TABLE
– Stores constants and strings, reducing
size of program
– Quick insertion and lookup are
essential

9/25/2020 Compiler Construction 37


MATRUSRI
ENGINEERING COLLEGE

Principle Data Structure for Communication among


Phases

INTERMEDIATE CODE TEMPORARY FILES


Kept as an array of text string, a Holds the product of intermediate steps
temporary text, or a linked list of during compiling
structures, depending on kind of Solve the problem of memory constraints
intermediate code (e.g. three-address or back-patch addressed during code
code and p-code) generation
Should be easy for reorganization

9/25/2020 Compiler Construction 38


MATRUSRI
ENGINEERING COLLEGE

Issues in Compiler Structure

9/25/2020 Compiler Construction 39


MATRUSRI
ENGINEERING COLLEGE

The Structure of Compiler

Multiple views from different angles A major impact of the structure

Logical Structure Reliability, efficiency


Physical Structure Usefulness, maintainability
Sequencing of the operations

9/25/2020 Compiler Construction 40


MATRUSRI
ENGINEERING COLLEGE

Analysis and Synthesis

The analysis part of the compiler analyzes the


source program to compute its properties
• Lexical analysis, syntax analysis and semantics analysis, as
well as optimization
• More mathematical and better understood

The synthesis part of the compiler produces


the translated codes
• Code generation, as well as optimization
• More specialized

The two parts can be changed independently


of the other

9/25/2020 Compiler Construction 41


MATRUSRI
ENGINEERING COLLEGE

Front End and Back End


• The operations of the front end depend on the
source language
– The scanner, parser, and semantic analyzer, as well as
intermediate code synthesis
• The operations of the back end depend on the
target language
– Code generation, as well as some optimization analysis
• The intermediate representation is the medium of
communication between them
• This structure is important for compiler portability

9/25/2020 Compiler Construction 42


MATRUSRI
ENGINEERING COLLEGE

Passes
• The repetitions to process the entire source program before
generating code are referred as passes.
• Passes may or may not correspond to phases
– A pass often consists of several phases
– A compiler can be one pass, which results in efficient compilation
but less efficient target code
– Most compilers with optimization use more than one pass
• One Pass for scanning and parsing
• One Pass for semantic analysis and source-level optimization
• The third Pass for code generation and target-level optimization

9/25/2020 Compiler Construction 43


MATRUSRI
ENGINEERING COLLEGE

Language Definition and compilers


• The lexical and syntactic structure of a
programming language
– regular expressions
– context-free grammar
• The semantics of a programming language
in English descriptions
– language reference manual, or language
definition.
9/25/2020 Compiler Construction 44
MATRUSRI
ENGINEERING COLLEGE

Language Definition and compilers


• A language definition and a compiler are often
developed simultaneously
– The techniques have a major impact on definition
– The definition has a major impact on the techniques

• The language to be implemented is well known


and has an existing definition
– This is not an easy task

9/25/2020 Compiler Construction 45


MATRUSRI
ENGINEERING COLLEGE

Language Definition and compilers


• A language occasionally has it semantics given by
a formal definition in mathematical term
– So-called denotational semantics in function
programming community
– Given a mathematical proof that a compiler conforms to
the definition
• The structure and behavior of the runtime
environment affect the compiler construction
– Static runtime environment
– Semi-dynamic or stack-based environment
– Fully-dynamic or heap-based environment

9/25/2020 Compiler Construction 46


MATRUSRI
ENGINEERING COLLEGE

Compiler options and interfaces


• Mechanisms for interfacing with the
operation system
– Input and output facilities
– Access to the file system of the target machine
• Options to the user for various purposes
– Specification of listing characteristic
– Code optimization options

9/25/2020 Compiler Construction 47


MATRUSRI
ENGINEERING COLLEGE

Error Handling
• Static (or compile-time) errors must be reported
by a compiler
– Generate meaningful error messages and resume
compilation after each error
– Each phase of a compiler needs different kind of error
handing
• Exception handling
– Generate extra code to perform suitable runtime tests to
guarantee all such errors to cause an appropriate event
during execution.

9/25/2020 Compiler Construction 48


MATRUSRI
ENGINEERING COLLEGE

Bootstrapping and Porting

9/25/2020 Compiler Construction 49


MATRUSRI
ENGINEERING COLLEGE

• Machine language
Third – compiler to execute immediately;
• Another language with existed compiler on the same target
Language for machine : (First Scenario)
– Compile the new compiler with existing compiler
Compiler • Another language with existed compiler on different
machine : (Second Scenario)
Construction – Compilation produce a cross compiler

9/25/2020 Compiler Construction 50


MATRUSRI
ENGINEERING COLLEGE

T-Diagram Describing Complex


Situation
• A compiler written in language H that
translates language S into language T.
S T
H
• T-Diagram can be combined in two basic
ways.

9/25/2020 Compiler Construction 51


MATRUSRI
ENGINEERING COLLEGE

The First T-diagram Combination


A B B C A C
H H H

• Two compilers run on the same machine H


– First from A to B
– Second from B to C
– Result from A to C on H

9/25/2020 Compiler Construction 52


MATRUSRI
ENGINEERING COLLEGE

The Second T-diagram Combination


A B A B
H H K K
M

• Translate implementation language of a


compiler from H to K
• Use another compiler from H to K

9/25/2020 Compiler Construction 53


MATRUSRI
ENGINEERING COLLEGE

The First Scenario


A H A H
B B H H
H

• Translate a compiler from A to H written in


B
– Use an existing compiler for language B on
machine H

9/25/2020 Compiler Construction 54


MATRUSRI
ENGINEERING COLLEGE

The Second Scenario


A H A H
B B K K
K

• Use an existing compiler for language B on


different machine K
– Result in a cross compiler

9/25/2020 Compiler Construction 55


MATRUSRI
ENGINEERING COLLEGE

Process of Bootstrapping
• Write a compiler in the same language
S T
S
• No compiler for source language yet
• Porting to a new host machine

9/25/2020 Compiler Construction 56


MATRUSRI
ENGINEERING COLLEGE

The First step in bootstrap


A H A H
A A H H
H

• “quick and dirty” compiler written in


machine language H
• Compiler written in its own language A
• Result in running but inefficient compiler

9/25/2020 Compiler Construction 57


MATRUSRI
ENGINEERING COLLEGE

The Second step in bootstrap


A H A H
A A H H
H

• Running but inefficient compiler


• Compiler written in its own language A
• Result in final version of the compiler

9/25/2020 Compiler Construction 58


MATRUSRI
ENGINEERING COLLEGE

The step 1 in porting


A K A K
A A H H
H

• Original compiler
• Compiler source code retargeted to K
• Result in Cross Compiler

9/25/2020 Compiler Construction 59


MATRUSRI
ENGINEERING COLLEGE

The step 2 in porting


A K A K
A A K K
H

• Cross compiler
• Compiler source code retargeted to K
• Result in Retargeted Compiler

9/25/2020 Compiler Construction 60


MATRUSRI
ENGINEERING COLLEGE

MODULE-III

CONTENTS
Scanning
The scanning process
Regular expressions

OUTCOMES
Understand the relation between types of languages and types of finite
automata
Understanding the Context free languages and grammars

9/25/2020 Compiler Construction 61


MATRUSRI
ENGINEERING COLLEGE

Parsing & Scanning


• In real compilers the recognizer is split into two
phases
– Scanner: translate input characters to tokens
• Also, report lexical errors like illegal characters and illegal
symbols
– Parser: read token stream and reconstruct the
derivation

source tokens
Scanner Parser

9/25/2020 Compiler Construction 62


MATRUSRI
ENGINEERING COLLEGE

Characters vs Tokens (review)


• Input text
// this statement does very little
if (x >= y) y = 42;
• Token Stream
IF LPAREN ID(x) GEQ ID(y)
RPAREN ID(y) BECOMES INT(42) SCOLON

9/25/2020 Compiler Construction 63


MATRUSRI
ENGINEERING COLLEGE

Why Separate the Scanner and


Parser?
• Simplicity & Separation of Concerns
– Scanner hides details from parser (comments,
whitespace, input files, etc.)
– Parser is easier to build; has simpler input
stream
• Efficiency
– Scanner can use simpler, faster design
• (But still often consumes a surprising amount of
the compiler’s total execution time)

9/25/2020 Compiler Construction 64


MATRUSRI
ENGINEERING COLLEGE

Tokens
• Idea: we want a distinct token kind (lexical
class) for each distinct terminal symbol in
the programming language
– Examine the grammar to find these
• Some tokens may have attributes
– Examples: integer constant token will have the
actual integer (17, 42, ?) as an attribute;
identifiers will have a string with the actual id

9/25/2020 Compiler Construction 65


MATRUSRI
ENGINEERING COLLEGE

Typical Tokens in Programming


Languages
• Operators & Punctuation
– + - * / ( ) { } [ ] ; : :: < <= == = != ! …
– Each of these is a distinct lexical class
• Keywords
– if while for goto return switch void …
– Each of these is also a distinct lexical class (not a string)
• Identifiers
– A single ID lexical class, but parameterized by actual id
• Integer constants
– A single INT lexical class, but parameterized by int value
• Other constants, etc.
9/25/2020 Compiler Construction 66
MATRUSRI
ENGINEERING COLLEGE

Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression,
L(a) = {a}
• (r) | (s) is a regular expression denoting the
language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the
language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
9/25/2020 Compiler Construction 67
MATRUSRI
ENGINEERING COLLEGE

Regular definitions

letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*

9/25/2020 Compiler Construction 68


MATRUSRI
ENGINEERING COLLEGE

Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]

• Example:
– letter_ -> [A-Za-z_]
– digit -> [0-9]
– id
9/25/2020
-> letter_(letter|digit)*
Compiler Construction 69
MATRUSRI
ENGINEERING COLLEGE

Recognition of tokens
• Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
9/25/2020 | number Compiler Construction 70
MATRUSRI
ENGINEERING COLLEGE

Recognition of tokens (cont.)


• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
9/25/2020 Compiler Construction 71
MATRUSRI
ENGINEERING COLLEGE

Transition diagrams
• Transition diagram for relop

9/25/2020 Compiler Construction 72


MATRUSRI
ENGINEERING COLLEGE

Lexical Analyzer Generator - Lex

Lex Source program Lexical lex.yy.c


lex.l Compiler

lex.yy.c
C a.out
compiler

Input stream Sequence


a.out
of tokens

9/25/2020 Compiler Construction 73


MATRUSRI
ENGINEERING COLLEGE

Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions

9/25/2020 Compiler Construction 74


MATRUSRI
ENGINEERING COLLEGE

MODULE-IV

CONTENTS
Finite Automata
Regular expressions to DFA‘s
Use of LEX to generate scanner

OUTCOMES
Understand basic properties of formal languages and formal grammars.
And basic properties of deterministic and nondeterministic finite automata

9/25/2020 Compiler Construction 75


MATRUSRI
ENGINEERING COLLEGE

Finite Automata
• Regular expressions = specification
• Finite automata = implementation
• A finite automaton consists of
– An input alphabet 
– A set of states S
– A start state n
– A set of accepting states F  S
– A set of transitions state →input state

9/25/2020 Compiler Construction 76


MATRUSRI
ENGINEERING COLLEGE

Finite Automata
• Transition
s1 →a s2
• Is read
In state s1 on input “a” go to state s2

• If end of input
– If in accepting state => accept, othewise =>
reject
• If no transition possible
9/25/2020 => reject
Compiler Construction 77
MATRUSRI
ENGINEERING COLLEGE

Finite Automata State Graphs


• A state

• The start state

• An accepting state

a
• A transition

9/25/2020 Compiler Construction 78


MATRUSRI
ENGINEERING COLLEGE

A Simple Example
• A finite automaton that accepts only “1”
1

• A finite automaton accepts a string if we can


follow transitions labeled with the characters in
the string from the start to some accepting state
9/25/2020 Compiler Construction 79
MATRUSRI
ENGINEERING COLLEGE

Another Simple Example


• A finite automaton accepting any number of 1’s
followed by a single 0
• Alphabet: {0,1}
1

• Check that “1110” is accepted but “110…” is


not
9/25/2020 Compiler Construction 80
MATRUSRI
ENGINEERING COLLEGE

And Another Example


• Alphabet {0,1}
• What language does this recognize?
1 0

0 0

1
1

9/25/2020 Compiler Construction 81


MATRUSRI
ENGINEERING COLLEGE

And Another Example


• Alphabet still { 0, 1 }
1

• The operation of the automaton is not


completely defined by the input
– On input “11” the automaton could be in either
state
9/25/2020 Compiler Construction 82
MATRUSRI
ENGINEERING COLLEGE

Epsilon Moves
• Another kind of transition: -moves

A B

• Machine can move from state A to state B


without reading input

9/25/2020 Compiler Construction 83


MATRUSRI
ENGINEERING COLLEGE

Deterministic and
Nondeterministic Automata
• Deterministic Finite Automata (DFA)
– One transition per input per state
– No -moves
• Nondeterministic Finite Automata (NFA)
– Can have multiple transitions for one input in a
given state
– Can have -moves
• Finite automata have finite memory
9/25/2020 Compiler Construction 84
– Need only to encode the current state
MATRUSRI
ENGINEERING COLLEGE

Execution of Finite Automata


• A DFA can take only one path through the
state graph
– Completely determined by input

• NFAs can choose


– Whether to make -moves
– Which of multiple transitions for a single input
to take
9/25/2020 Compiler Construction 85
MATRUSRI
ENGINEERING COLLEGE

Acceptance of NFAs
• An NFA can get into multiple states
1

0 1

• Input: 1 0 1

• Rule: NFA accepts if it can get in a final state

9/25/2020 Compiler Construction 86


MATRUSRI
ENGINEERING COLLEGE

NFA vs. DFA (1)


• NFAs and DFAs recognize the same set of
languages (regular languages)

• DFAs are easier to implement


– There are no choices to consider

9/25/2020 Compiler Construction 87


MATRUSRI
ENGINEERING COLLEGE

NFA vs. DFA (2)


• For a given language the NFA can be simpler
than the DFA 1
0 0
NFA
0

1 0
0 0
DFA
1
1

• DFA can be exponentially larger than NFA


9/25/2020 Compiler Construction 88
MATRUSRI
ENGINEERING COLLEGE

Regular Expressions to Finite


Automata
• High-level sketch
NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

9/25/2020 Compiler Construction 89


MATRUSRI
ENGINEERING COLLEGE

Regular Expressions to NFA (1)


• For each kind of rexp, define an NFA
– Notation: NFA for rexp A
A

• For 

• For input a
a

9/25/2020 Compiler Construction 90


MATRUSRI
ENGINEERING COLLEGE

Regular Expressions to NFA (2)


• For AB
A 
B

• For A | B


B 

 A

9/25/2020 Compiler Construction 91


MATRUSRI
ENGINEERING COLLEGE

Regular Expressions to NFA (3)


• For A*

A

9/25/2020 Compiler Construction 92


MATRUSRI
ENGINEERING COLLEGE

Example of RegExp -> NFA


conversion
• Consider the regular expression
(1 | 0)*1
• The NFA is

 C
1 
E
1
A  B
 D
0 F 
G H  I J

9/25/2020 Compiler Construction 93


MATRUSRI
ENGINEERING COLLEGE

Next

NFA

Regular
expressions DFA

Lexical Table-driven
Specification Implementation of DFA

9/25/2020 Compiler Construction 94


MATRUSRI
ENGINEERING COLLEGE

NFA to DFA. The Trick


• Simulate the NFA
• Each state of resulting DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through -moves from
NFA start state
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from the states in S
after seeing the input a
9/25/2020
• considering -moves as well
Compiler Construction 95
MATRUSRI
ENGINEERING COLLEGE

NFA -> DFA Example


 C 1 E 
1
 0
A B G H  I J
 D F 

0
0 FGABCDHI
0 1
ABCDHI
1
1 EJGABCDHI

9/25/2020 Compiler Construction 96


MATRUSRI
ENGINEERING COLLEGE

NFA to DFA. Remark


• An NFA may be in many states at any time

• How many different states ?

• If there are N states, the NFA must be in some


subset of those N states

• How many non-empty subsets are there?


– 2N - 1 = finitely many, but exponentially many
9/25/2020 Compiler Construction 97
MATRUSRI
ENGINEERING COLLEGE

Implementation
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbols”
– For every transition Si →a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and
skip to state Sk
– Very efficient
9/25/2020 Compiler Construction 98
MATRUSRI
ENGINEERING COLLEGE

Table Implementation of a DFA


0
0 T
0 1
S
1
1 U

0 1
S T U
T T U
U T U

9/25/2020 Compiler Construction 99


MATRUSRI
ENGINEERING COLLEGE

Implementation (Cont.)
• NFA -> DFA conversion is at the heart of
tools such as flex or jflex

• But, DFAs can be huge

• In practice, flex-like tools trade off speed


for space in the choice of NFA and DFA
representations
9/25/2020 Compiler Construction 100

You might also like