0% found this document useful (0 votes)
39 views24 pages

1 - Introduction

This document provides an introduction to compiler construction. It outlines the content, schedule and exams for the course. It also discusses the phases of a compiler including lexical analysis, syntactic analysis, semantic analysis and code generation.

Uploaded by

Aya Saafan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views24 pages

1 - Introduction

This document provides an introduction to compiler construction. It outlines the content, schedule and exams for the course. It also discusses the phases of a compiler including lexical analysis, syntactic analysis, semantic analysis and code generation.

Uploaded by

Aya Saafan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Prof.

Radu Prodan

INTRODUCTION TO COMPILER
CONSTRUCTION
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 1
Overview
▪ Content: introduction to compiler construction

▪ Lecture (621.400)
– Wednesday, 13:30 − 15:00, N.1.44
– https://campus.aau.at/studium/course/113655
– Slides and handouts available in Moodle before lecture

▪ Exercises (621.401)
– Wednesday, 15:15 – 16:45, S.2.69, Mario Taschwer
– https://campus.aau.at/studium/course/113656

▪ Exams
– Friday, July 12, 13:30 – 15:10, HS A
– Tuesday, September 24, 13:30 – 15:10, HS A
– Friday, November 15, 13:30 – 15:10, HS A
– Friday, January 31, 13:30 – 15:10, HS A
– Written, pen and paper, open book

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 2


Literature
▪ Alfred V. Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D. Ullman, Compilers: Principles,
Techniques, and Tools, 2nd edition,
Addison Wesley, August 31, 2006, ISBN-
13: 978-0321486813

▪ Kenneth C. Louden, Compiler


Construction: Principles and Practice, 1st
edition, PWS Publishing Company,
January 24, 1997, ISBN-13: 978-
0534939724
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 3
Overview
▪ Goals
– Theory behind modern compiler design
– Design and implementation of a complete mini-compiler

▪ Contents: compiler construction phases


– Lexical analysis
– Syntactic analysis Analysis
– Semantic analysis
– Code optimisation
– Runtime environments Synthesis
– Code generation
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 4
What is a Compiler ?
▪ Source language
Source
– Usually a high-level language (e.g. C, Java) Program

▪ Target language
– Object code (machine code) Compiler

Source (Input) Target (Output) Compiler


C Machine code gcc
FORTRAN Machine code g77 Target
Program
Java Byte code javac
PDF Postscript pdf2ps
Assembler Machine code as

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 5


History (I)
▪ Machine language
– Follow up to John von Neumann’s architecture in late 1940s
– C7 06 0000 0002 (move number 2 to location 0000 on Intel 8x86 processors)

▪ Assembly language
– MOV AL, 010B
– Translates symbolic code into memory locations

▪ Mathematical notation or natural language


– X = 2

▪ FORTRAN language (1954 – 1957)


– John Backus at IBM

▪ Natural language study by Noam Chomsky


– Language classification according to complexity of their grammars (Chomsky hierarchy)
– Types 0, 1, 2, 3 (specialisation relationship)
– Type 2: context-free grammars most useful for programming languages
– Type 3: finite automata and regular expressions

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 6


History (II)
▪ Parsing problem (1960s and 1970s)
– Recognition of context-free languages

▪ Optimisation or code improvement techniques

▪ Parser (syntax analyser) generators


– Yet another compiler-compiler (Yacc) − 1975

▪ Scanner (lexical analyser) generators


– Lex

▪ Automation of other parts less successful (1970s and 1980s)


– Code generation and optimisation

▪ Interactive development environments (IDE)


– Eclipse
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 7
Compiler-related Programs
▪ Interpreters ▪ Preprocessors
– BASIC, LISP – Perform macro substitutions
– Java – Required by C
– RATional FORtran (Ratfor)
preprocessor for FORTRAN
▪ Assemblers
– Translate assembly language into ▪ Editors
object code – Structure-based
– IDEs (e.g. Eclipse)
▪ Linkers
– ld ▪ Debuggers
– “Compile” object files into – gdb , dbx
executable code
▪ Profilers
▪ Loaders – prof, gprof
– Shared libraries (relocatable code)
– gcc –shared –fPIC ▪ Project managers
– CVS, RCS, SVN, GIT

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 8


Language Description
▪ A language description is usually defined at three levels
– Lexical, syntactic, semantic

▪ Lexical level
– Defined by a dictionary
– C++: list of keywords, number formats, variable names, etc.
– Regular expressions

▪ Syntactical level
– Defined by a grammar
– C++: syntax rules for loops

▪ Semantic level
– Meaning of well-defined sentences (e.g. true, false)
– Defined in prose as part of language specification

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 9


Examples (I)
▪ Kompiller
– Lexically wrong

▪ Construction the compiler of


– Lexically correct but syntactically wrong

▪ while; if(;;) for @fred


– Lexically wrong in C++

▪ while; if(;;) for fred


– Lexically correct but syntactically wrong in C++

▪ A compiler is a POSIX library


– Syntactically (and lexically) correct but semantically wrong

▪ float i; i = “Hello”; i++;


– Syntactically correct but semantically wrong

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 10


Examples (II)
▪ A compiler is a computer program that
translates text written in a source language
into another target language
– Semantically correct

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 11


Phases of a Compiler
Front-end Back-end

Source Syntax Annotated Intermediate Target Target


Tokens
Code Tree Tree Representation Code Code

Source Code

Target Code
Generator

Optimiser
Optimiser
Semantic
Syntactic
Analyser

Analyser

Analyser
Lexical

Code
Literal Symbol Error
Table Table Handler

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 12


Lexical Analysis (Scanning)
▪ Input
– Stream of characters (source file)
Token Token Type
▪ Output a Identifier
– Stream of tokens [ Left bracket
– Token: word in a natural language
index Identifier

▪ Example: a[index] = 4+2 ] Right bracket


– 12 non-blank characters = Assignment
– 8 tokens 4 Number
+ Plus sign
▪ Optional operations 2 Number
– Enter identifiers in symbol table
– Enter literals into literal table
• Constants: 3.1415926535
• Quoted strings: “Hello, world!”

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 13


Syntactic Analysis (Parsing)
▪ Input expression
– Stream of tokens
assign-expression
– A grammar
expression = expression

▪ Output
subscript-expression additive-expression
– Parse tree or
syntax tree expression [ expression ] expression + expression
– Syntax errors
identifier identifier number number
a index 4 2
▪ a[index] = 4+2

▪ Abstract syntax tree

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 14


Semantic Analysis
▪ Meaning of a program (as opposed to syntax)
– Static semantics

▪ Attributes
– Annotate or decorate abstract syntax tree
– E.g. declarations and data type checking
• Is variable a scalar or array ?
• Is an expression type-consistent ?
• Does dimension of an array match declaration ?
• Is an array reference in bounds ?

▪ a[index] = 4+2 assign-expression


subscript-expression additive-expression
integer integer

identifier identifier number number


a index 4 2
array of integer integer integer integer
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 15
Intermediate Code Generator
Source
▪ Input Language
– Syntax tree
– Symbol table
Front-end
▪ Output: intermediate representation
– Language and machine-independent representation of a program
– Easy to generate and transform into machine code Intermediate
– Improves portability and extensibility Representation
– Supports code optimisation through multiple passes

▪ Three-address code for a[index] = 4 + 2 Back-end

– Contains addresses of up to three locations in memory

t = 4 + 2 Machine
a[index] = t Code

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 16


Multi-Language and Multi-Platform Compilers
Code for
C++ Front-end Back-end
Platform A

Intermediate Code for


FORTRAN Front-end Back-end
Representation Platform B

Code for
Pascal Front-end Back-end
Platform C

▪ M languages  M front-ends
▪ N platforms  N back-ends

▪ Integrate new languages and platforms O(1) complexity

▪ Reduce compiler implementation effort of M languages to N platforms from M ·


N to M + N
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 17
Source Code Optimiser
▪ Code improvements that depend only on source code
– Works on intermediate representation

▪ An optimisation is a transformation that


– Improves runtime execution of a program
– Decreases size of a program
– Improves resource consumption
– ...

▪ Examples
– Dead code elimination
– Constant folding, constant propagation
– Loop fusion, loop fission
– Code motion (move invariant code out of loops)
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 18
Source Code Optimiser: Example
t = 4 + 2
a[index] = t
assign-expression
subscript-expression number
▪ Constant folding integer 6
integer
t = 6
identifier identifier
a[index] = t a index
array of integer integer

▪ Constant propagation
a[index] = 6
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 19
Code Generator
▪ Input MOV R0, index
– Intermediate ;; store value of index in R0
representation MUL R0, 2
;; double value in R0
▪ Output MOV R1, &a
– Assembly code ;; store address of a in R1
– Machine code ADD R1, R0
;; add R0 to R1
▪ Challenges MOV *R1, 6
– Memory management ;; store 6 at address in R1
– Register allocation
– Instruction scheduling

▪ a[2*index] = 6
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 20
Target Code Optimiser
▪ Examples
– Replace slow instructions with faster ones
• Replace multiplication with shift instruction
• Use indexed addressing mode to perform array store
– Data pre-fetching
– Code parallelisation

▪ a[2*index] = 6

MOV R0, index ;; store value of index in R0


SHL R0 ;; double value in R0
MOV &a[R0], 6 ;; store constant 6 at address a + R0
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 21
Major Compiler Data Structures
▪ Tokens

▪ Syntax Tree

▪ Symbol Table

▪ Literal Table

▪ Intermediate Code

▪ Temporary Files

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 22


Assignment Statement Compilation
position = initial + rate * 60 Intermediate Code Generator

Lexical Analyser t1 = inttofloat(60)


t2 = id3 * t1
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60> t3 = id2 + t2
id1 = t3
Syntax Analyser
Code Optimiser
=
<id, 1> +
t1 = id3 * 60.0
<id, 2> *
<id, 3> 60 id1 = id2 + t1

Semantic Analyser
Code Generator

= LDF R2, id3 Symbol table


<id, 1> + MULF R2, R2, #60.0 1 position . . .
<id, 2> * LDF R1, id2
<id, 3> inttofloat 2 initial . . .
ADDF R1, R1, R2
60 STF id1, R1 3 rate ...
05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 23
Conclusions
▪ Overview of compiler design

▪ Phases
– Lexical analysis
– Syntactic analysis
– Semantic analysis
– Intermediate code generation
– Source code optimisation
– Code generation
– Target code optimisation

05/03/2024 R. Prodan, Compiler Construction, Summer Semester 2024 24

You might also like