CS606 Midterm
CS606 Midterm
CS606 Midterm
Midterm
Lecture 01
Q: What are Compilers?
ANS:
Compilers translate information from one representation to another.
Usually information = program
Typical Compilers:
VC, VC++, GCC, JavaC
FORTRAN, Pascal, VB(?)
Translators
Word to PDF
PDF to Postscript
Typical Compilation:
High level To low level.
int expr( int n )
{
int d;
d = 4*n*n*(n+1)*(n+1);
return d;
}
Translation is a ….. process?
a. Complex b. easy
The source language and generated code are very different.
Typical compilation means programs written in high-level languages to low
level ___________.
Object code Byte code Unicode Both Object Code and byte(machine)
code (Page 2)
Typical compilation means programs written in high- level languages to low-
languages--------------.
Object code (machine code) Byte Code Unicode Both object
code and Byte code
Lecture 02
Two-pass Compiler:
The back end maps intermediate Representation (IR) into target--------------.
Object code Machine code Source code Linker
The front end maps legal source code into an ……. ?
Object code Machine code Source code intermediate
representation (IR).
Back End of two-pass compiler uses _____________ algorithm.
O(n) (n log n) NP Complete O(n2)
Front end have polynomial time complexity.
Front End:
Scanner:
The scanner takes a program as input and maps the character stream into
“words” that are the basic unit of syntax.
Word is the basic unit of syntax.
It produces pairs – a word and its part of speech.
For example, the input
x=x+y
Typical tokens are: number, identifier, +, -, new, while, if.
Parser
The parser takes in the stream of tokens, recognizes context-free syntax and
reports errors.
It guides context-sensitive (“semantic”) analysis for tasks like type checking.
The _______________ builds intermediate Representation (IR) for source
program.
Scanner Parser Compiler Loader
CFG: Context-Free Grammars
Context-free syntax is specified
with a Grammar G=(S,N,T,P)
S is the start symbol
N is a set of non-terminal symbols
T is set of terminal symbols or words
P is a set of productions or rewrite rules
Lecture 03
A parse: can be represented by a tree: parse tree or syntax tree.
For example, here is the parse tree for the expression x+2-y
The derivation can be extracted by starting at the root of the tree and working
towards the leaf nodes.
AST: Abstract Syntax Trees
The ____________ contains a lot of unneeded information.
Parse tree Abstract syntax tree Complete tree Abstract syntax
tree
Compilers often use an abstract syntax tree (AST).
AST summarizes grammatical structure without the details of derivation.
ASTs are one kind of intermediate representation (IR).
The back end maps intermediate Representation (IR) into target--------------.
Object code Machine code Source code Linker
Memory access is far slower than register access.
The Back End of a compiler consist of _________.
Instruction selection Register allocation Instruction scheduling All of
the given
Modern processors have a rich instruction set.
Responsibility of ______________ is to produce fast and compact code.
Instruction selection (Page 9) Register allocation
Instruction scheduling None of given
Instruction selection in compilers was spurred by the advent of the VAX-11.
CISC: Complex Instruction Set Computer
The VAX-11 had a large instruction set.
Lecture 04
Register Allocation:
CISC architecture provided a rich set of instructions.
Memory access is an order of magnitude slower than register access.
Optimal register allocation is NP-Complete.
In Back End module of compiler, optimal register allocation uses
___________.
O(log n) O(n log n) NP-Complete None of the given
Instruction Scheduling:
--------------- avoid hardware stalls and interlocks.
Register allocation Instruction scheduling Instruction selection
None of given
Optimal scheduling is NP-Complete in nearly all cases.
Three-pass Compiler:
The middle end analyzes IR and rewrites (or transforms) IR.
The middle end is generally termed the “Optimizer”
Modern optimizers are structured as a series of passes
In Three-pass compiler _____________ is used for code improvement or
optimization.
Front End Middle End (Page 10) Back End Both Front end and
Back end
Lecture 05
Lexical Analysis
Lexical Analysis:
Component of the front-end of a compiler:
1. Scanner
2. Parser
___________ read the input character and produce sequence of tokens as
output.
Lexical analyzer Parser Symbol table None of the given
Word:
The lexical analyzer partition input string into substrings, called words.
Tokens:
A token is a syntactic category in a sentence of a language.
Partitioning the input string by reading left-to-right, recognizing one token at
a time.
Lexer and scanner are _________ phase of compiler.
Same Different Backend Middle end
Lecture 6
Languages:
Let Σ be a set of characters. Σ is called the alphabet.
A language over Σ is set of strings of characters drawn from Σ.?
Examples:
Alphabet = English characters
Language = English sentences
Languages are sets of strings (finite sequence of characters).
Regular languages can be described using regular expressions.
If A is a regular expression, we write L(A) to refer to language denoted by A.
Regular Expression:
15, 16
Finite Automaton:
A ______________ accepts a string if we can follow transitions labeled with
characters in the string from start state to some accepting state.
Finite automata Regular expression Regular language Acceptor
Acceptor:
Mechanism to determine if an input string w belongs to L(R), the language
denoted by regular expression R. Such a mechanism is called an acceptor.
The acceptor is based on Finite Automata (FA).
A Finite Automaton consists of
• An input alphabet Σ
• A set of states
• A start (initial) state
• A set of transitions
• A set of accepting (final) states
Which of the statement is true about Regular Languages?
Regular Languages are the most popular for specifying tokens.
Regular Languages are based on simple and useful theory.
Regular Languages are easy to understand.
All
Lecture 7
Transition table: A FA can be encoded as a table. This is called a transition
table.
The rows correspond to…..?
States
The characters of the alphabet set appear in columns.
In a transition table cells of the table contain the ________ state.
Reject State Next State Previous State Same State
NFA: Nondeterministic Finite Automaton
The actual algorithm actually builds Nondeterministic Finite Automaton
(NFA) from RE.
Each RE is converted into an NFA.
NFA are joined together with …moves?
ε- @- 2-
DFA: Deterministic Finite Automaton
NFA into DFA
The eventual NFA is then converted into a Deterministic Finite Automaton
(DFA) which can be encoded as a transition table.
An NFA can have …. transitions for one input?
Multiple single
NFAs and DFAs recognize the same set of languages (regular languages).
DFA can be exponentially ….than NFA?
Larger smaller
Thompson’s Construction:
The algorithm for RE to DFA conversion is called Thompson’s Construction.
The algorithm appeared in CACM 1968.
The algorithm for _____________________ conversion is called Thompson’s
Construction.
DFA to RE RE to DFA RE to NFA CFG to NFA
The number of states in the resulting DFA are minimized using the Hopcroft’s
algorithm.
_________algorithm is used in DFA minimization.
James's Robert’s Hopcroft’s
Lecture 8
….. method is known as Subset Construction Method.
NFA TO DFA (Page 19) DFA DFA maximization None of
given
In the transition table of an NFA, each entry is a set of states.
In DFA, each entry is a single state.
e-closure(T):
set of NFA states reachable from some NFA state s in T on e-transitions
alone.
move(T,a):
set of NFA states to which there is a transition on input a from some
NFA state s in set of states T.
Lecture 9
_________algorithm is used in DFA minimization.
James's Robert’s Hopcroft’s
The process of constructing a lexical analyzer can be automated.
___________ read the input character and produce sequence of tokens as
output.
Lexical analyzer Parser Symbol table
The ------------ returns a sequence of mating token at output (or an error) and
it always return the longest matching token.
Scanner Lexer Lexical analyzer All of the given
Two popular lexical analyzer generators are:
Flex
generates lexical analyzer in C or C++
Jlex
written in Java. Generates lexical analyzer in Java
______(Lexical Analyzer generator), is written in Java..
Flex Jlex PG # 31 Complex None of given
In Flex specification file different sections are separated by ______________.
%% && ## \\
The input specification file consists of three sections:
C or C++ and flex definitions
%%
token definitions and actions
%%
user code
The symbols “%%” mark each section.
“.l” extension for Flex input files.
Lecture 10
The parser checks the stream of words (tokens) and their parts of speech for
grammatical correctness.
Lecture 11
In_________ , certain checks are performed to ensure that components of a
program fit together meaningfully..
Linear analysis Hierarchical analysis Semantic analysis None of
given
Parser does not distinguish between valid and invalid sequences of token.
True False
The acceptor mechanism determines if input token stream satisfies the syntax
of a programming language.
______________ is the process of discovering a derivation for some sentence of
a language.
Compiling Scanning Parsing Interpretation
The mathematical model of syntax is represented by a grammar G
The language generated by the grammar is indicated by L(G).
Syntax of most programming languages can be represented by Context Free
Grammars (CFG).
A CFG is a four tuple G=(S,N,T,P)
1. S is the start symbol
2. N is a set of non-terminals
3. T is a set of terminals
4. P is a set of productions
The syntax of C, C++ and Java is derived heavily from Algol-60.
The notation was developed by John Backus and adapted by Peter Naur for
the Algol-60 language report; thus the term Backus-Naur Form (BNF)
Such a process of rewrites is called a derivation
the process or discovering a derivation is called parsing.
At each step, we choose a non-terminal to replace.
Different choices can lead to different derivations.
Two derivations are of interest
1. Leftmost: replace leftmost non-terminal (NT) at each step
2. Rightmost: replace rightmost NT at each step
Lecture 12
Parse Trees