Compiler Unit5notes
Compiler Unit5notes
Compiler Unit5notes
Fig1.1 A compiler
Major function done by compiler:Error report
Compiler is used to convert one form of program to another.
A compiler should convert the source program to a target machine code in such a way that
Compiler should preserve the meaning of source code.
Compiler should report errors that occur during compilation process.
The compilation must be done efficiently.
Semantic analysis: -
c
2
Semantics of a language provide meaning toits constructs, like tokensand syntax structure. Semantics help interpret symbols, their
types,andtheirrelationswitheachother.Semanticanalysisjudgeswhetherthesyntaxstructureconstructedinthesourceprogramderives
anymeaningornot.
Semantic analyzer determines the meaning of a source string.
For example matching of parenthesis in the expression, or matching of if..else statement or performing
arithmetic operation that are type compatible, or checking the scope of operation.
2
Forexample:
inta=“ value” ;
Int to float
should not issue an errorin lexical and syntax analysis phase, as it islexically and structurally correct, but it should generate a semantic
error as the type of the assignment differs. These rules are set by the grammarof the language and evaluated in semantic analysis. The
followingtasksshouldbeperformedinsemanticanalysis:Scoperesolution,Typechecking,Array-boundchecking
SemanticErrors
someofthesemanticserrorsthatthesemanticanalyzerisexpectedtorecognize:
Typemismatch,Undeclaredvariable,Reservedidentifiermisuse,Multipledeclarationof variableinascope.
Code generation:
In code generation phase the target code gets generated. The intermediate code instruction are translated into sequence of
machine instruction.
MOV id3, R1
MUL #2.0, R1
MOV id2, R2
MUL R2, R1
MOV id1, R2
ADD R2, R1
MOV R1, id1
Symbol Table
A symbol table is a data structure used by a language translator such as a compiler or interpreter.
It is used to store names encountered in the source program, along with the relevant attributes for those names.
Information about following entities is stored in the symbol table.
Variable/Identifier
Procedure/function
Keyword
Constant
Class name
Label name
Source program
Lexical analysis
Syntax Analysis
Semantic Analysis
Symbol Table Error
detection an
Intermediate Code
Code Optimization
Code Generation
Target Program
QThe context of a compiler.ORWhat does the linker do? What does the loader do? What does the Processors or do? Explain
their role(s) in compilation process.
In addition to a compiler, several other programs may be required to create an executable target program.
Preprocessor -Preprocessor produces input to compiler. They may perform the following functions,
1. Macro processing: A preprocessor may allow user to define macros that are shorthand for longer constructs.
2. File inclusion: A preprocessor may include the header file into the programtext.
3. Rational preprocessor: Such a preprocessor provides the user with built-in macro for construct like
while statement or if statement.
4. Language extensions: this processors attempt to add capabilities to the language by what amount to built-in
macros. Ex: the language equal is a database query language embedded in C. statement beginning with ## are taken
by preprocessor to be database access statement unrelated to C and translated into procedure call on routines that
perform the databaseaccess.
Skeletal source
Preprocessor
Source program
Compiler
Target assembly
Assembler
What is the pass of compiler? Explain how the single and multi- pass compilers work?
What is the effect of reducing the number of passes?
One complete scan of a source program is called pass.Pass include reading an input file and writing to the output file.In
a single pass compiler analysis of source statement is immediately followed by synthesis of equivalent target
statement.It is difficult to compile the source program into single pass dueto:
Forward reference: a forward reference of a program entity is a reference to the entity which precedes its
definition in the program.
This problem can be solved by postponing the generation of target code until more information concerning the
entity becomes available.
It leads to multi pass model of compilation.
In Pass I: Perform analysis of the source program and note relevantinformation.
In Pass II: Generate target code using information noted in pass I.
Token
Lexical Parser
Source program
Get next token
If If If
Relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or >
Id Pi, count, n, I letter followed by letters and
digits.
Number 3.14159, 0, 6.02e23 Any numeric constant
Literal "Darshan Institute" Any character between “ and
“ except “
Example:
total = sum + 12.5
Tokens are: total (id),
=relation)
Sum (id)
+ (operator)
12.5 (num)
Lexemes are: total, =, sum, +, 12.5
Specification of token.
Strings and languages Terms for a Part of string
Term Definition
Prefix of S A string obtained by removing zero or more trailing symbol of string S.
e.g., ban is prefix of banana.
Sub string of S A string obtained by removing prefix and suffix from S. Table
e.g., nan is substring of banana. 2.2. Terms
Proper prefix, suffix and Any nonempty string x that is respectively prefix, suffix or for a part of
substring of S substring of S, such that s≠x astring
Transition Diagram.
A stylized flowchart is called transition diagram.
Positions in a transition diagram are drawn as a circle and are calledstates.
States are connected by arrows called edges.
Edge leaving state have label indicating the input character.
The transition diagram for unsigned number is given in Fig.2.4.
d + d Oth
d
Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:
Tostorethenamesofallentitiesinastructuredformatoneplace.
Toverifyifavariablehasbeendeclared.
Toimplementtypechecking,byverifyingassignmentsandexpressionsinthesourcecodearesemanticallycorrect.
Todeterminethescopeofaname(scoperesolution).
Asymboltableissimplyatablewhichcanbeeitherlinearorahashtable.Itmaintainsanentryforeachnameinthefollowing format:
Forexample,ifasymboltablehastostoreinformationaboutthefollowingvariabledeclaration:
staticintinterest;
thenitshouldstoretheentrysuchas:
<interest,int,static>
Theattributeclausecontainstheentriesrelatedtothename.
Implementation
Ifacompileristohandleasmallamountofdata,thenthesymboltablecanbeimplementedasanunorderedlist,whichiseasytocode,butit
isonlysuitableforsmalltablesonly.Asymboltablecanbeimplementedinoneofthefollowingways:
Linear(sorted orunsorted)list
BinarySearchTree
Hashtable
Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash
functionandthereturnvalueistheinformationaboutthesymbol.
Symboltablesareusedorreferredbyboththeanalysisandsynthesisphasesofcompiler(i.eallphasesofcompiler).Itusesittoverifythatusedidentifiers
aredefined,toverifythatexpressionsandassignmentaresemanticallycorrect,togenerateintermediateandtargetcode.
Ans:-CODEOPTOMIZATION
Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU,
Memory)anddeliverhighspeed.
In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes. A code
optimizingprocessmustfollowthethreerulesgivenbelow:
Theoutputcodemustnot,inanyway,changethemeaningoftheprogram.
Optimizationshouldincreasethespeedoftheprogramandifpossible,theprogramshoulddemandlessnumberofresources.
Optimizationshoulditselfbefastandshouldnotdelaytheoverallcompilingprocess.
Effortsforanoptimizedcodecanbemadeatvariouslevelsofcompilingtheprocess.
Atthebeginning,userscanchange/rearrangethecodeorusebetteralgorithmstowritethecode.
Aftergeneratingintermediatecode,thecompilercanmodifytheintermediatecodebyaddresscalculationsandimprovingloops.
Whileproducingthetargetmachinecode,thecompilercanmakeuseofmemoryhierarchyandCPUregisters.
Optimizationcanbecategorizedbroadlyintotwotypes:machineindependentandmachinedependent.
Machine-independentOptimization
Inthisoptimization,thecompilertakesintheintermediate codeand transformsapartofthecodethatdoesnotinvolve anyCPUregisters
and/orabsolutememorylocations.Forexample:
do{
item=10;
value=value+item;}while(value<100);
Thiscodeinvolvesrepeatedassignmentoftheidentifieritem,whichifweputthisway:
Item=10;do{
value=value+item;}while(value<100);
shouldnotonlysavetheCPUcycles,butcanbeusedonanyprocessor.
Machine-dependentOptimization
Machine-dependent optimization is done after the target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references.
Machine-dependentoptimizersputeffortstotakemaximumadvantageofmemoryhierarchy.
Q-ExplainParser.
1. Role of Parser.
In our compiler model, the parser obtains a string of tokens from lexical analyzer, as shown in fig.
We expect the parser to report any syntax error. It should commonly occurring errors.
The methods commonly used for parsing are classified as a top down or bottom up parsing.
In top down parsing parser, build parse tree from top to bottom, while bottom up parser starts from leaves and
work up to theroot.
In both the cases, the input to the parser is scanned from left to right, one symbol at a time.
We assume the output of parser is some representation of the parse tree for the stream of tokens produced by the
lexical analyzer.
Difference between syntax tree and Parse tree.
Syntax tree v/s Parse tree
Intermediate code
Static type checker Code generator
Parse generator
There are certain advantages of generating machine independent intermediate code, new machine to an existing front
end.
A machine independent code optimizer can be applied to intermediate code in order to optimize the code generation.
Intermediate forms
There are three types of intermediate representation,
1. Abstract syntax tree
2. Postfix notation
3. Three address code
Abstract Syntax tree
A syntax tree depicts the natural hierarchical structure of a source program.
A DAG gives the same informant but in a more compact way because common sub- expressions are identified.
A syntax tree and DAG for the assignment statement a=b*-c+b*-c is given in Fig.
Assign Assign
* +
* *
b Uminus b Uminus
Syntax tree & DAG for a=b*-c+b*-c Uminus
Postfix notation:
Postfix notation is a internalization of a syntax tree.
In post fix notation the operands occurs first and then operators are arranged. c
The postfix notation for the syntax tree in fig.
A b c uminus*b c uminus* + assign.
Three address code
Three address code is a sequence of statement or constants. And op stands for any operator.
Where a,b or c are the operands that can be names or constants . And op stands for any operator.
For the expression like. A=b+c+d might be translated into a sequence,
t1=b+c
t2=t1+d
a=t2
Here t1 and t2 are the temporary names generated by the compiler.
There are at most three addresses allowed (two for operands and one for result). Hence, this reorientation is called
three-address code.
Implementations of three address code
There are three types of representation used for three address code,
1. Quadruples
2. Triples
3. Indirect triples
Consider the input statement X = -a*b+ -a*b.
Three address code for above statement given in table
t1=-a
t2;=t1*b
t3=-a
t4:=t3*b
t5:=t2+t4
X=t5
Table three address code
Quadruple representation
The quadruple is a structure with at the most four field such as op, arg1,arg2.
The op field is used to represent the internal code for operator.
The arg1 and arg2 represent the two operands.
And result field is used to store the result of an expression.
Statement with unary operators like x=y do not use agr2.
Conditional and unconditional jumps put the target label in result.
Triples:
To avoid entering temporary names into the symbol table, we might refer a temporary value by the position of the statement
that computes it.
If we do so, three address statements can be represented by record with only three fields;
3. Variable propagation
Variable propagation means of one variable instead of another.
Example:
x=pi; area=x*r*r;
The optimization using variable propagation can be done as follows . area=pi*r*r;
Here the variable x is eliminated. Here the necessary condition is that variable must be assigned to another variable or
some constant.
4. Code movement
There are two basic goals of code movement;
I. To reduce the size of the code.
Ii. To reduce the frequency of execution of code.
Example:
for(i=0;j<=10;i++) x =y*5
{ for(i=0;j<=10;++)
X = y*5; {
K = (y*5)+50 K= x + 50;
}
5. Strength reduction
Here we get the count values as 7,14,21 and so on up to less than 50.
This code can be replaced by using strength reduction as follows
temp=7;
for(i=1;i<=50;i++)
{
Count=temp;
Temp=temp+7;
}
A variable is said to be live in a program if the value contained into it subsequently used.
On the other hand, the variable is said to be dead at a point in a program if the value contained into it is never been
used. The code containing such a variable supposed to be a dead code. And an optimization can be performed by
eliminating such a dead code.
Example;
I=o;
If(i==1)
{
A=x+5;
}
If statement is a dead code as this condition will never get satisfied hence, statement can be eliminated and optimization can
be done.
Q-Role of Lexical analyzer OR How tokens are recognized by lexical phase? Explain in detail.
Lexicalanalysisisthefirstphaseofacompiler.Ittakesthemodifiedsourcecodefromlanguagepreprocessorsthatarewrittenintheformof
sentences.Thelexicalanalyzerbreaksthesesyntaxesintoaseriesoftokens,byremovinganywhitespaceorcommentsinthesourcecode.
If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads
characterstreamsfromthesourcecode,checksforlegaltokens,andpassesthedatatothesyntaxanalyzerwhenitdemands.
Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be
identifiedasavalid token.Theserulesaredefinedbygrammarrules,bymeansofapattern.Apattern explainswhat canbe atoken,and
thesepatternsaredefinedbymeansofregularexpressions.
Forexample,inClanguage,thevariabledeclarationline
intvalue=100;
containsthetokens:
int(keyword),value(identifier),=(operator),100(constant) and;(symbol).
SpecificationsofTokens
Letusunderstandhowthelanguagetheoryundertakesthefollowingterms:
Alphabets
Anyfinitesetofsymbols{0,1}isasetofbinaryalphabets,{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}isasetofHexadecimalalphabets,{a-z,A-Z}isa
setofEnglishlanguagealphabets.
Strings
Anyfinitesequenceofalphabetsiscalledastring.Lengthofthestringisthetotalnumberofoccurrenceofalphabets,e.g.,thelengthofthe
string tutorialspointis14 and isdenoted by|tutorialspoint|=14.Astring havingno alphabets,i.e.a stringofzerolengthisknown asan
emptystringandisdenotedbyε(epsilon).
SpecialSymbols
Atypicalhigh-levellanguagecontainsthefollowingsymbols:-
Assignment =
Comparison ==,!=,<,<=,>,>=
Preprocessor #
LocationSpecifier &
ShiftOperator >>,>>>,<<,<<<
Language
Alanguage is considered asa finite set ofstrings oversome finite set of alphabets. Computerlanguages areconsidered as finite sets, and
mathematicallysetoperationscanbeperformedonthem.Finitelanguagescanbedescribedbymeansofregularexpressions.
LongestMatchRule
Whenthelexicalanalyzerreadthesource-code,itscansthecodeletterbyletter;andwhenitencountersawhitespace,operatorsymbol,or
specialsymbols,itdecidesthatawordiscompleted.
Forexample:
intintvalue;
While scanningbothlexemes till ‘ int’ ,thelexicalanalyzercannot determine whetherit isa keyword orthe initialsof identifierint
value.
TheLongestMatchRulestatesthatthelexemescannedshouldbedeterminedbasedonthelongestmatchamongallthetokensavailable.