Compiler Unit5notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

UNIT6 COMPILER SYSTEM PROGRAMMING

Explain overview of translation process


A translator is a kind of program that taken one form of program as input and converts it into another from.
The input is called source program and output is called target program.
The source language can be assembly language or higher level language like C, C++, FORTRAN, etc.......
There are three type of translators,
1. Compiler 2.Interpreter 3.Assembler

2. What is compiler? List major function done by compiler.


A compiler is a program that reads a program written in one language and translates into an equivalent program in
another language.

Source Program Compiler Target Program

Fig1.1 A compiler
Major function done by compiler:Error report
Compiler is used to convert one form of program to another.
A compiler should convert the source program to a target machine code in such a way that
Compiler should preserve the meaning of source code.
Compiler should report errors that occur during compilation process.
The compilation must be done efficiently.

Write the difference between compiler, interpreter and assembler.


1. Compiler v/s Interpreter

No. Compiler Interpreter


1 Compiler takes entire program as an input. Interpreter takes single instruction as an input.
2 Intermediate code is generated. No Intermediate code is generated.
3 Memory requirement is more Memory requirement is less.
4 Error is displayed after entire program is Error is displayed for every instruction interpreted.
checked.
5 Example: C compiler Example: BASIC

2. Compiler v/s Assembler


NO. Compiler Assembler
1 It translates higher level language to machine It translates mnemonic operation code to machine code.
code.
2 Types of compiler, Types of assembler,
Single pass compiler Single pass assembler
Multi pass compiler Two pass assembler

3 Example: C compiler Example: 8085, 8086 instruction set


Table 1.2 Difference between compiler & Assembler

Q-Explain compiler and various phases of compiler in detail.


Explain Phases of compiler OR
Write output of phases of a compiler for a=a+b*c*2 ; type of a,b,c are float
There are mainly two parts of compilation process.
1. Analysis Phase: The main objective of the analysis phase is to break the source code into parts and then arranges these
pieces into a meaningful structure.
2. Synthesis Phase: Synthesis phase is concerned with generation of target language statement which has the same meaning
as the source statement.
Analysis Phase : Analysis part is divided into three sub parts,
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
Lexical analysis:
Lexical analysis is also called linear analysis or scanning.Lexical analyzer known as scanner.
Lexical analyzer reads the source program and them it is broken into stream of units, such units are called token.
Then it classifies the unit into different lexical classes. E.g id’s constants , keyword etc... And enters them into different
tables.
For example, in lexical analysis the assignment statement a:= a+b*c*2 would be grouped into the following tokens:
a Identifier 1
= Assignment sign
a Identifier 1
+ The plus sign
b Identifier 2
* Multiplication
sign
c Identifier 3
Syntax Analysis: * Multiplication
Syntax analysis is sign also called hierarchical analysis or parsing.Syntax analyzer
known as parser. 2 Number 2
The syntax analyzer checks each line of the code and spots every tiny mistake that the
programmer has committed while typing the code.
If code is error free then syntax analyzer
= generates the tree.

Semantic analysis: -
c
2
Semantics of a language provide meaning toits constructs, like tokensand syntax structure. Semantics help interpret symbols, their
types,andtheirrelationswitheachother.Semanticanalysisjudgeswhetherthesyntaxstructureconstructedinthesourceprogramderives
anymeaningornot.
Semantic analyzer determines the meaning of a source string.
For example matching of parenthesis in the expression, or matching of if..else statement or performing
arithmetic operation that are type compatible, or checking the scope of operation.

2
Forexample:

inta=“ value” ;
Int to float
should not issue an errorin lexical and syntax analysis phase, as it islexically and structurally correct, but it should generate a semantic
error as the type of the assignment differs. These rules are set by the grammarof the language and evaluated in semantic analysis. The
followingtasksshouldbeperformedinsemanticanalysis:Scoperesolution,Typechecking,Array-boundchecking

SemanticErrors
someofthesemanticserrorsthatthesemanticanalyzerisexpectedtorecognize:

Typemismatch,Undeclaredvariable,Reservedidentifiermisuse,Multipledeclarationof variableinascope.

Accessinganoutofscopevariable, Actualandformal parametermismatch.

Synthesis phase: synthesis part is divided into three sub parts,


I. Intermediate code generation
II. Code optimization
III. Code generation
Intermediate code generation:
The intermediate representation should have two important properties, i t s ho u l d b e easy to produce and easy to
translate into target program.
We consider intermediate form called “three address code”.
Three address code consist of a sequence of instruction, each of which has at most three operands.The source program
might appear in three address code as,
t1= int real(2)
t2=id3 * t1
t3= t2 * id2
t4= t3 + id1
id1= t4
Code optimization:
The code optimization phase attempt to improve the intermediate code.
This is necessary to have a faster executing code or less consumption ofmemory.
Thus by optimizing the code the overall running time of a target program can be improved.
t1=id3*2.0
t2=id2*t1
Id1=id1+t2

Code generation:
In code generation phase the target code gets generated. The intermediate code instruction are translated into sequence of
machine instruction.
MOV id3, R1
MUL #2.0, R1
MOV id2, R2
MUL R2, R1
MOV id1, R2
ADD R2, R1
MOV R1, id1

Symbol Table
A symbol table is a data structure used by a language translator such as a compiler or interpreter.
It is used to store names encountered in the source program, along with the relevant attributes for those names.
Information about following entities is stored in the symbol table.

Variable/Identifier
Procedure/function
Keyword
Constant
Class name
Label name

Source program

Lexical analysis
Syntax Analysis

Semantic Analysis
Symbol Table Error
detection an
Intermediate Code

Code Optimization

Code Generation

Target Program

QThe context of a compiler.ORWhat does the linker do? What does the loader do? What does the Processors or do? Explain
their role(s) in compilation process.
In addition to a compiler, several other programs may be required to create an executable target program.
Preprocessor -Preprocessor produces input to compiler. They may perform the following functions,
1. Macro processing: A preprocessor may allow user to define macros that are shorthand for longer constructs.
2. File inclusion: A preprocessor may include the header file into the programtext.
3. Rational preprocessor: Such a preprocessor provides the user with built-in macro for construct like
while statement or if statement.
4. Language extensions: this processors attempt to add capabilities to the language by what amount to built-in
macros. Ex: the language equal is a database query language embedded in C. statement beginning with ## are taken
by preprocessor to be database access statement unrelated to C and translated into procedure call on routines that
perform the databaseaccess.
Skeletal source

Preprocessor

Source program

Compiler

Target assembly

Assembler

Relocatable M/C code


Assembler
Assembler is a translator which takes the assembly program as an input and generates the machine code as
a output. An assembly is a mnemonic version of machine code, in which names are used instead of binary
codes for operations.
Linker
Linker allows us to make a single program from a several files of relocatable machine code. These file may
have been the result of several different compilation, and one or more may be library files of routine
provided by a system.
Loader
The process of loading consists of taking relocatable machine code, altering the relocatable address and
placing the altered instructions and data in memory at the proper location.

Explain front end and back end in brief. (Grouping ofphases)


The phases are collected into a front end and backend.
Front end
The front end consist of those phases, that depends primarily on source language and largely independent of the
target machine.
Front end includes lexical analysis, syntax analysis, semantic analysis, intermediate code generation and
creation of symbol table.
Certain amount of code optimization can be done by front end.
Back end
The back end consists of those phases, that depends on target machine and do not depend on source program.

What is the pass of compiler? Explain how the single and multi- pass compilers work?
What is the effect of reducing the number of passes?
One complete scan of a source program is called pass.Pass include reading an input file and writing to the output file.In
a single pass compiler analysis of source statement is immediately followed by synthesis of equivalent target
statement.It is difficult to compile the source program into single pass dueto:
Forward reference: a forward reference of a program entity is a reference to the entity which precedes its
definition in the program.
This problem can be solved by postponing the generation of target code until more information concerning the
entity becomes available.
It leads to multi pass model of compilation.
In Pass I: Perform analysis of the source program and note relevantinformation.
In Pass II: Generate target code using information noted in pass I.

Effect of reducing the number of passes


It is desirable to have a few passes, because it takes time to read and write intermediate file.
On the other hand if we group several phases into one pass we may be forced to keep the entire program in the
memory. Therefore memory requirement may belarge.
Write difference between single pass and multi pass compiler.
Single pass compiler v/s Multi pass Compiler
No. Single pass compiler Multi pass compiler
1 A one-pass compiler is a compiler that passes A multi-pass compiler is a type of compiler that
through the source code of each compilation processes the source code or abstract syntax tree
unit only once. of a program several times.
2 A one-pass compiler is faster than A multi-pass compiler is slower than single pass
multi-pass compiler. compiler.
3 One-pass compiler are sometimes Multi-pass compilers are sometimes called wide
called narrow compiler. compiler.
4 Language like Pascal can be Languages like Java require a multi-pass compiler.
implemented with a single Pass
compiler.
Table 1.3 Difference between Single Pass Compiler & Multi Pass Compiler

Write the difference between phase and pass.


Phase v/s Pass
No. Phase Pass
1 The process of compilation is carried Various phases are logically grouped
out in various step is called phase. together to form a pass.
2 The phases of compilation are lexical The process of compilation can be carried out in a
analysis, syntax analysis, semantic single pass or in multiple passes.
analysis,intermediate code
generation, code optimization and code
generation.
Table 1.3 Difference between Phase & Pass

Role of lexical analysis andits issues. OR


How do the parser and scanner communicate? Explain with the block diagram
communication between them.
Ans:-The lexical analyzer is the first phase of compiler. Its main task is to read the input characters and produce as
output a sequence of tokens that the parser uses for syntax analysis.
This interaction is given in figure

Token
Lexical Parser
Source program
Get next token

Fig. 2.1 Communication


Symbol between Scanner & Parser
table
It is implemented by making lexical analyzer be asubroutine.Upon receiving a “get next token” command from parser,
the lexical analyzer reads the input character until it can identify the next token.
It may also perform secondary task at userinterface.
One such task is stripping out from the source program comments and white space in the form of blanks, tabs, and
newlinecharacters.
Some lexical analyzer are divided into cascade of two phases, the first called scanning and second is “lexical
analysis”.
The scanner is responsible for doing simple task while lexical analysis does the more complex task.

Issues in Lexical Analysis:


There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing:
Simpler design is perhaps the most important consideration. The separation of lexical analysis often allows us to
simplify one or other of these phases.
Compiler efficiency is improved.
Compiler portability is enhanced.

Explain token, pattern and lexemes.


Token: Sequence of character having a collective meaning is known as .
Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5) constants
Pattern: The set of rules called associated with a token.
Lexeme: The sequence of character in a source program matched with a pattern for a token is called lexeme.
Token Lexeme Pattern
Const Const Const

If If If
Relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or >
Id Pi, count, n, I letter followed by letters and
digits.
Number 3.14159, 0, 6.02e23 Any numeric constant
Literal "Darshan Institute" Any character between “ and
“ except “

Table 2.1. Examples of Tokens

Example:
total = sum + 12.5
Tokens are: total (id),
=relation)
Sum (id)
+ (operator)
12.5 (num)
Lexemes are: total, =, sum, +, 12.5

Specification of token.
Strings and languages Terms for a Part of string
Term Definition
Prefix of S A string obtained by removing zero or more trailing symbol of string S.
e.g., ban is prefix of banana.

Suffix of S A string obtained by removing zero or more leading symbol of string S.


e.g., nana is suffix of banana.

Sub string of S A string obtained by removing prefix and suffix from S. Table
e.g., nan is substring of banana. 2.2. Terms
Proper prefix, suffix and Any nonempty string x that is respectively prefix, suffix or for a part of
substring of S substring of S, such that s≠x astring

Subsequence of S A string obtained by removing zero or more not necessarily contiguous


symbol from S.
e.g., baaa is subsequence of banana.
Operation on languages
Definition of operation on language
Operation Definition
Union of L and M L U M = {s | s is in L or s is in M }
Written L U M
concatenation of L and M LM = {st | s is in L and t is in M }
Written LM
Kleene closure of L written L* L* denotes “zero or more concatenation of”
L.
Positive closure of L written L+ L+ denotes “one or more concatenation of” Table 2.3.
Definitions L. of operations on
languages

Regular Expression & Regular Definition.


Regular Definition
A regular definition gives names to certain regular expressions and uses those names in other regular expressions.
Here is a regular definition for the set of Pascal identifiers that is define as the set of strings of letters and digits
beginning with a letters.
letter → A | B | . . . | Z | a | b | . . . | z
digit → 0 | 1 | 2 | . . . | 9
id → letter (letter | digit)*
The regular expression id is the pattern for the identifier token and
defines letter and digit. Where letter is a regular expression for the set of all upper-case and lower case letters in the
alphabet and digit is the regular expression for the set of all decimal digits.
EXAMPLES:-
Regular expression
over ∑={a,b,c} that represent all string of length 3. (a+b+c)(a+b+c)(a+b+c)
String having zero or more a. a*
String having one or more a. a+
Reorganization of Token-using regular expression,regular definition and grammar
defining a language tokens are recognized by lexical analyzer.
Here we address how to recognizetoken.
We use the language generated by following grammar,
stmt → if expr then stmt
| if expr then stmt else stmt
|∈

expr → term relop term


| term term →
id | num
Where the terminals if, then, else, relop, id and num generates the set of strings given by the following
regulardefinitions,
if → if
then → then
relop → < | <= | = | <> | > | >=
letter → A | B | . . . | Z | a | b | . . . | z digit → 0 |
1| 2| ...|9
id → letter (letter | digit)*
num → digit+ ()?(E(+/-)?digit+ )?
For this language the lexical analyzer will recognize the keyword if, then, else, as well as the lexeme denoted by
relop, id and num.
num represents the unsigned integer and real numbers of pascal.
Lexical analyzer will isolate the next token in the input buffer and produces token and attribute value as an
output.

Transition Diagram.
A stylized flowchart is called transition diagram.
Positions in a transition diagram are drawn as a circle and are calledstates.
States are connected by arrows called edges.
Edge leaving state have label indicating the input character.
The transition diagram for unsigned number is given in Fig.2.4.

d + d Oth
d

Fig. 2.4. Transition diagram for unsigned number


Q-What is the significance of a 'Symbol Table' in compiler? Which phases of compiler refer/manipulate the symbol table?
Symbol table is an important data structure created and maintained by compilers in orderto store information about the occurrence of
variousentitiessuchasvariablenames,function names,objects,classes,interfaces,etc.Symboltable isusedby boththeanalysisandthe
synthesispartsofacompiler.

Asymboltablemayservethefollowingpurposesdependinguponthelanguageinhand:

Tostorethenamesofallentitiesinastructuredformatoneplace.

Toverifyifavariablehasbeendeclared.

Toimplementtypechecking,byverifyingassignmentsandexpressionsinthesourcecodearesemanticallycorrect.

Todeterminethescopeofaname(scoperesolution).

Asymboltableissimplyatablewhichcanbeeitherlinearorahashtable.Itmaintainsanentryforeachnameinthefollowing format:

<symbolname, type, attribute>

Forexample,ifasymboltablehastostoreinformationaboutthefollowingvariabledeclaration:

staticintinterest;

thenitshouldstoretheentrysuchas:

<interest,int,static>

Theattributeclausecontainstheentriesrelatedtothename.

Implementation
Ifacompileristohandleasmallamountofdata,thenthesymboltablecanbeimplementedasanunorderedlist,whichiseasytocode,butit
isonlysuitableforsmalltablesonly.Asymboltablecanbeimplementedinoneofthefollowingways:

Linear(sorted orunsorted)list

BinarySearchTree

Hashtable

Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash
functionandthereturnvalueistheinformationaboutthesymbol.

Symboltablesareusedorreferredbyboththeanalysisandsynthesisphasesofcompiler(i.eallphasesofcompiler).Itusesittoverifythatusedidentifiers
aredefined,toverifythatexpressionsandassignmentaresemanticallycorrect,togenerateintermediateandtargetcode.

Q-Explain CODEOPTOMIZATIONof compiler.

Ans:-CODEOPTOMIZATION

Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU,
Memory)anddeliverhighspeed.
In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes. A code
optimizingprocessmustfollowthethreerulesgivenbelow:

Theoutputcodemustnot,inanyway,changethemeaningoftheprogram.

Optimizationshouldincreasethespeedoftheprogramandifpossible,theprogramshoulddemandlessnumberofresources.

Optimizationshoulditselfbefastandshouldnotdelaytheoverallcompilingprocess.

Effortsforanoptimizedcodecanbemadeatvariouslevelsofcompilingtheprocess.

Atthebeginning,userscanchange/rearrangethecodeorusebetteralgorithmstowritethecode.

Aftergeneratingintermediatecode,thecompilercanmodifytheintermediatecodebyaddresscalculationsandimprovingloops.

Whileproducingthetargetmachinecode,thecompilercanmakeuseofmemoryhierarchyandCPUregisters.

Optimizationcanbecategorizedbroadlyintotwotypes:machineindependentandmachinedependent.

Machine-independentOptimization
Inthisoptimization,thecompilertakesintheintermediate codeand transformsapartofthecodethatdoesnotinvolve anyCPUregisters
and/orabsolutememorylocations.Forexample:

do{
item=10;
value=value+item;}while(value<100);

Thiscodeinvolvesrepeatedassignmentoftheidentifieritem,whichifweputthisway:

Item=10;do{
value=value+item;}while(value<100);

shouldnotonlysavetheCPUcycles,butcanbeusedonanyprocessor.

Machine-dependentOptimization
Machine-dependent optimization is done after the target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references.
Machine-dependentoptimizersputeffortstotakemaximumadvantageofmemoryhierarchy.

Q-ExplainParser.

1. Role of Parser.
In our compiler model, the parser obtains a string of tokens from lexical analyzer, as shown in fig.

Source Lexical Parser Rest of Compilation


progrm
Get
nex
t
toke Symbol
Position of parser in compiler model

We expect the parser to report any syntax error. It should commonly occurring errors.
The methods commonly used for parsing are classified as a top down or bottom up parsing.
In top down parsing parser, build parse tree from top to bottom, while bottom up parser starts from leaves and
work up to theroot.
In both the cases, the input to the parser is scanned from left to right, one symbol at a time.
We assume the output of parser is some representation of the parse tree for the stream of tokens produced by the
lexical analyzer.
Difference between syntax tree and Parse tree.
Syntax tree v/s Parse tree

No. Parse Tree Syntax Tree


1 Interior nodes are non-terminals, Interior nodes are “operators”, leaves
leaves are terminals. are operands.
2 Rarely constructed as a data structure. When representing a program in a tree structure
usually use a syntax
tree.
3 Represents the concrete syntax of a Represents the abstract syntax of a
program. program (the semantics).
Difference between syntax tree & Parse tree

Example: Consider grammar following grammar,


E E+E
E E* E
E Id
Shows the syntax tree and parse tree for string id +id*id.

Parse tree and Syntax tree


Q-Write note on Role of intermediate code generation
In the analysis-synthesis model of a compiler, that front end translates a source program into an intermediate
representation from which backend generates target code.
The generation of an intermediate language leads to efficient code generation.

Intermediate code
Static type checker Code generator
Parse generator
There are certain advantages of generating machine independent intermediate code, new machine to an existing front
end.
A machine independent code optimizer can be applied to intermediate code in order to optimize the code generation.

Intermediate forms
There are three types of intermediate representation,
1. Abstract syntax tree
2. Postfix notation
3. Three address code
Abstract Syntax tree
A syntax tree depicts the natural hierarchical structure of a source program.
A DAG gives the same informant but in a more compact way because common sub- expressions are identified.
A syntax tree and DAG for the assignment statement a=b*-c+b*-c is given in Fig.

Assign Assign

* +

* *

b Uminus b Uminus
Syntax tree & DAG for a=b*-c+b*-c Uminus

Postfix notation:
Postfix notation is a internalization of a syntax tree.
In post fix notation the operands occurs first and then operators are arranged. c
The postfix notation for the syntax tree in fig.
A b c uminus*b c uminus* + assign.
Three address code
Three address code is a sequence of statement or constants. And op stands for any operator.
Where a,b or c are the operands that can be names or constants . And op stands for any operator.
For the expression like. A=b+c+d might be translated into a sequence,
t1=b+c
t2=t1+d
a=t2
Here t1 and t2 are the temporary names generated by the compiler.
There are at most three addresses allowed (two for operands and one for result). Hence, this reorientation is called
three-address code.
Implementations of three address code
There are three types of representation used for three address code,
1. Quadruples
2. Triples
3. Indirect triples
Consider the input statement X = -a*b+ -a*b.
Three address code for above statement given in table
t1=-a
t2;=t1*b
t3=-a
t4:=t3*b
t5:=t2+t4
X=t5
Table three address code
Quadruple representation
The quadruple is a structure with at the most four field such as op, arg1,arg2.
The op field is used to represent the internal code for operator.
The arg1 and arg2 represent the two operands.
And result field is used to store the result of an expression.
Statement with unary operators like x=y do not use agr2.
Conditional and unconditional jumps put the target label in result.

Number Op Arg1 Agr2 Result


(0) uminus a t1
(1) * t1 b t2
(2) uminus a t3
(3) * t3 b t4
(4) + t2 t4 t5
(5) ;= t5 x

Triples:
To avoid entering temporary names into the symbol table, we might refer a temporary value by the position of the statement
that computes it.
If we do so, three address statements can be represented by record with only three fields;

Number Op Arg1 Agr2


(0) uminus a
(1) * (0) b
(2) uminus a
(3) * (2) b
(4) + (1) 3
(5) ;= x 4
Indirect Triples
In the indirect triple representation the listing of triples has been done. And listing pointer are used instead of using
statement.
This implementation is called indirect triples.
statement
Number OP Arg1 Arg2
(0) (11)
(0) uminus a
(1) (12)
(1) * (11) b
(2) (13)
(2) uminus a
(3) (14)

Explain code Optimization Technique OR


Q-Explain the machine independent code optimization techniques of compiler.
1.Folding
Compiler time evaluation means shifting of computations from run time to compile time.
In the folding technique the computation of constant is done at compiler time instead of run time.
Example: length=(22/7)*d
Here folding is implied by performing the computation of 22/7 at compile time.
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then computation of 3.14*5*5 is done
during compilation.
2. Common sub expression elimination
Example:
t1:=4*i t1=4*i
t2:=a[t1] t2=a[t1]
t3:=4*j t3=4*j
t4:=4*i t5=n
t5:=n t6=b[t1]+t5
t6:=b[t4]+t5
The common sub expression t4:=4*i is eliminated as its computation is already in t1 and value of i is not been changed
from definition to use.

3. Variable propagation
Variable propagation means of one variable instead of another.
Example:
x=pi; area=x*r*r;
The optimization using variable propagation can be done as follows . area=pi*r*r;
Here the variable x is eliminated. Here the necessary condition is that variable must be assigned to another variable or
some constant.

4. Code movement
There are two basic goals of code movement;
I. To reduce the size of the code.
Ii. To reduce the frequency of execution of code.

Example:
for(i=0;j<=10;i++) x =y*5
{ for(i=0;j<=10;++)
X = y*5; {
K = (y*5)+50 K= x + 50;
}

Loop invariant computation


Loop invariant optimization can be obtained by moving some amount of code outside the loop and placing it just
before entering in the loop.
This method is also called motion.
Example:
While (i<=max-1) N=max -1
{ While(i<=N)
Sum=sum+a[i]; {
} sum=sum+a[i]
}

5. Strength reduction

Strength of certain operators is higher than others.


For instance strength of * is higher than +
In this technique the higher strength operators can be replaced by lower strength operators.
Example;
for(I=1;i<=50;i++)
{
Count=i*7;
}

Here we get the count values as 7,14,21 and so on up to less than 50.
This code can be replaced by using strength reduction as follows
temp=7;
for(i=1;i<=50;i++)
{
Count=temp;
Temp=temp+7;
}

6. Dead code elimination

A variable is said to be live in a program if the value contained into it subsequently used.
On the other hand, the variable is said to be dead at a point in a program if the value contained into it is never been
used. The code containing such a variable supposed to be a dead code. And an optimization can be performed by
eliminating such a dead code.
Example;
I=o;
If(i==1)
{
A=x+5;
}
If statement is a dead code as this condition will never get satisfied hence, statement can be eliminated and optimization can
be done.
Q-Role of Lexical analyzer OR How tokens are recognized by lexical phase? Explain in detail.
Lexicalanalysisisthefirstphaseofacompiler.Ittakesthemodifiedsourcecodefromlanguagepreprocessorsthatarewrittenintheformof
sentences.Thelexicalanalyzerbreaksthesesyntaxesintoaseriesoftokens,byremovinganywhitespaceorcommentsinthesourcecode.

If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works closely with the syntax analyzer. It reads
characterstreamsfromthesourcecode,checksforlegaltokens,andpassesthedatatothesyntaxanalyzerwhenitdemands.

Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be
identifiedasavalid token.Theserulesaredefinedbygrammarrules,bymeansofapattern.Apattern explainswhat canbe atoken,and
thesepatternsaredefinedbymeansofregularexpressions.

Inprogramminglanguage,keywords,constants,identifiers,strings, numbers,operatorsandpunctuationssymbolscanbe consideredas


tokens.

Forexample,inClanguage,thevariabledeclarationline

intvalue=100;

containsthetokens:

int(keyword),value(identifier),=(operator),100(constant) and;(symbol).

SpecificationsofTokens
Letusunderstandhowthelanguagetheoryundertakesthefollowingterms:

Alphabets
Anyfinitesetofsymbols{0,1}isasetofbinaryalphabets,{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}isasetofHexadecimalalphabets,{a-z,A-Z}isa
setofEnglishlanguagealphabets.

Strings
Anyfinitesequenceofalphabetsiscalledastring.Lengthofthestringisthetotalnumberofoccurrenceofalphabets,e.g.,thelengthofthe
string tutorialspointis14 and isdenoted by|tutorialspoint|=14.Astring havingno alphabets,i.e.a stringofzerolengthisknown asan
emptystringandisdenotedbyε(epsilon).

SpecialSymbols
Atypicalhigh-levellanguagecontainsthefollowingsymbols:-

ArithmeticSymbols Addition(+),Subtraction(-), Modulo(%), Multiplication(*),Division(/)

Punctuation Comma(,),Semicolon(;), Dot(.),Arrow(->)

Assignment =

Special Assignment +=,/=,*=, -=

Comparison ==,!=,<,<=,>,>=

Preprocessor #

LocationSpecifier &

Logical &,&&,|, ||,!

ShiftOperator >>,>>>,<<,<<<

Language
Alanguage is considered asa finite set ofstrings oversome finite set of alphabets. Computerlanguages areconsidered as finite sets, and
mathematicallysetoperationscanbeperformedonthem.Finitelanguagescanbedescribedbymeansofregularexpressions.

LongestMatchRule
Whenthelexicalanalyzerreadthesource-code,itscansthecodeletterbyletter;andwhenitencountersawhitespace,operatorsymbol,or
specialsymbols,itdecidesthatawordiscompleted.
Forexample:

intintvalue;

While scanningbothlexemes till ‘ int’ ,thelexicalanalyzercannot determine whetherit isa keyword orthe initialsof identifierint
value.

TheLongestMatchRulestatesthatthelexemescannedshouldbedeterminedbasedonthelongestmatchamongallthetokensavailable.

Thelexicalanalyzeralsofollows rulepriority whereareservedword,e.g.,akeyword,ofalanguageisgivenpriorityoveruserinput.Thatis,if


thelexicalanalyzerfindsalexemethatmatcheswithanyexistingreservedword,itshouldgenerateanerror.

You might also like