Document 1
Document 1
Document 1
Different
type of translators
A program written in high-level language is called as source code. To convert the source code
into machine code, translators are needed.
A translator takes a program written in source language as input and converts it into a program
in target language as output.
It also detects and reports the error during translation.
Roles of translator are:
• Translating the high-level language program input into an equivalent machine language
program.
• Providing diagnostic messages wherever the programmer violates specification of the high-
level language program.
Compiler
Compiler is a translator which is used to convert programs in high-level language to low-level
language. It translates the entire program and also reports the errors in source program
encountered during the translation.
Interpreter
Interpreter is a translator which is used to convert programs in high-level language to low-level
language. Interpreter translates line by line and reports the error once it encountered during
the translation process.
It directly executes the operations specified in the source program when the input is given by
the user.
It gives better error diagnostics than a compiler.
4 Debugging is hard as the error messages are It stops translation when the first error is
generated after scanning the entire program met. Hence, debugging is easy.
only.
Assembler
Assembler is a translator which is used to translate the assembly language code into machine
language code.
Introduction to Compiler
o A compiler is a translator that converts the high-level language into the machine language.
o High-level language is written by a developer and machine language can be understood
by the processor.
o Compiler is used to show errors to the programmer.
o The main purpose of compiler is to change the code written in one language without
changing the meaning of the program.
o When you execute a program which is written in HLL programming language then it
executes into two parts.
o In the first part, the source program compiled and translated into the object program (low
level language).
o In the second part, object program translated into the target program through the
assembler.
Fig: Execution process of source program in Compiler
Compiler Phases
The compilation process contains the sequence of various phases. Each phase takes source
program in one representation and produces output in another representation. Each phase takes
input from its previous stage.
Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as input. It
reads the source program one character at a time and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens.
Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates
a parse tree as output. In syntax analysis phase, the parser checks that the expression made by the
tokens is syntactically correct or not.
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.
In the intermediate code generation, compiler generates the source code into the intermediate
code. Intermediate code is generated between the high-level language and the machine language.
The intermediate code should be generated in such a way that you can easily translate it into the
target machine code.
Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines of the
code and arranges the sequence of statements in order to speed up the program execution.
Code Generation
Code generation is the final stage of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.
Example:
Formal grammar
o Formal grammar is a set of rules. It is used to identify correct or incorrect strings of tokens
in a language. The formal grammar is represented as G.
o Formal grammar is used to generate all possible strings over the alphabet that is
syntactically correct in the language.
o Formal grammar is used mostly in the syntactic analysis phase (parsing) particularly during
the compilation.
1. G = <V, N, P, S>
Where:
Example:
Production rules:
1. S = bR
2. R = aR
3. R = aB
4. B=b
Through this production we can produce some strings like: bab, baab, baaab etc.
1. δ: Q x ∑ →Q
DFA
DFA stands for Deterministic Finite Automata. Deterministic refers to the uniqueness of the
computation. In DFA, the input character goes to one state only. DFA doesn't accept the null move
that means the DFA cannot change state without any input character.
Example
NDFA
NDFA refer to the Non Deterministic Finite Automata. It is used to transit the any number of states
for a particular input. NDFA accepts the NULL move that means it can change state without
reading the symbols.
NDFA also has five states same as DFA. But NDFA has different transition function.
δ: Q x ∑ →2Q
Example
1. Source Language
2. Target Language
3. Implementation Language
1. Create a compiler SCAA for subset, S of the desired language, L using language "A" and that
compiler runs on machine A.
LEX
o Lex is a program that generates lexical analyzer. It is used with YACC parser generator.
o The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
o It reads the input stream and produces the source code as output through implementing
the lexical analyzer in the C program.
o Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs
the lex.1 program and produces a C program lex.yy.c.
o Finally C compiler runs the lex.yy.c program and produces an object program a.out.
o a.out is lexical analyzer that transforms an input stream into a sequence of tokens.
Lex file format
A Lex program is separated into three sections by %% delimiters. The formal of Lex source is as
follows:
1. { definitions }
2. %%
3. { rules }
4. %%
5. { user subroutines }
Where pi describes the regular expression and action1 describes the actions what action the
lexical analyzer should take when pattern pi matches a lexeme.
User subroutines are auxiliary procedures needed by the actions. The subroutine can be loaded
with the lexical analyzer and compiled separately.
YACC
C Compiler
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than
one rightmost derivative or more than one parse tree for the given input string. If the grammar is
not ambiguous then it is called unambiguous.
Example:
1. S = aSb | SS
2. S=∈
For the string aabb, the above grammar generates two parse trees:
If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the
whole grammar without ambiguity.