0% found this document useful (0 votes)
12 views

Compiler Design - Syntax Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Compiler Design - Syntax Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Compiler Design Chapter 03 : Syntax analysis

CHAPTER 03:
SYNTAX ANALYSIS

The second phase of a compiler is called syntax analysis. The input to this phase consists
of a stream of tokens put out by the lexical analysis phase. They are then checked for proper
syntax, i.e. the compiler checks to ensure the statements and expressions are correctly formed.
Some examples of syntax errors in Java are:

x = (2+3) * 9); // mismatched parentheses

if x>y x = 2; // missing parentheses

while (x==3) do f1(); // invalid keyword do

when the compiler encounters such an error, it should put out an informative message for
the user. At this point, it is not necessary for the compiler to generate an object program. A
compiler is not expected to guess the intended purpose of a program with syntax errors. A good
compiler, however, will continue scanning the input for additional syntax errors.

1. Context free Grammars(CFG)


CFG is used to specify the syntax of a language. A grammar naturally describes the hierarchical
structure of most programming language constructs.
1.1.Formal Definition of Grammars
A context-free grammar has four components:
1. A set of terminal symbols, sometimes referred to as "tokens." The terminals are the
elementary symbols of the language defined by the grammar.
2. A set of nonterminals, sometimes called "syntactic variables." Each non-terminal represents
a set of strings of terminals, in a manner we shall describe.

23
Compiler Design Chapter 03 : Syntax analysis

3. A set of productions, where each production consists of a nonterminal, called the head or left
side of the production, an arrow, and a sequence of terminals and1or nonterminals, called the
body or right side of the production. The intuitive intent of a production is to specify one of the
written forms of a construct; if the head nonterminal represents a construct, then the body
represents a written form of the construct.
4. A designation of one of the nonterminals as the start symbol. Production is for a nonterminal
if the nonterminal is the head of the production. A string of terminals is a sequence of zero or
more terminals. The string of zero terminals, written as E, is called the empty string.
1.2. Derivations
A grammar derives strings by beginning with the start symbol and repeatedly replacing a
nonterminal by the body of a production for that nonterminal. The terminal strings that can be
derived from the start symbol form the language defined by the grammar. Leftmost and
Rightmost Derivation of a String
• Leftmost derivation − A leftmost derivation is obtained by applying production to the
leftmost variable in each step.
Two Left most Derivations for given input String are:

• Rightmost derivation − A rightmost derivation is obtained by applying production to the


rightmost variable in each step.
Example: Let any set of production rules in a CFG be
X → X+X | X*X |X| a over an alphabet {a}.
The leftmost derivation for the string "a+a*a" is:
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The rightmost derivation for the above string "a+a*a" is:
X → X*X → X*a → X+X*a → X+a*a → a+a*a

24
Compiler Design Chapter 03 : Syntax analysis

1.3.Derivation or Yield of a Tree


The derivation or the yield of a parse tree is the final string obtained by concatenating the labels
of the leaves of the tree from left to right, ignoring the Nulls. However, if all the leaves are Null,
derivation is Null. parse tree pictorially shows how the start symbol of a grammar derives a
string in the language. If nonterminal A has a production A → XYZ, then a parse tree may have
an interior node labeled A with three children labeled X, Y, and Z, from left to right:

Given context-free grammar, a parse tree according to the grammar is a tree with the following
properties:
1. The root is labeled by the start symbol.
2. Each leaf is labeled by a terminal or by e.
3. Each interior node is labeled by a nonterminal
If A is the nonterminal labeling of some interior node and X I, Xz, . . . , Xn are the labels of the
children of that node from left to right, then there must be a production A → X1X2... Xn. Here,
X1, X2,. . . , Xn, each stand for a symbol that is either a terminal or a nonterminal. As a special
case, if A → c is a production, then a node labeled A may have a single child labeled E.
2. Parse Tree
A parse tree is a graphical representation of a derivation that filters out the order in which
productions are applied to replace non-terminals.
- Each interior node of a parse tree represents the application of a production.
- All the interior nodes are Non terminals and all the leaf nodes are terminals.
- All the leaf nodes reading from the left to right will be the output of the parse tree.
If a node n is labeled X and has children n1,n2,n3,…nk with labels X1,X2,…Xk respectively,
then there must be a production A->X1X2…Xk in the grammar.
Example1:- Parse tree for the input string - (id + id) using the above Context free Grammar is:

25
Compiler Design Chapter 03 : Syntax analysis

The Following figure shows step by step construction of parse tree using CFG for the parse
tree for the input string - (id + id).

Example2:- Parse tree for the input string id+id*id using the above Context free Grammar is:

3. Ambiguity
A grammar can have more than one parse tree generating a given string of terminals. Such a
grammar is said to be ambiguous. To show that a grammar is ambiguous, all we need to do is
find a terminal string that is the yield of more than one parse tree. Since a string with more than
one parse tree usually has more than one meaning, we need to design unambiguous grammars

26
Compiler Design Chapter 03 : Syntax analysis

for compiling applications, or to use ambiguous grammars with additional rules to resolve the
ambiguities.
Example: this grammar is ambiguous: W= id+id+id

OR

4. Eliminating ambiguous grammar

Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.

4.1. Eliminating left-recursion

Because we try to generate a leftmost derivation by scanning the input from left to right,
grammars of the form A → A x may cause endless recursion. Such grammars are called left-
recursive and they must be transformed if we want to use a top-down parser.

 A grammar is left recursive if, for a non-terminal A, there is a derivation A+ A


 To eliminate direct left recursion replace

1) A → A | with A’ →  A’|

2) A → A1 | A2 | ... | Am | 1 | 2 | ... | n with A → 1B | 2B | ... | nB

B → 1B | 2B | ... | mB | 

4.2. Left-factoring

Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand a
non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.
 Consider S → if E then S else S | if E then S

27
Compiler Design Chapter 03 : Syntax analysis

 Which of the two productions should we use to expand non-terminal S when the next
token is if?
We can solve this problem by factoring out the common part in these rules.
A → 1 | 2 |...| n |  becomes A → B|  B → 1 | 2 |...| n
Consider the grammar , G :
S → iEtS | iEtSeS | a
E→b
Left factored, this grammar becomes
S → iEtSS’ | a
S’ → eS |ε
E→b

28

You might also like