Compiler Design - Syntax Analysis
Compiler Design - Syntax Analysis
CHAPTER 03:
SYNTAX ANALYSIS
The second phase of a compiler is called syntax analysis. The input to this phase consists
of a stream of tokens put out by the lexical analysis phase. They are then checked for proper
syntax, i.e. the compiler checks to ensure the statements and expressions are correctly formed.
Some examples of syntax errors in Java are:
when the compiler encounters such an error, it should put out an informative message for
the user. At this point, it is not necessary for the compiler to generate an object program. A
compiler is not expected to guess the intended purpose of a program with syntax errors. A good
compiler, however, will continue scanning the input for additional syntax errors.
23
Compiler Design Chapter 03 : Syntax analysis
3. A set of productions, where each production consists of a nonterminal, called the head or left
side of the production, an arrow, and a sequence of terminals and1or nonterminals, called the
body or right side of the production. The intuitive intent of a production is to specify one of the
written forms of a construct; if the head nonterminal represents a construct, then the body
represents a written form of the construct.
4. A designation of one of the nonterminals as the start symbol. Production is for a nonterminal
if the nonterminal is the head of the production. A string of terminals is a sequence of zero or
more terminals. The string of zero terminals, written as E, is called the empty string.
1.2. Derivations
A grammar derives strings by beginning with the start symbol and repeatedly replacing a
nonterminal by the body of a production for that nonterminal. The terminal strings that can be
derived from the start symbol form the language defined by the grammar. Leftmost and
Rightmost Derivation of a String
• Leftmost derivation − A leftmost derivation is obtained by applying production to the
leftmost variable in each step.
Two Left most Derivations for given input String are:
24
Compiler Design Chapter 03 : Syntax analysis
Given context-free grammar, a parse tree according to the grammar is a tree with the following
properties:
1. The root is labeled by the start symbol.
2. Each leaf is labeled by a terminal or by e.
3. Each interior node is labeled by a nonterminal
If A is the nonterminal labeling of some interior node and X I, Xz, . . . , Xn are the labels of the
children of that node from left to right, then there must be a production A → X1X2... Xn. Here,
X1, X2,. . . , Xn, each stand for a symbol that is either a terminal or a nonterminal. As a special
case, if A → c is a production, then a node labeled A may have a single child labeled E.
2. Parse Tree
A parse tree is a graphical representation of a derivation that filters out the order in which
productions are applied to replace non-terminals.
- Each interior node of a parse tree represents the application of a production.
- All the interior nodes are Non terminals and all the leaf nodes are terminals.
- All the leaf nodes reading from the left to right will be the output of the parse tree.
If a node n is labeled X and has children n1,n2,n3,…nk with labels X1,X2,…Xk respectively,
then there must be a production A->X1X2…Xk in the grammar.
Example1:- Parse tree for the input string - (id + id) using the above Context free Grammar is:
25
Compiler Design Chapter 03 : Syntax analysis
The Following figure shows step by step construction of parse tree using CFG for the parse
tree for the input string - (id + id).
Example2:- Parse tree for the input string id+id*id using the above Context free Grammar is:
3. Ambiguity
A grammar can have more than one parse tree generating a given string of terminals. Such a
grammar is said to be ambiguous. To show that a grammar is ambiguous, all we need to do is
find a terminal string that is the yield of more than one parse tree. Since a string with more than
one parse tree usually has more than one meaning, we need to design unambiguous grammars
26
Compiler Design Chapter 03 : Syntax analysis
for compiling applications, or to use ambiguous grammars with additional rules to resolve the
ambiguities.
Example: this grammar is ambiguous: W= id+id+id
OR
Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.
Because we try to generate a leftmost derivation by scanning the input from left to right,
grammars of the form A → A x may cause endless recursion. Such grammars are called left-
recursive and they must be transformed if we want to use a top-down parser.
1) A → A | with A’ → A’|
2) A → A1 | A2 | ... | Am | 1 | 2 | ... | n with A → 1B | 2B | ... | nB
4.2. Left-factoring
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand a
non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.
Consider S → if E then S else S | if E then S
27
Compiler Design Chapter 03 : Syntax analysis
Which of the two productions should we use to expand non-terminal S when the next
token is if?
We can solve this problem by factoring out the common part in these rules.
A → 1 | 2 |...| n | becomes A → B| B → 1 | 2 |...| n
Consider the grammar , G :
S → iEtS | iEtSeS | a
E→b
Left factored, this grammar becomes
S → iEtSS’ | a
S’ → eS |ε
E→b
28