0% found this document useful (0 votes)
93 views27 pages

SE Compiler Chapter 3-Parser

Uploaded by

mikiberhanu41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views27 pages

SE Compiler Chapter 3-Parser

Uploaded by

mikiberhanu41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Principles of Compiler Design – CENG 2042

Chapter III – Syntax Analysis


Software Engineering Department, School of Computing
Ethiopian Institute of Technology – Mekelle (EiT-M), Mekelle University

3. Syntax Analysis

Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall learn the basic
concepts used in the construction of a parser.

We have seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern
rules. But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the
regular expressions. Regular expressions cannot check balancing tokens, such as parenthesis. Therefore,
this phase uses context-free grammar (CFG), which is recognized by push-down automata.

CFG, on the other hand, is a superset of Regular Grammar, as depicted below:

It implies that every Regular Grammar is also context-free, but there exists some problems, which are
beyond the scope of Regular Grammar. CFG is a helpful tool in describing the syntax of programming
languages.

Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The parser
analyzes the source code (token stream) against the production rules to detect any errors in the code. The
output of this phase is a parse tree.

Ins: Fkrezgy Yohannes Compiler Design 1|Page


This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors, and generating a
parse tree as the output of the phase.

Parsers are expected to parse the whole code even if some errors exist in the program. Parsers use error
recovering strategies, which we will learn later in this chapter.

3.1. Context-Free Grammar

In this section, we will first see the definition of context-free grammar and introduce terminologies used in
parsing technology.
A context-free grammar has four components:
· A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings. The
non-terminals define sets of strings that help define the language generated by the grammar.
· A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which
strings are formed.
· A set of productions (P). The productions of a grammar specify the manner in which the terminals
and non-terminals can be combined to form strings. Each production consists of a non-terminal
called the left side of the production, an arrow, and a sequence of tokens and/or on- terminals,
called the right side of the production.
· One of the non-terminals is designated as the start symbol (S); from where the production begins.

The strings are derived from the start symbol by repeatedly replacing a non-terminal (initially the start
symbol) by the right side of a production, for that non-terminal.

Example
We take the problem of palindrome language, which cannot be described by means of Regular Expression.
That is, L = {w | w = wR} is not a regular language. But it can be described by means of CFG, as illustrated
below:

G = (V, Σ, P, S)

Where:
V = {Q, Z, N}
Σ = {0, 1}
P = {Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1}
S = {Q}

Ins: Fkrezgy Yohannes Compiler Design 2|Page


This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc.

Derivation
A derivation is basically a sequence of production rules, in order to get the input string. During parsing, we
take two decisions for some sentential form of input:
· Deciding the non-terminal which is to be replaced.
· Deciding the production rule, by which, the non-terminal will be replaced.

At each point in this derivation process, the string is a collection of terminal or nonterminal symbols. Such
a string is called a sentential form if it occurs in some step of a valid derivation. Any sentential form can
be derived from the start symbol in zero or more steps. Similarly, from any sentential form we can derive a
valid sentence in zero or more steps.

To decide which non-terminal to be replaced with production rule, we can have two options.

Left-most Derivation
If the sentential form of an input is scanned and replaced from left to right, it is called left-most derivation.
The sentential form derived by the left-most derivation is called the left-sentential form.

Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is known as right-most
derivation. The sentential form derived from the right-most derivation is called the right-sentential form.

Example
Production rules:
E → E + E
E → E * E
E → id

Input string: id + id * id
The left-most derivation is:
E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id

Notice that the left-most side non-terminal is always processed first.


The right-most derivation is:
E → E + E
E → E + E * E
E → E + E * id
E → E + id * id
E → id + id * id

Parse Tree
A parse tree is a graphical depiction of a derivation. It is convenient to see how strings are derived from the
start symbol. The start symbol of the derivation becomes the root of the parse tree. Let us see this by an
example from the last topic.

We take the left-most derivation of a + b * c


The left-most derivation is:

Ins: Fkrezgy Yohannes Compiler Design 3|Page


E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id

Step 1:

Step 2:

Step 3:

Step 4:

Ins: Fkrezgy Yohannes Compiler Design 4|Page


Step 5:

In a parse tree:
· All leaf nodes are terminals.
· All interior nodes are non-terminals.
· In-order traversal gives original input string.

A parse tree depicts associativity and precedence of operators. The deepest sub-tree is traversed first,
therefore the operator in that sub-tree gets precedence over the operator which is in the parent nodes.

Ambiguity
A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation) for at least
one string.

Example
E → E + E

Ins: Fkrezgy Yohannes Compiler Design 5|Page


E → E – E
E → id
For the string id + id – id, the above grammar generates two parse trees:

The language generated by an ambiguous grammar is said to be inherently ambiguous. Ambiguity in


grammar is not good for a compiler construction. No method can detect and remove ambiguity
automatically, but it can be removed by either re-writing the whole grammar without ambiguity, or by
setting and following associativity and precedence constraints.

Associativity
If an operand has operators on both sides, the side on which the operator takes this operand is decided by
the associativity of those operators. If the operation is left-associative, then the operand will be taken by the
left operator; or if the operation is right-associative, the right operator will take the operand.

Example
Operations such as Addition, Multiplication, Subtraction, and Division are left associative. If the
expression contains:

id op id op id

it will be evaluated as:

(id op id) op id

For example, (id + id) + id

Operations like Exponentiation are right associative, i.e., the order of evaluation in the same expression
will be:

id op (id op id)

For example, id ^ (id ^ id)

Precedence

Ins: Fkrezgy Yohannes Compiler Design 6|Page


If two different operators share a common operand, the precedence of operators decides which will take the
operand. That is, 2+3*4 can have two different parse trees, one corresponding to (2+3)*4 and another
corresponding to 2+(3*4). By setting precedence among operators, this problem can be easily removed. As
in the previous example, mathematically * (multiplication) has precedence over + (addition), so the
expression 2+3*4 will always be interpreted as:

2 + (3 * 4)

These methods decrease the chances of ambiguity in a language or its grammar.

Left Recursion
A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’ itself as the
left-most symbol. Left-recursive grammar is considered to be a problematic situation for top-down parsers.
Top-down parsers start parsing from the Start symbol, which in itself is non-terminal. So, when the parser
encounters the same non-terminal in its derivation, it becomes hard for it to judge when to stop parsing the
left non-terminal and it goes into an infinite loop.

Example:

(1) A => Aα | β
(2) S => Aα | β
A => Sd

(1) is an example of immediate left recursion, where A is any non-terminal symbol and α represents a
string of non-terminals.
(2) is an example of indirect-left recursion.

A top-down parser will first parse A, which in-turn will yield a string consisting of A itself and the parser
may go into a loop forever.

Removal of Left Recursion


One way to remove left recursion is to use the following technique:
The production
A => Aα | β
is converted into following productions
A => βA' A' => αA' | ε

Ins: Fkrezgy Yohannes Compiler Design 7|Page


This does not impact the strings derived from the grammar, but it removes immediate left recursion.
Second method is to use the following algorithm, which should eliminate all direct and indirect left
recursions.
Algorithm

START
Arrange non-terminals in some order like A1, A2, A3,…, An
for each i from 1 to n
{
for each j from 1 to i-1
{
replace each production of form Ai⟹Aj
with Ai ⟹ δ1 | δ2 | δ3 |…| 
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion
END

Example
The production set
S => Aα | β
A => Sd

after applying the above algorithm, should become

S => Aα | β
A => Aαd | βd

and then, remove immediate left recursion using the first technique.

A => βdA' A' => αdA' | ε

Now none of the production has either direct or indirect left recursion.

Left Factoring
If more than one grammar production rules have a common prefix string, then the top-down parser cannot
make a choice as to which of the production it should take to parse the string in hand.

Example
If a top-down parser encounters a production like
A ⟹ αβ | α | …

Then it cannot determine which production to follow to parse the string, as both productions are starting
from the same terminal (or non-terminal). To remove this confusion, we use a technique called left
factoring.

Left factoring transforms the grammar to make it useful for top-down parsers. In this technique, we make
one production for each common prefixes and the rest of the derivation is added by new productions.

Ins: Fkrezgy Yohannes Compiler Design 8|Page


Example
The above productions can be written as

A => αA’
A’=> β |  | …

Now the parser has only one production per prefix which makes it easier to take decisions.

Limitations of Syntax Analyzers

Syntax analyzers receive their inputs, in the form of tokens, from lexical analyzers. Lexical analyzers are
responsible for the validity of a token supplied by the syntax analyzer. Syntax analyzers have the following
drawbacks:
· It cannot determine if a token is valid,
· It cannot determine if a token is declared before it is being used,
· It cannot determine if a token is initialized before it is being used,
· It cannot determine if an operation performed on a token type is valid or not.

These tasks are accomplished by the semantic analyzer, which we shall study in Semantic Analysis.

Types of Parsing
Syntax analyzers follow production rules defined by means of context-free grammar. The way the
production rules are implemented (derivation) divides parsing into two types: top-down parsing and
bottom-up parsing.

Top-down Parsing
When the parser starts constructing the parse tree from the start symbol and then tries to transform the start
symbol to the input, it is called top-down parsing.
· Recursive descent parsing: It is a common form of top-down parsing. It is called recursive, as it
uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
· Backtracking: It means, if one derivation of a production fails, the syntax analyzer restarts the
process using different rules of same production. This technique may process the input string more
than once to determine the right production.

Bottom-up Parsing
As the name suggests, bottom-up parsing starts with the input symbols and tries to construct the parse tree
up to the start symbol.

Example:
Input string: a + b * c
Production rules:

Ins: Fkrezgy Yohannes Compiler Design 9|Page


S → E
E → E + T
E → E * T
E → T
T → id

Let us start bottom-up parsing.


a + b * c

Read the input and check if any production matches with the input:

a + b * c
T + b * c
E + b * c
E + T * c
E * c
E * T
E
S

Top-Down parsing

We have learnt that the top-down parsing technique parses the input, and starts constructing a parse tree
from the root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted
below:

Recursive Descent Parsing


Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input
is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing
technique recursively parses the input to make a parse tree, which may or may not require back-tracking.
But the grammar associated with it (if not left factored) cannot avoid back-tracking. A form of recursive-
descent parsing that does not require any back-tracking is known as predictive parsing.

This parsing technique is regarded recursive, as it uses context-free grammar which is recursive in nature.

Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against the production
rules to replace them (if matched). To understand this, take the following example of CFG:

Ins: Fkrezgy Yohannes Compiler Design 10 | P a g e


S → rXd | rZd
X → oa | ea
Z → ai

For an input string: read, a top-down parser, will behave like this:

It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e.
‘r’. The very production of S (S → rXd) matches with it. So the top-down parser advances to the next input
letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production from the left (X →
oa). It does not match with the next input symbol. So the top-down parser backtracks to obtain the next
production rule of X, (X → ea).

Now the parser matches all the input letters in an ordered manner. The string is accepted.

Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which production is to be
used to replace the input string. The predictive parser does not suffer from backtracking.

To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next input
symbols. To make the parser back-tracking free, the predictive parser puts some constraints on the
grammar and accepts only a class of grammar known as LL(k) grammar.

Ins: Fkrezgy Yohannes Compiler Design 11 | P a g e


Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree. Both the
stack and the input contains an end symbol $ to denote that the stack is empty and the input is consumed.
The parser refers to the parsing table to take any decision on the input and stack element combination.

· Input buffer: our string to be parsed. We will assume that its end is marked with a special symbol $.
· Output: a production rule representing a step of derivation sequence (left-most derivation) of the string
in the input buffer.
· Stack: contains the grammar symbols
o At the bottom of the stack, there is a special end mark symbol $.
o Initially the stack contains only the symbol $ and the starting symbol S ($S ← initial stack)
o When the stack is emptied (i.e. only $ left in the stack), the parsing is completed.
· Parsing Table:
o A two-dimensional array M[A, a]
o Each row is a non-terminal symbol.
o Each column is a terminal symbol or the special symbol $.
o Each entry holds a production rule.

In recursive descent parsing, the parser may have more than one production to choose from for a single
instance of input; whereas in predictive parser, each step has at most one production to choose. There
might be instances where there is no production matching the input string, making the parsing procedure to
fail.

LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some
restrictions to get the simplified version, in order to achieve easy implementation. LL grammar can be
implemented by means of both algorithms, namely, recursive-descent or table-driven.

LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in
LL(k) stands for left-most derivation and k itself represents the number of look aheads. Generally k = 1, so
LL(k) may also be written as LL(1).

Ins: Fkrezgy Yohannes Compiler Design 12 | P a g e


LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows exponentially with
the value of k. Secondly, if a given grammar is not LL(1), then usually, it is not LL(k), for any given k.

Implementation of predictive parser:


1. Elimination of left recursion, left factoring and ambiguous grammar.
2. Construct FIRST() and FOLLOW() for all non-terminals.
3. Construct predictive parsing table.
4. Parse the given input string using stack and parsing table.

FIRST and FOLLOW


The construction of a predictive parser is aided by two functions associated with grammar G. These
functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing table for G, whenever
possible. Sets of token yielded by the FOLLOW function can also be used as synchronizing token during
panic-mode error recovery.

FIRST (α) = is the set of terminals that can begin strings derived from α.
FOLLOW (α) = is the set of terminals that can immediately follow X. That is, t ∈ FOLLOW(X) if there is
any derivation containing Xt. This can occur if the derivation contains X YZt where Y and Z both derive ε.

Rules to computing the Function FIRST


To compute FIRST (X) for all grammar symbols X apply the following rules until no more terminals or ε
can be added to any FIRST set:

1. If X is terminal, then FIRST (X) is {X}


2. If X → ε is a production, then add ε to the FIRST(X)
3. IF X is a non-terminal and X → aA is a production then add a to FIRST(X)
4. If X is non-terminal and X → Y1Y2…Yk is a production, then place a in FIRST (X) if for some I, a
is in FIRST(Yi), and ε is in all of FIRST(Y1), …., FIRST(Yi-1); that is, Y1Y2…Yi-1 => ε. If ε is in
FIRST(Yj) for all j= 1,2,…,k then add ε to FIRST(X).

Rules to computing the Function FOLLOW


To compute FOLLOW (X) for all non-terminals X apply the following rules until nothing can be added to
any FOLLOW set:
1. If S is the start symbol, then FOLLOW(S) contains $.
2. If there is a production A → αXβ, the everything in FIRST(β) except ε is placed in FOLLOW(X).
3. If there is a production A→ αX, or a production A → αXβ where FIRST(β) contains ε, then
everything is FOLLOW(A) is in FOLLOW(X).

Algorithm for construction of predictive parsing table:


Input : Grammar G

Ins: Fkrezgy Yohannes Compiler Design 13 | P a g e


Output : Parsing table M
Method :

1. For each production A → α of the grammar, do steps 2 and 3.


2. For each terminal a in FIRST(α), add A → α to M[A, a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is in
FIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].
4. Make each undefined entry of M be error.

Algorithm for parsing a given input string using LL(1)

Parser Actions: The symbol at the top of the stack (say X) and the current symbol in the input string (say
a) determine the parser action.
• There are four possible parser actions:
1. If X and a are $ → parser halts (Successful completion)
2. If X and a are the same terminal symbol (different from $) → parser pops X from the stack, and
moves the next symbol in the input buffer.
3. If X is a non-terminal → parser looks at the parsing table entry M[X,a] holds a production rule X
→ Y1Y2…Yk, it pops X from the stack and pushes Yk, Yk-1,…, Y1 into the stack. The parser also
outputs the production rule X → Y1Y2…Yk to represent a step of the derivation.
4. None of the above → error
• All empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error case.

Example:

Consider the following grammar:


E → E+T | T
T → T*F | F
F → (E) | id

After eliminating left-recursion the grammar is

E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id

FIRST and FOLLOW

FIRST FOLLOW
E { ( , id} { $, ) }
E’ {+ , ε } { $, ) }
T { ( , id} { +, $, ) }
T’ {*, ε } { +, $, ) }
F { ( , id } {+, * , $ , ) }

Parsing Table

Ins: Fkrezgy Yohannes Compiler Design 14 | P a g e


Parsing a given input id+id*id$

LL(1) grammar:
The parsing table entries are single entries. So each location has not more than one entry. This type of
grammar is called LL(1) grammar.

Consider this following grammar:


S → iEtS | iEtSeS | a
E→b

Ins: Fkrezgy Yohannes Compiler Design 15 | P a g e


After eliminating left factoring, we have
S → iEtSS’ | a
S’→ eS | ε
E→b

To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.

FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}

Parsing Table:

Since there are more than one production, the grammar is not LL(1) grammar.

Ins: Fkrezgy Yohannes Compiler Design 16 | P a g e


Bottom-Up Parsing

Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the root
node. Here, we start from a sentence and then apply production rules in reverse manner in order to reach
the start symbol. The image given below depicts the bottom-up parsers available.

Shift-Reduce Parsing
Shift-reduce parsing use two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
· Shift step: The shift step refers to the advancement of the input pointer to the next input symbol,
which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is
treated as a single node of the parse tree.
· Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is
known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP
function is performed on the stack which pops off the handle and replaces it with LHS non-terminal
symbol.

Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top).

Example:
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.

Reduction (leftmost) Rightmost derivation


abbcde (A→b) S →aABe
aAbcde (A→Abc) →aAde
aAde (B→d) →aAbcde
aABe (S→aABe) →abbcde
S

The reductions trace out the right-most derivation in reverse.

Ins: Fkrezgy Yohannes Compiler Design 17 | P a g e


Handles: A handle of a string is a substring that matches the right side of production, and whose reduction
to the non-terminal on the left side of the production represents one step along the reverse of a rightmost
derivation.

Example: Consider the grammar:

E→E+E
E→E*E
E → (E)
E → id

And the input string id1 + id2 * id3


E→E+E
→E+E*E
→ E + E * id3
→ E + id2 * id3
→ id1 + id2 * id3

In the above derivation the underlined substrings are called handles.

Handle Pruning:

Stack Implementation of Shift Reduce parsing


There are four possible actions of a shift-reduce parse actions:

· Shift: The next input symbol is shifted onto the top of the stack.
· Reduce: Replace the handle on the top of the stack by the non-terminal.
· Accept: Successful completion of parsing.
· Error: Parser discovers a syntax error, and calls an error recovery routine.

Initial stack just contains only the end-marker $.


The end of the input string is marked by the end-marker $.

Example: consider the following grammar:

E→E+E
E→E*E
E → (E)
E → id

And input string: id1 + id2 * id3$

Ins: Fkrezgy Yohannes Compiler Design 18 | P a g e


Conflicts in Shift-Reduce parsing:
There are two conflicts that occur in shift-reduce parsing:
1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

Types of Shift Reduce Parsing: There are two main categories of shift-reduce parsers.
1. Operator-Precedence Parser: Simple, but only a small class of grammar.
2. LR-Parsers: Covers wide range of grammars.
a. SLR – Simple LR parser
b. CLR – Most general LR parse (Canonical LR)
c. LALR – Intermediate LR parser (Look Ahead LR)
SLR, CLR and LALR work same, only their parsing tables are different.

Ins: Fkrezgy Yohannes Compiler Design 19 | P a g e


Operator precedence Parser
Operator grammar have the property that no production right side is empty or has two adjacent
nonterminals. This property enables the implementation of efficient Operator-precedence parsers.

Examples:
E →AB E →EOE E →E+E
A →a E →id E →E*E
B →b O →+|*|/ E →E/E | id
Not operator grammar Not operator grammar Operator grammar

Precedence Relations
These parsers rely on the following three precedence relations:

The determination of correct precedence relations between terminals are based on the traditional notions of
associativity and precedence of operators. (unary minus causes a problem). The intention of the precedence
relations is to find the handle of right-sentential form,
<• with marking the left end,
=• appearing in the interior of the handle, and
•> marking the right hand.

Precedence Relations: From Associativity and Precedence


1. If operator θ1 has higher precedence than operator θ2, make θ1•> θ2 and θ2 <• θ1.
For Example, if * has higher precedence than +, make * •> + and + <• *. These relations ensure that,
in an expression of the form E + E * E + E, the central E*E is the handle that will be reduced first.

2. If operator θ1 and θ2 are operators of equal precedence (they may in fact be the same operator), then
make θ1 •> θ2 and θ2 •> θ1 if the operators are left-associative, or make. θ1 <• θ2 and θ2 <• θ1 if they
are right-associative.
For Example, if + and – are left-associative, then make + •> +, + •> -, - •> -, - •> +.

3. For all operator θ: θ <• id, id •> θ, θ <• (, (<• θ, ) •> θ, θ •>), θ •> $, and $ <• θ
4. For all operators θ. Also, let
( =• ) $ <• ( $ <• id
( <• ( id •> $ ) •> $
(<• id id •> ) ) •> )
• NB: id has higher precedence than any other symbol
$ has lowest precedence

Using Precedence relations to find Handles


In our input string $a1a2…an$, we insert the precedence relation between the pairs of terminals (the
precedence relation holds between the terminals in that pair)

Having precedence relations allows to identify handles as follows:

Ins: Fkrezgy Yohannes Compiler Design 20 | P a g e


· Scan the string from left until •>
· Scan backwards the strings from right to left until seeing <•
· Everything between the two relations <• and •> forms the handle.

Operator-Precedence Parsing Algorithm

The input string is w$, the initial stack is $ and a table holds precedence relations between certain
terminals.

Set p to point to the first symbol of w$

Repeat forever
if ($ is on top of the stack and p points to $) then return
else
{
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by p;
if (a <• b or a =• b) then
{ /* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if (a •> b ) then /* REDUCE*/
repeat pop stack until ( the top of stack terminal is related by <• to the terminal most
recently popped);
else
error();
}

Example: Consider the following grammar


E → E*E | E+E | id
• Construct the operator-precedence relation and parse the string id + id * id

Solution: let’s first build precedence relation

id + * $
id •> •> •>
+ <• •> <• •>
* <• •> •> •>
$ <• <• <•

Parsing the input: $<•id1•>+<•id2•>*<•id3•>$


Stack Input Action
$ id+id*id$ $ <• id Shift
$id +id*id$ id •> + Reduce E → id
$ +id*id$ shift
$+ id*id$ Shift
$+id *id$ id •> * Reduce E → id
$+ *id$ shift
$+* id$ shift

Ins: Fkrezgy Yohannes Compiler Design 21 | P a g e


$+*id $ id •> $ Reduce E → id
$+* $ * •> $ Reduce E → E * E
$+ $ + •> $ Reduce E → E + E
$ $ Accept

Advantages of Operator PrecedenceParsing:


1. It is easy to implement
2. Once an operator precedence relation is made beteen all pairs of terminals a grammar, the grammar
can be ignored. The grammar is not referred anymore during implementation.

Disadvantage of Operator PrecedenceParsing:


1. It is hard to handle tokens like the minus sign (-) which has two different precedence.
2. Only a small class of grammar can be parsed using operator precedence parser.

LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free
grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k)
parsers, where L stands for left-to-right scanning of the input stream; R stands for the construction of right-
most derivation in reverse, and k denotes the number of lookahead symbols to make decisions.

Advantages of LR parsing:

· It recognizes virtually all programming language constructs for which CFG can be written.
· It is efficient non-backtracking shift-reduce parsing method.
· A grammar that can be parsed using LR method is a proper superset of a grammar that can be
parsed with predictive parser.

Drawbacks of LR parsing:

· It is too much of work to construct a LR parser by hand for programming language grammar. A
specialized tool, called LR parser generator, is needed. Example: YACC/BISON.

There are three widely used algorithms available for constructing an LR parser:

· SLR(1) – Simple LR Parser:


o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction
· LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction
· LALR(1) – Look-Ahead LR Parser:
o Works on intermediate size of grammar
o Number of states are same as in SLR(1)

Ins: Fkrezgy Yohannes Compiler Design 22 | P a g e


LR parsing configuration

It consists of: an Input, an output, a stack, a driver program, and a parsing table that has two part (action
and goto)
· The driver program is the same for all LR parser.
· The parsing program reads characters from an input buffer one at a time.
· The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on top. Each
Xi is a grammar symbol and each si is a state.
· The parsing table consists of two parts: action and goto functions.

Items and the LR(0) Automaton


How does a shift-reduce parser know when to shift and when to reduce? For example, with stack contents $
T and next input symbol * , how does the parser know that T on the top of the stack is not a handle, so the
appropriate action is to shift and not to reduce T → EP.

An LR parser makes shift-reduce decisions by maintaining states to keep track of where we are in a parse.
States represent sets of "items." An LR(0) item (item for short) of a grammar G is a production of G with a
dot at some position of the body. Thus, production A → XYZ yields the four items:

A → •XYZ means parser is ready to scan.


A → X•YZ means parser has scanned X and ready to scan YZ.
A → XY•Z means parse has scanned X and Y and ready to scan Z.
A → XYZ• means parser is ready to detect handle.

One collection of sets of LR(0) items, called the canonical LR(0) collection, provides the basis for
constructing a deterministic finite automaton that is used to make parsing decisions. Such an automaton is
called an LR(0) automaton. In particular, each state of the LR(0) automaton represents a set of items in
the canonical LR(0) collection.

Ins: Fkrezgy Yohannes Compiler Design 23 | P a g e


To construct the canonical LR(0) collection for a grammar, we define an augmented grammar and two
functions, CLOSURE and GOTO. If G is a grammar with start symbol S, then G', the augmented grammar
for G, is G with a new start symbol S’ and production S' —> S. The purpose of this new starting production
is to indicate to the parser when it should stop parsing and announce acceptance of the input. That is,
acceptance occurs when and only when the parser is about to reduce by S' → •S.

Closure of Item Sets


If I is a set of items for a grammar G, then CLOSURE(I) is the set of items constructed from / by the two
rules:
1. Initially, add every item in / to CLOSURE().
2. For any item in CLOSURE(), A →α•Bβ where the next symbol after the dot is a non-terminal, add
the production rules of that sysmbol where the dot is before the first item.
3. Repeat (2) and (3) for any new item added under (2)

Example: G: S → E + S | E
E →num

Closure(S' → •S) = { S' → •S, S→•E + S, S→•E, E→•num}

The Function GOTO


The second useful function is GOTO (I, X) where I is a set of items and X is a grammar symbol.
Intuitively, the GOTO function is used to define the transitions in the LR(0) automaton for a grammar. The
states of the automaton correspond to sets of items, and GOTO(7,X) specifies the transition from the state
for I under input X.

LR(0) parsing: Building the DFA


a. Augment the grammar with a new state (I0)
S’ → S $
b. Build the node for state I0 :
i. Create an item from the rule introduced in (a), with the dot before the first item.
ii. Build the closure() for S’ → •S $
c. For each unique symbol which follows a dot in the closure (I0):
i. Create a new state representing the recognition of the symbol.
ii. Link the original state to the new state an arc labeled with the recognized symbol.
iii. Add a DFA node for this states.
iv. The closure() of the new states is calculated as follows:
a) It contains only those items from the previous state where the recognized symbol was the symbol
after dot.
b) For each of these items, move the dot AFTER the recognized symbol.
c) Where the new next symbol is a non terminal, add the closure of these items to the closure.
v. Where the closure includes an item with dot at end, place a double circle around the state (some
actions from this state cause a REDUCE).
d. Repeat (c) for each of the new states, Except: where a new state which has the same closure as an
existing one, link to that state instead.

LR(0) Parser: Parsing Input String


Initially we have:
1. The input vector: a vector of the input tokens, terminated by tokens $
2. Next token pointer: pointer to the next input token to be processed, initially pointing at the first token.
3. The Stack: initially we place I0
Recursively:

Ins: Fkrezgy Yohannes Compiler Design 24 | P a g e


1. Apply action: Apply the action given in the action column for the current state (the topmost state in the
stack).
a. If shift, move the next token and state S onto the stack, then move the pointer to the next token.
b. If reduce, look up the rule given in the action, and remove n*2 items from the stack (where n is the
number of symbols on the RHS of the rule). Then place the LHS of the production on the stack.
c. If accept, then finish parsing, we have a finished analysis.

2. Goto State: The top of the stack now contains a state and a symbol (terminal or non-terminals). In the
Goto table, look up the row for the state, and the column symbol.
a. If there is a number there, put this number on the top of the stack (it is the new state number)
b. If there is no Goto, then return ERROR ( the input cannot be parsed by the grammar)

Structure of the parse Table:


1. Each table row is indexed by state
· Each state corresponds to a top of stack configuration.
2. Each column is index by a symbol
· Terminals and non-terminals are both valid
3. Entries are exactly one of:
· Shift<state>
· Reduce<state>
· Accept

Action GOTO

State id * + - $ E T A

I0

I1

I2

I3

Example: Consider the following grammar G:

0. S’ → S
1. S→(L)
2. S→x
3. L→S
4. L→L,S

a. Draw LR(0) DFA States.


b. Draw the parse table
c. Parse the input (x , x)

Ins: Fkrezgy Yohannes Compiler Design 25 | P a g e


Solution:
a) LR(0) DFA states

b) LR(0) Parse Table

c) Parsing input (x , x)

Stack Input Action Output


1 (x , x)$ S3
1(3 x , x)$ S2
1(3X2 , x)$ R2 S→x
1(3S7 , x)$ R3 L→S
1(3L5 , x)$ S8
1(3L5,8 x)$ S2
1(3L5,8x2 )$ R2 S→x
1(3L5,8S9 )$ R4 L→L,S

Ins: Fkrezgy Yohannes Compiler Design 26 | P a g e


1(3L5 )$ S6
1(3L5)6 $ R1 S→(L)
1S4 $ Accept (a)

LL Vs LR

LL LR
Does a leftmost derivation. Does a rightmost derivation in reverse.
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Ends when the stack is empty. Starts with an empty stack.
Uses the stack for designating what is still to be Uses the stack for designating what is already seen.
expected.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the stack, and Tries to recognize a right hand side on the stack,
pushes the corresponding right hand side. pops it, and pushes the corresponding nonterminal.
Expands the non-terminals. Reduces the non-terminals.
Reads the terminals when it pops one off the stack. Reads the terminals while it pushes them on the
stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.

Ins: Fkrezgy Yohannes Compiler Design 27 | P a g e

You might also like