0% found this document useful (0 votes)
9 views

Lec4 SyntaxAnalysis

This document provides an introduction to parsing and context-free grammars. It discusses how a parser uses the grammar of a language to verify that a string of tokens can be generated by the grammar, report any syntax errors, and construct a parse tree representation. It defines key parsing concepts like context-free grammars, productions, terminals, non-terminals, derivations, parse trees, left-most and right-most derivations, ambiguity, and dealing with ambiguity in grammars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lec4 SyntaxAnalysis

This document provides an introduction to parsing and context-free grammars. It discusses how a parser uses the grammar of a language to verify that a string of tokens can be generated by the grammar, report any syntax errors, and construct a parse tree representation. It defines key parsing concepts like context-free grammars, productions, terminals, non-terminals, derivations, parse trees, left-most and right-most derivations, ambiguity, and dealing with ambiguity in grammars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to Parsing

Ambiguity and Syntax Errors


Outline

• Parser overview

• Context-free grammars (CFG’s)

• Derivations

• Ambiguity

• Syntax errors

2
What is the job of Syntax Analysis?
• Syntax Analysis is also called Parsing or Hierarchical
Analysis.
• A Parser implements grammar of the language may it be java,
C, C++ etc
• The parser obtains a string of tokens from the lexical analyzer
and :
• verifies that the string can be generated by the
grammar for the source language
• Reports any syntax errors in the program
• Constructs a parse tree representation of the
program
• usually calls the lexical analyzer to supply a token to it
when necessary
• The grammar that a parser implements is called a Context
The Functionality of theParser

• Input: sequence of tokens from lexer

• Output: parse tree of the program


Comparison with Lexical Analysis:
Phase Input Output
Lexer Sequence of Sequence of
characters tokens
Parser Sequence of Parse tree
tokens
4
Example

• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID =
INT;
• Possible parser output

IF-THEN-ELSE
== = =

I I I IN I IN
D D D T D T
5
The Role of the Parser
• Not all sequences of tokens are programs ...
• Parser must distinguish between valid and
invalid sequences of tokens
• The Role is:
1. To check syntax (= string recognizer)
And to report syntax errors accurately
2. To invoke semantic actions
For static semantics checking, e.g. type checking of
expressions, functions, etc.
• We need
– A language for describing valid sequences of
tokens
– A method for distinguishing valid from invalid
6
What is the difference between Syntax and Semantic?

 Syntax is the way in which we construct sentences


by following principles and rules.

 Semantics is the interpretations of and meanings


derived from the sentence transmission and
understanding of the message or in other words
are the logical sentences making sense or not
Context-Free Grammars
• Many programming language constructs have a
recursive structure

• A STMT is of the form


if COND then STMT else STMT , or
while COND do STMT , or

• Context-free grammars are a natural notation
for this recursive structure

8
CFGs (Cont.)
 A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

Assuming X  N the productions are of the form


X , or
X  Y1 Y2 ... Yn where Yi  N T

10
Notational Conventions
Terminals: Lower case letters, operator symbols,
punctuation symbols, digits, boldface strings are all
terminals

Non Terminals: Upper case letters, lower case italic


names are usually non terminals
• Greek letters such as ,, represent strings of
grammars symbols. Thus a generic production can
be written as A 
• The start symbol is the left-hand side of the first
production

10
Examples of CFGs

A fragment of our example language


(simplified):

STMT  if COND then STMT else STMT


| while COND do STMT
| id = int

11
Examples of CFGs (cont.)
Grammar for simple arithmetic expressions:

E  E*
E
|E+E
|(E)
|id

12
The Language of a CFG
Read productions as replacement rules:

X  Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X
Means X can be erased (replaced with empty string)

13
Key Idea

(1) Begin with a string consisting of the start


symbol “S”
(2) Replace any non-terminal X in the string
by a right-hand side of some production
X  Y1 . . Y n
(3) Repeat (2) until there are no non-terminals
in the string

14
The Language of a CFG (Cont.)
More formally, we write
X1 . . X i . . X n  X1 ..X i  1 Y 1 . . Y m Xi1 . . X n

if there is a production
Xi  Y1 . . Y m
Write
X1 . . X n Y1 . . Y m
if
X1 . . X n .. .. 
Y1 . . Y m
15
The Language of a CFG
Let G be a context-free grammar with start
symbol S. Then the language of G is:

{a1 . . a n  | S  a1 . . a n and every ai is a


terminal }
Terminals

• Terminals are called so because there are no


rules for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of the
language 16
Examples

L(G) is the language of the CFG G

Strings of balanced parentheses


( ) i i
|i
Two grammars:

S  (S ) S 0(S)
or
S   | 

20
Example

A fragment of our example language (simplified :

STMT  if COND then STMT


|if COND then STMT else STMT
|while COND do STMT
|id = int
COND  (id == id)
|
(id != id)

18
Arithmetic Example

Simple arithmetic expressions:

E  E+E | E E | (E) |
id
Some elements
id of the id
language:
+ id
(id) id 
(id) id id
 (id)
 19
Derivations and Parse Trees

A derivation is a sequence of productions

S.. .. ..


A derivation can be drawn as a tree
– Start symbol is the tree’s root
– For a production
X  Y1 . . Y n add
children Y1 . . Y n to node X

20
Derivation Example
• Grammar

E  E+E | E E | (E) | id
• String id  id + id E
E
 E+E
E + E
 E  E+E
 id E + E E * id
 id id + E E
 id id + id id id
21
Derivation Example
• Grammar

E  E+E | E E | (E) | id
• String id  id + id
Notes on Derivations
• A parse tree has
– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the


original input

• The parse tree shows the association of


operations, the input string does not

23
Left-most and Right-most
Derivations
E • What was shown before
was a left-most derivation
 E+E – At each step, replace the
 left-most non-terminal
E+id
 E  E + id • There is an equivalent
 E id + id notion of a right-

most derivation
id id + id – Shown on the right

24
Right-most Derivation in Detail
E E E

E  E+E
E + E

E
E
 E+E E + E

 E+id
id

25
Right-most Derivation in Detail (2)
E E
 E+E
 E+id E + E

 E  E + id
E * id
 E id + id E
 id id + id id
• Note that right-most and left-most
id
derivations have the same parse tree
• The difference is just in the order in
which branches are added
26
Ambiguity
• Grammar:
EE+E|E* E | (E)|
int

• The string int * int + int has two parse trees


E
E E+ E * E
E
E * E int int
E + E
int int int
int 27
Ambiguity (Cont.)
 A grammar is ambiguous if it has more
than one parse tree for some string
– Equivalently, there is more than one right-
most or left-most derivation for some string
 Ambiguity is bad
– Leaves meaning of some programs ill-defined
 Ambiguity is common in programming
languages
– Arithmetic expressions
– IF-THEN-ELSE

28
Dealing with Ambiguity

• There are several ways to handle ambiguity

• Most direct method is to rewrite grammar


unambiguously

ET+E|T
T  int * T | int | ( E )

• This grammar enforces precedence of * over +

29
The Dangling Else: Example
Consider the following grammar S  if C then S

• The expression |if


C then S else S
if C1 then if C2 then S3 else S4
|
has twoOTHER
parse trees
if if

C1 if S4 C1 if

C2 S3
S3 S4
C2

• Typically we want the second form


30
The Dangling Else: A Fix
• else should match the closest unmatched then
• We can describe this in the grammar

S  MIF /* all then are matched */


| /* some then are unmatched */
MIFUIF
 if C then MIF else MIF
| OTHER
UIF  if C then S
| if C then MIF else UIF

• Describes the same set of strings


The Dangling Else: Example Revisited
• The expression if C1 then if C2 then S3 else S4

if if

C1 if C1 if S4

C2 C2
S3
S3
• Not valid because the
S4 then expression is
not a MIF
• A valid parse tree 32
Ambiguity
• No general techniques for handling ambiguity

• Impossible to convert automatically an


ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the


grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms

33
Precedence and Associativity Declarations
• Instead of rewriting the grammar
– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations

• Most tools allow precedence and associativity


declarations to disambiguate grammars

• Examples …

34
Associativity Declarations
• Consider the grammar EE+E|
int
• Ambiguous: two parse trees of int + int + int

E E
E + E + E
E
E + E int int E + E

int int int int


• Left associativity declaration: %left +

35
Precedence Declarations
• Consider the grammar E  E + E * E | int
| E And the string int + int * int

E E
* E + E
E
E + E int int E *
E
int int int
int
• Precedence declarations: %left
+
36
Error Handling
 Purpose of the compiler is
– To detect non-valid programs
– To translate the valid ones
 Many kinds of possible errors (e.g. in C)
Error kind Example Detected by …
Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User

37
Error Handling
 A good compiler should assist in identifying and locating errors
◦ Lexical errors: important, compiler can easily recover and
continue such as misspelling an identifier, keyword etc.
◦ Syntax errors: most important for compiler, can almost
always recover such as arithmetic expression with unbalanced
parenthesis
◦ Static semantic errors: important, can sometimes recover
such as operator applied to incompatible operands
◦ Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
◦ Logical errors: hard or impossible to detect such as infinite
recursive calls

38
Syntax Error Handling
 Error handler should
– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code

• Good error handling is not easy to achieve

39
Approaches to Syntax Error Recovery
 Approaches from simple to complex
– Panic mode
– Error productions
– Automatic local or global correction
• Panic mode is the simplest, most popular method
• When an error is detected:
– Discard tokens until one with a clear role is
found
– Continue from there

• Such tokens are called synchronizing tokens


– Typically the statement or expression
terminators
40
Questions???

You might also like