Syntax Analysis: Dr. Nguyen Hua Phung Nhphung@hcmut - Edu.vn
Syntax Analysis: Dr. Nguyen Hua Phung Nhphung@hcmut - Edu.vn
Syntax Analysis: Dr. Nguyen Hua Phung Nhphung@hcmut - Edu.vn
07, 2019
2 Context-free grammar
3 Write a grammar
4 Some issues
Roles
read the sequence of tokens
produce as output a parse tree or abstract syntax tree
(AST)
give error messages when detecting syntax errors
token parse
source lexical syntax
tree /
program analyzer analyzer
get next AST
token
symbol
table
Source program: a = b * 4
Lexer output (parser input): ID ASSIGN ID MULOP INTLIT
Parser output:
<assign-stmt>
ID ASSIGN <expr>
<term>
ID INTLIT
L = {an bn | n > 0}
Example
In programming languages, there are some symmetric
structure
(((...))) the number of ( must be equal to that of )
repeat ... repeat ... until ... until
Grammar
<exp> → <exp> ADDOP <exp> (1)
| <exp> MULOP <exp> (2)
| LB <exp> RB (3)
| INTLIT (4)
Derivation
(2)
<exp> ⇒ <exp> MULOP <exp>
(4)
⇒ INTLIT MULOP <exp>
(1)
⇒ INTLIT MULOP <exp> ADDOP <exp>
(4)
⇒ INTLIT MULOP INTLIT ADDOP <exp>
(4)
⇒ INTLIT MULOP INTLIT ADDOP INTLIT
+
{ a1 a2 ...an | ai ∈ T and S ⇒ a1 a2 ...an }
Parse Tree
Start symbol as the parse tree’s root
For a production X → Y1 Y2 ...Yn , add children Y1 Y2 ...Yn to
node X
Source: 12 * (4 + 5)
Parser input: INTLIT MULOP LB INTLIT ADDOP INTLIT
RB
Parser output:
<exp>
INTLIT LB <exp> RB
INTLIT INTLIT
program
declaration function
program → ...
program
declaration → ...
variable_decl → ...
declaration function const_decl → ...
type_decl → ...
variable constant parameter idlist → ...
statement
declaration declaration declaration
function → ...
para_decl → ...
type idlist expression stmt → ...
exp → ...
EBNF
allow to use operators in regular expression in RHS
higher expressiveness
often supported by top-down parsing generators
BNF EBNF (RHS) ANTLR
<exp> → <exp> ’+’ <term>
exp:
| <exp> ’-’ <term> <term> ((’+’|’-’) <term>)*
term ((’+’|’-’) term)*;
| <term>
<else> → ELSE <stmt> else:
(ELSE <stmt>)?
| ("else" stmt)?;
<idlist> → ID ’,’ idlist idlist:
ID (’,’ ID)*
| ID ID ("," ID)* ;
When more than one parse tree can be found for a string of
tokens, the grammar is ambiguous
Ambiguity makes the meaning of some programs ill-defined
For example,
if ADDOP is left-associated, the grammar should be
<exp> → <exp> ADDOP <term>
if ADDOP is right-associated, the grammar should be
<exp> → <term> ADDOP <exp>