Context-Free Grammar (CFG) : Dr. Nadeem Akhtar

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 56

Context-Free Grammar (CFG)

Dr. Nadeem Akhtar


Assistant Professor
Department of Computer Science & IT
The Islamia University of Bahawalpur

PhD – Formal methods in Software engineering


IRISA – University of South Brittany – FRANCE.

1
Context Free Grammars (CFG)
• Context-Free Grammar is a more powerful method of
describing languages.
CFGs have a recursive structure, which makes them useful in a
variety of applications.

- Study of human languages. One way of understanding the


relationship of terms such as noun, verb, and preposition
and their respective phrases leads to a natural recursion
because noun phrases may appear inside verb phrases and
vice versa. Context-free grammars can capture important
aspects of these relationships.

2
Context Free Grammars (CFG)
• An important application of context-free grammars
occurs in the specification and compilation of
programming languages.
A grammar for a programming language is a reference
for people trying to learn the language syntax.

• Designers of compilers and interpreters for


programming languages start by obtaining a grammar
for the language.

3
Context Free Grammars (CFG)
• Parser of a compiler or interpreter extracts the
meaning of a program prior to generating the
compiled code or performing the interpreted
execution.

• A number of methodologies facilitate the


construction of a parser once a context-free
grammar is available. Some tools even automatically
generate the parser from the grammar.

4
Context Free Grammars (CFG)
The collection of languages associated with context-free
grammars are called the context-free languages.

They include all the regular languages and many additional


languages.

A formal definition of context-free grammars and study the


properties of context-free languages.
Pushdown Automata, a class of machines recognizing the
context-free languages. Pushdown automata allow us to gain
additional insight into the power of context-free grammars.

5
Context Free Grammar (CFG)
A CFG has four components
1) A set of terminal symbols, sometimes referred to as “tokens”.
Terminals are the elementary symbols of the language defined by the
grammar.
2) A set of non-terminals sometimes called “syntactic variables”. Each
non-terminal represents a set of strings of terminals
In stmt → if (expr) stmt else stmt
stmt and expr are non-terminals.
3) A set of productions
Each production consists of a non-terminal, called head or left side of
the production, an arrow, and a sequence of terminals and/or non-
terminals, called the body or right side of the production.
4) A designation of one of the non-terminals as the start symbol (head)

6
Regular Languages
• Closed under Union, Concatenation and
Closure (∗)
• Recognizable by finite-state automata
• Denoted by Regular Expressions
• Generated by Regular Grammars

7
Context Free Grammars
• More general productions than regular
grammars
S→w
where w is any string of terminals and non-
terminals
• What languages do these grammars generate?
S →(A) S → ε | aSb
A → ε | aA | ASA

8
Context-free languages more
general than regular languages
• {anbn | n ≥ 0} is not regular
‣ but it is context-free

• Why are they called “context-free”?


‣ Context-sensitive grammars allow more than
one symbol on the left-hand side of productions
– xAy → x(S)y can only be applied to the non-
terminal A when it is in the context of x and y

9
Example derivation in a Grammar
• Grammar: start symbol is A
A → aAa
A→B
B → bB
B→ε

• Sample Derivation:
A ⇒ aAa ⇒ aaAaa ⇒ aaaAaaa ⇒ aaaBaaa ⇒ aaabBaaa ⇒
aaabbBaaa ⇒ aaabbaaa

• Language?

10
Derivations in Tree Form

11
Example CFG
• An example of a context-free grammar, which we call G1 .
A → 0A1
A→B
B→#

Collection of substitution rules, called productions. Each rule appears as a line in the
grammar, comprising a symbol and a string separated by an arrow. The symbol is called a
variable.

The string consists of variables and terminals.


The variable symbols often are represented by capital letters.
The terminals are analogous to the input alphabet and often are represented by
lowercase letters, numbers, or special symbols.
One variable is designated as the start variable. It usually occurs on the left-hand side of
the topmost rule.

12
Example CFG
• An example of a context-free grammar, which we call
G1 .
A → 0A1
A→B
B→#

Grammar G1 contains three rules.


G1 's variables are A and B, where A is the start variable.
Its terminals are 0, 1, and #.
13
Example CFG
Grammar is used to describe a language by generating
each string of that language in the following manner.
1. Write down the start variable. It is the variable on
the left-hand side of the top rule, unless specified
otherwise.
2. Find a variable that is written down and a rule that
starts with that variable. Replace the written down
variable with the right-hand side of that rule.
3. Repeat step 2 until no variables remain.

14
Example-1 CFG
Context-free grammar, G1 .
A → 0A1
A→B
B→#

• Grammar G 1 generates the string 000#111.


The sequence of substitutions to obtain a string is called
a derivation. A derivation of string 000#111 in grammar G 1
is
A ⇒ OA1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000#111
15
Parse tree for 000#111 in G1

16
Example CFG
• All strings generated in this way constitute the
language of the grammar.
• We write L(G1) for the language of grammar G1
.
Grammar G1 shows that L(G1) is {0n#1n | n > 0}.
Any language that can be generated by some
context-free grammar is called a Context-Free
Language (CFL).
17
FORMAL DEFINITION OF A CONTEXT-FREE GRAMMAR

A context-free grammar is a 4-tuple (V, Σ, R, S),


where
1. V is a finite set called the variables
2. Σ is a finite set, disjoint from V, called the
terminals
3. R is a finite set of rules, with each rule being a
variable and a string of variables and terminals,
and
4. S Є V is the start variable.
18
Example-1 CFG
Context-free grammar, G1 .
A → 0A1
A→B
B→#

In grammar G1 , V = {A, B}, Σ = {0, 1, #}, S = A, and R


is the collection of the three rules

19
Example-2 CFG
• Consider grammar G2 = ({S}, { a, b }, R, S).
The set of rules, R, is S → aSb I SS I ε

This grammar generates strings such as abab, aaabbb,


and aababb.

You can see more easily what this language is if you think of a as a
left parenthesis" (" and b as a right parenthesis") ". Viewed in this
way, L(G2) is the language of all strings of properly nested
parentheses.

20
Example-3 CFG
Consider grammar G3 = (V, Σ , R, <EXPR>).

V is {<EXPR>, <TERM>, <FACTOR>} and


Σ is {a, +, x, (, )}.
The rules are
<EXPR) → <EXPR> + <TERM> | <TERM>
<TERM> → <TERM> x <FACTOR> | <FACTOR>
<FACTOR> → (<EXPR>) | a

21
<EXPR) → <EXPR> + <TERM> | <TERM>
<TERM> → <TERM> x <FACTOR> | <FACTOR>
<FACTOR> → (<EXPR>) | a

The two strings a+axa and (a+a)xa can be generated with grammar G3 .
The parse trees are shown in the following figure.

22
Example – Designing CFGs
• Grammar for the language
{0n1n | n ≥ 0} U {1n0n | n ≥ 0}
First construct the grammar
S1 → 0 S1 1 | ε
for the language {0n1n | n ≥ 0} and the grammar
S2 → 1 S2 0 | ε
for the language {1n0n | n ≥ 0}
And then add the rule S → S1 | S2
Grammar:
S → S1 | S2
S1 → 0 S 1 1 | ε
S2 → 1 S2 0 | ε

23
Ambiguous Grammar
Sometimes a grammar can generate the same
string in several different ways. Such a string will
have several different parse trees and thus several
different meanings.

In a programming language, its important that a a


given program should have a unique interpretation.

24
Ambiguous Grammar
If a grammar generates the same string in
several different ways, then that string is derived
ambiguously in that grammar.

If a grammar generates some string ambiguously


we say that the grammar is ambiguous.

25
For example, consider the following grammar:
<EXPR> → <EXPR>+<EXPR> | <EXPR> x <EXPR> | ( <EXPR>) | a
This grammar generates the string a+axa ambiguously.
The following figure shows the two different parse trees.

26
Ambiguous Grammar
A string w is derived ambiguously in context-
free grammar G if it has two or more different
leftmost derivations.

Grammar G is ambiguous if it generates some


string ambiguously.

27
Ambiguous Grammar
• A grammar can have more than one parse tree
generating a given string of terminals. Such a
grammar is said to be ambiguous

• Grammar is ambiguous, a terminal string that


yield of more than one parse tree.

28
Ambiguity
E  E  E | E  E | ( E ) | id
String id * id + id has the following two parse trees

Enforces precedence of * over + Doesn’t enforce this precedence


29
Dealing with Ambiguity
• The most direct way is to re-write the
grammar unambiguously
E  E  E | E  E | ( E ) | id

E  E ' E | E '
E '  id * E ' | id | ( E ) * E ' | ( E )
Enforces precedence of * over +
30
Example
E  E ' E | E '
id + id * id
E '  id * E ' | id | ( E ) * E ' | ( E )
E
id *id + id

E E’ + E

E’ E id E’
+

id E’ id E’
* id *

id id
31
Example
Another Ambiguous Grammar

• S→x Rewrite it as:


 T→x
• S→y
 T→y
• S→z  T→z
• S→S+S  S→S+T
• S→S–S  S→S–T
• S→S*S  S→S*T
• S→S/S  S→S/T
• S → (S)  T→(S)
 S→T
Generates two parse trees for x + y * z Enforces precedence of * over +

TRY DIFFERENT INPUTS AT HOME 32


Context Free Grammar (CFG)
Example
Java if-else statement
If (expression) statement else statement
stmt → if (expr) stmt else stmt
The arrow may be read as “can have the form”.
Such a rule is called a production

33
Context Free Grammar (CFG)
Example
list → list + digit
list → list - digit
list → digit
digit → 0|1|2|3|4|5|6|7|8|9

list → list + digit | list – digit | digit

The terminals are + - 0 1 2 3 4 5 6 7 8 9


34
Context Free Grammar (CFG)
Example
Function call
call → id(optparams)
optparams → params | є
params → params, param | param

35
Context Free Grammar (CFG)
• Example
• Operators on the same line have the same
associativity and precedence
left-associative: + -
left-associative: * /
Two non-terminals expr and term for the two
levels of precedence, non-terminal factor for
generating basic units in expressions

36
Context Free Grammar (CFG)
factor → digit | (expr)
Binary operators * and / have the highest precedence
term → term * factor
| term / factor
| factor
Similarly, expr
expr → expr + term
| expr - term
| term
The resulting grammar is therefore
expr → expr + term | expr – term | term
term → term * factor | term / factor | factor
factor → digit | (expr)

37
Context Free Grammar (CFG)
Example:
A grammar for a subset of Java statement

stmt → id = expression;
| if(expression) stmt
| if(expression) stmt else stmt
| while(expression) stmt
| do stmt while (expression);
| {stmts}

stmts → stmts stmt


| є

38
Context Free Grammar (CFG)
Example:
Grammar for statement blocks and conditional
statements:

stmt → if expr then stmt else stmt


| if stmt then stmt
| begin stmtList end

stmtList → stmt; stmtList | stmt


39
Context Free Grammar (CFG)
• Exercise
Consider the context-free grammar
S → SS+ | SS* | a

(a) Show how the string aa+a* can be generated by


this grammar
(b) Construct a parse tree for this string
(c) What language does this grammar generate?
Justify your answer.
40
CFG vs. Regex
• CFGs are a more powerful notation than
regexes
– Every construct that can be described by a regex
can also be described by the CFG, but not vice-
versa
– Every regular language is a context-free language,
but not vice versa.

41
CFG vs. Regex
Regex: (a|b)*abb Grammar:

A  aA0 | bA0 | aA1


Describe the same
language: the set of  bA2
strings of a’s and b’s
ending with abb  bA3


42
CFG vs. Regex
• Language L = {anbn | n>=1} can be described by a grammar but not by a
regex
• Suppose L was defined by some regex
– We could construct a DFA with a finite number of states, say k, to accept L
Path aj-i State si: For an input beginning
with more than k a’s
Path ai
s0 --- si aibi is in the language: A path
bi from si to state f
Path ajbi is also possible
--- Path bi This DFA accepts both aibi
and ajbi

DFA cannot count, i.e., keep track of


f the number of a’s before it sees the b’s

43
Context-free grammars are widely
used for programming languages
• From the definition of Algol-60:
procedure_identifier::= identifier.
actual_parameter::= string_literal | expression | array_identifier | switch_identifier | procedure_identifier.
letter_string::= letter | letter_string letter.
parameter_delimiter::= "," | ")" letter_string ":" "(".
actual_parameter_list::= actual_parameter | actual_parameter_list parameter_delimiter actual_parameter.
actual_parameter_part::= empty | "(" actual_parameter_list} ")".
function_designator::= procedure_identifier actual_parameter_part.

44
Example
adding_operator::= "+" | "−" .
multiplying_operator::= "×" | "/" | "÷" .
primary::= unsigned_number | variable | function_designator | "(" arithmetic_expression ")".
factor::= primary | factor | factor power primary.
term::= factor | term multiplying_operator factor.
simple_arithmetic_expression::= term | adding_operator term |
simple_arithmetic_expression adding_operator term.
if_clause::= if Boolean_expression then.
arithmetic_expression::= simple_arithmetic_expression |
if_clause simple_arithmetic_expression else arithmetic_expression.

if a < 0 then U+V else if a * b < 17 then U/V else if k <> y then
V/U else 0

45
NFA To CFG Conversion
• We can mechanically construct the CFG from an
NFA
• Converting the NFA for (a|b)*abb into CFG
– For each state i of the NFA, create a non-terminal Ai
– If i has a transition to j on input a, add Ai  aA j
– If i has a transition to j on input ε, add Ai  A j
– If i is an accepting state, add Ai 
– If i is the start state, make Ai the start symbol of the
grammar.
46
BNF: Meta-Syntax for CFGs
• <postal-address> ::= <name-part> <street-address>
<zip-part>
• <name-part> ::= <personal-part> <last-name>
<opt-jr-part> <EOL>
| <personal-part> <name-part>
• <personal-part> ::= <first-name> | <initial> "."
• <street-address> ::= <house-num> <street-name>
<opt-apt-num> <EOL>
• <zip-part> ::= <town-name> "," <state-code>
<ZIP-code> <EOL>
• <opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
47
Left-Most Derivation Parse Tree

48
Right-Most Derivation Parse Tree

49
Ambiguity
• There are no general techniques for handling
ambiguity
• It is impossible to automatically convert an
ambiguous grammar into an unambiguous one
• If used sensibly, ambiguity can simplify the
grammar
• Disambiguation Rules: Instead of re-writing the
grammar, we can
– Use the ambiguous grammar
– Along with disambiguation rules. 50
Associativity of Operators
• The operator + associates to the left
An operator with + signs on both sides of it
belongs to the operator to its left.

In most programming languages the four


arithmetic operators, addition, subtraction,
multiplication, and division are left associative.

51
Right Associative Operator
• The operator = associates to the right
right → letter = right | letter
letter → a | b |…| z

Parse tree for 9 – 5 – 2 grows down towards the


left, whereas parse tree for a=b=c grows
down towards the right

52
Precedence of Operators
• Associativity rules for + and * apply to
occurrences of the same operator
• Rule
* has the highest precedence than + if * takes its
operands before + does
• Multiplication and division have higher
precedence than addition and subtraction.
• 9 + 5 * 2 and 9 * 5 + 2 equivalent to 9 + (5 * 2)
and (9 * 2) + 2
53
Questions
• CFG representing the Regular Expression a+
A → aA | a

• CFG representing the Regular Expression b*


B →
B → bB

54
Questions
• CFG representing the Regular Expression a*b+
(i.e. start with any number of a’s followed by non-zero numbers of
b)
S → R | aS
R → b | bR

• A CFG representing the Regular Expression ab+a


(i.e. start with a followed by non-zero numbers of b’s and ends with a)
S → aRa
R → b | bR
55
Questions
Every construct described by a Regular Expression can also
be described by a CFG. Consider the following regular
expression:

(a|b)*abb where Σ = {a, b}


Create an equivalent CFG of the above regular expression

A0 → aA0 | bA0 | aA1


A1 → bA2
A2 → bA3
A3 → є

56

You might also like