Chapter 4 Automata

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Chapter 4

Context-Free Grammar (CFG)


Context Free Grammars (Derivation, Derivation Tree, Ambiguity in
grammars)
Simplification of CFG
Normal Forms (NFs) - Chomsky NF (CNF), Conversion to CNF
Greibach NF (GNF)
Context-Free Grammar (CFG)

CFG stands for context-free grammar. It is a formal grammar which is used to generate all possible patterns of
strings in a given formal language. Context-free grammar G can be defined by four tuples as:
G = (V, T, P, S)  
Where,
G is the grammar, which consists of a set of the production rule. It is used to generate the string of a language.

T is the final set of a terminal symbol. It is denoted by lower case letters.

V is the final set of a non-terminal symbol. It is denoted by capital letters.

P is a set of production rules, which is used for replacing non-terminals symbols(on the left side of the

production) in a string with other terminal or non-terminal symbols(on the right side of the production).

S is the start symbol which is used to derive the string. We can derive the string by repeatedly replacing a non-

terminal by the right-hand side of the production until all non-terminal have been replaced by terminal

symbols.
Example 1:

Construct the CFG for the language having any number of a's over the set ∑= {a}.
Solution:
As we know the regular expression for the above language is
1.r.e. = a*   means zero or more occurrence of a  
Production rule for the Regular expression is as follows:
S → aS    rule 1  
S → ε     rule 2  
Now if we want to derive a string "aaaaaa", we can start with start symbols.
S  
aS   
aaS          rule 1  
aaaS         rule 1  
aaaaS        rule 1  
aaaaaS       rule 1  
aaaaaaS      rule 1  
aaaaaaε      rule 2  
aaaaaa  
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S is a start symbol and rule 2
gives S → ε.
Example 2:

Construct a CFG for the regular expression (0+1)*


Solution:
(0+1)* All strings of 0’s and 1’s
The CFG can be given by,
Production rule (P):  
S → 0S | 1S  
S → ε  
The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)*
indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in the rule, we
can set the rule S → ε.
Derivation

 Derivation is a sequence of production rules. It is used to get the input string


through these production rules. During parsing, we have to take two decisions.
These are as follows:
We have to decide the non-terminal which is to be replaced.
We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be placed with production
rule.
1. Leftmost Derivation:
 In the leftmost derivation, the input is scanned and replaced with the production
rule from left to right. So in leftmost derivation, we read the input string from left
to right.
Example:

Production rules:
E = E + E  
E = E - E  
E = a | b  
Input
a - b + a  
The leftmost derivation is:
E = E + E  
E = E - E + E  
E = a - E + E  
E = a - b + E  
E = a - b + a  
2. Rightmost Derivation:

In rightmost derivation, the input is scanned and replaced with the production rule from right to left. So in rightmost
derivation, we read the input string from right to left.
Example
Production rules:
E = E + E  
E = E - E  
E = a | b  
Input
a - b + a  
The rightmost derivation is:
E = E - E  
E = E - E + E  
E = E - E + a  
E = E - b + a  
E = a - b + a  
When we use the leftmost derivation or rightmost derivation, we may get the same string. This type of derivation
does not affect on getting of a string.
Examples of Derivation:

Example 1:
Derive the string "abb" for leftmost derivation and rightmost derivation using a
CFG given by,
S → AB | ε   Rightmost derivation:
A → aB  
B → Sb  
Solution:
Leftmost derivation:
Example 2:

Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG given by,
S → aB | bA  
S → a | aS | bAA  
S → b | aS | aBB  
Solution:
Leftmost derivation:
S  
aB            S → aB      
aaBB          B → aBB  
aabB          B → b  
aabbS         B → bS  
aabbaB        S → aB  
aabbabS       B → bS  
aabbabbA      S → bA  
aabbabba      A → a  
Example 2:

Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG given by,
S → aB | bA  
S → a | aS | bAA  
S → b | aS | aBB  
Solution:
Leftmost derivation:
S  
aB            S → aB      
aaBB          B → aBB  
aabB          B → b  
aabbS         B → bS  
aabbaB        S → aB  
aabbabS       B → bS  
aabbabbA      S → bA  
aabbabba      A → a  
Example 3:

Derive the string "00101" for leftmost derivation and rightmost derivation using a CFG given by,
S → A1B  
A → 0A | ε  
B → 0B | 1B | ε  
Solution:
Leftmost derivation:
S  
A1B  
0A1B  
00A1B  
001B  
0010B  
00101B  
00101  
Derivation Tree

 Derivation tree is a graphical representation for the derivation of the given


production rules for a given CFG.
 It is the simple way to show how the derivation can be done to obtain some string
from a given set of production rules.
 The derivation tree is also called a parse tree.
 Parse tree follows the precedence of operators. The deepest sub-tree traversed first.
So, the operator in the parent node has less precedence over the operator in the sub-
tree.
A parse tree contains the following properties:
The root node is always a node indicating start symbols.
The derivation is read from left to right.
The leaf node is always terminal nodes.
The interior nodes are always the non-terminal nodes.
Example 1:

Step 4:
Production rules: Step 2: Step 5:

E = E + E  
E = E * E  
E = a | b | c  
Input Step 3:
a * b + c  
Step 1:

Note: We can draw a derivation tree step by step or directly in one step
Example 2:

Draw a derivation tree for the string "bab" from the CFG given by
S → bSb | a | b  
Solution:
Now, the derivation tree for the string "bbabb" is as follows:
The above tree is a derivation tree drawn for deriving a string
bbabb. By simply reading the leaf nodes, we can obtain the
desired string. The same tree can also be denoted by,
Ambiguity in Grammar

 A grammar is said to be ambiguous if there exists more than one leftmost


derivation or more than one rightmost derivation or more than one parse tree for
the given input string.
 If the grammar is not ambiguous, then it is called unambiguous.
• If the grammar has ambiguity, then it is not good for compiler construction. No
method can automatically detect and remove the ambiguity, but we can remove
ambiguity by re-writing the whole grammar without ambiguity.
Example 1:

Let us consider a grammar G with the production rule


E → I  
E → E + E  
E → E * E  
E → (E)  
I → ε | 0 | 1 | 2 | ... | 9  
Solution:
For the string "3 * 2 + 5", the above
grammar can generate two parse trees
by leftmost derivation:

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.
Example 2:

Check whether the given grammar G is ambiguous or not.


E → E + E  
E → E - E  
E → id  
Solution:
From the above grammar String "id + id - id" can be derived in 2 ways:
First Leftmost derivation
E → E + E  
 → id + E  
→ id + E - E  
 → id + id - E  
 → id + id- id  

Since there are two leftmost derivation for


a single string "id + id - id", the grammar G is ambiguous.
Example 3:

Check whether the given grammar G is ambiguous or not.


1.S → aSb | SS  
2.S → ε  
Solution:
For the string "aabb" the above grammar can generate two parse trees

Since there are two parse trees for a single


string "aabb", the grammar G is ambiguous.
Simplification of CFG

 As we have seen, various languages can efficiently be represented by a


context-free grammar. All the grammar are not always optimized that
means the grammar may consist of some extra symbols(non-terminal).
 Having extra symbols, unnecessary increase the length of grammar.
Simplification of grammar means reduction of grammar by removing
useless symbols. The properties of reduced grammar are given below:
Each variable (i.e. non-terminal) and each terminal of G appears in the
derivation of some word in L.
There should not be any production as X → Y where X and Y are non-
terminal.
If ε is not in the language L then there need not to be the production X → ε.
Let us study the reduction process in detail
Removal of Useless Symbols

 A symbol can be useless if it does not appear on the right-hand side of the
production rule and does not take part in the derivation of any string.
 That symbol is known as a useless symbol. Similarly, a variable can be useless if it
does not take part in the derivation of any string. That variable is known as a
useless variable. For Example:
T → aaB | abA | aaT  
A → aA  
B → ab | b  
C → ad  
 In the above example, the variable 'C' will never occur in the derivation of any
string, so the production C → ad is useless. So we will eliminate it, and the other
productions are written in such a way that variable C can never reach from the
starting variable 'T'.
Cont….
 Production A → aA is also useless because there is no way to terminate it. If it
never terminates, then it can never produce a string. Hence this production can
never take part in any derivation.

 To remove this useless production A → aA, we will first find all the variables
which will never lead to a terminal string such as variable 'A'. Then we will
remove all the productions in which the variable 'B' occurs.
Elimination of ε Production

 The productions of type S → ε are called ε productions. These type of productions can only be
removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained from a
by removing one or more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε productions.
Example:
Remove the production from the following CFG by preserving the meaning of it.
S → XYX  
X → 0X | ε  
Y → 1Y | ε  
Solution:
Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve the
meaning of CFG we are actually placing ε at the right-hand side whenever X and Y have
appeared.
Co t…
Let us take
S → XYX  
If the first X at left-hand side is ε. Then
S → YX  
Similarly if the last X in R.H.S. = ε.
Then
S → XY  
If Y = ε then
S → XX  
If Y and X are ε then,
S → X  
If both X are replaced by ε
S → Y  
Now,
Removing Unit Productions

The unit productions are the productions in which one non-terminal


gives another non-terminal. Use the following steps to remove unit
production:
Step 1: To remove X → Y, add production X → a to the grammar rule
whenever Y → a occurs in the grammar.
Step 2: Now delete X → Y from the grammar.
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
For example:

S → 0A | 1B | C  
A → 0S | 00  
B → 1 | A  
C → 01  
Solution:
S → C is a unit production. But while removing S → C we have to consider what C gives. So, we can
add a rule to S.
S → 0A | 1B | 01  
Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00  
Thus finally we can write CFG without unit production as
S → 0A | 1B | 01  
A → 0S | 00  
B → 1 | 0S | 00  
C → 01  
Chomsky's Normal Form (CNF)

 CNF stands for Chomsky normal form. A CFG(context free grammar) is in


CNF(Chomsky normal form) if all production rules satisfy one of the following
conditions:
Start symbol generating ε. For example, A → ε.
A non-terminal generating two non-terminals. For example, S → AB.
A non-terminal generating a terminal. For example, S → a.
For example:
G1 = {S → AB, S → c, A → a, B → b}  
G2 = {S → aA, A → a, B → c}  
The production rules of Grammar G1 satisfy the rules specified for CNF, so the
grammar G1 is in CNF. However, the production rule of Grammar G2 does not
satisfy the rules specified for CNF as S → aZ contains terminal followed by non-
terminal. So the grammar G2 is not in CNF.
Steps for converting CFG into CNF

Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-
hand side of any production, create a new production as:S1 → S  
Where S1 is the new start symbol.
Step 2: In the grammar, remove the null, unit and useless productions.
Step 3: Eliminate terminals from the RHS of the production if they exist with other
non-terminals or terminals. For example, production S → aA can be decomposed
as:
S → RA  
R → a  
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB
can be decomposed as:
S → RS  
R → AS  
Example:

Convert the given CFG to CNF. Consider the given grammar G1:
S → a | aA | B  
A → aBB | ε  
B → Aa | b  
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the
RHS. The grammar will be:
S1 → S  
S → a | aA | B  
A → aBB | ε  
B → Aa | b  
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar
yields:
S1 → S  
S → a | aA | B  
A → aBB  
B → Aa | b | a  
Cont….
Now, as grammar G1 contains Unit production S → B, its removal yield:
S1 → S  
S → a | aA | Aa | b  
A → aBB  
B → Aa | b | a  
Also remove the unit production S1 → S,
its removal from the grammar yields:
S0 → a | aA | Aa | b  
S → a | aA | Aa | b  
A → aBB  
B → Aa | b | a  
Step 3: In the production rule S0 → aA | Aa,
S → aA | Aa, A → aBB and B → Aa,
terminal a exists on RHS with non-terminals.
So we will replace terminal a with X:
Greibach Normal Form (GNF)

 GNF stands for Greibach normal form. A CFG(context free grammar) is in


GNF(Greibach normal form) if all the production rules satisfy one of the
following conditions:
• A start symbol generating ε. For example, S → ε.
• A non-terminal generating a terminal. For example, A → a.
• A non-terminal generating a terminal which is followed by any number of non-terminals.
For example, S → aASB.
For example:
G1 = {S → aAB | aB, A → aA| a, B → bB | b}  
G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}  
 The production rules of Grammar G1 satisfy the rules specified for GNF, so the
grammar G1 is in GNF. However, the production rule of Grammar G2 does not
satisfy the rules specified for GNF as A → ε and B → ε contains ε(only start
symbol can generate ε). So the grammar G2 is not in GNF.
Steps for converting CFG into GNF

Step 1: Convert the grammar into CNF.

If the given grammar is not in CNF, convert it into CNF. You can refer the following
topic to convert the CFG into CNF: Chomsky normal form

Step 2: If the grammar exists left recursion, eliminate it.

If the context free grammar contains left recursion, eliminate it. You can refer the
following topic to eliminate left recursion: Left Recursion

Step 3: In the grammar, convert the given production rule into GNF form.

If any production rule in the grammar is not in GNF form, convert it.
Example:

S → XB | AA  
A → a | SA  
B → b  
X → a  
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can skip step 1 and step 2 and directly
go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production rule A → SA as:
S → XB | AA  
A → a | XBA | AAA  
B → b  
X → a  
The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in the production rule S → XB and
B → XBA as:
S → aB | AA  
A → a | aBA | AAA  
B → b  
X → a  
Cont…
Now we will remove left recursion (A → AAA), we get:
S → aB | AA  
A → aC | aBAC  
C → AAC |  ε  
B → b  
X → a  
Now we will remove null production C → ε, we get:
S → aB | AA  
A → aC | aBAC | a | aBA  
C → AAC |  AA  
B → b  
X → a  
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in
production rule S → AA as:
Cont…
S → aB | aCA | aBACA | aA | aBAA  
A → aC | aBAC | a | aBA  
C → AAC  
C → aCA | aBACA | aA | aBAA  
B → b  
X → a  
The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA in production
rule C → AAC as:
S → aB | aCA | aBACA | aA | aBAA  
A → aC | aBAC | a | aBA  
C →  aCAC | aBACAC | aAC | aBAAC  
C → aCA | aBACA | aA | aBAA  
B → b  
X → a  
Hence, this is the GNF form for the grammar G.
D ! !
EN

You might also like