Properties of Context-Free Languages: Reading: Chapter 7
Properties of Context-Free Languages: Reading: Chapter 7
Properties of Context-Free Languages: Reading: Chapter 7
Reading: Chapter 7
Topics
Normal forms Pumping lemma for CFLs Closure and decision properties of CFLs
A =>
A => B
S * X
X * w,
for some w T*
S * X * w,
reachable generating
for some w T*
1.
SAB | a A b A, S are generating B is not generating (and therefore is useless) ==> Eliminating B (i.e., remove any production involving B)
1.
2.
1. 2. 3.
S a Ab
4.
Now, A is not reachable and therefore is useless Simplified G: What would happen if you reverse the order: 1. Sa i.e., test reachability before generating? Will fail to remove: Ab
8
5.
X * w
Every symbol in T is obviously generating. Suppose for a production A , where is generating Then, A is also generating
Induction:
S * X
S is obviously reachable (from itself) Suppose for a production A 1 2 k, where A is reachable Then, all symbols on the right hand side, {1, 2 , k} are also reachable.
10
Induction:
Eliminating -productions
A =>
11
Eliminating -productions
Caveat: It is not possible to eliminate -productions for languages which include in their word set
So we will target the grammar for the rest of the language
Theorem: If G=(V,T,P,S) is a CFG for a language L, then L-{} has a CFG without -productions Definition: A is nullable if A* If A is nullable, then any production of the form B CAD can be simulated by: B CD | CAD
Basis:
If A is a production in G, then A is nullable (note: A can still have other productions) If there is a production B C1C2Ck, where every Ci is nullable, then B is also nullable
13
Induction:
Eliminating -productions
Given: G=(V,T,P,S) Algorithm:
1. 2.
i.
For each production of the form: AX1X2Xk, where k1, suppose m out of the k Xis are nullable symbols Then G1 will have 2m versions for this production
i.e, all combinations where each Xi is either present or absent
i.
iii.
14
i. ii. iii.
Let L be the language represented by the following CFG G: SAB AaAA | BbBB |
Nullable symbols:
{A, B}
G1: S A | B | AB A a | aA | aAA B b | bB | bBB
G1 can be constructed from G as follows: B b | bB | bB | bBB ==> B b | bB | bBB Similarly, A a | aA | aAA Similarly, S A | B | AB Note: L(G) = L(G1) U {}
15
A=>xxx | yyy | zzz | B=> xxx | yyy | zzz | C=> xxx | yyy | zzz | D=>xxx | yyy | zzz
after
16
AB
Unit production is one which is of the form A B, where both A & B are variables E.g.,
1. 2. 3. 4.
Replace E T with E F | T*F Then, upon recursive application wherever there is a unit production:
E F | T*F | E+T E I | (E) | T*F| E+T E a | b | Ia | Ib | I0 | I1 | (E) | T*F | E+T Now, E has no unit productions
17
i.e., A ; B1 ; Bn ;
Basis: Every pair (A,A) is a unit pair (by definition). Similarly, if AB is a production, then (A,B) is a unit pair. Induction: If (A,B) and (B,C) are unit pairs, and AC is also a unit pair.
18
Add to P1 a new production A, for every B which is a non-unit production If a resulting production is already there in P, then there is no need to add it.
19
(E,I)
(T,T) (T,F)
G1: 1. 2. 3. 4.
E a|b|Ia | Ib | I0 | I1
T T*F T (E) T a|b| Ia | Ib | I0 | I1 F (E) F a| b| Ia | Ib | I0 | I1 I a| b | Ia | Ib | I0 | I1
20
(T,I)
E E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1 T T*F | (E) | a| b | Ia | Ib | I0 | I1 F (E) | a| b | Ia | Ib | I0 | I1 I a | b | Ia | Ib | I0 | I1
Theorem: If G is a CFG for a language that contains at least one string other than , then there is another CFG G1, such that L(G1)=L(G) - , and G1 has:
Algorithm:
Step 1) Step 2) Step 3)
Normal Forms
22
If all productions of the grammar could be expressed in the same form(s), then:
a.
It becomes easy to design algorithms that use the grammar It becomes easy to show proofs and properties
b.
23
A BC Aa
By implication from i and ii: => a CNF also cannot contain any unit productions (also no productions)
24
Checklist: G has no -productions G has no unit productions G has no useless symbols But the normal form for productions is violated
25
Assumption: G has no -productions, unit productions or useless symbols For every terminal a that appears in the body of a production:
i. ii.
1)
create a unique variable, say Xa, with a production Xa a, and replace all other instances of a in G by Xa
2)
A B1B2 Bk (k3)
or
Aa
3)
C2
C1
and so on
AB1C1
C1B2C2 Ck-3Bk-2Ck-2
Ck-2Bk-1Bk
26
Example
G in CNF: G: S => AS | BABC A => A1 | 0A1 | 01 B => 0B | 0 C => 1C | 1 X0 => 0 X1 => 1 S => AS | BY1 Y1 => AY2 Y2 => BC A => AX1 | X0Y3 | X0X1 Y3 => AX1 B => X0B | 0 C => X1C | 1
Example
G: 1. 2. 3. 4. E E+T | T*F | (E) | Ia | Ib | I0 | I1 T T*F | (E) | Ia | Ib | I0 | I1 F (E) | Ia | Ib | I0 | I1 I a | b | Ia | Ib | I0 | I1 1. 2. 3. 4. 5. 6. 7. 8. 9. E EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1 T TX*F | X(EX) | IXa | IXb | IX0 | IX1 F X(EX) | IXa | IXb | IX0 | IX1 I Xa | Xb | IXa | IXb | IX0 | IX1 X+ + X* * X+ + X( ( .
Step (1)
1. 2. 3. 4. 5. 6.
E EC1 | TC2 | X(C3 | IXa | IXb | IX0 | IX1 C1 X+T C2 X*F C3 EX) T ... .
28
Languages with
Write down the rest of grammar in CNF Then add production S => at the end
G in CNF:
X0 => 0 X1 => 1 S => AS | BY1 Y1 => AY2 Y2 => BC A => AX1 | X0Y3 | X0X1 Y3 => AX1 B => X0B | 0 C => X1C | 1 29
E.g., consider:
G: S => AS | BABC A => A1 | 0A1 | 01 | B => 0B | 0 | C => 1C | 1 |
30
31
A result that will be useful in proving languages that are not CFLs
Suppose we have a parse tree for a string w, according to a CNF grammar, G=(V,T,P,S)
Let h be the height of the parse tree
S = A0 A1 A2 h
= tree height
Ah-1
Implies:
|w|
2h-1
w
33
Basis: h = 1
Derivation will have to be Sa |w|= 1 = 21-1 .
B h
Ind. Step: h = k
S will have exactly two children: SAB Height of A & B subtrees is at most h-1 w = wA wB, where |wA| 2k-2 and |wB| 2k-2 |w| 2k-1
= height
wA w
wB
34
==> |w| 2k-1 Assuming CNF, any parse tree of height k cannot generate a string longer than 2k-1
Implication:
uvkwxky L
If L= or contains only , then the lemma is trivially satisfied (as it cannot be violated) For any other L which is a CFL:
Let G be a CFG for L in CNF Let m = number of variables in G Choose N=2m. Pick some z L s.t. |z| N the parse tree for z should have height m+1 (by the parse tree theorem)
37
+
Ai = Aj
S = A0
Ai
h m+1
Aj
m+1
Ah-1
u v
a
x
w
z = uvwxy Therefore, vx
38
S = A0
Ai=Aj
h m+1
Aj
Ai Ai
u v
v
x
x
y
==> For all k0:
u z = uwy
uvkwxky L
w z = uvkwxky
39
Proof contd..
Also, since Ais subtree no taller than m+1 ==> the string generated under Ais subtree, which is vwx, cannot be longer than 2m (=N) But, 2m =N ==> |vwx| N This completes the proof for the pumping lemma.
40
Let N <== P/L constant Pick z = aNbNcN Apply pumping lemma to z and show that there exists at least one other string constructed from z (obtained by pumping up or down) that is L
41
Proof contd
==> v, x cannot contain all three symbols (a,b,c) ==> we can pump up or pump down to build another string which is L
42
Example 3
L={
k2 0
| k is any integer)
44
Example 4
45
46
Union Concatenation Kleene closure operator Substitution Homomorphism, inverse homomorphism Intersection Difference Complementation
Note: Reg languages are closed under these operators
47
First prove closure under substitution Using the above result, prove the other closure properties
48
Example:
Let ={0,1} Let: s(0) = {anbn | n 1}, s(1) = {aa,bb} If w=01, s(w)=s(0).s(1)
49
IF L is a CFL and a substititution defined on L, s(L), is s.t., s(a) is a CFL for every symbol a, THEN:
50
G=(V,T,P,S) : CFG for L Because each s(a) is a CFL, there is a CFG for each s(a)
Let Ga = (Va,Ta,Pa,Sa)
P consists of:
S
San
x1
x2
xn
51
52
Let L1 and L2 be CFLs Make Lnew= {ab} s.t., s(a) = L1 and s(b)= L2
==> L1 L2 = s(Lnew)
Let L be a CFL
Let Lnew = {a}* and s(a) = L1
Then, L* = s(Lnew)
54
Let L be a CFL, with grammar G=(V,T,P,S) For LR, construct GR=(V,T,PR,S) s.t.,
If A==> is in P, then:
55
Existential proof:
Grammars?
Why?
We have an example, where intersection is not closed. Therefore, CFLs are not closed under intersection 56
Follows from the fact that CFLs are not closed under intersection
L1 L2 = L1 U L2
57
Follows from the fact that CFLs are not closed under complementation Because, if CFLs are closed under difference, then:
Decision Properties
Emptiness test
Membership test
59
Is a given CFG G ambiguous? Is a given CFL inherently ambiguous? Is the intersection of two CFLs empty? Are two CFLs the same? Is a given L(G) equal to *?
60
Summary
Normal Forms
Chomsky Normal Form Griebach Normal Form Useful in proroving P/L
Closure properties
Closed under: union, concatentation, reversal, Kleen closure, homomorphism, substitution Not closed under: intersection, complementation, difference
61