FL&T Unit 3 - 1 - 1724732026415

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Context-free Languages

Unit II: Context-free languages (07 Hrs.)


Context-free grammars (CFG) and languages (CFL), Chomsky and Greibach normal forms,
parse trees, ambiguity in CFG, pumping lemma for context-free languages, closure properties of
CFLs.
Context-sensitive languages: Context-sensitive grammars (CSG) and languages, linear bounded
automata and equivalence with CSG.

What is a grammar?
A grammar consists of one or more variables that represent classes of strings (i.e., languages)
There are rules that say how the strings in each class are constructed. The construction can use:
1. symbols of the alphabet
2. Strings that are already known to be in one of the classes
3. or both

Definition of Context-Free Grammar:


A GFG (or just a grammar)
G is a tuple
G = (V, T, P, S) where
1. V is the (finite) set of variables (or nonterminals or syntactic categories). Each variable
represents a language, i.e., a set of strings
2. T is a finite set of terminals, i.e., the symbols that form the strings of the language being
defined
3. P is a set of production rules that represent the recursive definition of the language.
4. S is the start symbol that represents the language being defined.
Other variables represent auxiliary classes of strings that are used to help define the language of
the start symbol.

Production rules
Each production rule consists of:
1. A variable that is being (partially) defined by the production. This variable is often called the
head of the production.
2. The production symbol →.
3. A string of zero or more terminals and variables. This string, called the body of the
production, represents one way to form strings in the language of the variable of the head.
In doing so, we leave terminals unchanged and substitute for each variable of the body any string
that is known to be in the language of that variable.

Here is an example of a context-free grammar G1:


A → 0A1
A→B
B→#
 Each line is a substitution rule (or production).
 A, B are variables.
 0, 1, # are terminals.
 The left-hand side of the first rule (A) is the start variable.

What is Derivation?
A sequence of rewritings that transforms the start variable S of a grammar G to a
sentence s is called a derivation of s from G.( symbol ⇒ is used to show derivation)

Example:
Given grammar G is:
S → 0S1
S→ε
The string 0011 is in the language generated.
The derivation is:
S ⇒ 0S1
⇒ 00S11
⇒ 0011

Generation of Derivation Tree

A derivation tree or parse tree is an ordered rooted tree that graphically represents the semantic
information a string derived from a context-free grammar.

Representation Technique

 Root vertex − Must be labeled by the start symbol.


 Vertex − Labeled by a non-terminal symbol.
 Leaves − Labeled by a terminal symbol or ε.

If S → x1x2 …… xn is a production rule in a CFG, then the parse tree / derivation tree will be as
follows −

There are two different approaches to draw a derivation tree −

Top-down Approach −

 Starts with the starting symbol S


 Goes down to tree leaves using productions

Bottom-up Approach −

 Starts from tree leaves


 Proceeds upward to the root which is the starting symbol S
Leftmost and Rightmost Derivation of a String

 Leftmost derivation − A leftmost derivation is obtained by applying production to the


leftmost variable in each step.
 Rightmost derivation − A rightmost derivation is obtained by applying production to
the rightmost variable in each step.

Example

Let any set of production rules in a CFG be

X → X+X | X*X |X| a

over an alphabet {a}.

The leftmost derivation for the string "a+a*a" may be −

X → X+X → a+X → a + X*X → a+a*X → a+a*a

The stepwise derivation of the above string is shown as below −


The rightmost derivation for the above string "a+a*a" may be −

X → X*X

→ X*a

→ X+X*a

→ X+a*a

→ a+a*a

The stepwise derivation of the above string is shown as below −


Context Free Grammars (CFG) and Context Free Languages (CFL):

Context-free languages (CFLs) are generated by context-free grammars. The set of all context-
free languages is identical to the set of languages accepted by pushdown automata, and the set
of regular languages is a subset of context-free languages. An inputed language is accepted by a
computational model if it runs through the model and ends in an accepting final state. All regular
languages are context-free languages, but not all context-free languages are regular. Most
arithmetic expressions are generated by context-free grammars, and are therefore, context-free
languages. Context-free languages and context-free grammars have applications in computer
science and linguistics such as natural language processing and computer language design.
 According to Chomsky hierarchy, grammars are divided of 4 types:

Type 0 known as unrestricted grammar.

Type 1 known as context sensitive grammar.

Type 2 known as context free grammar.


Type 3 Regular Grammar.

1. Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single non-
terminal on the left-hand side and a right-hand side consisting of a single terminal or single
terminal followed by a single non-terminal.

The productions must be in the form X → a or X → aY

where X, Y ∈ N (Non terminal)

and a ∈ T (Terminal)

The rule S → ε is allowed if S does not appear on the right side of any rule.

Example
X→ε
X → a | aY
Y→b

2. Type - 2 Grammar

Type-2 grammars generate context-free languages.

The productions must be in the form A → γ

where A ∈ N (Non terminal)

and γ ∈ (T ∪ N)* (String of terminals and non-terminals).

These languages generated by these grammars are be recognized by a non-deterministic


pushdown automaton.

Example
S→Xa
X→a
X → aX
X → abc
X→ε

3. Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in the
form

αAβ→αγβ

where A ∈ N (Non-terminal)

and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)

The strings α and β may be empty, but γ must be non-empty.

The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.

Example
AB → AbBc
A → bcA
B→b

4. Type - 0 Grammar

Type-0 grammars generate recursively enumerable languages. The productions have no


restrictions. They are any phase structure grammar including all formal grammars.

They generate the languages that are recognized by a Turing machine.

The productions can be in the form of α → β where α is a string of terminals and


nonterminals with at least one non-terminal and α cannot be null. β is a string of terminals
and non-terminals.

Example
S → ACaB
Bc → acB
CB → DB
aD → Db

 Pumping Lemma for Context-free Languages (CFL) :


Pumping Lemma for CFL states that for any Context Free Language L, it is possible to find
two substrings that can be ‘pumped’ any number of times and still be in the same language.

For any language L, we break its strings into five parts and pump second and fourth substring.
Pumping Lemma, here also, is used as a tool to prove that a language is not CFL.

Because, if any one string does not satisfy its conditions, then the language is not CFL.

Thus, if L is a CFL, there exists an integer n, such that for all x ∈ L with |x| ≥ n, there exists u,
v, w, x, y ∈ Σ∗, such that x = uvwxy, and
(1) |vwx| ≤ n
(2) |vx| ≥ 1
(3) for all i ≥ 0: uviwxiy ∈ L

For above example, 0n1n is CFL, as any string can be the result of pumping at two places, one for
0 and other for 1.
Let us prove, L012 = {0n1n2n | n ≥ 0} is not Context-free.
Let us assume that L is Context-free, then by Pumping Lemma, the above given rules follow.
Now, let x ∈ L and |x| ≥ n. So, by Pumping Lemma, there exists u, v, w, x, y such that (1) – (3)
hold.
We show that for all u, v, w, x, y (1) – (3) do not hold.

If (1) and (2) hold then x = 0n1n2n = uvwxy with |vwx| ≤ n and |vx| ≥ 1.
(1) Tells us that vwx does not contain both 0 and 2. Thus, either vwx has no 0’s, or vwx has no
2’s. Thus, we have two cases to consider.
Suppose vwx has no 0’s.
By (2), vx contains a 1 or a 2. Thus uwy has ‘n’ 0’s and uwy either has less than ‘n’ 1’s or has
less than ‘n’ 2’s.
But (3) tells us that uwy = uv0wx0y ∈ L.
So, uwy has an equal number of 0’s, 1’s and 2’s gives us a contradiction. The case where vwx
has no 2’s is similar and also gives us a contradiction. Thus L is not context-free.

 Closure properties

Context-free languages are closed under −

 Union

 Concatenation

 Kleene closure operation

1. Union

Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also context free.

Example

Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 → aAb|ab

Let L2 = { cmdm , m ≥ 0}. Corresponding grammar G2 will have P: S2 → cBb| ε

Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ { cmdm }

The corresponding grammar G will have the additional production S → S1 | S2


2. Concatenation

If L1 and L2 are context free languages, then L1L2 is also context free.

Example

Union of the languages L1 and L2, L = L1L2 = { anbncmdm }

The corresponding grammar G will have the additional production S → S1 S2

3. Kleene closure

If L is a context free language, then L* is also context free.

Example

Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb| ε

Kleene Star L1 = { anbn }*

The corresponding grammar G1 will have additional productions S1 → SS1 | ε

Context-free languages are not closed under −

 Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is not


necessarily context free.

 Intersection with Regular Language − If L1 is a regular language and L2 is a


context free language, then L1 ∩ L2 is a context free language.

 Complement − If L1 is a context free language, then L1’ may not be context free.

 Context-Sensitive Grammar:

A Context-sensitive grammar is an unrestricted grammar in which all the productions are of form
Where α and β are strings of non-terminals and terminals.
Context-sensitive grammars are more powerful than context-free grammars because there are
some languages that can be described by CSG but not by context-free grammars and CSL are
less powerful than unrestricted grammar. That’s why context-sensitive grammars are positioned
between context-free and unrestricted grammars in the Chomsky hierarchy.

Context-sensitive grammar has 4-tuples. G = {N, Σ, P, S}, Where


N = Set of non-terminal symbols
Σ = Set of terminal symbols
S = Start symbol of the production
P = Finite set of productions
P is a set of production rules, of the form αAβ → αγβ

Where, A in N, α, β ∈ (N ∪ Σ) and γ ∈ (N ∪ Σ)+


 Context-sensitive Language:

The language that can be defined by context-sensitive grammar is called CSL. Properties of CSL
are:

o Union, intersection and concatenation of two context-sensitive languages is context-


sensitive.
o Complement of a context-sensitive language is context-sensitive.

Example:

1.

Context sensitive grammar (G) N = {S, B, C}, Σ = {1, 2, 3}, S = S

And P:

S → 1SBC | 123

1B → 12

2B → 22

2C → 23

3C → 33

CB → HB,

HB → HC,

HC → BC
2.

Consider the following CSG.


S → abc/aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa/aaA
What is the language generated by this grammar?

Solution:
S → aAbc
→ abAc
→ abBbcc
→ aBbbcc
→ aaAbbcc
→ aabAbcc
→ aabbAcc
→ aabbBbccc
→ aabBbbccc
→ aaBbbbccc
→ aaabbbccc
The language generated by this grammar is {anbncn | n≥1}.

 A linear bounded automaton

A linear bounded automaton is a multi-track non-deterministic Turing machine with a tape of


some bounded finite length.

Length = function (Length of the initial input string, constant c)

Here,
Memory information ≤ c × Input information

The computation is restricted to the constant bounded area. The input alphabet contains two
special symbols which serve as left end markers and right end markers which mean the
transitions neither move to the left of the left end marker nor to the right of the right end marker
of the tape.

A linear bounded automaton can be defined as an 8-tuple (Q, X, ∑, q0, ML, MR, δ, F) where −

 Q is a finite set of states

 X is the tape alphabet

 ∑ is the input alphabet

 q0 is the initial state

 ML is the left end marker

 MR is the right end marker where MR ≠ ML

 δ is a transition function which maps each pair (state, tape symbol) to (state, tape symbol,
Constant ‘c’) where c can be 0 or +1 or -1

 F is the set of final states

A deterministic linear bounded automaton is always context-sensitive and the linear bounded
automaton with empty language is undecidable.

You might also like