Formal Language and Automata Theory Handout

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.

Chapter One
Introduction of Automata

 Automata – What is it?


The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting".
An automaton (Automata in plural) is an abstract self-propelled computing device which
follows a predetermined sequence of operations automatically.

Also its an abstract model of a digital computer. That has a mechanism to read input, which is a
string over a given alphabet. This input is actually written on an “input file”, which can be read
by the automaton but cannot change it.

Input file is divided into cells, each of which can hold one symbol. The automaton has a
temporary “storage” device, which has unlimited number of cells, the contents of which can be
altered by the automaton. Automaton has a control unit, which is said to be in one of a finite
number of “internal states”. The automaton can change state in a defined way.

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State
Machine (FSM).

Formal definition of a Finite Automaton


An automaton can be represented by a 5-tuple (Q, ∑, δ, q0, F), where −

 Q is a finite set of states.

 ∑ is a finite set of symbols, called the alphabet of the automaton.

 δ is the transition function.

 q0 is the initial state from where any input is processed (q0 ∈ Q).

 F is a set of final state/states of Q (F ⊆ Q).

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 1
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Related Terminologies
Alphabet
 Definition − An alphabet is any finite set of symbols.

 Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and‘d’ are symbols.

String
 Definition − A string is a finite sequence of symbols taken from ∑.

 Example − ‘cabcad’ is a valid string on the alphabet set ∑ = {a, b, c, d}

Length of a String
 Definition − It is the number of symbols present in a string. (Denoted by |S|).

 Examples −

If S = ‘cabcad’, |S|= 6

If |S|= 0, it is called an empty string (Denoted by λ or ε)

Kleene Star
 Definition − The Kleene star, ∑*, is a unary operator on a set of symbols or strings, ∑,
that gives the infinite set of all possible strings of all possible lengths
over ∑ including λ.

 Representation − ∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪……. where ∑p is the set of all possible strings


of length p.

 Example − If ∑ = {a, b}, ∑* = {λ, a, b, aa, ab, ba, bb,………..}

Kleene Closure / Plus


 Definition − The set ∑+ is the infinite set of all possible strings of all possible lengths
over ∑ excluding λ.

 Representation − ∑+ = ∑1 ∪ ∑2 ∪ ∑3 ∪…….

∑+ = ∑* − { λ }

 Example − If ∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 2
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Language
 Definition − A language is a subset of ∑* for some alphabet ∑. It can be finite or
infinite.
 Example1 − If the language takes all possible strings of length 2 over ∑ = {a, b}, then
L = { ab, aa, ba, bb }
 Example2 − If the language takes all possible strings of length 3 over ∑ = {a, b}, then
L = { aaa, aab, abb, aba, baa, bba, bbb,bab }

 Example3 − The set L = { an bn , n>= 0} is also language on ∑={a,b}.

Then L={ ε, ab, aabb, aaaabbbb………} belongs to L, but sting abb is not in L. because
the number of a's and b's should be equal.

Finite Automaton can be classified into two types −

 Deterministic Finite Automaton (DFA)


 Non-deterministic Finite Automaton (NDFA / NFA)
Deterministic Finite Automaton (DFA)
In DFA, for each input symbol, one can determine the state to which the machine will move.
Hence, it is called Deterministic Automaton. As it has a finite number of states, the machine is
called Deterministic Finite Machine or Deterministic Finite Automaton.

Formal Definition of a DFA


A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −

 Q is a finite set of states.

 ∑ is a finite set of symbols called the alphabet.

 δ is the transition function where δ: Q × ∑ → Q

 q0 is the initial state from where any input is processed (q0 ∈ Q).

 F is a set of final state/states of Q (F ⊆ Q).

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 3
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Graphical Representation of a DFA


A DFA is represented by digraphs (a combination of letters) called state diagram.

 The vertices represent the states.

 The arcs labeled with an input alphabet show the transitions.

 The initial state is denoted by an empty single incoming arc.

 The final state is indicated by double circles.


Example
Let a deterministic finite automaton be →

Q = {a, b, c},

∑ = {0, 1},

q0 = {a},

F = {c}, and

Transition function δ as shown by the following table −

Present State Next State for Input 0 Next State for Input 1

A A B

B C A

C B C

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 4
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Its graphical representation would be as follows −

Non-deterministic Finite Automaton


In NDFA, for a particular input symbol, the machine can move to any combination of the states
in the machine. In other words, the exact state to which the machine moves cannot be
determined. Hence, it is called Non-deterministic Automaton. As it has finite number of states,
the machine is called Non-deterministic Finite Machine or Non-deterministic Finite
Automaton.

Formal Definition of an NDFA


An NDFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −

 Q is a finite set of states.

 ∑ is a finite set of symbols called the alphabets.

 δ is the transition function where δ: Q × ∑ → 2Q

(Here the power set of Q (2Q) has been taken because in case of NDFA, from a state,
transition can occur to any combination of Q states) exercise

 q0 is the initial state from where any input is processed (q0 ∈ Q).

 F is a set of final state/states of Q (F ⊆ Q).

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 5
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Graphical Representation of an NDFA: (same as DFA)


An NDFA is represented by digraphs called state diagram.

 The vertices represent the states.

 The arcs labeled with an input alphabet show the transitions.

 The initial state is denoted by an empty single incoming arc.

 The final state is indicated by double circles.

Example

Let a non-deterministic finite automaton be →

Q = {a, b, c}

∑ = {0, 1}

q0 = {a}

F = {c}

The transition function δ as shown below −

Present State Next State for Input 0 Next State for Input 1

A a, b b

B C a, c

C b, c c

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 6
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Its graphical representation would be as follows −

DFA vs NDFA
The following table lists the differences between DFA and NDFA.

DFA NDFA

The transition from a state is to a single particular The transition from a state can be to multiple
next state for each input symbol. Hence it is next states for each input symbol. Hence it is
called deterministic. called non-deterministic.

Empty string transitions are not seen in DFA. NDFA permits empty string transitions.

Backtracking is allowed in DFA In NDFA, backtracking is not always possible.

Requires more space. Requires less space.

A string is accepted by a DFA, if it transits to a final A string is accepted by a NDFA, if at least one
state. of all possible transitions ends in a final state.

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 7
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Acceptors, Classifiers, and Transducers


Acceptor (Recognizer):- An automaton that computes a Boolean function is called an acceptor.
All the states of an acceptor is either accepting or rejecting the inputs given to it.
Classifier: - A classifier has more than two final states and it gives a single output when it
terminates.
Transducer: - An automaton that produces outputs based on current input and/or previous state
is called a transducer. Transducers can be of two types −
 Mealy Machine − The output depends both on the current state and the current input.

 Moore Machine − The output depends only on the current state.

Acceptability by DFA and NDFA


A string is accepted by a DFA/NDFA iff the DFA/NDFA starting at the initial state ends in an
accepting state (any of the final states) after reading the string wholly.

A string S is accepted by a DFA/NDFA (Q, ∑, δ, q0, F), if

δ*(q0, S) ∈ F

The language L accepted by DFA/NDFA is

{S | S ∈ ∑* and δ*(q0, S) ∈ F}

A string S′ is not accepted by a DFA/NDFA (Q, ∑, δ, q0, F), iff

δ*(q0, S′) ∉ F

The language L′ not accepted by DFA/NDFA (Complement of accepted language L) is

{S | S ∈ ∑* and δ*(q0, S) ∉ F}

Example

Let us consider the DFA shown in Figure bellow From the DFA, the acceptable strings can be
derived.

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 8
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Strings accepted by the above DFA: {0, 00, 11, 010, 101, ...........}

Strings not accepted by the above DFA: {1, 011, 111, ........}

NDFA to DFA Conversion


Problem Statement
Let X = (Qx, ∑, δx, q0, Fx) be an NDFA which accepts the language L(X). We have to design an
equivalent DFA Y = (Qy, ∑, δy, q0, Fy) such that L(Y) = L(X). The following procedure
converts the NDFA to its equivalent DFA −

Algorithm
Input − An NDFA

Output − An equivalent DFA

Step 1 − Create state table from the given NDFA.

Step 2 − Create a blank state table under possible input alphabets for the equivalent DFA.

Step 3 − Mark the start state of the DFA by q0 (Same as the NDFA).

Step 4 − Find out the combination of States {Q0, Q1,... , Qn} for each possible input alphabet.

Step 5 − Each time we generate a new DFA state under the input alphabet columns, we have to
apply step 4 again, otherwise go to step 6.

Step 6 − The states which contain any of the final states of the NDFA are the final states of the
equivalent DFA.

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 9
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Example
Let us consider the NDFA shown in the figure below.

Q δ(q,0) δ(q,1)

A {a,b,c,d,e} {d,e}

B {c} {e}

C ∅ {b}

D {e} ∅

E ∅ ∅

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 10
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Using the above algorithm, we find its equivalent DFA. The state table of the DFA is shown in
below.

Q δ(q,0) δ(q,1)

[a] [a,b,c,d,e] [d,e]

[a,b,c,d,e] [a,b,c,d,e] [b,d,e]

[d,e] [e] ∅

[b,d,e] [c,e] [e]

[e] ∅ ∅

[c, e] ∅ [b]

[b] [c] [e]

[c] ∅ [b]

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 11
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

The state diagram of the DFA is as follows −

DFA Minimization
DFA Minimized by using Myphill-Nerode theorem and Equivalence Theorem.
But we will see Myphill-Nerode Theorem in this handout.
Algorithm
Input − DFA

Output − Minimized DFA

Step 1 − Draw a table for all pairs of states (Qi, Qj) not necessarily connected directly [All are
unmarked initially]

Step 2 − Consider every state pair (Qi, Qj) in the DFA where Qi ∈ F and Qj ∉ F or vice versa
and mark them. [Here F is the set of final states]

Step 3 − Repeat this step until we cannot mark anymore states −

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 12
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

If there is an unmarked pair (Qi, Qj), mark it if the pair {δ (Qi, A), δ (Qi, A)} is marked for some
input alphabet.

Step 4 − Combine all the unmarked pair (Qi, Qj) and make them a single state in the reduced
DFA.

Example: Let us use Algorithm 2 to minimize the DFA shown below.

Step 1 − We draw a table for all pair of states.

A B C D E F

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 13
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Step 2 – We mark the state pairs.

A B C D E F

C ✔ ✔

D ✔ ✔

E ✔ ✔

F ✔ ✔ ✔

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 14
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Step 3 − We will try to mark the state pairs, with green colored check mark, transitively. If we
input 1 to state ‘a’ and ‘f’, it will go to state ‘c’ and ‘f’ respectively. (c, f) is already marked,
hence we will mark pair (a, f). Now, we input 1 to state ‘b’ and ‘f’; it will go to state ‘d’ and ‘f’
respectively. (d, f) is already marked, hence we will mark pair (b, f).

A B C D e f

C ✔ ✔

D ✔ ✔

E ✔ ✔

F ✔ ✔ ✔ ✔ ✔

After step 3, we have got state combinations {a, b} {c, d} {c, e} {d, e} that are unmarked.

We can recombine {c, d} {c, e} {d, e} into {c, d, e}

Hence we got two combined states as − {a, b} and {c, d, e}

So the final minimized DFA will contain three states {f}, {a, b} and {c, d, e}

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 15
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Chapter Two
Classification of Grammars

Introduction to Grammars
In the literary sense of the term, grammars denote syntactical rules for conversation in natural
languages. Linguistics have attempted to define grammars since the inception of natural
languages like English, Sanskrit, Mandarin, etc.

The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective
for writing computer languages.

Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −

 N or VN is a set of variables or non-terminal symbols.

 T or ∑ is a set of Terminal symbols.

 S is a special variable called the Start symbol, S ∈ N

 P is Production rules for Terminals and Non-terminals. A production rule has the form α
→ β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.

Example
Grammar G1 −

({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})

Here,

 S, A, and B are Non-terminal symbols;

 a and b are Terminal symbols

 S is the Start symbol, S ∈ N

 Productions, P : S → AB, A → a, B → b

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 16
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Example
Grammar G2 −

(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )

Here,

 S and A are Non-terminal symbols.

 a and b are Terminal symbols.

 ε is an empty string.

 S is the Start symbol, S ∈ N

 Production P : S → aAb, aA → aaAb, A → ε

Derivations from a Grammar


Strings may be derived from other strings using the productions in a grammar. If a
grammar G has a production α → β, we can say that x α y derives x β y in G. This derivation is
written as −

x α y ⇒G x β y

Example
Let us consider the grammar −

G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )

Some of the strings that can be derived are −

S ⇒ aAb using production S → aAb

⇒ aaAbb using production aA → aAb

⇒ aaaAbbb using production aA → aAb

⇒ aaabbb using production A → ε

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 17
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Language Generated by a Grammar


The set of all strings that can be derived from a grammar is said to be the language generated
from that grammar. A language generated by a grammar G is a subset formally defined by

L(G)={W|W ∈ ∑*, S ⇒G W}

If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2.

Example
If there is a grammar

G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b}

Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted string
is ab, i.e.,

L(G) = {ab}

Example
Suppose we have the following grammar −

G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}

The language generated by this grammar −

L(G) = {ab, a2b, ab2, a2b2, ………}

= {am bn | m ≥ 1 and n ≥ 1}

Construction of a Grammar Generating a Language


We’ll consider some languages and convert it into a grammar G which produces those
languages.

Example
Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the
grammar G which produces L(G).

Solution

Since L(G) = {am bn | m ≥ 0 and n > 0}

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 18
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

the set of strings accepted can be rewritten as −

L(G) = {b, ab,bb, aab, abb, …….}

Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including null.

To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −

S → aS , S → B, B → b and B → bB

S → B → b (Accepted)

S → B → bB → bb (Accepted)

S → aS → aB → ab (Accepted)

S → aS → aaS → aaB → aab(Accepted)

S → aS → aB → abB → abb (Accepted)

Thus, we can prove every single string in L(G) is accepted by the language generated by the
production set.

Hence the grammar −

G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })

Example
Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar G
which produces L(G).

Solution −

Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −

L(G) = {a, aa, ab, aaa, aab ,abb, …….}

Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’ including null.

To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions −

S → aA, A → aA , A → B, B → bB ,B → λ

S → aA → aB → aλ → a (Accepted)

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 19
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

S → aA → aaA → aaB → aaλ → aa (Accepted)

S → aA → aB → abB → abλ → ab (Accepted)

S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)

S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)

S → aA → aB → abB → abbB → abbλ → abb (Accepted)

Thus, we can prove every single string in L(G) is accepted by the language generated by the
production set.

Hence the grammar −

G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })

Chomsky Classification of Grammars


According to Noam Chomosky, there are four types of grammars − Type 0, Type 1, Type 2, and
Type 3. The following table shows how they differ from each other −

Grammar Grammar Accepted Language Accepted Automaton


Type

Type 0 Unrestricted grammar Recursively enumerable Turing Machine


language

Type 1 Context-sensitive Context-sensitive language Linear-bounded


grammar automaton

Type 2 Context-free grammar Context-free language Pushdown automaton

Type 3 Regular grammar Regular language Finite state automaton

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 20
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Take a look at the following illustration. It shows the scope of each type of grammar −

Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single non-
terminal on the left-hand side and a right-hand side consisting of a single terminal or single
terminal followed by a single non-terminal.

The productions must be in the form X → a or X → aY

where X, Y ∈ N (Non terminal)

and a ∈ T (Terminal)

The rule S → ε is allowed if S does not appear on the right side of any rule.

Example
X→ε
X → a | aY
Y→b

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 21
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Type - 2 Grammars
Type-2 grammars generate context-free languages.

The productions must be in the form A → γ

where A ∈ N (Non terminal)

and γ ∈ (T ∪ N)* (String of terminals and non-terminals).

These languages generated by these grammars are be recognized by a non-deterministic


pushdown automaton.

Example

S→Xa
X→a
X → aX
X → abc
X→ε

Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in the form

αAβ→αγβ

where A ∈ N (Non-terminal)

and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)

The strings α and β may be empty, but γ must be non-empty.

The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.

Example

AB → AbBc
A → bcA
B→b

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 22
Formal Language and Automata Theory Handout for CS 3rd Year Program: Weekend 2011 E.C

Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have no
restrictions. They are any phase structure grammar including all formal grammars.

They generate the languages that are recognized by a Turing machine.

The productions can be in the form of α → β where α is a string of terminals and nonterminals
with at least one non-terminal and α cannot be null. β is a string of terminals and non-terminals.

Example
S → ACaB
Bc → acB
CB → DB
aD → Db

Prepared By: Mr. Tadese F. Department of Computer Science C.Code:[ CoSc3111] Page 23

You might also like