Regular Expression
Regular Expression
Regular Expression
Unit 1: Chapter 3
Regular Expression are those algebraic expressions used for representing regular languages, the
languages accepted by finite automaton. Regular expressions offer a declarative way to express
the strings we want to accept. This is what the regular expression offer that the automata do not.
Many system uses regular expression as input language. Some of them are:
A regular expression is built up out of simpler regular expression using a set of defining rules.
Each regular expression „r‟ denotes a language L(r). The defining rules specify how L(r) is
formed by combining in various ways the languages denoted by the sub expressions of „r‟.
Let Σ be an alphabet, the regular expression over the alphabet Σ are defined inductively
as follows;
Basic steps:
Now the following operations over basic regular expression define the complex regular
expression as:
-if „r‟ and „s‟ are the regular expressions representing the language L(r) and L(s) then
Note: any expression obtained from Φ, Є, a using above operation and parenthesis where required is a
regular expression.
Regular operator:
Basically, there are three operators that are used to generate the languages that are
regular,
For Example:
L1 = {00, 11}, L2 = (Є, 10}
L1UL2 = {Є, 00, 11, 10}
1. The star operator is of highest precedence. i.e it applies to its left well formed RE.
2. Next precedence is taken by concatenation operator.
3. Finally, unions are taken.
Examples: Write a RE for the set of string that consists of alternating 0’s and 1’s over
{0,1}.
Now if we concatenate these two RE, we get the RE 01 that represent the language {01}.
Then to generate the language of zero or more occurrence of 01, we take Kleen closure. i.e. the
RE (01)* represent the language {01,0101,…………..}
Similarly, the RE for second part is (10)*.
Now finally we take union of above two first part and second part to get the required RE. i.e. the
RE (01)*+(10)* represent the given language.
Regular language:
Let Σ be an alphabet, the class of regular language over Σ is defined inductively as;
- Φ is a regular language representing empty language
- {Є} is a regular language representing language of empty strings.
- For each a ε Σ, {a} is a regular language.
- If L1, L2…………. Ln is regular languages, then so is L1U L2U………..ULn.
- If LI,L2,L3,…………..Ln are regular languages, then so is L1.L2.L3………Ln
-If L is a regular language, then so is L*
Note: strictly speaking a regular expression E is just expression, not a language. We should use
L(E) when we want to refer to the language that E denotes. However it is to common to refer to
say E when we really mean L(E).
Search and Selection: Identifying a subset of items from a larger set on the
basis of a pattern match.
4. Identity law: Φ is identity for union. i.e. for any regular expression
r representing regular expression L(r).
r + Φ = Φ + r = r i.e. ΦUr=r.
Є is identity for concatenation. i.e. Є.r = r = r. Є
For string that have substring either 001 or 100, the regular expression is
(1+0)*.001.(1+0)*+(1+0)*.(100).(1+0)*
For strings that have at most two 0‟s with in it, the regular expression is
1*.(0+Є).1*.(0+Є).1*
For the strings ending with 11, the regular expression is (1+0)*.(11) +
Regular expression that denotes the C identifiers:
(Alphabet + _ )(Alphabet + digit + _ )*
Theorem 1
If L, M and N are any languages, then L(M U N) = LM U LN.
Proof:
Let w = xy be a string, now to prove the theorem it is sufficient to show that „w‟ ε LM U LN.
Also,
xy ε LM implies x ε L and y ε M (by concatenation rule)
And,
xy ε LN implies x ε L and y ε N (by concatenation rule)
Now,
xy ε L(M U N) => x ε L and y ε (M U N) (by concatenation)
y ε (M U N) => y ε M or y ε N (by union rule)
Now, we have x ε L
Here if y ε M then xy ε L(M) (by concatenation)
And if y ε N then xy ε L(N) (by concatenation)
Thus,
We have, L (M U N) = L(M) U L(N)
1)Any language define by one of these finite automata is also defined by RE.
2)Every language defined by RE is also defined by any of these finite automata.
ε-NFA NFA
RE
DFA
1. RE to NFA conversion
We can show that every language L(R) for some RE R, is also a language L(E)
for some epsilon NFA. This say that both RE and epsilon-NFA are equivalent
in terms of language representation.
Theorem 1
Every language defined by a regular expression is also defined by a finite
automaton. [For any regular expression r, there is an Є-NFA that accepts the same
language represented by r].
Proof:
Let L =L(r) be the language for regular expression r, now we have to show there
is an Є-NFA E such that L (E) =L.
The proof can be done through structural induction on r, following the recursive
definition of regular expressions.
For this we know Φ, Є, „a‟ are the regular expressions representing languages {Φ};
an empty language, { Є };language for empty strings and {a} respectively.
1. For union ‟+‟: From basis step we can construct Є-NFA‟s for r1 and r2. Let the
Є-NFA‟s be M1 and M2 respectively
The language of this automaton is L(r1) U L(r2) which is also the language
represented by expression r1+r2.
Here, the path from starting to accepting state go first through the automaton for
r1, where it must follow a path labeled by a string in L(r1), and then through the
automaton for r2, where it follows a path labeled by a string in L(r2). Thus, the
language accepted by above automaton is L(r1).L(r2).
Clearly language of this Є-NFA is L(r*) as it can also just Є as well as string in
L(r), L(r)L(r), L(r)L(r)L(r) and so on. Thus covering all strings in L(r*).
Finally, for regular expression (r), the automaton for r also serves as the
automaton for (r), since the parentheses do not change the language defined by
the expression.
Arden’s Theorem
Let p and q be the regular expressions over the alphabet Σ, if p does not contain
any empty string then r = q + rp has a unique solution r = qp*.
Proof:
Here, r = q + rp ……………… (i)
Let us put the value of r = q + rp on the right hand side of the relation (i), so;
r = q + (q + rp)p
r = q + qp + rp2………………(ii)
To convert the given DFA into a regular expression, here are some of the
assumptions regarding the transition system:
- The transition diagram should not have the Є-transitions.
-There must be only one initial state.
-The vertices or the states in the DFA are as;
q1,q2,……………..qn (Any qi is final state)
-Wij denotes the regular expression representing the set of labels of the edjes from qi to qj.
Thus we can write expressions as;
q1=q1w11+q2w12+q3w31+………………qnwn1+Є
q2=q1w12+q2w22+q3w32+………………+qnwn2
q3=q1w13+q2w23+q3w33+………………+qnwn3
…………………………………………………
…………………………………………………
…………………………………………………
qn=q1w1n+q2wn2+q3wn3+………………………qnwnn
Solving these equations for qi in terms of wij gives the regular expression
eqivalent to given DFA.
q1=q101+q110+ Є
i.e.q1=q1(01+10)+ Є
i.e.q1= Є+q1(01+10) (since r = q+rp)
i.e. q1= Є(01+10)* (using Arden‟s rule)
Since, q1 is final state, the final regular expression for the DFA is
Є(01+10)* = (01+10)*
Exercise: Try some Question from text book as well as from Adesh Kumar Pandey‟s book.Here I
have maintion some of them.
It is shown that the class of language known as regular language has at least four different
descriptions. They are the language accepted by DFA‟s, by NFA‟s, by Є-NFA, and defined by
RE.
Not every language is Regular. To show that a langauge is not regular, the powerfull technique
used is known as Pumping Lemma.
Pumping Lemma
Statement: Let L be a regular language. Then, there exists an integer constant n so that
for any x ε L with |x| ≥ n, there are strings u, v, w such that x = uvw, |uv| ≤ n, |v| > 0.
Then uvkw ε L for all k ≥ 0.
Note: Here y is the string that can be pumped i.e repeating y any number of times or deleting it,
keeps the resulting string in the language.
Proof:
Suppose L is a regular language, then L is accepted by some DFA M. Let M has n states. Also L
is infinite so M accepts some string x of length n or greater. Let length of x, |x| =m where m ≥ n.
Now suppose;
X = a1a2a3………………am where each ai ε Σ be an input symbol to M. Now, consider for j =
1,………….n, qj be states of M
Then,
(q0,x) = (q0,a1a2………..am) [q0 being start state of M]
= (q1,a2………am)
=…………………
=…………………
=…………………
= (qm,Є) [qm being final state]
Since m ≥ n, and DFA M has only n states, so by pigeonhole principle, there exists some i and
j; 0 ≤ i < j ≤ m such that qi =qj.
i.e. string ai+1 ………………aj takes M from state qi back to itself since qi = qj. So we can
say M accepts a1a2…………ai(ai+1…………aj)Kaj+1……………am for all k≥0.
Theory of Computation (Compiled By Tej Shahi,CDCSIT,TU) Page 12
Regular Expression Unit 1 chapter 3
For example:
1. Show that language, L={0r 1r|n ≥0} is not a regular language.
Solution:
Let L is a regular language. Then by pumping lemma, there are strings u, v, w with v≥1 such
that uvkw ε L for k≥0.
Case I:
Let v contain 0‟s only. Then, suppose u = 0p, v = 0q,w = 0r1s ;
Then we must have p+q+r = s (as we have 0r1r) and q>0
Case II
Let v contains 1‟s only. Then u= 0p1q, v= 1r, w=1s
Then p= q+r+s and r>0
Now, 0p1q(1r)k1s=0p1q+rk+s
Only those strings in 0p1q+rk+s belongs to L fpr k =1 otherwise not.
Hence the language is not regular.
Case III
V contains 0‟s and 1‟s both. Then, suppose,
u = 0p, v = 0q1r, w = 1s;
p+q = r+s and q+r>0
Minimization of DFA
Given a DFA M, that accepts a language L (M). Now, configure a DFA M „. During the
course of minimization, it involves identifying the equivalent states and distinguishable
states.
Equivalent States: Two states p & q are called equivalent states, denoted by p ≡ q if and
only if for each input string x, (p, x) is a final state if and only if (q, x) is a final state.
Distinguishable state: Two states p & q are said to be distinguishable states if (for any)
there exists a string x, such that (p, x) is a final state (q, x) is not a final state.
For minimization, the table filling algorithm is used. The steps of the algorithm are;
For identifying the pairs (p, q) with p ≠ q;
Example
Now to solve this problem first we should determine weather the pair is distinguishable or not.