Chapter 4

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Regular Expressions

Defining Languages Using


Regular Expressions
 Previously, we defined the languages:
• L1 = {Xn for n = 1, 2, 3, . . .}
• L2 = {x, xxx, xxxxx, . . .}
 But these are not very precise ways of
defining languages.
 So, now we want a very precise way of
defining a languages, and we will do this
using regular expressions
Regular Expressions
 Regular expressions are written in bold face letters and
are a way of specifying the language.
 Formal way to define the lexical specifications of a
language
 Remove ambiguity altogether
 Called expressions on account of similarity with
arithmetic expressions
 Use *, + and ()
 * shows repetition
 + presents choice or disjunction (some time authors used
U for this purpose)
 () used for grouping
Language-Defining Symbols:
Star Sign
 We now introduce the use of the Kleene star, applied not
to a set, but directly to the letter x and written as a
superscript: x*.
 This simple expression indicates some sequence of x’s
(may be none at all):
x* = Λ or x or x2 or x3…
= xn for some n = 0, 1, 2, 3, …

 We can think of the star as an unknownpower.


 That is, x* stands for a string of x’s, but we do not
specify how many, and it may be the null string .

4
 The notation x* can be used to define languages
by writing, say L = language (x*)

 Since x* is any string of x’s, L is then the


language of all possible strings of x’s of any
length (including Λ).

5
 Given the alphabet = {a, b}, suppose we wish to define the
language L that contains all words of the form one a followed by
some number of b’s (maybe no b’s at all); that is
L = {a, ab, abb, abbb, abbbb, …}

 Using the language-defining symbol, we may write


L = language (ab*)

 This equation obviously means that L is the language in which the


words are the concatenation of an initial a with some or no b’s.

6
 We can apply the Kleene star to the whole
string ab if we want:
(ab)* = Λ or ab or abab or ababab…
 Observe that
(ab)* ≠ a*b*
 because the language defined by the
expression on the left contains the word
abab, whereas the language defined by
7
the expression on the right does not.
 If we want to define the language L1 = {x; xx; xxx; …}
using the language-defining symbol, we can write
L1 = language(xx*)
which means that each word of L1 must start with an x
followed by some (or no) x’s.

 Note that we can also define L1 using the notation + (as


an exponent) introduced in previous lecture
L1 = language(x+)

8
Plus Sign

 Let us introduce another use of the plus


sign. By the expression
x+y
where x and y are strings of characters
from an alphabet, we mean either x or y.

 Care should be taken so as not to confuse


this notation with the notation + (as an
9
exponent).
Example

 Consider the language T over the alphabet


Σ = {a; b; c}:
 T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb;
cbbbb; …}
 In other words, all the words in T begin with
either an a or a c and then are followed by some
number of b’s.
 Using the above plus sign notation, we may
write this as
10
T = language((a+ c)b*)
Example

 Consider a finite language L that contains


all the strings of a’s and b’s of length
three exactly:
L = {aaa, aab, aba, abb, baa, bab, bba,
bbb}
 Thus, we may write
L = language((a+ b)(a + b)(a + b))

11
Example

 In general, if we want to refer to the set of all possible


strings of a’s and b’s of any length whatsoever, we could
write
language((a+ b)*)

 This is the set of all possible strings of letters from


the alphabet Σ = {a, b}, including the null string.

12
Regular Expressions
 Given  = {a,b}
 a* = {Λ, a,aa,aaa,aaa,aaaa,aaaaa, …}
 ab* = {a, ab,abb,abbb,abbbb, …}
 a+b = {a,b}
 (ab)* = {Λ, ab, abab, ababab, …}
 (a+b)* = {Λ, any string of as and bs}
Formal Definition of Regular
Expressions
 The set of regular expression is defined by
following rules
1. Every letter of  and Λ is a regular
expression
2. If r1 and r2 are regular expressions, then so
are
 (r1)
 r1r2
 r1+r2
 r1*

3.Nothing else is a regular expression


Regular Expressions
 Whether following are RE if so what
languages do they generate
a (b + a)*
 bb(a+b)
 (a+b)(a+b)(a+b)
 (a+b)*ba
 (a+b)*a(a+b)*
 (a+b)*aa(a+b)*
Regular Expressions
 Write RE for the following languages over
the  ={a,b}.
 All words ending with b
 All words that start with a
 All words that start with a double letter
 All words that contain at least one double letter
 All words that start and end with a double letter
 All words of length >=3
 All words that contain exactly one a or exactly
one b
 All words that don’t end at b
Regular Expressions
 Example: Give a regular expression for each of the
following over the alphabet { 0, 1 }:

 { w | w begins with a ‘1’ and ends with a ‘0’ }

 { w | w contains exactly three 1’s}

 { w | w contains at least three 1’s}

 { w | w is a string that begin with a ‘1’ and contain


exactly two 0’s }

 Regular expression definition of a language is not unique.

17
[Section 1.3]
Regular expressions

Examples: give regular expressions for the


following languages:
- { w ε {0,1}* | w contains the substring 01 }
or
- { w ε ∑* | w contains the substring 01 }
- {w in {0,1}* | second symbol of w is a 1}
- { w ε {0,1}* | |w| < 4 }
Some exercises on regular expressions

 Example: What is L((a  b)*a(a  b)*)?


 Ans: {w in {a, b}* | w contains at least one
a}
 Write regular expressions for:
1. {w in {a,b}* | |w| is odd }.
2. {w in {a,b}* | w does not have ab as a substring}.
3. {w in {a,b,c}* | no b in w can come before any c in w}.

19
Regular Expression
Identities
Regular Expression Identities
1. u = u = u
2. * = 
3. u+v=v+u
4. u+u=u
5. u* = (u*)*
6. u (v + w) = uv + uw
7. (u + v) w = uw + vw

20
Languages Associated
with Regular
Expressions
Definition

 The following rules define the language associated


with any regular expression:

 Rule 1: The language associated with the regular


expression that is just a single letter is that one-letter
word alone, and the language associated with Λ is just
{Λ}, a one-word language.

 Rule 2: If r1 is a regular expression associated with the


language L1 and r2 is a regular expression associated
with the language L2, then:
(i) The regular expression (r1)(r2) is associated with the product
L1L2, that is the language L1 times the language L2:

language(r1r2) = L1L2
22
Definition contd.
 Rule 2 (cont.):

(ii) The regular expression r1 + r2 is associated


with the language formed by the union of L1
and L2:
language(r1 + r2) = L1 + L2

(iii) The language associated with the regular


expression (r1)* is L1*, the Kleene closure of
the set L1 as a set of words:
23 language(r1*) = L1*
Languages associated with REs
 r1= a, r2 = b, r3 = Λ
 IfL1 is associated with r1 and L2 is
associated r2
 Language(r1r2)= L1L2
 Language(r1+r2) = L1+L2 = L1 U L2
 Language(r1*) = L1* (Kleen’s Closure of L1)
Regular Languages
 How to tell whether a language is regular
 Define an RE for it, if it is possible the language
is Regular other wise non-regular
 Definition
 Any language that can be represented by a
regular expression is called a regular
language
 It is to be noted that if r1, r2 are regular
expressions, corresponding to the languages L1
and L2 then the languages generated by r1+
r2, r1r2( or r2r1) and r1*( or r2*) are also
regular languages.
Regular Languages
 Example
 Consider the language L, defined over
Σ = {a,b}, of strings of length 2,
starting with a, then
 L = {aa, ab}, may be expressed by
the regular expression aa+ab. Hence
L, by definition, is a regular language.
Regular Languages
 All finite languages are regular
 Example
 Consider the language L, defined over Σ
= {a,b}, of strings of length 2, starting
with a, then L = {aa, ab}, may be
expressed by the regular expression
aa+ab. Hence L, by definition, is a
regular language.
Theorem

 If L is a finite language (a language with only finitely many


words), then L can be defined by a regular expression. In
other words, all finite languages are regular.

Proof

 Let L be a finite language. To make one regular expression that


defines L, we turn all the words in L into boldface type and insert
plus signs between them.

 For example, the regular expression that defines the language


L = {baa, abbba, bababa} is (baa + abbba + bababa)

 This algorithm only works for finite languages because an infinite


language would become a regular expression that is infinitely long,
which is forbidden.
28
Equivalent Regular Expressions
 Definition
 Two regular expressions are said to be
equivalent if they generate the same
language.
 Example
 Consider the following regular
expressions
 r1 = (a + b)* (aa + bb)
 r2 = (a + b)*aa + ( a + b)*bb then both
regular expressions define the language
of strings ending in aa or bb
Example

 The language of all words that have at least two a’s can
be defined by the expression:
(a + b)*a(a + b)*a(a + b)*

 Another expression that defines all the words with at


least two a’s is
b*ab*a(a + b)*

 Hence, we can write


(a + b)*a(a + b)*a(a + b)* = b*ab*a(a + b)*

where by the equal sign we mean that these two


expressions are equivalent in the sense that they
30
describe the same language.
Example

 Let V be the language of all strings of a’s and b’s in


which either the strings are all b’s, or else an a followed
by some b’s. Let V also contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …}
 We can define V by the expression
b* + ab*
where Λ is included in b*.
 Alternatively, we could define V by
(Λ + a)b*
which means that in front of the string of some b’s, we
have
either an a or nothing.
31

You might also like