CMP3008 LN1 CourseOverview Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

CMP3008

Formal Languages
and Automata Theory
Lecture Notes 1
Course Overview & Introduction
Sources
https://eecs.wsu.edu/~ananth/CptS317/Lectures/index.htm
"Introduction to automata theory, languages and
computation" by JE Hopcroft, R Motwani and JD Ullman.
"An Introduction to Formal Languages and Automata Theory" by
Peter Linz
Course overview
Contact Info
Assoc. Prof. C. Okan Şakar
Contact info
okan.sakar@bau.edu.tr
Office: 3rd floor of building D.

3
Course Textbook
• Texbook
Michael Sipser, “Introduction to the Theory of
Computation” (3rd edition), 2012.
• Recommended Book
Peter Linz, “An introduction to formal languages and
automata” (5th edition), 2011.

4
Course Info
• 3 hours a week (theory)
• All material (lecture notes, announcements, results, etc) will be
uploaded to itslearning!
https://buei.itslearning.com/index.aspx

• Check it frequently!

5
Course Learning Outcomes
At the end of the course, you will be able to:

1. Be able to identify classes of formal languages, computational models and their relationships.
2. Be able design regular expressions and finite state automata.
3. Be able to convert Nondeterministic Finite Automata to Deterministic Finite Automata.
4. Be able to convert regular expressions to Nondeterministic Finite Automata.
5. Be able to design grammars and push down automata
6. Be able to design Turing machines
7. Be able prove theorems in automata theory.
8. Become familiar with undecidable and NP-complete problems.

6
Grading Policy
Weight
Assignment Description Scoring
(%)
Each quiz will cover the topics of that week and previous
weeks. There will be approximately 4 quizzes. All the quizzes will
be given during the class time using Itslearning.
The highest 3 grades will be considered for the quiz
Quizzes average. There will not be an announcement for the 100 12
exact starting time announcement for the quizzes.
It is your responsibility to follow the starting time of
the quizzes from itslearning and be ready to attend from a
mobile device or computer.
You will have a midterm exam covering the first seven
Midterm Exam weeks. The midterm exam will be given in Week#7. 100 38
The exact date will be announced later.

Final Exam You will have a final exam covering all chapters.
100 50

TOTAL 100 100

7
Weekly Schedule
Assignments &
Week/Place Course Topic To Do
Deadline
W1 Introduction, Strings, and Languages. • Read the Syllabus Read chapter 0 of the textbook
W2 Finite Automata • Read section 1.1 of the textbook
W3 Nondeterminism • Read section 1.2 of the textbook
W4 Regular Expressions • Read section 1.3 of the textbook Quiz #1
W5 Nonregular Languages • Read section 1.4 of the textbook
W6 Context-Free Grammars • Read section 2.1 of the textbook Quiz #2
W7 Midterm Exam • Study the first 6 weeks’ material Midterm Exam
W8 Pushdown Automata • Read section 2.2 of the textbook
W9 Turing Machines • Read section 3.1 of the textbook
W10 Variants of Turing Machines • Read section 3.2 of the textbook Quiz #3
W11 Time Complexity • Read sections 7.1, 7.2, 7.3, 7.4 of the textbook
W12 Heuristic and Approximation Algoritms • Readings will be provided
W13 Additional NP-Complete Problems • Read section 7.5 of the textbook Quiz #4
W14 Review for the final exam. • Study the all the course topics

8
Introduction
Course Organization
Very broadly, the course will contain following parts:
• Computability
• What is computable or not?
• Complexity Theory
• What is easy (practical) to compute or not?

10
Automata, computability, and complexity
• Computability theory
• Solvable/computable problems
• Unsolvable/Uncomputable Problems
• E.g., Halting problem, PCP (Post Correspondence Problem)
• Complexity theory
• Easy Problems
• E.g., searching, sorting, shortest path, matrix multiplication.
• Hard Problems
• E.g., traveling salesman problem, k-clique, vertex cover, subset-sum.
• Automata theory
• The theories of computability and complexity require a precise definition of a
computer.
• E.g., finite automata, pushdown automata, Turing machines
Traveling Salesman Problem

Source: https://optimization.mccormick.northwestern.edu/index.php/Traveling_salesman_problems
What is Automata Theory?
• Study of abstract computing devices, or “machines”
• Automaton = an abstract computing device
• Note: A “device” need not even be a physical hardware!
• A fundamental question in computer science:
• Find out what different models of machines can do and cannot do
• The theory of computation
• Computability vs. Complexity

13
Alan Turing (1912-1954)
(A pioneer of automata theory)

• Father of Modern Computer Science


• English mathematician
• Studied abstract machines called
Turing machines even before
computers existed
• Heard of the Turing test?
"Can machines think?"

14
Theory of Computation: A Historical
Perspective
1930s • Alan Turing studies Turing machines
• Decidability
• Halting problem
1940-1950s • “Finite automata” machines studied
• Noam Chomsky proposes the
“Chomsky Hierarchy” for formal
languages
1969 Cook introduces “intractable” problems
or “NP-Hard” problems
1970- Modern computer science: compilers,
computational & complexity theory evolve 15
Halting Problem
Which one of the following programs halts (stops) which one does not?
Deals whether it is
int main(){ possible to create
while (1) an algorithm that
printf(“something”); can determine, for
any given input and
} program, whether
that program will
int main(){ eventually halt
printf(“something”); (terminate) or
continue to run
} indefinitely.
"Halting Problem Undecidability Proof" 
Theory of Computation

Basic Concepts
• Automaton: a formal construct that accepts input, produces
output, may have some temporary storage, and can make
decisions
• Formal Language: a set of sentences formed from a set of symbols
according to formal rules
• Grammar: a set of rules for generating the sentences in a formal
language
Review of Mathematical Preliminaries
• Sets: basic notation, operations (union, intersection,
difference, and complementation), disjoint sets, power set,
partitions.
• Functions and Relations: domain, range, total function,
partial function, order of magnitude, equivalence relations.
• Graphs and Trees: vertices, edges, walk, path, simple path,
cycle, loop, root vertex, parent, child, leaves, level, height.
• Proof Techniques: proof by induction and proof by
contradiction.
Formal Languages
Basic Concepts
• Alphabet: set of symbols, i.e. Σ = {a, b}
• String: finite sequence of symbols from Σ, such as v = aba
and w = abaaa
• Empty string (λ (lambda))
• Substring, prefix, suffix
• Operations on strings:
• Concatenation: vw = abaabaaa
• Reverse: wR = aaaba
• Repetition: v2 = abaaba and v0 = λ
• Length of a string: |v| = 3 and |λ| = 0
Strings and Languages
• An alphabet is any nonempty finite set.
• The members of the alphabet are the symbols of the alphabet.
• We generally use capital Greek letters Σ (sigma) to designate
alphabets.
• The following are a few examples of alphabets.
Strings and Languages
• A string over an alphabet is a finite sequence of symbols from that alphabet.
• If Σ1 = {0, 1}, then 01001 is a string over Σ1.
• If Σ2 = {a, b, c, ..., z}, then abracadabra is a string over Σ2.
• If w is a string over Σ, the length of w, written |w|, is the number of symbols
that it contains.
• The string of length zero is called the empty string and is written ε (epsilon).
• If w has length n, we can write w = w1w2 ··· wn where each wi ∈ Σ.
• The reverse of w, written wR, is the string obtained by writing w in the
opposite order (i.e., wnwn−1 · · · w1).
• String z is a substring of w if z appears consecutively within w. For example,
cad is a substring of abracadabra.
Strings and Languages
• If we have string x of length m and string y of length n, the
concatenation of x and y, written xy, is the string obtained by
appending y to the end of x, as in x1 · · · xmy1 · · · yn.
• To concatenate a string with itself many times, we use the
superscript notation xk to mean
Strings and Languages
• The lexicographic order of strings is the same as the familiar
dictionary order.
• We’ll occasionally use a modified lexicographic order, called shortlex
order or simply string order, that is identical to lexicographic order,
except that shorter strings precede longer strings. Thus the string
ordering of all strings over the alphabet {0,1} is

(ε, 0, 1, 00, 01, 10, 11, 000, ...).


Strings and Languages
• Say that string x is a prefix of string y if a string z exists where xz = y,
and that x is a proper prefix of y if in addition x ≠ y.
• A language is a set of strings.
• A language is prefix-free if no member is a proper prefix of another
member.
Formal Languages
Definitions
• Σ* = set of all strings formed by concatenating zero or
more symbols in Σ
• Σ+ = set of all non-empty strings formed by
concatenating symbols in Σ
In other words, Σ+ = Σ* - { λ }
•A formal language is any subset of Σ*
Examples: L1 = { anbn: n ≥ 0 } and L2 = { ab, aa }
• A string in a language is also called a sentence of the
language
Formal Languages
Set Operations
• A language is a set. Therefore, set operations are defined as usual.
• If L1 = { anbn: n ≥ 0 } and L2 = { ab, aa }
• Union: L1 ᴜ L2 = { aa, λ, ab, aabb, aaabbb, … }
• Intersection: L1 ∩ L2 = { ab }
• Difference: L1 - L2 = { λ, aabb, aaabbb, … }
• Complement: L2 = Σ* - { ab, aa }
• Practice: Find L2 – L1
Formal Languages
Other Operations
• New languages can be produced by reversing all strings
in a language, concatenating strings from two languages,
and concatenating strings from the same language.
• If L1 = { anbn: n ≥ 0 } and L2 = { ab, aa }
• Reverse: L 2R = { ba, aa }
• Concatenation: L1L2 = { ab, aa, abab, abaa, aabbab, aabbaa, … }
The concatenation L2L2 or L2 = { abab, abaa, aaab, aaaa }
• Star-Closure: L2* = L2 0 ᴜ L21 ᴜ2L22 ᴜ L23 ᴜ …
• Positive Closure: L2 + = L21 ᴜ L22 ᴜ L23 ᴜ …

• Practice: Find (L2 – L1)R


Grammars
Definition
• Precise mechanism to describe the strings in a
language
• Def. 1.1: A grammar G consists of:
V: a finite set of variable or non-terminal symbols
T: a finite set of terminal symbols
S: a variable called the start symbol
P: a set of productions
• Example 1.11:
V={S}
T = { a, b }
P = { S → aSb, S → λ }
Relation Between Languages & Grammars
• Languages: “A language is a collection of
Or “words” sentences of finite length all constructed
from a finite alphabet of symbols”
• Grammars: “A grammar can be regarded as
a device that enumerates the sentences of a
language” - nothing more, nothing less

• N. Chomsky, Information and Control, Vol 2,


1959

Image source: Nowak et al. Nature, vol 417, 2002 29


Grammars
Derivation of Strings
• Beginning with the start symbol, strings are derived
by repeatedly replacing variable symbols with the
expression on the right-hand side of any applicable
production
• Any applicable production can be used, in arbitrary
order, until the string contains no variable symbols.
• Sample derivation using grammar in Example 1.11:
S  aSb (applying first production)
 aaSbb (applying first production)
 aabb (applying second production)
The Language Generated by a Grammar

• Def. 1.2: For a given grammar G, the language


generated by G, L(G), is the set of all strings
derived from the start symbol
• To show a language L is generated by G:
• Show every string in L can be generated by G
• Show every string generated by L is in G
• A given language can normally be generated by
different grammars
Equivalence of Grammars
• Two grammars are equivalent if they
generate the same language
• For convenience, productions with the same
left-hand sides are written on the same line
• Example 1.14:
V = { A, S }, T = { a, b }, and productions
S → aAb | λ
A → aAb | λ
• The grammars in examples 1.11 and 1.14 can
be shown to be equivalent
Automata
• An automaton is an abstract model of a digital
computer
• An automaton consists of
• An input mechanism
• A control unit
• Possibly, a storage mechanism
• Possibly, an output mechanism
• Control unit can be in any number of internal
states, as determined by a next-state or transition
function.
Diagram of a General Automaton
Application
Grammars for Programming Languages
• The syntax of constructs in a programming
language is commonly described with grammars
• Assume that in a hypothetical programming
language,
• Identifiers consist of digits and the letters a, b, or c
• Identifiers must begin with a letter
• Productions for a sample grammar:
<id> → <letter> <rest>
<rest> → <letter> <rest> | <digit> <rest> | λ
<letter> → a | b | c
<digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Finite Automata
• Some Applications
• Software for designing and checking the behavior of
digital circuits
• Lexical analyzer of a typical compiler
• Software for scanning large bodies of text (e.g., web
pages) for pattern finding
• Software for verifying systems of all types that have a
finite number of states (e.g., stock market transaction,
communication/network protocol)

36
Finite Automata : Examples
• On/Off switch action

state

• Modeling recognition of the word “then”

Start state Transition Intermediate Final state


state

37
Structural expressions
• Grammars
• Regular expressions
• E.g., unix style to capture city names such as “Palo Alto CA”:
[A-Z][a-z]*([ ][A-Z][a-z]*)*[ ][A-Z][A-Z]

Start with a letter

Should end w/ 2-letter state code


A string of other
letters (possibly
empty)

Other space delimited words


38
(part of city name)
Formal Proofs

39
Quantifiers
“For all” or “For every”
• Universal proofs
• Notation=
“There exists”
• Used in existential proofs
• Notation=
Implication is denoted by =>
• E.g., “IF A THEN B” can also be written as “A=>B”

40
Proving techniques
• By contradiction
• Start with the statement contradictory to the given statement
• E.g., To prove (A => B), we start with:
• (A and ~B) and then show that could never happen
• By induction
• (3 steps) Basis, inductive hypothesis, inductive step
• By Deduction
• Involves deriving a conclusion based on logical implications and previously
established facts or assumptions.

41
Proving techniques…
• By counter-example
• Show an example that disproves the claim

• Note: There is no such thing called a “proof by example”!


• So when asked to prove a claim, an example that satisfied that claim is not a
proof

42
Proof By Induction Example:
Relationship Between Number of Nodes
(Internal - External)
• A full binary tree with N internal nodes has N+1 external nodes

• Proof by induction:
• Base step: when n=0 (i.e., there is no internal nodes), there is only one node in
the tree which is the root. I.e., the number of external nodes is 1. Proved for the
base case.
• Inductive step: A tree with n internal nodes, has a root, a left subtree (with m
internal nodes) and a right subtree (with n-m-1 internal nodes).
• Inductive hypothesis says left subtree has m+1 external nodes, and right subtree
has (n-m-1)+1 external nodes
• This implies that the overall tree has m+1 +n-m-1 +1 =n+1 external nodes. Proved

43
Deductive Proofs
From the given statement(s) to a conclusion
statement (what we want to prove)
• Logical progression by direct implications

Example for parsing a statement:


• “If y≥4, then 2y≥y2.”
given conclusion

(there are other ways of writing this).

44
Example: Deductive proof
Let Claim 1: If y≥4, then 2y≥y2.

Let x be any number which is obtained by adding the squares of 4 positive integers.
Claim 2:
Given x and assuming that Claim 1 is true, prove that 2x≥x2
• Proof:
1) Given: x = a2 + b2 + c2 + d2
2) Given: a≥1, b≥1, c≥1, d≥1
3) ➔ a2≥1, b2≥1, c2≥1, d2≥1 (by 2)
4) ➔ x ≥ 4 (by 1 & 3)
5) ➔ 2x ≥ x2 (by 4 and Claim 1)
“implies” or “follows”
45
Different ways of saying the same thing
“If H then C”:
i. H implies C
ii. H => C
iii. C if H
iv. H only if C
v. Whenever H holds, C follows

46
“If-and-Only-If” statements
• “A if and only if B” (A <==> B)
• (if part) if B then A ( <= )
• (only if part) A only if B ( => )
(same as “if A then B”)
• “If and only if” is abbreviated as “iff”
• i.e., “A iff B”
• Example:
• Theorem: Let x be a real number.
Then floor of x = ceiling of x if and only if x is an integer.
• Proofs for iff have two parts
• One for the “if part” & another for the “only if part”

47
Proof Example with Proof By Contradiction
• A rational number is a number that can be expressed as the ratio of two integers
n and m so that n and m have no common factor.
• A real number that is not rational is said to be irrational. Show that 2 is
irrational.
Proof Example with Proof By Contradiction
• With proof by contradiction, show that n is an even number if its square is even.

You might also like