Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)

• Minimize the number of states in a DFA
• Algorithm (3.6, page 142):

– Input: a DFA M
– output: a minimum state DFA M’
• If some states in M ignore some inputs, add transitions to a “dead”

state.
• Let P = {All accepting states, All nonaccepting states}
• Let P’ = {}
• Loop: for each group G in P do
Partition G into subgroups so that s and t (in G) belong to the same
subgroup if and only if each input a moves s and t to the same state of the
same P-groups
put the new subgroups in P’
if (P != P’) {P = P’; goto loop}
• Remove any dead states and unreachable states.
– Example: minimize the DFA for (ab|ba)a*
– Example: minimize the DFA for Fig 3.29 (pages 121)
– Questions: How can we implement Lex?
%%
BEGIN {return(BEGINNUMBER);}
END {return(ENDNUMBER);}
IF {return(IFNUMBER);}
%%
– Lex internal:
• construct an NFA to recognize the sum of all patterns
• convert the NFA to a DFA (record all accepting states for each
individual pattern).
• Minimize the DFA (separate distinct accepting states for the initial
pattern).
• Simulate the DFA to termination (that is, no further transitions)
• Find the last DFA state entered that holds an accepting NFA state
(this picks the longest match). If no such state, then it is an invalid
token.
Chapter 4: Syntax analysis
• Syntax analysis is done by the parser.
– Detects and reports any syntax errors.
– Produces a parse tree from which intermediate code can be
generated.
token
Source Lexical Parse Rest of Int.
program
parser tree front end
analyzer code
Request
for token
Symbol
table
• The syntax of a programming language is
described by a context-free grammar
(Backus-Naur Form (BNF)).
– A grammar gives a precise syntactic specification of a
language.
– From some classes of grammars, tools exist that can
automatically construct an efficient parser. These tools
can also detect syntactic ambiguities and other
problems automatically.
– A compiler based on a grammatical description of a
language is more easily maintained and updated.
• A grammar G = (N, T, P, S)
– N is a finite set of non-terminal symbols
– T is a finit set of terminal symbols
– P is a finit subset of ( N  T ) * N ( N  T ) * ( N  T ) *
• An element ( ,  )  P is written as   
– S is a distinguished symbol in N and is called
the start symbol.
• Language defined by a grammar
– We say “aAb derives awb in one step”, denoted as “aAb=>awb”,
if A->w is a production and a and b are arbitrary strings of terminal
or nonterminal symbols.
*
– We say a1 derives am if a1=>a2=>…=>am, written as a1=>am
– The languages L(G) defined by G are the set of strings of the
*
terminals w such that S=>w.
• Example:
A->aA
A->bA
A->a
A->b
• Chomsky Hierarchy (classification of grammars)
• A grammar is said to be
– regular if it is
• right-linear, where each production in P has the form,
or . Here, A and B are non-
A  wB
terminals and w is a terminal
Aw
• left-linear
– context-free if each production in P is of the form
, where and
A  A N   (N T ) *
– context sensitive if each production in P is of the
form where
 |  ||  |
– unrestricted if each production in P is of the form
where
  
• Context-free grammar is sufficient to describe
most programming languages.
• Example: a grammar for arithmetic expressions.
<expr> -> <expr> <op> <expr>
<expr> -> ( <expr> )
<expr> -> - <expr>
<expr> -> id
<op> -> + | - | * | /
derive -(id) from the grammar:

<expr> => -<expr> => - (<expr>) =>-(id)
sentence: a strings of terminals that can be derived from S

sentential form: a strings of terminals or none terminals that can be
derived from S.
– derive id + id * id from the grammar:
E=>E+E=>E+E*E=>E+E*id=>E+id*id=>id+id*id
– leftmost/rightmost derivation -- each step replaces

leftmost/rightmost non-terminal.
E=>E+E=>id+E=>id+E*E=>id+id*E=>id+id*id
– Parse tree:
• A parse tree pictorially shows how the start symbol of a grammar
derives a specific string in the language. Given a context-free
grammar, a parse tree has the following properties:
– The root is labeled by the start symbol
– Each leaf is labeled by a token or the empty string
– Each interior node is labeled by a nonterminal
– If A is a non-terminal labeling some interior node and abcdefg..z are the
labels of the children of that node from left to right, then A->abcdefg..z is
a production of the grammar.
– The leaves of the parse tree read from left to right is called “yield”
of the parse tree. It is equivalent to the string derived from the
nonterminal at the root of the parse tree.
– An ambiguous grammar is one that can generate two or more

parse trees that yield the same string
– E.G
string -> string + string
string->string - string
string ->0|1|2|3|4|5|6|7|8|9
string=>string + string =>string - string + string => 9 -5 + 2

string=>string - string=>string - string + string =>9-5+2

Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Minimize The Number of States in A DFA - Algorithm (3.6, Page 142)

Uploaded by

Copyright:

Available Formats

• Minimize the number of states in a DFA

• Algorithm (3.6, page 142):

• If some states in M ignore some inputs, add transitions to a “dead”

– Questions: How can we implement Lex?

derive -(id) from the grammar:

sentence: a strings of terminals that can be derived from S

– leftmost/rightmost derivation -- each step replaces

– An ambiguous grammar is one that can generate two or more

string=>string + string =>string - string + string => 9 -5 + 2

You might also like