Cits2211lectures Grammars PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

CITS2211 Discrete Structures


Lectures for Semester 2 2012
Part 6: Grammars
Unit coordinator: Mark Reynolds
Notes: Rachel Cardell-Oliver

October 6, 2014

Context Free or Sensitive

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Recall that a language is simply a set of words.


That is language L is subset of the set I where I is the set of all
strings over an alphabet I .

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

For example, English, German and Java are all languages some
strings are legal , and hence in the language, while some strings
are not in the language.
The walrus talks loudly
Loudly walrus talks the
for (int i=0; i<10; i++) { j=j+i; }
for System.out.(int i)<10; ++ }
I have arrived
Ich bin angekommen (literally, I am in come-d)

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

We would like to be able to distinguish between legal strings and


the others.
For Java, and computer languages in general, this is crucial
because a compiler needs to be able to reject illegal constructs.
For English, the problem is much harder, but the pay-off for
accurate natural language processing would be immense.
So far we have seen regular expressions as a way of defining a set
of strings and finite state machines and Turing Machines that are
able to recognise or generate languages.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Grammars (informal)
A grammar is a set of rules that describe how to form legal strings
in the language.
For English we have the following loose rule
sentence noun-phrase verb-phrase
which we interpret as saying
A valid sentence consists of a noun-phrase followed by a
verb-phrase
To complete the grammar, we then need to define noun-phrase,
verb-phrase and so on, which are defined in the same way
noun-phrase article noun
verb-phrase verb adverb

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Terminals

To actually get some specific sentences from the grammar, we


need some terminating symbols, or symbols which cannot be
substituted any further.
article the
noun walrus
noun banana
verb talks
adverb loudly
adverb quickly

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

This grammar allows us to produce a range of syntactically legal


sentences in English.
the banana talks quickly
the walrus talks loudly
Although we do not know how to interpret these sentences
semantically, we can all recognise it as a correctly-formed sentence.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Formal Grammars

A phrase-structure grammar or Type 0 grammar G is a 4-tuple


G = (V , VT , S, P) where
1

V is some alphabet

VT is a non-empty subset of V called the terminals

S is an element of V VT called the start symbol

P is a finite set of productions of the form where is


a word over V containing a non-terminal symbol and is a
word over V

Introduction

Formal Grammar

Languages from Grammars

Example
Suppose that
V = {I , L, D, a, b, . . . , z, 0, 1, . . . 9}
VT = {a, b, . . . , z, 0, 1, . . . 9}
S =I
and the productions are as follows
I L
I IL
I ID
L a...L z
D 0...D 9
Example words: a42, doctor , r 2d2

Chomsky hierarchy

Context Free or Sensitive

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Derivations in a grammar
Let G be a grammar with alphabet V , and let w1 and w2 be two
words over V .
Then we say that w1 directly generates w2 if there is a production

such that
1 w contains an instance of
1
2 w is obtained from w by replacing this instance of with
2
1
If w1 directly generates w2 , then we denote this by
w1 w2
If there is a sequence,
w1 w2 w3 wn
then we say that

w1 wn

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Example derivations

We can see that


I ID ILD IaD LaD zaD za0
and so we can conclude that

I za0
As the word za0 consists entirely of terminals, there are no further
productions that can be applied to it.
We say that za0 is one of the words in the language generated by
the grammar.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Language generated by a grammar

Formally, the language generated by a grammar is the collection of


words that can be generated starting with the start symbol, and
applying productions in all possible ways until only terminals
remain.
Symbolically,

L(G ) = {w VT | S w }
We now have two different approaches to formal languages:
Recognition is the act of determining whether or not a word is
contained in a given language.
Generation is the process of applying rules to generate words
guaranteed to be contained in a language.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

A more complicated grammar


1

The alphabet is
V = {0, A, B, E , F , W , X , Y , Z , S}

The terminals are


VT = {0}

The production rules are


S FA
0X X 0
S FBA
Y 0 0Y
FB F 0EB0 FX F 0W
EB 0
YA Z 0A
EB XBY
W 0 0W

0Z Z 0
WBZ EB
F
A

This grammar generates the language consisting of all strings


containing an odd number of zeros. (Try it!)

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

The productions that generate the empty string are called


erasing productions.
A grammar is said to obey the erasing convention if either it has
no erasing productions, or the only erasing production is S .

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Types of Grammar

Context-Sensitive (Type 1)
A grammar is context-sensitive if it obeys the erasing convention
and for every non-erasing production , the word is at least
as long as .
Almost any language one can think of is context-sensitive
Example: S aSBc | abc, cB Bc, bB bb

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Types of Grammar

Context-Sensitive (Type 1)
A grammar is context-sensitive if it obeys the erasing convention
and for every non-erasing production , the word is at least
as long as .
Almost any language one can think of is context-sensitive
Example: S aSBc | abc, cB Bc, bB bb
One derivation is: S aSBc aabcBc aabBcc aabbcc and
more generally, any string an b n c n for n 1

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Context-Free (Type 2)
A grammar is context-free if it obeys the erasing convention and
for every production , is a single non-terminal symbol.
Example: S SS | (S) | () and S aSb, S ab

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Context-Free (Type 2)
A grammar is context-free if it obeys the erasing convention and
for every production , is a single non-terminal symbol.
Example: S SS | (S) | () and S aSb, S ab
One derivation is: S SS SSS (S)SS (())()() That is,
expressions with balanced parentheses. While the second grammar
generates an b n .

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Regular (Type 3)
A grammar is regular if it obeys the erasing convention and for
every production , is a single nonterminal and is of the
form t or tW where t is a single terminal symbol and W is a
non-terminal symbol.
Example: S 0S, S 1S and S

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Regular (Type 3)
A grammar is regular if it obeys the erasing convention and for
every production , is a single nonterminal and is of the
form t or tW where t is a single terminal symbol and W is a
non-terminal symbol.
Example: S 0S, S 1S and S
Some derivations are: 00, 1101, 0010101111, 111111, or more
generally, (0 + 1)

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

The Chomsky hierarchy

The collection of languages generated by Type 0, Type 1, Type 2


and Type 3 grammars forms a hierarchy called the Chomsky
hierarchy.
A natural question to ask is the relationship between the Chomsky
hierarchy and computational devices such as finite state machines
and Turing machines.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Regular ContextFree ContextSensitive Type0

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Languages and Machines


In fact, there is a completely natural relationship between the
Chomsky hierarchy and types of computing machine that recognise
languages.
Type 0 languages, the most general, are precisely those that can be
recognized by Turing machines, which (by the CT thesis) are the
most general computational devices.
Similarly, regular languages, the most restricted, are precisely those
that can be recognized by finite state machines, which we viewed
as our minimal computational devices.
Are there natural classes of machines that can recognize
context-free and context-sensitive languages?

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Two types of Automata

Context-free languages are precisely the class of languages that can


be recognized by pushdown automata, which are finite state
machines that can only use a stack as a memory device.
Context-sensitive languages are the class of languages that can be
recognized by linear bounded automata which are Turing machines
that can only use the tape containing the original input string.
The most important formal languages for Computer Science, are
probably context-free languages, because many computer
languages are described in such a form that portions of them (if
not all of them) can be described by a context free grammar.

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Pumping Lemma for Context Free Languages

Theorem: For every context-free language L there exists some


integer p 1 such that for any string s L with |s| p there
exist substrings where s = uvxyz such that,
1

|vxy | p,

|vy | 1, and

n : N. uv n xy n z L

As in the FSM version of this theorem, we can use the pumping


lemma to show that a given language is not context free. (see
FSM notes for details)

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

A context-free grammar

A good example of a context-free grammar is given by our earlier


example. The example was introduced with no particular
interpretation, but in fact it generates valid identifiers in a
(hypothetical) programming language. Its purpose is clearer if we
rewrite it in the following form:
identifier letter
identifier identifier letter
identifier identifier digit
letter a . . . letter z
digit 0 . . . digit 9

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Chomsky Normal form

Theorem Any context-free language (without A productions)


is generated by a grammar in which all productions are of the form
A BC or A a where A, B and C are non-terminals and a is
any terminal.
Normal form can be used to make it easier to write context free
grammars. A procedure exists for converting grammars into normal
form (but you are not required to know this procedure for cits2211)

Introduction

Formal Grammar

Languages from Grammars

Chomsky hierarchy

Context Free or Sensitive

Backus-Naur form

Context-free grammars related to computer languages are often


given in a special shorthand notation known as Backus-Naur form.
<identifier> :: = <letter> | <identifer> <letter> |
<identifier> <digit>
<letter> :: = a | b | c | ... | z
<digit> :: = 0 | 1 | ... | 9

In BNF, the non-terminals are identifed by the angle brackets, and


productions with the same left-hand side are combined into a
single statement with the OR symbol.

You might also like