1 Removing Ambiguity From Grammars
1 Removing Ambiguity From Grammars
1. E ⇒ E + E ⇒ E + E ∗ E.
2. E ⇒ E ∗ E ⇒ E + E ∗ E.
E E
E + E E * E
E * E E + E
(b)
(a)
Example 2 says that the grammar in example 1 is ambiguous grammar. There are
two causes of ambiguity in grammar in example 1.
1. The precedence of operators is not respected. While Figure 1(a) properly groups
* before + operator, Figure 1(b) is also a valid perse tree and groups the +
ahead of the *. We need to force only the structure of Figure 1(a) to be legal
in an unambiguous grammar.
2. A sequence of identical operators can group either from the left or from the right.
For example, if *’s in Figure 1 were replaced by +’s, we would see two different
parse trees for the string E + E + E. Since addition and multiplication are
associative, it does not matter whether we group from the left or the right,
but to eliminate ambiguity, we must pick one. The conventional approach is
to insist on grouping from the left, so the structure of Figure 1(b) is the only
correct grouping of two + signs.
1
The solution to the problem of enforcing precedence is to introduce several different
variables, each of which represents those expressions that share a level of binding
strength. Specially:
1. A factor is an expression that can not be broken apart by any adjacent oper-
ator, either a * or a +. The only factors in our expression language are:
E → T /E + T
T → F/T ∗ F
F → I/(E)
I → a/b/Ia/Ib/I0/I1
Example 3: The grammar in Table 1 allows only one parse tree for the string
a + a ∗ a; it is shown in the Figure 2.
The fact that the grammar in Table 1 is unambiguous may be far from obvious.
Here are the key observations that explain why no string in the language can have
two different parse trees.
• Any string derived from T , a term, must be a sequence of one or more factors,
connected by *’s. A factor is either a single identifier or any parenthesized
expression.
2
E
E + T
T T * F
F F I
I I a
a a
• Because of the form of the two rules for T , the only parse tree for a sequence
of factors is the one that breaks f1 ∗ f2 ∗ . . . ∗ fn , for n > 1 into a term
f1 ∗ f2 ∗ . . . ∗ fn−1 and a factor fn . The reason is that F can not derive
expression like fn−1 ∗ fn without introducing parentheses around them. Thus,
it is not possible that when using the production T → T ∗ F , the F derives
anything but the last of the factor. That is, the parse tree for a term can only
look like Figure 3.
T * F
T * F
T * F
3
2 Leftmost Derivations as a Way to Express Am-
biguity
While derivations are not unique, even if the grammar is unambiguous, it turns out
that, in an unambiguous grammar, leftmost derivation will be unique, and rightmost
derivations will be unique. We shall consider leftmost derivations only, and state
the result for rightmost derivations.
As an example, see the two parse trees in Figure 4 that each yield a + a ∗ a (see the
grammar in Example 1). If we construct leftmost derivations from them we get the
following leftmost derivations from trees (a) and (b), respectively.
E E
E + E E * E
I E * E E + E I
a I I I I a
a a a a
(a) (b)
Figure 4: Trees with yield a + a ∗ a, demonstrating the ambiguity of expression
grammar (see Example 1)
Note that these two leftmost derivations differ. This example does not prove the
theorem, demonstrates how the differences in the trees force different steps to be
taken in the leftmost derivation.
Theorem: For each grammar G = (V, T, R, S) and string w ∈ T ∗ , w has two
distinct parse trees if and only if w has two distinct leftmost derivations from S.
• For proof see Hopcroft, Motwani and Ullman book.
3 Inherent Ambiguity
A context-free language L is said to be inherently ambiguous if all its grammars
are ambiguous. If even one grammar for L is unambiguous, then L is an unambiguous
language. For example, the language of expressions generated by the grammar in
Example 1 is actually unambiguous. Even though that grammar is ambiguous, there
4
is another grammar for the same language that is unambiguous (see the grammar
in Table 1).
We shall not prove that there are inherently ambiguous languages. Rather we shall
discuss one example of a language that can be proved inherently ambiguous, and we
shall explain intuitively why every grammar for the language must be ambiguous.
The language L in question is:
L = {an bn cm dm |n ≥ 1, m ≥ 1} ∪ {an bm cm dn |n ≥ 1, m ≥ 1}
L is a context-free language. The obvious grammar for L is shown in the Table 2.
It uses separate set of rules to generate the two kind of strings in L.
S → AB/C
A → aAb/ab
B → cBd/cd
C → aCd/aDd
D → bDc/bc
This grammar is ambiguous. For example, the string aabbccdd has the two leftmost
derivations:
S S
A B C
a b c B d a C d
A
b c d a D d
a
b D c
b c
(a) (b)
The proof that all grammars for L must be ambiguous is complex. However the
essence is as follows. We need to argue that all but a finite number of strings whose
counts of the four symbols a, b, c and d, are all equal must be generated in two
different ways: one in which the a’s and b’s are generated to be equal and the c’s
and d’s are generated to be equal, and a second way, where the a’s and d’s are
generated to be equal and likewise the b’s and c’s.