Optimal Binary Search Tree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

DYNAMIC PROGRAMMING: OPTIMAL STATIC BINARY SEARCH TREES

This lecture note describes an application of the dynamic programming paradigm on computing
the optimal static binary search tree of n keys to minimize the expected search time for a given
search probability distribution.
Suppose we are given a collection of n keys K1 < K2 < ... < K n which are to be stored in a binary
search tree. After the tree has been constructed, only search operations will be performed i.e.
there will be no insertions or deletions. We are also given a probability density function P where
P(i) is the probability of searching for key K i . There are many different binary search trees
where the n given keys can be stored. For a particular tree T with these keys, the average number of comparisons to find a key, for the given probability density is
n

P(i) ( depthT (K i ) + 1 ),
i=1
where depthT (K i ) denotes the depth of the node where K i is stored in T . The problem we would
like to solve is to find, among all the possible binary search trees that contain the n keys, one
which minimizes this quantity. Such a tree is called an optimal (static) binary search tree. Note
that there may be several optimal binary trees for the given density function. This is why we
speak of an optimal, rather than the optimum binary search tree.
A simple way to accomplish this is to try out all possible binary trees with n nodes, computing the average number of comparisons to find a key in each tree considered, and selecting a tree
with the minimum average. Unfortunately, this simple strategy is ridiculously inefficient because
2n
there are too many trees to try out. In particular, there are ( )/(n + 1) different binary trees with
n
n nodes (if interested in the derivation of this formula, see Knuth, vol. I, pp. 388-389). Thus, if
there are 20 keys, we have to try out 131,282,408,400 different trees. Computing the average
number of comparisons in each at the rather astonishing speed of 1sec per tree, will still take
2188 hours or approximately 91 days and nights of computing to find an optimal binary search
tree (for just 20 keys)! Fortunately, there is a much more efficient, if less straightforward, way to
find an optimal binary search tree. Let T be a binary search tree that contains K i , K i+1 , .. . , K j
for some 1 i j n. We shall see shortly why it is useful to consider trees that contain subsets
of successive keys. We define the cost of T as,
j

c(T ) = P(l) ( depthT (K l ) + 1 )


l=i

Hence, if T contains all n keys (i.e. i = 1 and j = n), the cost of T is precisely the expected number of comparisons to find a key for the given density function. Thus, we can rephrase our
problem as follows: Given a density function for the n keys, find a minimum cost tree with n
nodes.
Before giving the algorithm to find an optimal binary search tree, we prove two key facts.
Lemma 1: Let T be a binary search tree containing keys K i , K i+1 , ... , K j , T L and T R be the
left and right subtrees of T respectively. Then
This is not so if T is missing some of the keys, because in that case the probabilities of the keys that are in T
do not sum to 1 that is, P is not a proper density function relative to the set of keys in the tree.

-2j

c(T ) = c(T L ) + c(T R ) + P(l) .


l=i

Proof: This is an easy consequence of the definition of cost of a tree. You should prove it on your
own.
Lemma 2: Let T be a binary search tree that has minimum cost among all trees containing keys
K i , K i+1 , . .. , K j and let K m be the key at the root of T (so i m j ). Then T L , the left subtree
of T , is a binary search tree that has minimum cost among all trees containing keys
K i , . . . , K m1 ; and T R , the right subtree of T , is a binary search tree that has minimum cost
among all trees containing keys K m+1 , ... , K j .
Proof: We prove the contrapositive. That is, if either the left or right subtree of T fails to satisfy
the minimality property asserted in the lemma we show that T does not really have the minimum
possible cost among all trees that contain K i , ... , K j .
Let T L and T R be minimum cost binary search trees that contain K i , ... , K m1 and
K m+1 , . . . , K j respectively. Thus, c(T L ) c(T L ) and c(T R ) c(T R ) (*).
Further, let T be the tree with key K m in the root, and left and right subtrees T L and T R respectively. Evidently, T is a binary search tree that contains keys K i , ... , K j . If T L or T R do not
have the minimality property asserted by the lemma, it must be that either c(T L ) > c(T L ), or
c(T R ) > c(T R ). This, together with (*) implies that c(T L ) + c(T R ) > c(T L ) + c(T R ). From this
and Lemma 1 we have,
j

l=i

l=i

c(T ) = c(T L ) + c(T R ) + P(l) > c(T L ) + c(T R ) + P(l) = c(T).


Thus, c(T ) > c(T) and T is not a minimum cost binary search tree among all trees that contain
keys K i , . . . , K j .
Computing an Optimal Binary Search Tree
Lemma 2 is the basis of an efficient algorithm to find an optimal binary search tree. Let T ij
denote an optimal binary search tree containing keys K i , K i+1, ..., K j . Then T1n is precisely the
optimal binary search tree that we want to construct. Lemma 2 says that T ij must be of the following form:

Km

i,m-1

Tm+1,j

That is, its root has key K m for some m, i m j, and its subtrees are T i,m1 and T m+1, j , i.e.
minimum cost subtrees containing keys K i , ... , K m1 and K m+1 , ... , K j respectively. But T i,m1
and T m+1, j are "smaller" trees than T ij . This suggests proceeding inductively, starting with small
minimum cost trees (each containing just one key) and progressively building larger and larger
minimum cost trees, until we have a minimum cost tree with n nodes which is what we are looking for.

-3More specifically, we start the induction with minimum cost trees each of which contains
exactly one key, and proceed by constructing minimum cost trees with 2, 3, ... , n successive
keys. Note that there are exactly n d + 1 groups of d successive keys for each d = 1, .. . , n.
Thus, instead of considering all possible trees with n nodes we consider only n (minimum cost)
trees with 1 node each, n 1 (minimum cost) trees with 2 nodes each, ..., 1 minimum cost tree
2n
with n nodes, i.e. a total of n(n + 1)/2 trees much fewer than ( )/(n + 1).
n
So now the question is how to compute these trees T ij inductively. The basis of the induction, i.e. when j i = 0 is trivial. In this case we have j = i and the minimum cost binary search
tree that stores K i (in fact the only such tree!) is a single node containing K i ; its cost is
c(T ii ) = P(i).
For the inductive step, take j i > 0 and assume that we have already computed all the
T uv s and their costs, for v u < j i. Let T imj be the tree with K m in the root, and left and right
subtrees T i,m1 and T m+1, j respectively. As we saw before, Lemma 2 implies that T ij is the minimum cost tree among the T imj s. Thus we can find T ij simply by trying out all the T imj s for
m = i, i + 1, ... , j. In fact Lemma 1 tells us how to compute c(T imj ) efficiently, so that "trying out"
each possible m will not take too long: Since (m 1) i and j (m + 1) are both < j i, we have
already (inductively) computed T i,m1 and T m+1, j as well as their costs, c(T i,m1) and c(T m+1, j ).
Lemma 1 then tells us how to get c(T imj ) in terms of these. Note that when m = i the left subtree
of T imj is T i,i1 and when m = j the right subtree of T imj is T j+1, j . We define T ij to be empty if
i > j, and the cost of an empty tree to be 0.
Figure 1 shows this algorithm in pseudo-code. The algorithm takes as input an array
Prob[1. . n], which specifies the probability density (i.e. Prob[i] = P(i)). It computes two twodimensional arrays, Root and Cost, where Root[i, j] is the root of T ij , and Cost[i, j] = c(T ij ), for
1 i j n. To help compute Root and Cost the algorithm maintains a third array, SumOfi

Prob, where SumOfProb[i] = P(l) for 1 i n, and SumOfProb[0] = 0.

Note that

l=1

P(l) = SumOfProb[j] SumOfProb[i 1].

l=i
The algorithm of Figure 1 does not explicitly construct an optimal binary search tree but
such a tree is implicit in the information in array Root. As an exercise you should write an algorithm which, given Root and an array Key[1.. n], where Key[i] = K i , constructs an optimal
binary search tree.
It is not hard to show that this algorithm has worst case time complexity (n3). A slight
modification of this algorithm leads to a (n2) complexity (if interested, see D.E. Knuth, "Optimum binary search trees," Acta Informatica, vol. 1 (1971), pp. 14-25.)

Unsuccessful Searches
In this discussion we have only considered successful searches. However, if we take into
account unsuccessful searches, maybe the constructed tree is no longer optimal. Fortunately, this
problem can be taken care of in a straightforward manner. To find an optimal binary search tree
in the case where both successful and unsuccessful searches are taken into account, we must
For technical reasons that will become apparent when you look at the algorithm carefully we need to set
Recall that T i,i1 is empty and thus has cost 0.

Cost[i, i 1] = 0 for 1 i n + 1.

-4algorithm OptimalBST ( Prob[1.. n] );


begin
(* initialization *)
for i 1 to n + 1 do Cost[i, i 1] 0;
SumOfProb[0] 0;
for i 1 to n do begin
SumOfProb[i] Prob[i] + SumOfProb[i 1];
Root[i, i] i;
Cost[i, i] Prob[i]
end;
for d 1 to n 1 do (* compute info about trees with d + 1 consecutive keys *)
for i 1 to n d do begin (* compute Root[i, j] and Cost[i, j] *)
j i + d;
MinCost + ;
for m i to j do begin (* find m between i and j so that c(T imj ) is minimum *)
c Cost[i, m 1] + Cost[m + 1, j] + SumOfProb[ j] SumOfProb[i 1];
if c < MinCost then begin
MinCost c;
rm
end
end;
Root[i, j] r;
Cost[i, j] MinCost
end
end
Figure 1: Algorithm for optimal binary search tree.
know the probability density for both successful and unsuccessful searches. So, in addition to
P(i) we are also given Q(i) for 0 i n, where
Q(0) = prob. of searching for keys < K1 ;
Q(i) = prob. of searching for keys x, K i < x < K i+1 , for 1 i < n;
Q(n) = prob. of searching for keys > K n .
In each binary search tree containing keys K1 , K2 , ... , K n we add n + 1 external nodes
E0 , E1 , . . . , E n . This is illustrated below; the external nodes are drawn in boxes, as usual.
K2
K4

K1
E0

E4

K3

1
E2

E3

The average number of comparisons for a successful or unsuccessful search in such a tree T is

-5n

P(i) ( depthT (K i ) + 1 ) +

i=1

Q(i) depthT (E i ).

i=0

The left term is the contribution to the average number of comparisons by the successful searches
and the right term is the contribution to the average number of comparisons by the unsuccessful
searches. Now we want to find a tree that minimizes this quantity.
We can proceed exactly as before, except that the definition of the cost of a tree T with consecutive keys K i , K i+1 , ... , K j is slightly modified to account for the unsuccessful searches.
Namely, it becomes,
j

c(T ) = P(l) ( depthT (K l ) + 1 ) +


l=i

Q(l) depthT (E l ).

l=i1

With this cost function Lemma 1 is slightly different:


Lemma 1: Let T be a binary search tree containing keys K i , K i+1 , ... , K j , T L and T R be the
left and right subtrees of T respectively. Then
j

c(T ) = c(T L ) + c(T R ) + Q(i 1) + (P(l) + Q(l)).


l=i

Everything else works out exactly as before. In particular, Lemma 2 is still valid (check this!).
As an exercise show how to modify the algorithm in Figure 1 to account for these changes.
Example: Suppose we want to find an optimal binary search tree for the dictionary {begin, end,
goto, repeat, until} for the following probabilities of searching for these keys (P(i)s) and keys
alphabetically in between (the Q(i)s):

Q(0) =. 05

K1 = begin
P(1) =.1
Q(1) =. 05

K2 = end
P(2) = .1
Q(2) = . 2

K3 = goto
P(3) =. 05
Q(3) = . 2

K 4 = repeat
P(4) =. 05
Q(4) =.1

K5 = until
P(5) =. 05
Q(5) = . 05

The trees T ij as computed by the algorithm on this example are shown in the table below with
their costs. The computation proceeds row by row (i.e. to compute the trees and costs in row d,
all the trees up to row d 1 must have been previously computed). The optimal binary search
tree T1,5 is shown on the last row.
You are strongly encouraged to trace this example carefully. Even if you think you understand
the algorithm from the previous abstract discussion, you may be surprised at how much better
youll understand it after working out an example.

-6-

d = -1 :
E0

E1

E2

E3

E4

E5

T
10

T
21

T
32

T
43

T
54

T65

c(T

c(T )=0
10

)=0
21

c(T

)=0
32

c(T 43 )=0

c(T

54

)=0

c(T 65 )=0

d=0:
K
1
E0

E1

c(T

11

K
3

K
2
E1

)=.2

E2

c(T

22

E2

)=.35

c(T

K4
E3

33

)=.45

E3
c(T

K
5
E4

44

)=.35

E4

E5

c(T )=.2
55

d=1:
K
3

K
2
K
1
E0

K
2

E2
E1

c(T 12 ) = .7

E1
c(T

E3
E2

23

) = .95

K4

K
3
K4

E2
E3

E3
E4

c(T 34 ) = .95

K
5
E4
c(T ) = .65
45

E5

-7-

d=2:
K
2
K
3

K
1
E0

K
2

E2

E1

K
3

K
3

E3

E1

E2

K4

E2

K4
E3

K
5

E3

E4

E4
c(T

13

) = 1.4

c(T

24

) = 1.45

c(T

35

E5

) = 1.35

d=3:
K
3
K
2
K
1
E0

K
3
K4

E2

E3

K
2
E4

E1

K4
E2

E3

E1
c(T

K
5
E4

14

) = 1.95

c(T 25 ) = 1.85

d=4:

K
3
K
2
K
1
E0

K4
E2

K
5

E3
E4

E1
c(T 15 ) = 2.35

E5

E5

You might also like