Advanced Data Structures Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 142
At a glance
Powered by AI
The document discusses various data structures for implementing dictionaries like hash tables, trees, skip lists etc. It also talks about sets, dictionaries and different operations that can be performed on sets. Hash tables provide an efficient way of implementing dictionaries with average case time complexity of O(1).

Some of the data structures discussed for implementing dictionaries are hash tables, binary search trees, balanced search trees, splay trees, tries, multiway search trees etc.

The two main issues in hash tables are collisions that can occur due to different keys hashing to the same value and choosing an appropriate hash function that minimizes collisions and hashes uniformly.

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ADVANCED DATA STRUCTURES


III-Year/I-sem
Regulation(R10)
Lecture Notes
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

1

UNIT-I
Topics: Dictionaries : Sets, Dictionaries, Hash Tables, Open Hashing, Closed Hashing
(Rehashing Methods), Hashing Functions( Division Method, Multiplication Method,
Universal Hashing), Analysis of Closed Hashing Result (Unsuccessful Search, Insertion,
Successful Search, Deletion), Hash Table Restructuring, Skip Lists, Analysis of Skip Lists.

1.1 Dictionaries:
A dictionary is a container of elements from a totally ordered universe that supports the basic
operations of inserting/deleting elements and searching for a given element. In this chapter, we
present hash tables which provide an efficient implicit realization of a dictionary. Efficient
explicit implementations include binary search trees and balanced search trees.The abstract data
type Set which includes dictionaries, priority queues, etc. as subclasses.
1.2 Sets:
A set is a collection of well defined elements. The members of a set are all different.
A set ADT can be defined to comprise the following operations:
1. Union (A, B, C)
2. Intersection (A, B, C)
3. Difference (A, B, C)
4. Merge (A, B, C)
5. Find (x)
6. Member (x, A) or Search (x, A)
7. Makenull (A)
8. Equal (A, B)
9. Assign (A, B)
10.Insert (x, A)
11.Delete (x, A)
12.Min (A) (if A is an ordered set)
Set implementation: Possible data structures include:
o Bit Vector
o Array
o Linked List
Unsorted
Sorted
Dictionaries:
A dictionary is a dynamic set ADT with the operations:
1.Makenull (D)
2.Insert (x, D)
3.Delete (x, D)
4.Search (x, D)
Useful in implementing symbol tables, text retrieval systems, database systems, page
mapping tables, etc.
Implementation:
1.Fixed Length arrays
2.Linked lists : sorted, unsorted, skip-lists
3.Hash Tables : open, closed
4.Trees
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

2

o Binary Search Trees (BSTs)
o Balanced BSTs
AVL Trees
Red-Black Trees
o Splay Trees
o Multiway Search Trees
2-3 Trees
B Trees
o Tries
Let n be the number of elements is a dictionary D. The following is a summary of the
performance of some basic implementation methods:
Worst case complexity of




O(n) O(n) O(n) O(n)


O(n) O(n) O(n) O(1)


O(n) O(n) O(n) O(n)
Among these, the sorted list has the best average case performance.
In this chapter, we discuss two data structures for dictionaries, namely Hash Tables and
Skip Lists.
1.3 Hash Tables:
An extremely effective and practical way of implementing dictionaries.
O(1) time for search, insert, and delete in the average case.
O(n) time in the worst case; by careful design, we can make the probability that more
than constant time is required to be arbitrarily small.
Hash Tables
o Open or External
o Closed or Internal
1.4 Open hashing
Let:
U be the universe of keys:
o integers
o character strings
o complex bit patterns
B the set of hash values (also called the buckets or bins). Let B = {0, 1,..., m - 1}where m
> 0 is a positive integer.
A hash function h : U B associates buckets (hash values) to keys.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

3

Two main issues:
1. Collisions
If x
1
and x
2
are two different keys, it is possible that h(x
1
) = h(x
2
). This is called a
collision. Collision resolution is the most important issue in hash table implementations.
2. Hash Functions
Choosing a hash function that minimizes the number of collisions and also hashes
uniformly is another critical issue.
Collision Resolution by Chaining
Put all the elements that hash to the same value in a linked list. See Figure 3.1.
Figure 3.1: Collision resolution by chaining


Example:
See Figure 3.2. Consider the keys 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.
Let the hash function be: h(x) = x % 7

Figure 3.2: Open hashing: An example


Bucket lists
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

4

o unsorted lists
o sorted lists (these are better)
Insert (x, T)
Insert x at the head of list T[h(key (x))]
Search (x, T)
Search for an element x in the list T[h(key (x))]
Delete (x, T)
Delete x from the list T[h(key (x))]
Worst case complexity of all these operations is O(n)
In the average case, the running time is O(1 + ), where

=
load factor
(3.1)
n = number of elements stored (3.2)
m = number of hash values or buckets
It is assumed that the hash value h(k) can be computed in O(1) time. If n is O(m), the average
case complexity of these operations becomes O(1) !
1.5 Closed hashing:
All elements are stored in the hash table itself
Avoids pointers; only computes the sequence of slots to be examined.
Collisions are handled by generating a sequence of rehash values.
h : x {0, 1, 2,..., m
- 1}
Given a key x, it has a hash value h(x,0) and a set of rehash values
h(x, 1), h(x,2), . . . , h(x, m-1)
We require that for every key x, the probe sequence
< h(x,0), h(x, 1), h(x,2), . . . , h(x, m-1)>
be a permutation of <0, 1, ..., m-1>.
This ensures that every hash table position is eventually considered as a slot for storing a
record with a key value x.
Search (x, T)
Search will continue until you find the element x (successful search) or an empty slot
(unsuccessful search).
Delete (x, T)
No delete if the search is unsuccessful.
If the search is successful, then put the label DELETED (different from an empty slot).
Insert (x, T)
No need to insert if the search is successful.
If the search is unsuccessful, insert at the first position with a DELETED tag.

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

5

1.6 Rehashing methods
Denote h(x, 0) by simply h(x).
1.Linear probing
h(x, i) = (h(x) + i) mod m
2.Quadratic Probing
h(x, i) = (h(x) + C
1
i + C
2
i
2
) mod m

where C
1
and C
2
are constants.
3.Double Hashing
h(x, i) = (h(x) + i mod m

A Comparison of Rehashing Methods

m distinct probe Primary clustering
sequences


m distinct probe No primary clustering;
sequences but secondary clustering


m
2
distinct probe No primary clustering
sequences No secondary clustering
An example:
Assume linear probing with the following hashing and rehashing functions:
h(x, 0) = x%7

h(x, i) = (h(x, 0) + i)%7
Start with an empty table.

Insert (20, T) 0 14
Insert (30, T) 1 empty
Insert (9, T) 2 30
Insert (45, T) 3 9
Insert (14, T) 4 45
5 empty
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

6

6 20

Search (35, T) 0 14
Delete (9, T) 1 empty
2 30
3 deleted
4 45
5 empty
6 20

Search (45, T) 0 14
Search (52, T) 1 empty
Search (9, T) 2 30
Insert (45, T) 3 10
Insert (10, T) 4 45
5 empty
6 20

Delete (45, T) 0 14
Insert (16, T) 1 empty
2 30
3 10
4 16
5 empty
6 20
An other example: Let m be the number of slots.
Assume :
every even numbered slot occupied and every odd
numbered slot empty

any hash value between 0 . . . m-1 is equally likely
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

7

to be generated.

linear probing
empty
occupied
empty
occupied
empty
occupied
empty
occupied

Expected number of probes for a successful search = 1
Expected number of probes for an unsuccessful search
=
(1) + (2)



= 1.5

1.7 Hashing Function:
What is a good hash function?
Should satisfy the simple uniform hashing property.
Let U = universe of keys
Let the hash values be 0, 1, . . . , m-1
Let us assume that each key is drawn independently from U according to a probability
distribution P. i.e., for k U
P(k) =
Probability that k is drawn
Then simple uniform hashing requires that
P(k) = for each j = 0, 1,..., m - 1
that is, each bucket is equally likely to be occupied.
Example of a hash function that satisfies simple uniform hashing property:
Suppose the keys are known to be random real numbers k independently and uniformly
distributed in the range [0,1).
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

8

h(k) = km
satisfies the simple uniform hashing property.
Qualitative information about P is often useful in the design process. For example,
consider a compiler's symbol table in which the keys are arbitrary character strings
representing identifiers in a program. It is common for closely related symbols, say pt,
pts, ptt, to appear in the same program. A good hash function would minimize the chance
that such variants hash to the same slot.
A common approach is to derive a hash value in a way that is expected to be independent
of any patterns that might exist in the data.
o The division method computes the hash value as the remainder when the key is
divided by a prime number. Unless that prime is somehow related to patterns in
the distribution P, this method gives good results.

Division method:
A key is mapped into one of m slots using the function
h(k) = k mod m
Requires only a single division, hence fast
m should not be :
o a power of 2, since if m = 2
p
, then h(k) is just the p lowest order bits of k
o a power of 10, since then the hash function does not depend on all the decimal
digits of k
o 2
p
- 1. If k is a character string interpreted in radix 2
p
, two strings that are identical
except for a transposition of two adjacent characters will hash to the same value.
Good values for m

o primes not too close to exact powers of 2.
Multiplication Method:
There are two steps:
1.
Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of
kA
2.
Multiply this fractional part by m and take the floor.
h(k) = m(kA mod 1)
where

kA mod 1 = kA - kA


h(k) = m(kA - kA )

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

9




Advantage of the method is that the value of m is not critical. We typically choose it to be
a power of 2:
m =
2
p

for some integer p so that we can then easily implement the function on most computers
as follows:
Suppose the word size = w. Assume that k fits into a single word. First multiply k by the
w-bit integer A.2
w
. The result is a 2w - bit value
r
1
2
w
+ r
0

where r
1
is the high order word of the product and r
0
is the low order word of the product.
The desired p-bit hash value consists of the p most significant bits of r
0
.
Works practically with any value of A, but works better with some values than the others.
The optimal choice depends on the characteristics of the data being hashed. Knuth
recommends
A = 0.6180339887... (Golden Ratio)
1.8 Universal Hashing:
This involves choosing a hash function randomly in a way that is independent of the keys that
are actually going to be stored. We select the hash function at random from a carefully designed
class of functions.
Let be a finite collection of hash functions that map a given universe U of keys into
the range {0, 1, 2,..., m - 1}.
is called universal if for each pair of distinct keys x, y U, the number of hash
functions h for which h(x) = h(y) is precisely equal to

With a function randomly chosen from , the chance of a collision between x and y
where x y is exactly .
Example of a universal class of hash functions:
Let table size m be prime. Decompose a key x into r +1 bytes. (i.e., characters or fixed-width
binary strings). Thus
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

10

x = (x
0
, x
1
,..., x
r
)
Assume that the maximum value of a byte to be less than m.
Let a = (a
0
, a
1
,..., a
r
) denote a sequence of r + 1 elements chosen randomly from the set {0, 1,...,
m - 1}. Define a hash function h
a
by
h
a
(x) = a
i
x
i
mod m
With this definition, = {h
a
} can be shown to be universal. Note that it has m
r + 1

members.
1.9 Analysis of closed hashing:
Load factor
=
Assume uniform hashing. In this scheme, the probe sequence
< h(k, 0),..., h(k, m - 1) >
for each key k is equally likely to be any permutation of
< 0, 1,..., m - 1 >
Result 1: Unsuccessful Search
The expected number of probes in an unsuccessful search
Proof: In an unsuccessful search, every probe but the last accesses an occupied slot that
does not contain the desired key and the last slot probed is empty.

Let the random variable X denote the number of occupied slots probed before hitting an
empty slot. X can take values 0, 1,..., n. It is easy to see that the expected number of
probes in an unsuccessful search is 1 + E[X].
p
i
=
P{ }, for i = 0, 1, 2,...

for i > n, p
i
= 0 since we can find at most n occupied slots.
Thus the expected number of probes
= 1 + ip
i
(*)
To evaluate (*), define
q
i
= P{at least i probes access occupied slots}
and use the identity:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

11

ip
i
= q
i

To compute q
i
, we proceed as follows:
q
1
=
since the probability that the first probe accesses an occupied slot is n/m.
With uniform hashing, a second probe, if necessary, is to one of the remaining m - 1
unprobed slots, n - 1 of which are occupied. Thus
q
2
=
since we make a second probe only if the first probe accesses an occupied slot.
In general,
q
i
=
...




= since

Thus the expected number of probes in an unsuccessful search

=
1 + ip
i




1 + + +
...



=



Intuitive Interpretation of the above:

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

12

One probe is always made, with probability approximately a second probe is needed,
with probability approximately a third probe is needed, and so on.
Result 2: Insertion
Insertion of an element requires at most probes on average.
Proof is obvious because inserting a key requires an unsuccessful search followed by a
placement of the key in the first empty slot found.
Result 3: Successful Search
The expected number of probes in a successful search is at most
ln +
assuming that each key in the table is equally likely to be searched for.
A search for a key k follows the same probe sequence as was followed when the element
with key k was inserted. Thus, if k was the (i + 1)th key inserted into the hash table, the
expected number of probes made in a search for k is at most
=
Averaging over all the n keys in the hash table gives us the expected # of probes in a
successful search:

=



=
H
m
- H
m - n


where
H
i
= is the i th Harmonic number.
Using the well known bounds,
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

13

lni H
i
ln i + 1,
we obtain
(H
m
- H
m - n
)

lnm + 1 - ln(m - n)


=
ln +


=
ln +

Result 4: Deletion
Expected # of probes in a deletion
ln +
Proof is obvious since deletion always follows a successful search.
Figure 3.3: Performance of closed hashing


Figure 3.3 depicts the performance of closed hashing for all the four operations discussed above.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

14

1.10 Hash Table Restructuring
When a hash table has a large number of entries (ie., let us say n 2m in open hash
table or n 0.9m in closed hash table), the average time for operations can become
quite substantial. In such cases, one idea is to simply create a new hash table with more
number of buckets (say twice or any appropriate large number).
In such a case, the currently existing elements will have to be inserted into the new table.
This may call for
o rehashing of all these key values
o transferring all the records
This effort will be less than it took to insert them into the original table.
Subsequent dictionary operations will be more efficient and can more than make up for
the overhead in creating the larger table.

1.11 Skip Lists
Skip lists use probabilistic balancing rather than strictly enforced balancing.
Although skip lists have bad worst-case performance, no input sequence consistently
produces the worst-case performance (like quicksort).
It is very unlikely that a skip list will be significantly unbalanced. For example, in a
dictionary of more than 250 elements, the chance is that a search will take more than 3
times the expected time 10
-6
.
Skip lists have balance properties similar to that of search trees built by random
insertions, yet do not require insertions to be random.

Figure 3.4: A singly linked list


Consider a singly linked list as in Figure 3.4. We might need to examine every node of the list
when searching a singly linked list.
Figure 3.5 is a sorted list where every other node has an additional pointer, to the node two ahead
of it in the list. Here we have to examine no more than +1 nodes.

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

15

Figure 3.5: Every other node has an additional pointer



Figure 3.6: Every second node has a pointer two ahead of it


In the list of Figure 3.6, every second node has a pointer two ahead of it; every fourth node has a
pointer four ahead if it. Here we need to examine no more than + 2 nodes.
In Figure 3.7, (every (2
i
)
th
node has a pointer (2
i
) node ahead (i = 1, 2,...); then the number of
nodes to be examined can be reduced to log
2
n while only doubling the number of pointers.

Figure 3.7: Every (2
i
)
th
node has a pointer to a node (2
i
)nodes ahead (i = 1,
2,...)


Figure 3.8: A skip list


A node that has k forward pointers is called a level k node. If every (2
i
)
th
node has a
pointer (2
i
) nodes ahead, then
# of level 1 nodes 50 %
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

16

# of level 2 nodes 25 %
# of level 3 nodes 12.5 %
Such a data structure can be used for fast searching but insertions and deletions will be
extremely cumbersome, since levels of nodes will have to change.
What would happen if the levels of nodes were randomly chosen but in the same
proportions (Figure 3.8)?
o level of a node is chosen randomly when the node is inserted
o A node's i
th
pointer, instead of pointing to a node that is 2
i - 1
nodes ahead, points
to the next node of level i or higher.
o In this case, insertions and deletions will not change the level of any node.
o Some arrangements of levels would give poor execution times but it can be shown
that such arrangements are rare.
Such a linked representation is called a skip list.
Each element is represented by a node the level of which is chosen randomly when the
node is inserted, without regard for the number of elements in the data structure.
A level i node has i forward pointers, indexed 1 through i. There is no need to store the
level of a node in the node.
Maxlevel is the maximum number of levels in a node.
o Level of a list = Maxlevel
o Level of empty list = 1
o Level of header = Maxlevel

Initialization:
An element NIL is allocated and given a key greater than any legal key. All levels of all lists are
terminated with NIL. A new list is initialized so that the level of list = maxlevel and all forward
pointers of the list's header point to NIL
Search:
We search for an element by traversing forward pointers that do not overshoot the node
containing the element being searched for. When no more progress can be made at the current
level of forward pointers, the search moves down to the next level. When we can make no more
progress at level 1, we must be immediately in front of the node that contains the desired element
(if it is in the list).
Insertion and Deletion:
Insertion and deletion are through search and splice
update [i] contains a pointer to the rightmost node of level i or higher that is to the left of
the location of insertion or deletion.
If an insertion generates a node with a level greater than the previous maximum level, we
update the maximum level and initialize appropriate portions of update list.
After a deletion, we check to see if we have deleted the maximum level element of the
list and if so, decrease the maximum level of the list.
Figure 3.9 provides an example of Insert and Delete. The pseudocode for Insert and
Delete is shown below.
search (list, searchkey);
{ x = list header; for (i = list level; i 1; i - -) {
while (x forward[i] key < searchkey)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

17

x = x forward[i];
}
x = x forward[i];
if (x key = searchkey) return (true)
else return false;
}
Figure 3.9: A skip list
insert (list, searchkey);

{ x = list header ;
for (i = list level; i 1; i - -) {
while (x forward[i] key < searchkey)
x = x forward[i];
update[i] = x
}
x = x forward[1];
if (x key = searchkey) return (``key already present'')
else {
newLevel = randomLevel();
if newLevel > list level {
for (i = list level + 1; i newLevel; i ++ )
update[i] = list header;
}
x = makenode(newLevel, searchkey);
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

18

for (i = 1, i newLevel; i++) {
x forward[i] = update[i] forward[i];
update[i] forward[i] = x
}
}}
delete (list, searchkey);
{ x = list header ;
for (i = list level; i 1; i - ) {
while (x forward[i] key <searchkey)
x = x forward[i];
update[i] = x
}
x = x forward[1];
if (x key = searchkey) {
for (i = 1; i list level; i ++) {
if (update[i] forward[i] x) break;
if (update[i] forward[i] = x forward[i];
}
free(x)
While (( list 1) &&
(list header forward [list+level] = NIL))
list level = list level - 1

1.12 Analysis of Skip Lists
In a skiplist of 16 elements, we may have
9 elements at level 1
3 elements at level 2
3 elements at level 3
1 element at level 6
One important question is:
Where do we start our search? Analysis shows we should start from level L(n) where
L(n) = log
2
n
In general if p is the probability fraction,
L(n) = log n
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

19

where p is the fraction of the nodes with level i pointers which also have level (i + 1)
pointers.
However, starting at the highest level does not alter the efficiency in a significant way.
Another important question to ask is:
What should be MaxLevel? A good choice is
MaxLevel = L(N) = log N
where N is an upperbound on the number of elements is a skiplist.
Complexity of search, delete, insert is dominated by the time required to search for the
appropriate element. This in turn is proportional to the length of the search path. This is
determined by the pattern in which elements with different levels appear as we traverse
the list.
Insert and delete involve additional cost proportional to the level of the node being
inserted or deleted.

Analysis of Expected Search Cost
We analyze the search path backwards, traveling up and to the left. We pretend as if the level of
a node is being determined only when it is observed while backtracking the search path.
1.
From level 1, we rise to level L(n)
2.
We move leftward
3.
We climb from L(n) to maxLevel.
At a particular point in the climb, we are at the i
th
forward pointer of a node x and we have no
knowledge about the levels of nodes to the left of x or about the level of x (other than that the
level of x must be i ). See situation (a) in figure.
Assume ``x'' is not the header.
If level of x is i, then we are in situation (b). If level of x is > i, we are in situation (c).
Prob{ we are in situation cA} = p
Each time we are in situation c, we climb up one level.
Let
C(k) = expected length of a search path that

.climbs up k levels in an infinite list.
Then:
C(0) = 0
C(k) = p {cost in situation c }

(1 - p) { cost in situation b }
We get

C(k) = p {1 + C(k - 1)} + (1 - p){1 + C(k)}
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

20

C(k) =
+ C(k - 1)
C(k) =

Our assumption that the list is infinite is pessimistic.
When we bump into the header in our backward climb, we simply climb it up without
performing any leftward (horizontal) movements.
This gives us an upperbound of

On the expected length of a path that climbs up from level 1 to level L(n) in a list of n elements.
(because L(n) - 1 level are being climbed).
The rest of the journey includes
1.
leftward movements, the number of which is bounded by the number of elements of level
L(n)or higher. his has an expected value =
2.
We also move upwards from level L(n) to the maximum level in the list.
p { maximum level in the list > k}

= 1 - (1 - p
k
)
n


np
k

p { }
= p
k

p { }
= 1 - p
k

p { }
= (1 - p
k
)
n

p {
}
= 1 - (1 - p
k
)
n

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

21

Expected maximum level
np
k

Expected maximum level
L(n) +

Putting our results together, we get:
Total expected cost to climb out of a list on n elements
+ which is O(log n).
We can also compute the probability if the actual cost of a search exceeds expected cost by more
than a specified ratio.
Examples
1.
p = ; n = 256
p {search will take longer than 3 times the expected cost } =
2.
p = ; n = 4096
p {search will take longer than 2 times the expected cost } =
3.
p = ; n = 1024
p {search will take longer than 3 times } =
How do we choose p ?
Decreasing p increases variability of running times.
p = ; generation of random level can be done from a stream of random bits.
Decreasing p decreases # of pointers per node.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

22

Overheads are related to L(n) rather than .
p = is probably the best choice, unless variability of running times is an issue.

Assignment questions:
Dictionaries
1. Explain about the skip list representation of dictionary with an example?
2. What are the data members of skipList class? Write the constructor for skipList?
3. Write a short notes on
i. Sets
ii. Dictionaries
iii. Hash Tables
4. Discuss
i. Open Hashing
ii. Closed Hashing
iii. Rehashing Methods
5. Explain Hashing Functions
i. Division Method
ii. Multiplication Method
iii. Universal Hashing
6. Give the analysis of Closed Hashing Result (Unsuccessful Search, Insertion, Successful
Search, Deletion)?
7. How to restructure Hash Table?
8. Explain Skip Lists? Analysis of Skip Lists?











www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

23

UNIT-II
Topics: AVL Trees: Maximum Height of an AVL Tree, Insertions and
Deletions. 2-3 Trees : Insertion, Deletion.

2.1Balanced Trees:AVL Trees
Also called as: Height Balanced Binary Search Trees.
Search, Insertion, and Deletion can be implemented in worst case O(log n) time
Definition
An AVL tree is a binary search tree in which
1.
the heights of the right subtree and left subtree of the root differ by at most 1
2.
the left subtree and the right subtree are themselves AVL trees
3.
A node is said to be
left-high if the left subtree has /
greater height

right-high if the right subtree has
greater height


equal if the heights of the LST and
RST are the same -
Figure 5.1: Examples of AVL trees

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

24

Examples: Several examples of AVL trees are shown in Figure 5.1.


Figure 5.2: An AVL tree with height h


2.2 Maximum Height of an AVL Tree
What is the maximum height of an AVL tree having exactly n nodes? To answer this question,
we will pose the following question:
What is the minimum number of nodes (sparsest possible AVL tree) an AVL tree of height h can
have ?
Let F
h
be an AVL tree of height h, having the minimum number of nodes. F
h
can be
visualized as in Figure 5.2.
Let F
l
and F
r
be AVL trees which are the left subtree and right subtree, respectively, of
F
h
. Then F
l
or F
r
must have height h-2.
Suppose F
l
has height h-1 so that F
r
has height h-2. Note that F
r
has to be an AVL tree
having the minimum number of nodes among all AVL trees with height of h-1. Similarly,
F
r
will have the minimum number of nodes among all AVL trees of height h--2. Thus we
have
| F
h
| = | F
h - 1
| + | F
h - 2
| + 1
where | F
r
| denotes the number of nodes in F
r
. Such trees are called Fibonacci trees. See
Figure 5.3. Some Fibonacci trees are shown in Figure 4.20. Note that | F
0
| = 1 and | F
1
| =
2.
Adding 1 to both sides, we get
| F
h
| + 1 = (| F
h - 1
| + 1) + (| F
h - 2
| + 1)
Thus the numbers | F
h
| + 1 are Fibonacci numbers. Using the approximate formula for
Fibonacci numbers, we get
| F
h
| + 1

h 1.44log| F
n
|
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

25


The sparsest possible AVL tree with n nodes has height
h 1.44log n

The worst case height of an AVL tree with n nodes is
1.44log n

Figure 5.3: Fibonacci trees




Figure 5.4: Rotations in a binary search tree

2.3 AVL Trees: Insertions and Deletions
While inserting a new node or deleting an existing node, the resulting tree may violate the
(stringent) AVL property. To reinstate the AVL property, we use rotations. See Figure
5.4.
Rotation in a BST
Left rotation and right rotation can be realized by a three-way rotation of pointers.
Left Rotation:
temp =
p right ;
p right
=
temp left ;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

26

temp left
= p ;
p = temp ;
Left rotation and right rotation preserve
*
BST property
*
Inorder ordering of keys
Problem Scenarios in AVL Tree Insertions
o left subtree of node has degree higher by 2
left child of node is left high (A)
left child or node is right high (B)
o right subtree has degree higher by 2
right child of node is left high (C)
right child or node is right high (D)
The AVL tree property may be violated at any node, not necessarily the root. Fixing the
AVL property involves doing a series of single or double rotations.
Double rotation involves a left rotation followed or preceded by a right rotation.
In an AVL tree of height h, no more than rotations are required to fix the AVL
property.


Figure 5.5: Insertion in AVL trees: Scenario D
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

27




Figure 5.6: Insertion in AVL trees: Scenario C
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

28




Figure 5.7: Deletion in AVL trees: Scenario 1
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

29




Figure 5.8: Deletion in AVL trees: Scenario 2
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

30



Insertion: Problem Scenario 1: (Scenario D)
Scenario A is symmetrical to the above. See Figure 5.5.
Insertion: Problem Scenario 2: (Scenario C)
Scenario B is symmetrical to this. See Figure 5.6.
Deletion: Problem Scenario 1:
Depending on the original height of T
2
, the height of the tree will be either unchanged
(height of T
2
= h) or gets reduced (if height of T
2
= h - 1). See Figure 5.7.
There is a scenario symmetric to this.
Deletion: Problem Scenario 2:
See Figure 5.8. As usual, there is a symmetric scenario

2.4 2-3 Trees
Definition
A 2-3 Tree is a null tree (zero nodes) or a single node tree (only one node) or a multiple
node tree with the following properties:
1. Each interior node has two or three children
2. Each path from the root to a leaf has the same length.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

31

Fields of a Node :
Internal Node
p
1
k
1
p
2
k
2
p
3

p
1
: Pointer to the first child

p
2
: Pointer to the second child

p
3
: Pointer to the third child

k
1
: Smallest key that is a descendent of the second child

k
2
: Smallest key that is a descendent of the third child
Leaf Node
key other fields
Records are placed at the leaves. Each leaf contains a record (and key)
Example: See Figure 5.23


Figure 5.23: An example of a 2-3 tree

Search
The values recorded at the internal nodes can be used to guide the search path.
To search for a record with key value x, we first start at the root. Let k
1
and k
2
be the two
values stored here.
1.
If x < k
1
, move to the first child
2.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

32

If x k
1
and the node has only two children, move to the second child
3.
If x k
1
and the node has three children, move to the second child if x < k
2
and to the
third child if x k
2
.
Eventually, we reach a leaf. x is in the tree iff x is at this leaf.
Path Lengths
A 2-3 Tree with k levels has between 2
k - 1
and 3
k - 1
leaves
Thus a 2-3 tree with n elements (leaves) requires

at least 1 + log
3
n levels
at most 1 + log
2
n levels
2.5 2-3 Trees: Insertion
For an example, see Figure 5.24.

Figure 5.24: Insertion in 2-3 trees: An example
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

33



www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

34



Figure 5.25: Deletion in 2-3 trees: An Example

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

35


Insert (x)
1.
Locate the node v, which should be the parent of x
2.
If v has two children,
make x another child of v and place it in the proper order
adjust k
1
and k
2
at node v to reflect the new situation
3.
If v has three children,
split v into two nodes v and v'. Make the two smallest among four children stay
with v and assign the other two as children of v'.
Recursively insert v' among the children of w where
w = parent of v
The recursive insertion can proceed all the way up to the root, making it necessary
to split the root. In this case, create a new root, thus increasing the number of
levels by 1.
2.5 2-3 Trees: Deletion
Delete (x)
1.
Locate the leaf L containing x and let v be the parent of L
2.
Delete L. This may leave v with only one child. If v is the root, delete v and let its lone
child become the new root. This reduces the number of levels by 1. If v is not the root, let
p = parent of v
If p has a child with 3 children, transfer an appropriate one to vif this child is adjacent
(sibling) to v.
If children of p adjacent to v have only two children, transfer the lone child of v to an
adjacent sibling of v and delete v.
If p now has only one child, repeat recursively with p in place of v. The recursion
can ripple all the way up to the root, leading to a decrease in the number of levels.
Example: See Figure 5.25.

Assignment questions:
Balanced Trees
1. Write a short notes on balancing trees?
2. What is an AVL Trees? Maximum Height of an AVL Tree?application area of avl?
3. Give the operations performed on avl tree with an example Insertions and
4. Deletions.
5. What is 2-3 Trees ?explain Insertion, Deletion operations?application area of 2-3 trees?
6. Start with an empty 2-3 tree and insert the keys 2, 1, 5, 6, 7, 4, 3, 8, 9, 10, and 11
in this order. Draw the 2-3 tree following each insertion. Remove 5, 7, 9 from 2-3
tree constructed for above elements and draw the 2-3 tree after each removal.


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

36

UNIT-III
Topics: Priority Queues :
Binary Heaps : Implementation of Insert and Delete min, Creating Heap.
Binomial Queues : Binomial Queue Operations, Binomial Amortized Analysis, Lazy
Binomial Queues

3.1 Priority Queues
A Priority queue is an important abstract data type in Computer Science. Major operations
supported by priority queues are INSERT and DELETEMIN. They find extensive use in
implementing schedulers in OS, and distributed systems
representing event lists in discrete event simulation
implementing numerous graph algorithms efficiently
selecting k
th
largest or k
th
smallest elements in lists (order statistics problems)
sorting applications
Simple ways of implementing a priority queue include:
Linked list - sorted and unsorted
Binary search tree
Binary Heaps
Heaps (occasionally called as partially ordered trees) are a very popular data structure for
implementing priority queues.
A heap is either a min-heap or a max-heap. A min-heap supports the insert and deletemin
operations while a max-heap supports the insert and deletemax operations.
Heaps could be binary or d-ary. Binary heaps are special forms of binary trees while d-
ary heaps are a special class of general trees.
Binary heaps were first introduced by Williams in 1964.
We discuss binary min-heaps. The discussion is identical for binary max-heaps.
DEF. A binary heap is a complete binary tree with elements from a partially ordered set, such
that the element at every node is less than (or equal to) the element at its left child and the
element at its right child. Figure 6.1 shows and example of a heap.
Figure 6.1: An example of a heap and its array
representation

Since a heap is a complete binary tree, the elements can be conveniently stored in an
array. If an element is at position i in the array, then the left child will be in position 2i
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

37

and the right child will be in position 2i + 1. By the same token, a non-root element at
position i will have its parent at position
Because of its structure, a heap with height k will have between 2
k
and 2
k + 1
- 1 elements.
Therefore a heap with n elements will have height = log
2
n
Because of the heap property, the minimum element will always be present at the root of
the heap. Thus the findmin operation will have worst-case O (1) running time.
3.2 Implementation of Insert and Deletemin
Insert
To insert an element say x, into the heap with n elements, we first create a hole in position (n+1)
and see if the heap property is violated by putting x into the hole. If the heap property is not
violated, then we have found the correct position for x. Otherwise, we ``push-up'' or ``percolate-
up'' x until the heap property is restored. To do this, we slide the element that is in the hole's
parent node into the hole, thus bubbling the hole up toward the root. We continue this process
until x can be placed in the whole. See Figure 6.2 for an example.


Figure 6.2: Insertion into a heap
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

38


Worstcase complexity of insert is O (h) where h is the height of the heap. Thus insertions are O
(log n) where n is the number of elements in the heap.
Deletemin
When the minimum is deleted, a hole is created at the root level. Since the heap now has one less
element and the heap is a complete binary tree, the element in the least position is to be
relocated. This we first do by placing the last element in the hole created at the root. This will
leave the heap property possibly violated at the root level. We now ``push-down'' or ``percolate-
down'' the hole at the root until the violation of heap property is stopped. While pushing down
the hole, it is important to slide it down to the less of its two children (pushing up the latter). This
is done so as not to create another violation of heap property. See Figure 6.3. It is easy to see that
the worst-case running time of deletemin is O (log n) where n is the number of elements in the
heap.


Figure 6.3: Deletemin
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

39


3.3 Creating Heap
Given a set of n elements, the problem here is to create a heap of these elements.
obvious approach is to insert the n elements, one at a time, into an initially empty heap.
Since the worstcase complexity of inserts is O (log n), this approach has a worstcase
running time of O ( n log n)
Another approach, purposed by Floyd in 1964, is to use a procedure called ``pushdown''
repeatedly, starting with the array consisting of the given n elements in the input-order.
o The function pushdown (first,last) assumes that the elements a[first], ..., a[last]
obey the heap property, except possibly the children of a[first]. The function
pushes down a[first] until the heap property violation is stopped.
o The following code will accomplish what is required:


Figure 6.4 shows an example of this for an array [5 9 4 2 1 6]
The worstcase complexity of this approach can be shown to b O (n) by virtue of the
following result.
Figure 6.4: Creation of heap
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

40


Result: For the perfect binary tree of height h containing2h + 1 - 1 nodes, the sum of the heights
of the nodes is 2h + 1 - 1 - (h + 1).
Proof: The perfect or full binary tree of height h has 1 node at height h, 2 nodes art height (h-1),
4 nodes at height (h-2), ...
Therefore the required sum S is given by
S
= 2
i
(h - i)
= h + 2(h - 1) + 4(h - 2) + ... + 2
h - 1

Multiplying both sides by 2 yields
2S = 2h + 4(h - 1) + 8(h - 2) + ... + 2
h

Subtracting this above equation from the equation prior to that, we obtain

S = 2h + 1 - 1 - (h + 1)

It is easy to see that the above is an upper bound on the sum of heights of nodes of a complete
binary tree. Since a complete binary tree of height h has between 2h and 2h + 1 nodes, the above
sum is therefore O (n) where n is the number of nodes in the heap.
Since the worstcase complexity of the heap building algorithm is of the order of the sum of
heights of the nodes of the heap built, we then have the worst-case complexity of heap building
as O (n).
3.4 Binomial Queues
Support merge, insert, and deletemin operations in O(log n) worstcase time.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

41

A binomial queue is a forest of heap-ordered trees.
o Each of the heap-ordered trees is of a constrained form known as a binomial tree.


Figure 6.5: Examples of Binomial Trees


A binomial tree of height 0 is a one-node tree; A binomial tree B
k
of height k is formed
by attaching a binomial tree B
k - 1
to the root of another binomial tree, B
k - 1
.
See Figure 6.5 for an example of binomial tree.
A binomial tree B
k
consists of a root with children B
0
, B
1
, ... B
k - 1
.
B
k
has exactly 2
k
nodes.
Number of nodes at depth d is kC
d

If we impose heap order on the binomial trees and allow at most one binomial tree of any
height, we can uniquely represent a priority queue of any size by a forest of binomial
trees.
Example: A priority queue of size 13 could be represented by the forest B
3
, B
2
, B
0
. A
natural way of writing this is: 1101. See Figure 6.6 for a binomial queue with six
elements.


Figure 6.6: A binomial queue H
1
with six elements
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

42


3.5 Binomial Queue Operations
Find-min
This is implemented by scanning the roots of all the trees. Since there are at most log n different
trees, this leads to a worstcase complexity of 0(log n).
Alternatively, one can keep track of the current minimum and perform find-min in 0(1) time if
we remember to update the minimum if it changes during other operations.
Merge
Let H1: 6 elements
H2: 7 elements
We are now left with
1 tree of height 0
3 trees of height 2
Note that of the 3 binomial trees of height 2, we could have any pair to get another binomial
heap. Since merging two binomial trees takes constant time and there are 0(log n) binomial trees,
merge takes 0(log n) in the worstcase. See Figures 6.7 and 6.8 for two examples.
Figure 6.7: Examples of Merging
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

43




Figure 6.8: Merge of H
1
and H
2

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

44


Insertion
This is a special case of merging since we merely create a one-node tree and perform a merge.
Worstcase complexity therefore will be O(log n).
More precisely; if the priority queue into which the element is being inserted has the property
that the smallest nonexistent binomial tree is Bi, the running time is proportional to i + 1.
Eg: Insertion into H3 (which has 13 elements) terminates in two steps.
Since each tree of a particular degree in a binomial queue is present with probability , if we
define the random variable X as representing the number of steps in an insert operation, then
Figure 6.9: Examples of Inserts
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

45



X = 1

= 2

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

46

= 3


*15pt

Thus average number of steps in an insert operation = 2
Thus we expect an insertion to terminate in two steps on the average. Further more, performing n
inserts on an initially empty binomial queue will take 0(n) worstcase time.
See Figures 6.9 and 6.10 for an example.
Deletemin
First find the binomial tree with the smallest root. Let this be B
k
. Let H be the original
priority queue.
Remove B
k
from the forest in H forming another binomial queue H
'
.
Now remove the root of B
k
creating binomial trees B
0
, B
1
, ..., B
k - 1
, which collectively
form a binomial queue H
''
.
Now, merge H
'
and H
''
.

Figure 6.10: Merges of H
'
and H
''

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

47


It is easy to see that the worstcase complexity of deletemin is 0(log n).
Implementation of a Binomial Queue
deletemin operation requires ability to find all subtrees of the root. Thus children of each
node should be available (say a linked list)
deletemin requires that the children be ordered by the size of their subtrees.
we need to make sure it is easy to merge tress. Two binomial trees can be merged only if
they have the same size, hence the size of the tree must be stored in the root. While
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

48

merging, one of the trees becomes the last child of the other, so we should keep track of
the last child of each node. A good data structure to use is a circular doubly linked list
what each node is of the following form:
data first left right rank No. of
child sibling sibling children
3.6 Binomial Amortized Analysis
Amortized Analysis of Merge
To merge two binomial queues, an operation similar to addition of binary integers is
performed:
At any stage, we may have zero, one, two, or three B
k
trees, depending on whether or not
the two priority queues contain a B
k
tree and whether or not a B
k
tree is carried over from
the previous step.
o if there is zero or more B
k
tree, it is placed as a tree in the resulting binomial
queue.
o If there are two, they are merged into a B
k + 1
tree and carried over
o If there are three, one is retained and other two merged.
Result 1:
A binomial queue of n elements can be built by n successive insertions in 0(n) time.
Brute force Analysis
Define the cost of each insertions to be
o 1time unit + an extra unit for each linking step
Thus the total will be n units plus the total number of linking steps.
o The 1st, 3rd, ... and each odd-numbered step requires no linking steps since there
is no B
0
present.
o A quarter of the insertions require only one linking step: 2nd, 6th, 10, ...
o One eighth of insertions require two linking steps.
We could do all this and bound the number of linking steps by n.
The above analysis will not help when we try to analyze a sequence of operations that
include more than just insertions.
Amortized Analysis
Consider the result of an insertion.
o If there is no B
0
tree, then the insertion costs one time unit. The result of insertion
is that there is now a B
0
tree and the forest has one more tree.
o If there is a B
0
tree but not B
1
tree, then insertion costs 2 time units. The new
forest will have a B
1
tree but not a B
0
tree. Thus number of trees in the forest is
unchanged.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

49

o An insertion that costs 3 time units will create a B
2
tree but destroy a B
0
and B
1
,
yielding one less tree in the forest.
o In general, an insertion that costs c units results in a net increase of 2 - c trees.
Since
a B
c - 1
tree is created
all B
i
trees, 0 i c - 1 are removed.
Thus expensive insertions remove trees and cheap insertions create trees.
Let t
i
=

c
i
=

We have
c
0
= 0
t
i
+ (c
i
- c
i - 1
) = 2
Result 2:
The amortized running times of Insert, Delete-min, and Merge are 0(1), 0(log n), and
0(log n) respectively.
Potential function = # of trees in the queue
To prove this result we choose:
Insertion
t
i
=

c
i
=

a
i
= t
i
+ (c
i
- c
i - 1
)
a
i
=
2 i
t
i

=
a
i
- (c
n
- c
0
)
= 2n - (c
n
- c
0
)
As long as (c
n
- c
0
) is positive, we are done.
In any case (c
n
- c
0
) is bounded by log n if we start with an empty tree.
Merge:
Assume that the two queues to be merged have n
1
and n
2
nodes with T
1
and T
2
trees. Let n
= n
1
+ n
2
. Actual time to perform merge is given by:
t
i
= 0(logn
1
+ logn
2
)
= 0(max(logn
1
, logn
2
)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

50

= 0(log n)
(c
i
- c
i - 1
) is at most (log n) since there can be at most (log n) trees after merge.
Deletemin:
The analysis here follows the same argument as for merge.
3.7 Lazy Binomial Queues
Binomial queues in which merging is done lazily.
Here, to merge two binomial queues, we simply concatenate the two lists of binomial trees. In
the resulting forest, there may be several trees of the same size.
Because of the lazy merge, merge and insert are both worstcase 0(1) time.
Deletemin:
o Convert lazy binomial queue into a standard binomial queue
o Do deletemin as in standard queue.
Fibonacci Heaps
Fibonacci heap supports all basic heap operations in 0(1) amortized time, with the exception of
deletemin and delete which take 0(log n) amortized time.
Fibonacci heaps generalize binomial queues by adding two new concepts:
A different implementation of decrease-key
Lazy merging: Two heaps are merged only when it is required.
It can be shown in a Fibonacci heap that any node of rank r 1 has at least Fr + 1 descendents.

Assignment Questions:

Priority Queues
1. What are Binary Heaps ? application area of Binary heaps?
2. Discuss implementation of Creating Heap, Insert and Delete min?
3. What are Binomial Queues?explainBinomial Queue Operations?
4. Binomial Amortized Analysis?
5. What is Lazy Binomial Queues?






www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

51

UNIT-IV
Topics: Graphs : Operations on Graphs: Vertex insertion, vertex deletion, find vertex, edge
addition, edge deletion, Graph Traversals- Depth First Search and Breadth First
Search(Non recursive) .
Graph storage Representation- Adjacency matrix, adjacency lists.

4.1 Graphs(directed graphs)
In numerous situations in Computer Science, Mathematics, Engineering, and many other
disciplines, Graphs have found wide use in representing arbitrary relationships among data
objects. There are two broad categories of graphs: Directed graphs (digraphs) and Undirected
graphs. In this chapter, we study some important graph-based algorithms in Computer Science
and examine data structure related issues.
A directed graph or digraph G comprises
1. a set of vertices V
2. a set of directed edges or arcs, E (an arc is an ordered pair of vertices)
Example: See Figure 7.1


Figure 7.1: A digraph with 4 vertices and 5
arcs

V = {1, 2, 3, 4}
E = {(1,2), (1,3), (2,4), (3,2), (4,3)}
If there is an arc (v, w), we say w is adjacent to v.
A path is a sequence of vertices v
1
, v
2
,...v
n
such that the vertex pairs (v
1
, v
2
),(v
2
, v
3
),
..., (v
n - 1
, v
n
) are arcs in the graph. The length of a path is the number of arcs on the path.
A single vertex v by itself denotes a path of length zero.
A path v
1
, v
2
,..., v
n
is said to be simple if the vertices are distinct, except possibly v
1
and
v
n
. If v
1
= v
n
and n > 1, we call such a path a simple cycle.

Graph operations:
Edge:
This is a bidirected connection between two vertices
Arc:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

52

This is a directed connection from one vertex to other.
Create a node which has the following fields:
1.pointer to the next vertex
2.data (name of the vertex)
3.indegree
4.outdegree
5.boolean data which will tell whether the vertex is visited or not
6.pointer to the arc and arc is a node having the fields 1.data or name of vertex 2.boolean
data 3.pointer to the next arc.

1.insert vertex:
Create a vertex node
2.delete vertex:
To delete a vertex see that its indegree and outdegree is zero if not we cant delete the vertex
3.insert arc:
Create a arc node and connect that to the corresponding vertex
4.delete arc:
Delete the arc node from the corresponding source and destination vertex
5.find vertex:
Traverse throught the adjacency list and return the vertex if found else unsuccessful find.
Note: Implementation of graph operations using adjacency list is given at the end of the unit.
6.graph Traversal
1. Depth first traversal
2. Breadth first traversal
4.2 Depth first Traversal:
An important way of traversing all vertices of a digraph, with applications in many
problems.
Let L[v] be the adjacency list for vertex v. To do a depth first search of all vertices
emanating from v, we have the following recursive scheme:

void dfs (vertex v)
{
vertex w;
mark v as visited;
for each vertex w L[v]
dfs(w)
}

Figure 7.10: Depth first search of a digraph
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

53




To do a dfs on the entire digraph, we first unmark all vertices and do a dfs on all
unmarked vertices:

{
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

54

for v V
mark v as unvisited;
for v V
if (v is unvisited)
dfs(v);
}
Example: See Figure 7.10.
DFS Arcs:
Tree Arcs: a b, N(a) < N(b) leading to unvisited vertices
Non-Tree Arcs:
o back arcs: a b, N(a) N(b), b is an ancestor
o forward arcs: a b, N(a) < N(b), b is a proper descendant
o cross arcs: a b. Neither ancestor nor descendant
4.3 Breadth First Traversal:
Level by level traversal
A queue data structure is used



Figure 7.11: Breadth-first search of the digraph in Figure 7.11
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

55


The complexity of both DFS and BFS is O(e).
Implementation of Breadth-First Search
void bfs (v) /* visits all vertices connected to v
in breadth-first fashion */
vertex x, y;
vertexqueue Q;
{
mark [v] = visited ;
enqueue (v, Q);
while (Q not empty)
{ x = front (Q) ;
dequeue (Q) ;
for (each vertex y adjacent to x)
if (mark[y] = unvisited)
{
mark[y] = visited ;
enqueue(y,Q) ;
insert ((x,y), T)
}
}
}

Example: See Figure 7.11.
Undirected graphs:
G = (V, E)
Each edge is an unordered pair of vertices
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

56

v and w are adjacent if (v, w) or equivalently (w, v) is an edge. The edge (v, w) is incident
upon the vertices v and w.
path is a sequence of vertices v
1
, v
2
,..., v
n
, such that (v
i
, v
i + 1
) is an edge for 1 i < n. A
path is simple if all vertices on it are distinct, except possibly v
1
and v
n
. The path has length n - 1
and connects v
1
and v
n
.
connected if every pair of vertices is connected.
subgraph G' = (V', E') of a graph G = (V, E) is a graph such that
1.
V' V
2.
E' contains some edges (u, v) of E such that both u and vare in V'
If E' contains all edges (u, v) E for which u, v V', the subgraph G' is called an induced
subgraph.
connected component of a graph is an induced subgraph which is connected and which is
not a proper subgraph of any other connected subgraph of G (i.e., maximal connected induced
subgraph).
simple cycle is a simple path of length 3, that connects a vertex to itself.
A graph is cyclic if it contains at least one (simple) cycle.
free tree is a connected, acyclic graph.


Figure 8.1: Examples of undirected graphs
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

57




1.
Every free tree with n vertices contains exactly n - 1 edges
2.
If we add any edge to a free tree, we get a cycle.
8.1 for several examples.
Depth first traversal(undirected graphs):
See Figure 8.2 for an example. Assume the following adjacency lists.
Vertex Adj. List

a (b, c, d, e)
b (a, d, e)
c (a, f, g)
d (a, b, e)
e (a, b, d)
f (c, g)
g (c, f)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

58

Figure 8.2: Depth-first search of an undirected graph

DFS of an undirected graph involves only two types of arcs.
1.Tree arcs.
2.Back arcs (there is no distinction between back arcs and forward arcs)
Cross arcs also don't exist because given any two vertices, if there exists an arc between
them, then one of them will be an ancestor and the other a descendant in the DFS. Thus
all arcs that would have been cross arcs in the DFS of a digraph would become tree arcs
in the case of an undirected graph.
Breadth First Traversal( Undirected Graphs)
Example: See Figure 8.3.
Assume the adjacency lists:
Vertex Adj. List

a (b, c, d, e)
b (a, d, e)
c (a, f, g)
d (a, b, e)
e (a, b, d)
f (c, g)
g (c, f)


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

59

Figure 8.3: Breadth-first search of an undirected graph







4.4 Graph storage representation:

1. Adjacency Matrix: The matrix is of order (n x n) where n is the number of vertices. The
adjacency matrix for the graph in Figure above is

2. Adjacency List: Figures 7.2 and 7.3 provide two different adjacency list representations for the
graph of Figure above
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

60

Figure 7.2: Adjacency list representation of the digraph


Figure 7.3: Adjacency list using cursors

Graph operations implementation using cpp:
To implement operations on graph 1.vertex operation 2.vertex deletion 3.finding vertex
4.edge addition and deletion

#include<iostream.h>
#include<conio.h>
#include<stdlib.h>
struct vertex
{
vertex *pnextvertex;
int data;
int indegree;
int outdegree;
short processed;
struct arc *parc;
};
struct arc
{
struct vertex *destination;
arc *pnextarc;
};
class graph
{
public:
int count;
vertex *first;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

61

public:
graph(void);
//~graph(void);
int insertvertex(int);
int deletevertex(int);
int insertarc(int,int);
int deletearc(int,int);
int findvertex(int);
void display();
};
void graph::graph(void)
{
first=NULL;
count=0;
}
int graph::insertvertex(int x)
{
vertex *locptr,*newptr,*predptr;
newptr=new vertex;
if(newptr)
{
newptr->pnextvertex=NULL;
newptr->data=x;
newptr->indegree=0;
newptr->outdegree=0;
newptr->processed=0;
newptr->parc=NULL;
count++;
}
else
return -1;
locptr=first;
if(!locptr)
first=newptr;
else
{
predptr=NULL;
while(locptr&&x>locptr->data)
{
predptr=locptr;
locptr=locptr->pnextvertex;
}
if(!predptr)
first=newptr;
else
predptr->pnextvertex=newptr;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

62

newptr->pnextvertex=locptr;
}
return 1;
}
void graph::display()
{
vertex *p;
arc *ssl;
p=first;
ssl=p->parc;
while(p->pnextvertex!=NULL||p->data>0)
{
ssl=p->parc;
cout<<p->data;
while(ssl->destination->data>0)
{
cout<<"->" ;
cout<<ssl->destination->data;
ssl=ssl->pnextarc;
}
p=p->pnextvertex;
cout<<"\n";
}
}
int graph::deletevertex(int dv)
{
vertex *predptr,*walkptr;
if(!first)
return -2;
predptr=NULL;
walkptr=first;
while(walkptr&&dv>walkptr->data)
{
predptr=walkptr;
walkptr=walkptr->pnextvertex;
}
if(!walkptr||dv!=(walkptr->data))
return -2;
if(walkptr->indegree>0||(walkptr->outdegree>0))
return -1;
if(!predptr)
first=walkptr->pnextvertex;
else
predptr->pnextvertex=walkptr->pnextvertex;
count--;
delete walkptr;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

63

return 1;
}
int graph::insertarc(int s,int d)
{
arc *newptr,*arcpredptr,*arcwalkptr;
vertex *vertfromptr,*verttoptr;
newptr=new arc;
if(!newptr)
return (-1);
vertfromptr=first;
while(vertfromptr&&s>vertfromptr->data)
vertfromptr=vertfromptr->pnextvertex;
if(!vertfromptr||s!=vertfromptr->data)
return -2;
verttoptr=first;
while(verttoptr&&d>verttoptr->data)
verttoptr=verttoptr->pnextvertex;
if(!verttoptr||d!=(verttoptr->data))
return -3;
++vertfromptr->outdegree;
++verttoptr->indegree;
newptr->destination=verttoptr;
if(!vertfromptr->parc)
{
vertfromptr->parc=newptr;
newptr->pnextarc=NULL;
return 1;
}
arcpredptr=NULL;
arcwalkptr=vertfromptr->parc;
while(arcwalkptr&&d>=arcwalkptr->destination->data)
{
arcpredptr=arcwalkptr;
arcwalkptr=arcwalkptr->pnextarc;
}
if(!arcpredptr)
vertfromptr->parc=newptr;
else
arcpredptr->pnextarc=newptr;
newptr->pnextarc=arcwalkptr;
return 1;
}
int graph::deletearc(int so,int de)
{
vertex *fromvertexptr,*tovertexptr;
arc *prearcptr,*arcwalkptr;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

64

if(!first)
return (-2);
fromvertexptr=first;
while(fromvertexptr&&so>fromvertexptr->data)
fromvertexptr=fromvertexptr->pnextvertex;
if(!fromvertexptr||so!=fromvertexptr->data)
return -2;
if(!fromvertexptr->parc)
return -3;
prearcptr=NULL;
arcwalkptr=fromvertexptr->parc;
while(arcwalkptr&&de>arcwalkptr->destination->data)
{
prearcptr=arcwalkptr;
arcwalkptr=arcwalkptr->pnextarc;
}

if(!arcwalkptr||de!=arcwalkptr->destination->data)
return -3;
tovertexptr=arcwalkptr->destination;
--fromvertexptr->outdegree;
--tovertexptr->indegree;
if(!prearcptr)
fromvertexptr->parc=arcwalkptr->pnextarc;
else
prearcptr->pnextarc=arcwalkptr->pnextarc;
delete arcwalkptr;
return 1;
}
int graph::findvertex(int f)
{
vertex *predptr,*walkptr;
if(!first)
return -2;
predptr=NULL;
walkptr=first;
while(walkptr&&f>walkptr->data)
{
predptr=walkptr;
walkptr=walkptr->pnextvertex;
}
if(!walkptr||f!=walkptr->data)
return -2;
else
return 1;
}
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

65



main()
{
int ch,v,z,del,s,d,so,de,fi;
graph ob;
clrscr();
while(1)
{
cout<<"\n";

cout<<"MENU"<<"\n1.insertvertex\n2.display\n3.count\n4.deletevertex\n5.insertarc";
cout<<"\n6.deletearc\n7.findvertex\n8.exit";
cout<<"\n";
cout<<"enter ur choice from the menu:";
cin>>ch;
switch(ch)
{
case 1://insert vertex
cout<<"\n enter the vertex:";
cin>>v;
z=ob.insertvertex(v);
if(z==1)
cout<<"\ninsertvertex"<<" "<<v<<" "<<"is successful";
else
cout<<"\ninsertvertex"<<" "<<v<<" "<<"is unsuccessful due to
memory overflow";
break;
case 2:ob.display();
break;
case 3:cout<<"\nthe number of vertices in the adjacency list
are:"<<""<<ob.count;
break;
case 4:cout<<"\n enter the vertex to be deleted:";
cin>>del;
z=ob.deletevertex(del);
if(z==1)
cout<<"\n the vertex"<<" "<<del<<" "<<"is deleted
successfully";
else if(z==1)
cout<<"\n the vertex"<<" "<<del<<" "<<"cannot be deleted";
else
cout<<"\n the graph is empty";
break;
case 5:cout<<"\n enter the source vertex:";
cin>>s;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

66

cout<<"\n enter the destination vertex:";
cin>>d;
z=ob.insertarc(s,d);
if(z==1)
cout<<"\n arc inserted successfully between"<<" "<<s<<" "<<d;
else if(z==-2||z==-3)
cout<<"\ncan't find source"<<" "<<s<<" "<<"or destination"<<"
"<<d;
else
cout<<"\ngraph empty";
break;
case 6:cout<<"\nenter the source vertex:";
cin>>so;
cout<<"\n enter the destination vertex:";
cin>>de;
z=ob.deletearc(so,de);
if(z==1)
cout<<"\n arc deleted successfully between"<<so<<" "<<de;
else if(z==-2||z==-3)
cout<<"\n graph is empty or the vertices"<<" "<<so<<"
"<<de<<" "<<"not found";
break;
case 7:cout<<"\n enter the vertex to be found:";
cin>>fi;
z=ob.findvertex(fi);
if(z==1)
cout<<"\n the vertex"<<" "<<fi<<"is found";
else
cout<<"\n the vertex"<<" "<<fi<<"is not found";
break;
case 8:exit(0);
default:cout<<"\n valid data.....try again";
}
}
}
OUTPUT:
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 1
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

67

Enter the vertex :1
Insert vertex 1 is successful
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 1
Enter the vertex :2
Insert vertex 2is successful
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 1
Enter the vertex :3
Insert vertex 3 is successful
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 1
Enter the vertex :4
Insert vertex 4 is successful
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

68

8.exit
Enter ur choice from the menu : 3
The number vertices in the adjacency list are 4
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 5
Enter the source vertex :1
Enter the destination vertex :3
Arc Inserted vertex 1 is successfully b/w 1 3
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 5
Enter the source vertex :1
Enter the destination vertex :2
Arc Inserted vertex 1 is successfully b/w 1 2
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 5
Enter the source vertex :3
Enter the destination vertex :1
Arc Inserted vertex 1 is successfully b/w 3 1
MENU
1.insert vertex
2.diplay
3.count
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

69

4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 2
1->2->3
2
3->1
4
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 3
The number vertices in the adjacency list are 4
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 4
Enter the vertex to be deleted :1
The vertex cant be deleted
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 4
Enter the vertex to be deleted :4
The vertex 4 cant be deleted successfully
MENU
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

70

1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 4
Enter the vertex to be deleted :2
The vertex 2 is found successfully
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 6
Enter the source vertex :1
Enter the destination vertex :2
Vertex deleted successfully b/w 1 2
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
Enter ur choice from the menu : 2
1->3
2
3->1
MENU
1.insert vertex
2.diplay
3.count
4.deletevertex
5.insertarc
6.deletearc
7.findvertex
8.exit
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

71

Enter ur choice from the menu : 8

Implementation of dft:
#include<iostream.h>
#include<conio.h>
#include<stdlib.h>
int cost[10][10],i,j,k,n,stk[10],top,v,visit[10],visited[10];
void main()
{
int m;
clrscr();
cout<<"enter no of vertices";
cin>>n;
cout<<"enter no of edges";
cin>>m;
cout<<"\nEDGES\n";
for(k=1;k<=m;k++)
{
cin>>i>>j;
cost[i][j]=1;
}
cout<<"enter intial vertex";
cin>>v;
cout<<"order of visited vertices";
cout<<v<<" ";
visited[v]=1;
k=1;
while(k<n)
{
for(j=n;j>=1;j--)
if(cost[v][j]!=0 && visited[j]!=1 && visit[j]!=1)
{
visit[j]=1;
stk[top]=j;
top++;
}
v=stk[--top];
cout<<v<<" ";
k++;
visit[v]=0;
visit[v]=1;
}
getch();
}


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

72

Implementation of bft:
#include<iostream.h>
#include<conio.h>
#include<stdlib.h>
int cost[10][10],i,j,k,n,qu[10],front,rare,v,visit[10],visited[10];
void main()
{
int m;
clrscr();
cout<<"enter no of vertices";
cin>>n;
cout<<"enter no of edges";
cin>>m;
cout<<"\nEDGES\n";
for(k=1;k<=m;k++)
{
cin>>i>>j;
cost[i][j]=1;
}
cout<<"enter intial vertex";
cin>>v;
cout<<"order of visited vertices";
cout<<v<<" ";
visited[v]=1;
k=1;
while(k<n)
{
for(j=1;j<=n;j++)
if(cost[v][j]!=0 && visited[j]!=1 && visit[j]!=1)
{
visit[j]=1;
qu[rare++]=j;
}
v=qu[front++];
cout<<v<<" ";
k++;
visit[v]=0;
visit[v]=1;
}
getch();
}


Assignment Questions:

Graphs
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

73

1. Write a notes on operations on Graphs
i. Vertex insertion
ii. vertex deletion
iii. find vertex
iv. edge addition
v. edge deletion
2. Discuss Graph Traversals- Depth First Search and Breadth First Search.
3. Give the implementation of DFS and BFS (Non recursive) .
4. Explain Graph storage Representation
i. Adjacency matrix
ii. adjacency lists.






















www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

74

UNIT-5
Topics: Graph algorithms : Minimum-Cost Spanning Trees- Prim's Algorithm, Kruskal's
Algorithm Shortest Path Algorithms: Dijkstra's Algorithm, All Pairs Shortest Paths
Problem:Floyd's Algorithm, Warshall's Algorithm.

5.1 Graph algorithms:
Minimum-Cost Spanning Trees
Let G = (V, E) be a connected graph in which each edge (u, v) E has an associated cost C(u,
v).
A Spanning Tree for G is a subgraph of G that it is a free tree connecting all vertices in
V. The cost of a spanning tree is the sum of costs on its edges.
An MST of G is a spanning tree of G having a minimum cost.
See Figure 8.4 for several examples.
Figure 8.4: Spanning trees in a connected graph

MST Property
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

75

SupposeG = (V, E)is a connected graph with costs defined on all e E. Let U be some proper
subset of V.
If (u, v) is an edge of lowest cost such that u Uand v V - U, then there exists an MST that
includes (u, v) as an edge. See Figure 8.5.

Figure 8.5: An illustration of MST property

Proof of MST Property
Suppose to the contrary that there is no MST for the graph Gwhich includes the edge (u, v).
Let T be any MST of G. By our assumption, T does not contain (u, v). Adding (u, v) to Twill
introduce a cycle since T is a free tree. This cycle involves (u, v). Therefore there is a path fromv
to u that does not pass through this edge. This means that another edge (u', v') in T such that u'
Uandv' V - U. See Figure 8.6.

Figure 8.6: Construction of a minimal spanning
tree

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

76

Deleting edge (u', v') breaks the cycle and yields a spanning tree T' whose cost is certainly
that of T since C(u, v) C(u', v'). Thus we have constructed an MST that includes (u, v).

Figure 8.7: An example graph for finding an MST

To illustrate, consider the graph in Figure 8.7 and refer to Figures 8.8 and 8.9. Consider the sets:
U = {1, 2, 5}

V - U = {3, 4, 6}.
Figure 8.8: A spanning tree in the above graph,
with cost 26

Spanning tree with cost = 26. Now, least cost edge from U to V - U is (1,3).
By including (1,3) in the above spanning tree, a cycle will form (for example, 1-2-3-1). Let us
replace the edge (2,3) by the edge (1,3).

Figure 8.9: Another spanning tree, but with
cost 22
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

77


This has yielded a ST with cost = 22
DEF.
A set of edges T in a connected graph promising if it can be extended to produce a
minimal spanning tree for the graph.
By definition, T = is always promising since a weighted connected graph always has
at least one MST.
Also, if a promising set of edges T is a spanning tree, then it must be an MST.
Def: An edge is said to leave a given set of nodes if exactly one end of this edge is in the set.
MST Lemma: Let
G = (V, E) be weighted connected graph
U V a strict subset of nodes in G
T E a promising set of edges in E such that no edge in T leaves U
e a least cost edge that leaves U
Then the set of edgesT
'
= T {e} is promising.

Figure 8.10: Illustration of MST Lemma

Proof
Since T is promising, it can be extended to obtain an MST, says S. If e S, there is nothing to
prove.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

78

If e S, then we add edge e to S, we create exactly one cycle (since S is a spanning tree). In
this cycle, since e leaves U there exists at least one other edge, f say, that also leaves U
(otherwise the cycle could not close). See Figure 8.10.
If we now remove f, the cycle disappears and we obtain a new tree R that spans G.
Note thatR = (S {e}) - {f}
Also note that weight of e weight of f since e is a least cost edge leaving U.
Therefore R is also an MST and it includes edge e. Furthermore T R and so can be extended
to the MST R. Thus T is a promising set of edges.
5.2 Prim's Algorithm
This algorithm is directly based on the MST property. Assume that V = {1, 2,..., n}.
{
T = ;
U = { 1 };
while (U V)
{
let (u, v) be the lowest cost edge
such that u U and v V - U;
T = T {(u, v)}
U = U {v}
}
}
See Figure 8.11 for an example.
O(n
2
) algorithm.
Proof of Correctness of Prim's Algorithm
Theorem: Prim's algorithm finds a minimum spanning tree.
Proof: Let G = (V,E) be a weighted, connected graph. Let T be the edge set that is grown in
Prim's algorithm. The proof is by mathematical induction on the number of edges in T and using
the MST Lemma.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

79

Basis: The empty set is promising since a connected, weighted graph always has at least one
MST.
Induction Step: Assume that T is promising just before the algorithm adds a new edge e = (u,v).
Let U be the set of nodes grown in Prim's algorithm. Then all three conditions in the MST
Lemma are satisfied and therefore T U e is also promising.
When the algorithm stops, U includes all vertices of the graph and hence T is a spanning tree.
Since T is also promising, it will be a MST.
Implementation of Prim's Algorithm
Use two arrays, closest and lowcost.
For i V - U, closest[i] gives the vertex in U that is closest to i
For i V - U, lowcost[i] gives the cost of the edge (i, closest(i))
Figure 8.11: Illustration of Prim's algorithm

Figure 8.12: An example graph for illustrating Prim's
algorithm
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

80


At each step, we can scan lowcost to find the vertex in V - U that is closest to U. Then we
update lowcost and closest taking into account the new addition to U.
Complexity: O(n
2
)
Example: Consider the digraph shown in Figure 8.12.


U = {1} V - U = {2, 3, 4, 5, 6}
closest lowcost
V - U U
2 1 6
3 1 1
4 1 5
5 1

6 1

Select vertex 3 to include in U


U = {1, 3} V - U = {2, 4, 5, 6}
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

81

closest lowcost

V - U U
2 3 5
4 1 5
5 3 6
6 3 4
Now select vertex 6

U = {1, 3, 6} V - U = {2, 4, 5, 6}
closest lowcost

V - U U
2 3 5
4 6 2
5 3 6
Now select vertex 4, and so on
5.3 Kruskal's Algorithm
Complexity is O(elog e) where e is the number of edges. Can be made even more
efficient by a proper choice of data structures.
Greedy algorithm
Algorithm:

Let G = (V, E) be the given graph, with | V| = n
{
Start with a graph T = (V, ) consisting of only the
vertices of G and no edges; /* This can be viewed as n
connected components, each vertex being one connected component */
Arrange E in the order of increasing costs;
for (i = 1, i n - 1, i + +)
{ Select the next smallest cost edge;
if (the edge connects two different connected components)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

82

add the edge to T;
}
}

At the end of the algorithm, we will be left with a single component that comprises all the
vertices and this component will be an MST for G. Proof of Correctness of Kruskal's
Algorithm
Theorem: Kruskal's algorithm finds a minimum spanning tree.
Proof: Let G = (V, E) be a weighted, connected graph. Let T be the edge set that is
grown in Kruskal's algorithm. The proof is by mathematical induction on the number of
edges in T.
o We show that if T is promising at any stage of the algorithm, then it is still
promising when a new edge is added to it in Kruskal's algorithm
o When the algorithm terminates, it will happen that T gives a solution to the
problem and hence an MST.
Basis: T = is promising since a weighted connected graph always has at least one
MST.
Induction Step: Let T be promising just before adding a new edge e = (u, v). The edges
T divide the nodes of G into one or more connected components. u and v will be in two
different components. Let U be the set of nodes in the component that includes u. Note
that
o U is a strict subset of V
o T is a promising set of edges such that no edge in T leaves U (since an edge T
either has both ends in U or has neither end in U)
o e is a least cost edge that leaves U (since Kruskal's algorithm, being greedy,
would have chosen e only after examining edges shorter than e)
The above three conditions are precisely like in the MST Lemma and hence we can
conclude that the T {e} is also promising. When the algorithm stops, T gives not
merely a spanning tree but a minimal spanning tree since it is promising.

Figure 8.13: An illustration of Kruskal's algorithm
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

83


Program
void kruskal (vertex-set V; edge-set E; edge-set T)
int ncomp; /* current number of components */
priority-queue edges /* partially ordered tree */
mfset components; /* merge-find set data structure */
vertex u, v; edge e;
int nextcomp; /* name for new component */
int ucomp, vcomp; /* component names */
{
makenull (T); makenull (edges);
nextcomp = 0; ncomp = n;
for (v V) /* initialize a component to have one vertex of V*/
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

84

{ nextcomp++ ;
initial (nextcomp, v, components);
}
for (e E)
insert (e, edges); /* initialize priority queue of edges */
while (ncomp > 1)
{
e = deletemin (edges);
let e = (u, v);
ucomp = find(u, components);
vcomp = find(v, components);
if (ucomp! = vcomp)
{
merge (ucomp, vcomp, components);
ncomp = ncomp - 1;
}
}
}

Implementation
Choose a partially ordered tree for representing the sorted set of edges
To represent connected components and interconnecting them, we need to implement:
1.
MERGE (A, B, C) . . . merge components A and B in C and call the result A or B
arbitrarily.
2.
FIND (v, C) . . . returns the name of the component of C of which vertex v is a member.
This operation will be used to determine whether the two vertices of an edge are in the
same or in different components.
3.
INITIAL (A, v, C) . . . makes A the name of the component in C containing only one
vertex, namely v
The above data structure is called an MFSET
Running Time of Kruskal's Algorithm
Creation of the priority queue
*
If there are e edges, it is easy to see that it takes O(elog e) time to insert the edges into a
partially ordered tree
*
O(e) algorithms are possible for this problem
Each deletemin operation takes O(log e) time in the worst case. Thus finding and deleting
least-cost edges, over the while iterations contribute O(log e) in the worst case.
The total time for performing all the merge and find depends on the method used.
O(elog e) without path compression
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

85

O(e (e))
with the path compression, where
(e)
is the inverse of an Ackerman function.
Example: See Figure 8.13.
E = {(1,3), (4,6), (2,5), (3,6), (3,4), (1,4), (2,3), (1,2), (3,5), (5,6) }
5.4 Single Source Shortest Paths Problem: Dijkstra's Algorithm
Greedy algorithm
It works by maintaining a set S of ``special'' vertices whose shortest distance from the
source is already known. At each step, a ``non-special'' vertex is absorbed into S.
The absorption of an element of V - S into S is done by a greedy strategy.
The following provides the steps of the algorithm.
Let
V = {1, 2,..., n} and source = 1
(7.1
)
C[i,
j]
=


{
S= { 1 };
for (i = 2; i < n; i++)
D[i] = C[1,i];
for (i=1; i < = n-1; i++)
{
choose a vertex w V-S such that D[w] is a minimum;
S = S {w };
for each vertex v V-S
D[v] = min (D[v], D[w] + C[w, v])
}
}
The above algorithm gives the costs of the shortest paths from source vertex to every
other vertex.
The actual shortest paths can also be constructed by modifying the above algorithm.
Theorem: Dijkstra's algorithm finds the shortest paths from a single source to all other nodes of
a weighted digraph with positive weights.
Proof: Let V = 1, 2, ..., n and 1 be the source vertex. We use mathematical induction to show
that
(a)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

86

If a node i ( 1) S, then D[i] gives the length of the shortest path from the source to
i.
(b)
if a node i S, then D[i] gives the length of the shortest special path from the source to i.
Basis: Initially S = 1 and hence (a) is vacuously true. For i S, the only special path from the
source is the direct edge if present from source to i and D is initialized accordingly. Hence (b) is
also true.
Induction for condition (a)
The induction hypothesis is that both (a) and (b) hold just before we add a new vertex v
to S.
For every node already in S before adding v, nothing changes, so condition (a) is still
true.
We have to only show (a) for v which is just added to S.



Figure 7.4: The shortest path to v cannot visit x

Before adding it to S, we must check that D[v] gives the length of the shortest path from
source to v. By the induction hypothesis, D[v] certainly gives the length of the shortest
special path. We therefore have to verify that the shortest path from the source to v does
not pass through any nodes that do not belong to S.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

87

Suppose to the contrary. That is, suppose that when we follow the shortest path from
source to v, we encounter nodes not belonging to S. Let x be the first such node
encountered (see Figure 7.4 ). The initial segment of the path from source to x is a special
path and by part (b) of the induction hypothesis, the length of this path is D[x]. Since
edge weights are no-negative, the total distance to v via x is greater than or equal to D[x].
However since the algorithm has chosen v ahead of x, D[x] D[v]. Thus the path via x
cannot be shorter than the shortest special path leading to v.
Induction step for condition (b): Let v and S. When v is added to S, these are
two possibilities for the shortest special path from source to w:
1.
It remains as before
2.
It now passes through v ( and possibly other nodes in S)
In the first case, there is nothing to prove. In the second case, let y be the last node of S visited
before arriving at w. The length of such a path is D[y] + C[y,w].
At first glance, to compute the new value of d[w], it looks as if we should compare the
old value of D[w] with D[y] + C[y,w] for every y S (including v)
This comparison was however made for all y S except v, when y was added to S in
the algorithm. Thus the new value of D[w] can be computed simply by comparing the old
value with D[v] + C[v,w]. This the algorithm does.
When the algorithm stops, all the nodes but one are in S and it is clear that the vector D[1], D[2],
..., D[n]) contains the lengths of the shortest paths from source to respective vertices.
Example: Consider the digraph in Figure 7.5.

Figure 7.5: A digraph example for Dijkstra's algorithm

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

88

Initially:
S = {1} D[2] = 10 D[3] = D[4] = 30 D[5] = 100
Iteration 1
Select w = 2, so that S = {1, 2}
D[3] =
min( , D[2] + C[2, 3]) = 60
(7.2)
D[4] = min(30, D[2] + C[2, 4]) = 30 (7.3)
D[5] = min(100, D[2] + C[2, 5]) = 100
Iteration 2
Select w = 4, so that S = {1, 2, 4}
D[3] = min(60, D[4] + C[4, 3]) = 50 (7.4)
D[5] = min(100, D[4] + C[4, 5]) = 90
Iteration 3
Select w = 3, so that S = {1, 2, 4, 3}
D[5] = min(90, D[3] + C[3, 5]) = 60
Iteration 4
Select w = 5, so that S = {1, 2, 4, 3, 5}
D[2] = 10 (7.5)
D[3] = 50 (7.6)
D[4] = 30 (7.7)
D[5] = 60
Complexity of Dijkstra's Algorithm
With adjacency matrix representation, the running time is O(n2) By using an adjacency list
representation and a partially ordered tree data structure for organizing the set V - S, the
complexity can be shown to be
O(elog n)
where e is the number of edges and n is the number of vertices in the digraph.
All Pairs Shortest Paths Problem: Floyd's Algorithm
Given












www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

89

5.5 Floyd's Algorithm
This is an O(n
3
) algorithm, where n is the number of vertices in the digraph.
Uses the principle of Dynamic Programming
Let V = {1, 2,..., n}. The algorithm uses a matrix A[1..n][1..n] to compute the lengths of
the shortest paths.
Initially,
A[i, j] =
C[i, j] if i j
(7.12)
= 0 if i = j
Note that C[i, j] is taken as if there is no directed arc from ito j.
The algorithm makes n passes over A. Let A
0
, A
1
,..., A
n
be the values of A on the n passes,
with A
0
being the initial value. Just after the k-1
th
iteration,
A
k - 1
[i, j] =

(7.13)


(7.14)




See Figure 7.8
The k
th
pass explores whether the vertex k lies on an optimal path from i to j, i, j


Figure 7.8: Principle of Floyd's algorithm

We use
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

90

A
k
[i, j] = min
The algorithm:

void Floyd (floatC[n - 1][n - 1], A[n - 1][n - 1])
{ int i, j, k;
{ for(i = 0, i n - 1;i + +)
for(j = 0;j n - 1, j + +)
A[i, j] = C[i, j];
for(i = 0;i n - 1;i + +)
A[i, i] = 0;
for(k = 0;k n - 1;k + +);
{
for(i = 0;i n - 1;i + +)
{
for(j = 0;j n - 1, j + +)
if(A[i, k] + A[k, j] < A[i, j])
A[i, j] = A[i, k] + A[k, j]
}
}
}
}

Figure 7.9: A digraph example for Floyd's algorithm
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

91


Example: See Figure 7.9.
C =
A
0
=
; A
1
=
(7.15)
A
2
=
; A
3
=

With adjacency matrix representation, Floyd's algorithm has a worst case complexity of
O(n
3
) where n is the number of vertices
If Dijkstra's algorithm is used for the same purpose, then with an adjacency list
representation, the worst case complexity will be O(nelog n). Thus if e is O(n
2
), then the
complexity will be O(n
3
log n) while if e is O(n), then the complexity is O(n
2
log n).
5.6 Warshall's Algorithm

Def: Given the adjacency matrix C of any digraph C = (v,E), the matrix A is called the
transitive closure of C if i, jtex2html_image_mark>#tex2html_wrap_inline27764# V,
A[i, j] = 1 if there is a path of length one or more from vertex i to vertex j
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

92

= 0 otherwise
Warshall's algorithm enables to compute the transitive closure of the adjacency matrix f
any digraph. Warshall's algorithm predates Floyd's algorithm and simple uses the
following formula in the kth passes of Floyd's algorithm:
Ak[i, j] = Ak - 1[i, j] (Ak - 1[i, k] Ak - 1[k, j])
where the matrix elements are treated as boolean values 0 or 1 and the symbols
and denote ``logical or'' and ``logical and'' respectively. It is easy to see that
Warshall's algorithm has a worst case complexity of O(n3) where n is the number of
vertices of the graph.

Assignment Questions:
Graph algorithms
1. Explain Minimum-Cost Spanning Trees
i. Prim's Algorithm,
ii. Kruskal's Algorithm
2. Write a notes on Shortest Path Algorithms: Dijkstra's Algorithm?
3. Explain All Pairs Shortest Paths Problem:
i. Floyd's Algorithm
ii. Warshall's Algorithm





















www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

93

UNIT-VI
Topics: Sorting Methods : Order Statistics: Lower Bound on Complexity for Sorting
Methods: Lower Bound on Worst Case Complexity, Lower Bound on Average Case
Complexity, Heap Sort, Quick Sort, Radix Sorting, Merge Sort.


6.1 Order statistics:
Given a list of n records, and an integer k, find the record whose key is the k
th
in the sorted order
of keys.
Note that
K = 1 corresponds to finding the minimum


= n corresponds to finding the maximum

K =
corresponds to finding the median


Algorithm 1:
One can arrange the n keys into a heap and pick the k
th
largest in k steps, where each step takes
logarithmic time in the number of elements of the heap.
Since the construction of initial heap can be accomplished in worst case O(n)time, the worst case
complexity of this algorithm is:
O(n + klog n)

Therefore if k or k , the above algorithm will have O(n) worst case
complexity.
Algorithm 2:
Variation of quicksort.
Select (i, j, k) /* finds the k
th
element among A[i]
...
A[j] */
{
pick a pivot element v;
partition A[i]
...
A[j] so as to get
A[i]
...
A[m - 1] with keys < v and
A[m]
...
A[j] with keys v;
if(k m - 1)
Select (i, m - 1, k)
else
Select (m, j, k - m + i)
}
Worst case complexity of the above algorithm is O(n
2
) (as in quicksort).
Average Case:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

94

o Note that select calls itself only once at a time whereas quicksort called itself
twice each time.
o On an average, select calls itself on a subarray half as long as the subarray. To be
conservative, suppose that each call of select is on an array the size
on previous call. Then,
T(n) Cn + T
which can be shown to be O(n).
Algorithm 3:
This is also a variation of quicksort and the main idea is to find a good pivot.
Assume that all the elements are distinct. The algorithm works otherwise too.
1.
Divide the n elements into groups of 5 leaving aside between 0 and 4 elements that
cannot be placed in a group.



Sort each group of 5 elements by any algorithm and take the middle element from each group.
This yields
n/5 medians.
2.
Use the SELECT algorithm to find the median of these n/5 elements. Choose this
median as the pivot.
Note that this pivot is in position
Also, the pivot exceeds of the middle elements and each of these middle
elements exceeds two elements. Thus, the pivot exceeds at least
3 elements.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

95

Also, by a similar argument, the pivot is less than at least

3 elements.
If n 75,
3 3 = 21 >
In other words,
n 75 3
This would mean that, if n 75, the pivot is greater than at least elements and less
than at least elements. Consequently, when we partition an array with this pivot,
the k
th
element is isolated to within a range of at most of the elements.

Implementation
keytype select (int i, j, k);
/* returns key of the k
th
largest element among A[i]
...
A[j]*/
{
if((j - i) < 75)
find the k
th
largest by some simple algorithm
else {
for(m = 0;m (j - i - 4)/5;m + +)
{
find the third element among A[i + 5 * m]
...
A[i + 5 * m + 4]
and swap it with A[i + m];
pivot = select (i,(j - i - 4)/5,(j - i - 4)/10)
m = partition (i, j, pivot);
if(k m - i)
return (select (i, m - 1, k))
else
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

96

return (select (m, j,(k - (m - i)))
}
}
}
The worst case complexity of the above algorithm can be described by
T(n)

C
1
if n 75



C
2
n + T + T if n > 75

Let us guess the solution T(n) = Cn for n > 75. For n 75, assumeT(m) Cm for m < n.
Then
T(n)

C
2
n + +





C
2
n + Cn

Thus T(n) is O(n).
Instead of groups of 5, let us say we choose groups of 7. It is easy to show that the worst
case complexity is again O(n). In general, any size of the groups that ensures the sum of
the two arguments of T(.) to be less than nwill yield O(n) worst case complexity.
6.2 Lower bound on complexity for sorting methods:
Result 1
The worst case complexity of any sorting algorithm that only uses key comparisons is
(nlog n)

Result 2
The average case complexity of any sorting algorithm that only uses key comparisons is
(nlog n)

The above results are proved using a Decision Tree which is a binary tree in which the nodes
represent the status of the algorithm after making some comparisons.
Consider a node x in a decision tree and let y be its left child and zits right child. See Figure 9.4.


Figure 9.4: A decision tree scenario
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

97


Basically, y represents a state consisting of the information known at x plus the fact that the
key k
1
is less than key k
2
. For a decision tree for insertion sort on 3 elements, see Figure 9.5.


Figure 9.5: Decision tree for a 3-element insertion
sort


6.3 Lower bound on worst case complexity:
Given a list of n distinct elements, there are n! possible outcomes that represent correct
sorted orders.



www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

98

o any decision tree describing a correct sorting algorithm on a list of n elements will
have at least n! leaves.
o In fact, if we delete nodes corresponding to unnecessary comparisons and if we
delete leaves that correspond to an inconsistent sequence of comparison results,
there will be exactly n! leaves.
The length of a path from the root to a leaf gives the number of comparisons made when
the ordering represented by that leaf is the sorted order for a given input list L.
The worst case complexity of an algorithm is given by the length of the longest path in
the associated decision tree.
To obtain a lower bound on the worst case complexity of sorting algorithm, we have to
consider all possible decision trees having n! leaves and take the minimum longest path.
In any decision tree, it is clear that the longest path will have a length of at least log n!
Since
n!





log n!

nlog n
More Precisely,
n!





or log(n!)

log



=
logn -

Thus any sorting algorithm that only uses comparisons has a worst case complexity
(nlog n)
6.4 Lower bound on average case complexity:
We shall show that in any decision tree with K leaves, the average depth of a leaf is at least log K
We shall show the result for any binary tree with K leaves.
Suppose the result is not true. Suppose T is the counterexample with the fewest nodes.
T cannot be a single node because log 1 = 0. Let T have k leaves. T can only be of the following
two forms. Now see Figure 9.6.


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

99

Figure 9.6: Two possibilities for a counterexample
with fewest nodes

Suppose T is of the from Tree 1. The tree rooted at n
1
, has fewer nodes than T but the same
number of leaves and the hence an even smaller counterexample than T. Thus T cannot be of
Tree 1 form.
Suppose T is of the form of Trees 2. The trees T
1
and T
2
rooted at n
1
and n
2
are smaller than T
and therefore the
Average depth of T
1


logk
1


Average depth of T
2


logk
2

Thus the average depth of T


logk
1
+ logk
2
+ 1



=
logk
1
+ logk
2
+ +



=
k
1
log 2k
1
+ k
2
log 2k
2




www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

100



logk

This contradicts the premise that the average depth of T is < log k.
Thus T cannot be of the form of Tree 2.
Thus in any decision tree with n! leaves, the average path length to a leaf is at least
log(n!) O(nlog n)
6.5 Heap sort
Worst case as well as average case running time is O(nlog n)
Heap: Recall that a heap is a complete binary tree such that the weight of every node is
less than the weights of its children.




Figure 9.1: Example of a heap

A heap with n elements can be conveniently represented as the first nelements of an
array. Furthermore, the children of a[i] can be found in a[2i] (left child) and a[2i + 1]
(right child)
See Figure 9.1 for an example
Generic heap sort algorithm:
{
Insert all records to form a heap S;
while (S is not empty)
{
y = min (S);
print the value of y;
delete y from S;
}
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

101

}

Heap sort crucially uses a function called pushdown (first, last).
This assumes that the elements a[first], a[first + 1], ..., a[last] obey the heap property, except
possibly the children of a[first]. The function pushes a[first] down until the heap property is
restored.
Example: See Figure 9.2. Here, a[1],..., a[10] is a heap except that the children of a[1]violate the
heap property.
Figure 9.2: Illustration of some heap operations

Heapsort Algorithm
{
for i = , i > = 1;i - -
pushdown (i, n); /* initial heap construction */
for( i > = 2;i - -)
{
swap records a[i] and a[1];
pushdown (1, i - 1)
}
}
It can be shown that the initial heap construction takes O(n) time in the worst case.
The sorting portion takes worst case O(nlog n) time.
Heap sort can also be used for computing order statistics, ie., k
th
lowest in a list of records.

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

102

6.6 Quick sort
Divide and conquer algorithm designed by CAR Hoare in 1962.
Worst case O(n
2
) time, but average case O(nlog n) time. Better average case performance
than heap sort.
Algorithm:
To sort the records a[i], a[i + 1],..., a[j], in place.
quicksort (i, j)
{
if (a[i]
...
a[j] contain at least two distinct keys)
{
let v be the larger of the first two distinct keys;
Partition a[i]
...
a[j] so that for some k between i + 1 and j,
a[i]
...
a[k - 1] all have keys <v, and
a[k]
...
a[j] all have keys v;
quicksort (i, k - 1);
quicksort (k, j);
}
}

v is called the pivot. It could be any element such that the resulting partition desirably has two
equal sized groups.
Algorithm for partitioning:
int partition (int i; int j; key-type pivot);
/* partitions the elements a[k] ... a[j],
wrt the pivot and returns the position k */
{
int , r;
{
= i; /* starts from left end */
r = j; /*r starts from right end */
do
swap the records a[ ] and a[r];
while (a[ ] . key < pivot)
= + 1;
while (a[r] . key pivot)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

103

r = r - 1;
while( r);
return ( );
}
}
For an example, see Figure 9.3.
Worst case arises when the input is already sorted: O(n
2
)
Average case : O(nlog n)

Figure 9.3: Quicksort applied to a list of 10 elements

Quick sort average case analysis:
Assume that all initial orderings of the keys are equally likely;
Assume that the keys are distinct



Note that the presence of equal keys will make the sorting easier, not harder.
Also assume that when we call quicksort (i, j), all orders for A[i]SUP> ... A[j] are equally
likely.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

104




Let T(n) = average time taken by quicksort to sort n elements
T(1) = C
1
where C
1
is some constant.
Recall that the pivot is the larger of the first two elements.
When n > 1, quicksort splits the subarray, taking C
2
ntime, where C
2
is another constant.
Also, since the pivot is the larger of the first two elements, left groups tend to be larger than the
right groups.
The left group can have i elements where i = 1, 2,..., n - 1, since the left group has at least one
element and the right group also has at least one element.
Let us fix i and try to compute the probability:
P{ left group has i elements}
Now, left group has i elements
pivot must be the (i + 1)
st
element among the n elements

If the pivot is in position 1, then the element in position 2 is one of the i smaller elements and
vice-versa.
P{ Position 1 contains one of the i smaller elements }
=
Similarly,
P{ Position 2 contains one of the i smaller elements}
=
Therefore,
P{ left group has i elements}
=
This leads to
T(n) C
2
n + T(i) + T(n - i)
Using
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

105

f (i) = f (n - i),
we get
T(n) C
2
n + T(i) + T(n - i)
The above expression is in the form it would have been if we had picked a truly random pivot at
each step.
The above simplifies to:
T(n) C
2
n + T(i)
The above is the recurrence that one would get if all sizes between 1 and n - 1for the left group
were equally likely. Thus picking the larger of the two elements doesn't really affect the size
distribution.
We shall guess the solution
T(n) Cnlog n
for some constant C and prove its correctness using induction.
Basis:
n = 2 Cnlog n = 2C

which is correct.
Induction Step:
AssumeT(i) Cilogi i < n.
T(n)

C
2
n + ilog i





C
2
n + ilogi + ilog i

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

106





C
2
n + ilog + ilog n





C
2
n + Cnlogn - - ,simplification

PickingC 4C
2
, we have,
C
2
n - 0
Thus T(n) is O(nlog n)
6.7 Merge sort
Divide and conquer algorithm
O(nlog n) worst case performance
Useful in external sorting applications.
Algorithm


list-type mergesort (list-type L; int n)
{
if (n = 1)
return (L)
else {
split L into two halves L
1
and L
2
;
return (merge (mergesort L
1
, , mergesort L
2
,
}
}

Let T(n) be the running time of mergesort on a list of size n. Then.
T(n)

C
1
(n = 1)



www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

107



+ (n > 1)

It can be shown that T(n) is O(nlog n).
Mergesort as an External Sorting Procedure
Assume that the data to be sorted is stored in a file
The essential idea is to organize a file into progressively larger runs where:
A run is a sequence of records r
1
,..., r
k
, such that r
i
r
i + 1
for i = 1,..., k - 1.
We say a file, r
1
, r
2
,..., r
m
, of records is organized into runs or length k if: i 1 such
that ki m, the sequence
r
k(i - 1) + 1
, r
k(i - 1) + 2
,..., r
ki

is a run of length k and furthermore if m is not divisible by k and m = pk + q where q < k,
the sequence (r
m - q + 1
,..., r
m
) which is called the tail is a run of length q.
Example
The following is organized into runs of length 3 with the tail having length 2.
7 15 29 8 11 13 16 22 31 5 12
Basic step
Consider two files f1 and f2 organized into runs of length k. Assume that
1.
If n
i
is the number of runs (including tails) of f
1
(i = 1, 2), then | n
1
- n
2
| 1
2.
At most one of f
1
and f
2
has a tail
3.
The one with a tail (if any) has at least as many runs as the other.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

108

In mergesort, we read one run from each of f
1
and f
2
, merge the runs into a run of length 2k and
append it to one of two files g
1
and g
2
. By alternating between g
1
and g
2
, these files would be
organized into runs of length 2k, satisfying (1), (2), and (3).
Algorithm
First divide all n records into two files f
1
and f
2
as evenly as possible. f
1
and f
2
can be
regarded as organized into runs of length 1.
Merge the runs of length 1 and distribute them into files g
1
and g
2
which will now be
organized into runs of length 2.
Now make f
1
and f
2
empty and merge g
1
and g
2
into f
1
and f
2
creating runs of length 4.
Repeat
...

Example
f
1
28 3 93 54 65 30 90 10 69 8 22
f
2
31 5 96 85 9 39 13 8 77 10
runs of length 1
merge into g
1
and g
2

g
1
28 31 93 96 9 65 13 90 69 77 22
g
2
3 5 54 85 30 39 8 10 8 10
runs of length 2
merge into f
1
and f
2

f
1
3 5 28 31 9 30 39 65 8 10 69 77
f
2
54 85 93 96 8 10 13 90 22
runs of length 4
merge into g
1
and g
2

g
1
3 5 28 31 54 85 93 96 8 10 22 69 77
g
2
8 9 10 13 30 39 65 90
runs of length 8
merge into f
1
and f
2

f
1
3 5 8 9 10 13 28 30 31
39 54 65 85 90 93 96

f
2
8 10 22 69 77
runs of length 16
merge into g
1
and g
2

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

109

g
1
will be left with the sorted sequence
After i passes, we have two files consisting of runs of length 2
i
. If 2
i
n, then one of
the two files will be empty and the other will contain a single run of length n (which is
the sorted sequence).





Therefore number of passes = logn
Each pass requires the reading of two files and the writing of two files, all of length
. Total number of blocks read or written in a pass
(where b is the number of records that fit into a block)
Total number of block reads or writes for the entire sorting} O .
The procedure to merge reads and writes only one record at a time. Thus it is not required
to have a complete run in memory at any time.
Mergesort need not start with runs of length 1. We could start with a pass that, for some
appropriate k, reads groups of k records into main memory, sorts them (say quicksort) and
writes them out as a run of length k.
Multiway Mergesort
Here we merge m files at a time. Let the files f
1
,..., f
m
be organized as runs of length k.

We can read m runs, one from each file, and merge them into one run of length mk. This
run is placed on one of m files g
1
,..., g
m
, each getting a run in turn.
The merging process will involve each time selecting the minimum among m currently
smallest unselected records from each file.





By using a priority queue that supports insert and delete in logarithmic time (e.g.,
partially ordered tree), the process can be accomplished in O(log m)time.
Number of passes - log
m
n
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

110


Effort in each pass = O(nlog
2
m)
overall complexity O(nlog
2
m.log
m
n)
Advantages

We save by a factor of log
2
m, the number of times we read each record

We can process data O(m) times faster with m disk units.
6.8 Radix sort
Here we use some special information about the keys and design sorting algorithms
which beat the O(nlog n) lower bound for comparison-based sorting methods.
Consider sorting n integers in the range 0 to n
2
- 1. We do it in two phases.
Phase 1:
We use n bins, one for each of the integers 0, 1,
...
, n - 1. We place each integer i on the
list to be sorted into the bin numbered Each bin will then contain a list of
integers leaving the same remainder when divided by n.

At the end, we concatenate the bins in order to obtain a list L.
Phase 2:
The integers on the list L are redistributed into bins, but using the bin selection function:


Now append integers to the ends of lists. Finally, concatenate the lists to get the final
sorted sequence.
Example
n = 10
Initial list : 36, 9, 0, 25, 1, 49, 64, 16, 81, 4
Phase 1 : bin = i mod 10
= right most digit of i
Bin

0 0




1 1 81


2
3
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

111

4 64 4



5 25

6 36 16



7
8
9 9 49


Concatenation would now yield the list:

L: 0, 1, 81, 64, 4, 25, 36, 16, 9, 49
Phase 2 : bin = i/10
= right most digit of i
Bin

0 0 1 4 9

1 16

2 25

3 36


4 49

5

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

112

6 64

7

8 81

9
The concatenation now yields:

0, 1, 4, 9, 25, 36, 49, 64, 81
In general, assume that the key-type consists of k components f
1
, f
2
,
...
, f
k
, of type t
1
, t
2
,
...
, t
k
.
Suppose it is required to sort the records in lexicographic order of their keys. That is,
(a
1
, a
2
,..., a
k
) < (b
1
, b
2
,..., b
k
)
if one of the following holds:
1. a
1
< b
1

2. a
1
= b
1
;a
2
< b
2

3. a
1
= b
1
;a
2
= b
2
;a
3
< b
3

.
.
.
k. a
1
= b
1
;...;a
k - 1
= b
k - 1
;a
k
< b
k

Radix sort proceeds in the following way.
First binsort all records first on f
k
, the least significant digit, then concatenate the bins lowest
value first.
Then binsort the above concatenated list on f
k - 1
and then concatenate the bins lowest value first.
In general, after binsorting on f
k
, f
k - 1
,..., f
i
, the records will appear in lexicographic order if the
key consisted of only the fieldsf
i
,..., f
k
.

void radixsort;
/ * Sorts a list A of n records with keys consisting of fields
f
1
,..., f
k
of types t
1
,..., t
k
. The function uses k
arrays B
1
,..., B
k
of type array [t
i
] of list-type, i = 1,..., k,
where list-type is a linked list of records */
{
for(i = k;i 1;i - -)
{
for (each value v of type t
i
)
make B
i
[v] empty; /* clear bins */
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

113

for (each value r on list A)
move r from A onto the end of bin B
i
[v],
where v is the value of the field f
i
of the key of r;
for (each value v of type t
i
from lowest to highest)
concatenate B
i
[v] onto the end of A
}
}

Elements to be sorted are presented in the form of a linked list obviating the need for
copying a record. We just move records from one list to another.
For concatenation to be done quickly, we need pointers to the end of lists.
Complexity of Radix Sort
Inner loop 1 takes O(s
i
) time where s
i
= number of different values of type t
i

Inner loop 2 takes O(n) time
Inner loop 3 times O(s
i
) time

Thus the total time
=
O(s
i
+ n)



=
O kn + s
i




=
O n + s
i


For example, if keys are integers in the range 0 to n
k
- 1, then we can view the integers as
radix - n integers each k digits long. Then t
i
has range 0
...
n - 1 for 1 i k.

Thus s
i
= n and total running time is O(n)
similarly, if keys are character strings of length k, for constant k, then s
i
= 128 (say) for i
= 1,..., k and again we get the total running time as O(n).

Assignment Questions:
Sorting Methods
1. Explain lower Bound on Complexity for Sorting?
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

114

2. Give the methodsfor
i. Lower Bound on Worst Case Complexity
ii. Lower Bound on Average Case Complexity
3. Give the implementation of also explain with an example
i. Heap Sort
ii. Quick Sort
iii. Radix Sorting
iv. Merge Sort.
























www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

115

UNIT-VII
Topics: Pattern matching and Tries : Pattern matching algorithms- the Boyer Moore
algorithm, the Knuth-Morris-Pratt algorithm
Tries: Definitions and concepts of digital search tree, Binary trie, Patricia , Multi-way trie

7.1 Pattern Matching
Pattern matching is to find a pattern, which is relatively small, in a text, which is
supposed to be very large. Patterns and texts can be one-dimensional, or two-
dimensional. In the case of one-dimensional, examples can be text editor and DNA
analysis. In the text editor, we have 26 characters and some special symbols, whereas in
the DNA case, we have four characters of A, C, G, and T. In the text editor, a pattern is
often a word, whose length is around 10, and the length of the text is a few hundred up to
one million. In the DNA analysis, a gene in a few hundred long and the human genome is
about 3 billion long.
In the case of two-dimensional, the typical application is a pattern matching in
computer vision. A pattern is about (100, 100) and text typically (1024,1024). Either
one-dimensional or two-dimensional, the text is very large, and therefore a fast algorithm
to find the occurrence(s) of pattern in it is needed. We start from a nave algorithm for
one-dimensional.
Size of text : n
Text: a b a b a a
Pattern a b a b b a b a b
Size of pattern: m
mismatch
At first the pattern is set to the left end of the text, and matching process starts. After a
mismatch is found(at the bold characters), pattern is shifted one place right and a new matching
process starts,
and so on. The pattern and text are in arrays pat[1..m] and text[1..n] respectively.
Algorithm 1.
Naive pattern matching algorithm
1. j:=1;
2.
while
j <= n-m+1
do begin
3. i:=1;
4.
while
(i<=m)
and
(pat[i]=text[j])
do begin
5. i:=i+1;
6. j:=j+1
7.
end
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

116

;
8.
if
i<=m
then
j:=j-i+2 /* shift the pattern one place right */
9.
else write
(found at , j-i+1)
10.
end
.
The worst case happens when pat and text are all as but b at the end, such as pat =
aaaaab and text = aaaaaaaaaaaaaaaaaaaaaaaaaaaaab. The time is obviously O(mn). On
average the situation is not as bad, but in the next section we introduce a much better
algorithm. We call the operation pat[i]=text[j] a comparison between characters, and
a b a b a a
a b a b b a b a b
49
measure the complexity of a given algorithm by the number of character comparisons.
The rest of computing time is proportional to this measure.

7.2 Knuth-Morris-Pratt algorithm (KMP algorithm)
In computer science, the KnuthMorrisPratt string searching algorithm (or KMP algorithm)
searches for occurrences of a "word" W within a main "text string" S by employing the
observation that when a mismatch occurs, the word itself embodies sufficient information to
determine where the next match could begin, thus bypassing re-examination of previously
matched characters.
The algorithm was conceived in 1974 by Donald Knuth and Vaughan Pratt, and independently
by James H. Morris. The three published it jointly in 1977.
where W = "ABCDABD" and S = "ABC ABCDAB ABCDABCDABDE". At any given time,
the algorithm is in a state determined by two integers:
m which denotes the position within S which is the beginning of a prospective match for
W
i the index in W denoting the character currently under consideration.
In each step we compare S[m+i] with W[i] and advance if they are equal. This is depicted, at the
start of the run, like
1 2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

117

We proceed by comparing successive characters of W to "parallel" characters of S, moving from
one to the next if they match. However, in the fourth step, we get S[3] is a space and W[3] = 'D',
a mismatch. Rather than beginning to search again at S[1], we note that no 'A' occurs between
positions 0 and 3 in S except at 0; hence, having checked all those characters previously, we
know there is no chance of finding the beginning of a match if we check them again. Therefore
we move on to the next character, setting m = 4 and i = 0.
1 2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

We quickly obtain a nearly complete match "ABCDAB" when, at W[6] (S[10]), we again have a
discrepancy. However, just prior to the end of the current partial match, we passed an "AB"
which could be the beginning of a new match, so we must take this into consideration. As we
already know that these characters match the two characters prior to the current position, we
need not check them again; we simply reset m = 8, i = 2 and continue matching the current
character. Thus, not only do we omit previously matched characters of S, but also previously
matched characters of W.
1 2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

This search fails immediately, however, as the pattern still does not contain a space, so as in the
first trial, we return to the beginning of W and begin searching at the next character of S: m = 11,
reset i = 0.
1 2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

Once again we immediately hit upon a match "ABCDAB" but the next character, 'C', does not
match the final character 'D' of the word W. Reasoning as before, we set m = 15, to start at the
two-character string "AB" leading up to the current position, set i = 2, and continue matching
from the current position.
1 2
m: 01234567890123456789012
S: ABC ABCDAB ABCDABCDABDE
W: ABCDABD
i: 0123456

This time we are able to complete the match, whose first character is S[15].
Algorithm:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

118

algorithm kmp_search:
input:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
output:
an integer (the zero-based position in S at which W is found)

define variables:
an integer, m 0 (the beginning of the current match in S)
an integer, i 0 (the position of the current character in W)
an array of integers, T (the table, computed elsewhere)

while m + i < length(S) do
if W[i] = S[m + i] then
if i = length(W) - 1 then
return m
let i i + 1
else
let m m + i - T[i]
if T[i] > -1 then
let i T[i]
else
let i 0

(if we reach here, we have searched all of S unsuccessfully)
return the length of S
Efficiency of KMP:
Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), the
complexity of the overall algorithm is O(n + k).
These complexities are the same, no matter how many repetitive patterns are in W or S.
Implementation of KMP:
#include<iostream.h>
#include<conio.h>
#include<string.h>
class str
{
private:
char t[58],p[67],f[45];
int i,j,m,n;
public:
void failure(char[]);
int kmpmatch(char[],char[]);
};
void str::failure(char x[])
{
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

119

m=strlen(x);
j=0;
i=1;
f[0]=0;
while(i<m)
{
if(x[j]==x[i])
{
f[i]=j+1;
i++;
j++;
}
else if(j>0)
j=f[j-1];
else
{
f[i]=0;
i++;
}
}
}
int str::kmpmatch(char t[],char p[])
{
failure(p);
i=0;j=0;
n=strlen(t);
while(i<n)
{
if(t[i]==p[j])
{
if(j==m-1)
return(i-m+1);
i=i+1;
j=j+1;
}
else if(j>0)
j=f[i-1];
else
i++;
}
return -1;
}
void main()
{
int i,j,m,n;
str b;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

120

clrscr();
char t[50],p[20];
cout<<"\nEnter the text(without spaces):";
cin>>t;
cout<<"\nEnter the pattern to match:";
cin>>p;
int a=b.kmpmatch(t,p);
if(a!=-1)
cout<<"\nMatch at position"<<" "<<a;
else
cout<<"\nsorry!Pattern not found";
getch();
}
7.3 Boyer Moore Algorithm
In computer science, the BoyerMoore string search algorithm is an efficient string searching
algorithm that is the standard benchmark for practical string search literature.
[1]
It was developed
by Robert S. Boyer and J Strother Moore in 1977.
[2]
The algorithm preprocesses the string being
searched for (the pattern), but not the string being searched in (the text). It is thus well-suited for
applications in which the pattern either is much shorter than the text or does persist across
multiple searches. The Boyer-Moore algorithm uses information gathered during the preprocess
step to skip sections of the text, resulting in a lower constant factor than many other string
algorithms. In general, the algorithm runs faster as the pattern length increases.
Definitions:
S[i] refers to the character at index i of string S, counting from 1.
S[i..j] refers to the substring of string S starting at index i and ending at j, inclusive.
A prefix of S is a substring S[1..i] for some i in range [1, n], where n is the length of S.
A suffix of S is a substring S[i..n] for some i in range [1, n], where n is the length of S.
The string to be searched for is called the pattern and is referred to with symbol P.
The string being searched in is called the text and is referred to with symbol T.
The length of P is n.
The length of T is m.
An alignment of P to T is an index k in T such that the last character of P is aligned with
index k of T.
A match or occurrence of P occurs at an alignment if P is equivalent to T[(k-n+1)..k].
A N P A N M A N -
P A N - - - - - -
- P A N - - - - -
- - P A N - - - -
- - - P A N - - -
- - - - P A N - -
- - - - - P A N -
Alignments of pattern PAN to text ANPANMAN, from k=3 to k=8. A match occurs at
k=5
Description:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

121

The Boyer-Moore algorithm searches for occurrences of P in T by performing explicit character
comparisons at different alignments. Instead of a brute-force search of all alignments (of which
there are m - n + 1), Boyer-Moore uses information gained by preprocessing P to skip as many
alignments as possible.
The algorithm begins at alignment k = n, so the start of P is aligned with the start of T.
Characters in P and T are then compared starting at index n in P and k in T, moving backward:
the strings are matched from the end of P to the start of P. The comparisons continue until either
the beginning of P is reached (which means there is a match) or a mismatch occurs upon which
the alignment is shifted to the right according to the maximum value permitted by a number of
rules. The comparisons are performed again at the new alignment, and the process repeats until
the alignment is shifted past the end of T, which means no further matches will be found.
The shift rules are implemented as constant-time table lookups, using tables generated during the
preprocessing of P.
The Good Suffix Rule
Description
- - - - X - - K - - - - -
M A N P A N A M A N A P -
A N A M P N A M - - - - -
- - - - A N A M P N A M -
Demonstration of good suffix rule with pattern ANAMPNAM.
The good suffix rule is markedly more complex in both concept and implementation than the bad
character rule. It is the reason comparisons begin at the end of the pattern rather than the start,
and is formally stated thus:
[3]

Suppose for a given alignment of P and T, a substring t of T matches a suffix of P, but a
mismatch occurs at the next comparison to the left. Then find, if it exists, the right-most copy t'
of t in P such that t' is not a suffix of P and the character to the left of t' in P differs from the
character to the left of t in P. Shift P to the right so that substring t' in P is below substring t in T.
If t' does not exist, then shift the left end of P past the left end of t in T by the least amount so
that a prefix of the shifted pattern matches a suffix of t in T. If no such shift is possible, then shift
P by n places to the right. If an occurrence of P is found, then shift P by the least amount so that
a proper prefix of the shifted P matches a suffix of the occurrence of P in T. If no such shift is
possible, then shift P by n places, that is, shift P past T.
Preprocessing
The good suffix rule requires two tables: one for use in the general case, and another for use
when either the general case returns no meaningful result or a match occurs. These tables will be
designated L and H respectively. Their definitions are as follows:
[3]

For each i, L[i] is the largest position less than n such that string P[i..n] matches a suffix of
P[1..L[i]] and such that the character preceding that suffix is not equal to P[i-1]. L[i] is defined to
be zero if there is no position satisfying the condition.
Let H[i] denote the length of the largest suffix of P[i..n] that is also a prefix of P, if one exists. If
none exists, let H[i] be zero.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

122

Both of these tables are constructible in O(n) time and use O(n) space. The alignment shift for
index i in P is given by n - L[i] or n - H[i]. H should only be used if L[i] is zero or a match has
been found.
Performance:
The Boyer-Moore algorithm as presented in the original paper has worst-case running time of
O(n+m) only if the pattern does not appear in the text.
Implementation of boyer moore:
#include<iostream.h>
#include<conio.h>
#include<string.h>
class bm
{
public:
char M[20],P[15];
public:
int last(char);
int psize();
int boyer(int,int);
int min(int,int);
};

int bm::last(char ch)
{
int l[150],i,k;
for(i=0;i<128;i++)
l[i]=-1;
k=psize();
for(i=0;i<k;i++)
l[P[i]]=i;
return l[ch];
}
int bm::psize()
{
int y;
y=strlen(P);
return y;
}
int bm::boyer(int n,int m)
{
int i=m-1;
int j=m-1;
do
{
if(P[j]==M[i])
{
if(j==0)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

123

return i;
else
{
i=i-1;
j=j-1;
}
}
else
{
i=i+m-min(j,1+last(M[i]));
j=m-1;
}} while(i<=(n-1));
return -1;
}
int bm::min(int i,int j)
{
return((j>i)?i:j);
}

void main()
{
bm b1;
int m,n,x;
clrscr();
cout<<"\nEnter Text:";
cin>>b1.M;
cout<<"\nEnter Pattern:";
cin>>b1.P;
m=strlen(b1.P);
n=strlen(b1.M);
x=b1.boyer(n,m);
if(x==-1)
cout<<"\nNot found";
else
cout<<"\nPattern found at"<< <<x;
}


7.4 Tries
Digital serach trees:
Definition:
A digital search tree is a binary tree in which each node contains one element.
The element-to-node assignment is determined by the binary representation of the
element keys.
We number the bits in the binary representation of a key from left to right beginning at
one.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

124

Ex: bit one of 1000 is 1, and bits two , three , four are 0.
All keys in the left subtree of a node at level I have bit i equal to zero whereas those in
the right subtree of nodes at this level have bit i = 1.
Assume fixed number of bits
Not empty =>
Root contains one dictionary pair (any pair)
All remaining pairs whose key begins with
a 0 are in the left subtree.
All remaining pairs whose key begins with
a 1 are in the right subtree.
Left and right subtrees are digital subtrees
on remaining bits.
This digital search tree contains the keys 1000,0010,1001,0001,1100,0000

Example:
Start with an empty digital search tree and
insert a pair whose key is 0110


Now , insert a pair whose key is 0010

Now , insert a pair whose key is 1001

www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

125

Now insert a pair whose key is 1011





Now , insert a pair whose key is 0000



Search and Insert:
The digital search tree functions to search and insert are quite similar to the
corresponding functions for binary search trees.
The essential difference is that the subtree to move to is determined by a bit in the search
key rather than by the result of the comparison of the search key and the key in the
current node.
Try to build the digital search tree:
A 00001
S 10011
E 00101
R 10010
C 00011
H 01000
I 01001
N 01110
G 00111
X 11000
M 01101
P 10000
When we dealing with very long keys, the cost of a key comparison is high. We can reduce the
number of key comparisons to one by using a related structure called Patricia
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

126

We shall develop this structure in three steps. First, we introduce a structure called a
binary trie.
Then we transform binary tries into compressed binary tries.
Finally, from compressed binary tries we obtain Patricia.
7.5 Binary Trie
A binary trie is a binary tree that has two kinds of nodes: branch nodes and element
nodes.
A branch node has the two data members LeftChild and RightChild. It has no data
member.
An element node has the single data member data.
Branch nodes are used to build a binary tree search structure similar to that of a digital
search tree. This leads to element nodes
Six element binary trie:


Compressed Binary trie:
The binary trie contains branch nodes whose degree is one. By adding another data
member, BitNumber , to each branch node, we can eliminate all degree-one branch nodes
from the trie. The BitNumber data member of a branch node gives the bit number of the
key that is to be used at this node.
Binary Trie with degree one nodes eliminated:


7.6 Patricia:( Practical Algorithm to Retrieve Information Coded in
Alphanumeric)
Compressed binary tries may be represented using nodes of a single type. The new nodes,
called augmented branch nodes, are the original branch nodes augmented by the data
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

127

member data. The resulting structure is called Patricia and is obtained from a compressed
binary trie in the following way:
(1)Replace each branch node by an augmented branch node.
(2)Eliminate the element nodes.
(3)Store the data previously in the element node in the data data members of the
augmented branch nodes. Since every nonempty compressed binary trie has one less
branch node than it has element nodes, it is necessary to add one augmented branch node.
This node is called the head node . The remaining structure is the left subtree of the head
node. The head node has BitNumber equal to zero. Its right-child data member is not
used. The assignment of data to augmented branch node is less than or equal to that in the
parent of the element node that contained this data .
(4)Replace the original pointers to element nodes by pointers to the respective augmented
branch nodes.

typedef struct patricia_tree *patricia;
struct patricia_tree {
int bit_number;
element data;
patricia left_child, right_child;
};
patricia root;
patricia search:
Patricia search(patricia t, unsigned k)
{
/*search the Patricia tree t; return the last node y encountered; if k = y ->data.key, the key is in
the tree */
Patricia p, y;
If (!t) return NULL; /* empty tree*/
y=t->left_child;
p=t;
while (y->bit_number > p->bit_number){
p=y;
y=(bit(k, y->bit_number)) ?
y->right_child : y->left_child;
}
return y;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

128

}
Patricia Insert:
void insert (patricia *t, element x){
/* insert x into the Patricia tree *t */
patricia s, p, y, z;
int i;
if (!(*t)) { /* empty tree*/
*t = (patricia)malloc(sizeof(patricia_tree));
if (IS_FULL(*t)) {
fprintf(stderr, The memory is full\n) ;
exit(1);
}
(*t)->bit_number = 0
(*t)->data = x;
(*t)->left_child = *t;
}
y = search(*t,x.key);
if (x.key == y->data.key) {
fprintf(stderr, The key is in the tree. Insertion fails.\n);
exit(1);}
/* find the first bit where x.key and y->data.key differ*/
for(i = 1; bit (x.key,i) == bit(y->data.key,i); i++ );
/* search tree using the first i-1 bits*/
s = (*t)->left_child;
p = *t;
while (s->bit_number > p->bit_number && s->bit_number < 1){
p = s;
s = (bit(x.key,s->bit_number)) ?
s->right_child : s->left_child;}
/* add x as a child of p */
z = (patricia)malloc(sizeof(patricia_tree));
if (IS_FULL(z)) {
fprintf(stderr, The memory is full\n);
exit(1);
}
z->data = x;
z->bit_number = i;
z->left_child = (bit(x.key,i)) ? s: z;
z->right_child = (bit(x.key,i)) ? z : s;
if (s == p->left_child) p->left_child = z;
else
p->right_child = z;




www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

129









Assignment Questions:
Pattern matching and Tries
1. Explain Pattern matching algorithms
i. the Boyer Moore algorithm
ii. the Knuth-Morris-Pratt algorithm
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

130

2. Define Tries?Give the concepts of digital search tree? What are the applications of
Tries.
3. Write a short notes on
i. binary trie
ii. Patricia
iii. Multi-way trie
4. Describe an efficient algorithm to find the longest palindrome that is a suffix
of a string T of length n.
























www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

131

UNIT-VIII
Topics: File Structures: Fundamental File Processing Operations-opening files, closing
files, Reading and Writing file contents, Special characters in files.
Fundamental File Structure Concepts- Field and record organization, Managing fixed-
length,fixed-field buffers.

8.1 Fundamental file processing operations
physical file
A file as seen by the operating system, and which actually exists on secondary storage.
logical file
A file as seen by a program
Programs read and write data from logical files.
Before a logical file can be used, it must be associated with a physical file.
This act of connection is called "opening" the file.
Data in a physical file is persistent.
Data in a logical file is temporary.
A logical file is identified (within the program) by a program variable or constant.
C++ supports file access on three levels:
Unbuffered, unformatted file access using handles.
Buffered, formatted file access using the FILE structure.
Buffered, formatted file access using classes.
8.1.1Opening Files
open
To associate a logical program file with a physical system file.
protection mode
The security status of a file, defining who is allowed to access a file and which access
modes are allowed.
access mode
The type of (file) access allowed.
file descriptor
A cardinal number used as the identifier for a logical file by operating systems such as
UNIX and PC-DOS.
In C++, a file is opened by using library functions.
The name of the physical file must be supplied to an open function.
The open function must also be supplied with an access mode.
The open function can also be supplied with a protection mode.
The access mode has several aspects:
Is the file is to be accessed by reading, by writing, or by both?
What should be done with existing contents of the file?
Should a new file be created if none exists?
Should any character translation be done?
For handle level access, the logical file is declared as an int.
The handle is also known as a file descriptor.
The C++ open function is used to open a file for handle level access.
The value returned by the open is assigned to the file variable.
For FILE level access, the logical file is declared as a FILE *.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

132

The C++ fopen function is used to open a file for FILE level access.
The value returned by the fopen is assigned to the file variable.
For class level access, the logical file is declared as an fstream (or as an ifstream or an
ofstream.)
The C++ open method of the class is used to open a file for class level access.
8.1.2 Closing Files
To disassociate a logical program file from a physical system file.
Closing a file frees system resources for reuse.
Data may not be actually written to the physical file until a logical file is closed.
A programs should close a file when it is no longer needed.
The C++ close function is used to close a file for handle level access.
The logical file to be closed is an argument to the handle close function.
The C++ fclose function is used to close a file for FILE level access.
The logical file to be closed is an argument to the FILE fclose function.
The C++ close method of the class is used to close a file for class level access.
8.1.3 Reading and writing
read
To tranfer data from a file to program variable(s).
write
To tranfer data to a file from program variable(s) or constant(s).
end-of-file
A physical location just beyond the last datum in a file.
The read and write operations are performed on the logical file with calls to library
functions.
For read, one or more variables must be supplied to the read function, to receive the data
from the file.
For write, one or more values (as variables or constants) must be supplied to the write
function, to provide the data for the file.
For unformatted transfers, the amount of data to be transferred must also be supplied.
The C++ read function is used to read a file with handle level access.
The C++ write function is used to write a file with handle level access.
The C++ fread function is used to read a file with FILE level access.
The C++ fwrite function is used to write a file with FILE level access.
The C++ read method of the class is used to read a file with class level access.
The C++ write method of the class is used to write a file with class level access.
The acronym for end-of-file is EOF.
When a file reaches EOF, no more data can be read.
Data can be written at or past EOF.
The C++ eof function is used to detect end-of-file with handle level access.
The handle eof function detects when the file pointer is at end-of-file.
The C++ feof function is used to detect end-of-file with FILE level access.
The FILE feof function detects when the file pointer is past end-of-file.
The C++ eof method of the class is used to detect end-of-file with class level access.
The stream class eof function detects when the file pointer is past end-of-file.


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

133

Seeking:
seek
To move to a specified location in a file.
byte offset
The distance, measured in bytes, from the beginning.
Seeking moves an attribute in the file called the file pointer.
C++ library functions allow seeking.
In DOS, Windows, and UNIX, files are organized as streams of bytes, and locations are
in terms of byte count.
Seeking can be specified from one of three reference points:
o The beginning of the file.
o The end of the file.
o The current file pointer position.
The C++ lseek function is used to seek with handle level access.
The C++ fseek function is used to seek with FILE level access.
The C++ seekg method of the class is used to seek with class level access for read (get.)
The C++ seekp method of the class is used to seek with class level access for write (put.)
C++ allows, but does not require, separate file pointers and seek functions for reading
and writing.
Most implementations of C++ do not have separate file pointers.
8.1.4 Special characters in files
Specifics of files can vary with the operating system.
The C++ language was written originally for the UNIX operating system.
UNIX and DOS (Windows) systems handle separators between lines differently.
In UNIX files, lines are separated by a single new line character (ASCII line feed.)
In DOS (Windows) files, lines are separated by a two characters (ASCII carriage return
and line feed.)
When DOS files are opened in text mode, the internal separator ('\n') is translated to the
the external separator (<CR><LF>) during read and write.
When DOS files are opened in binary mode, the internal separator ('\n') is not translated
to the the external separator (<CR><LF>) during read and write.
In DOS (Windows) files, end-of-file can be marked by a "control-Z" character (ASCII
SUB).
In C++ implementations for DOS, a control-Z in a file is interpreted as end-of-file.
Other operating systems may handle line separation and end-of-file differently.
Implementations of C++ should treat text files so that the internal representations are the
same as UNIX.
On UNIX systems, files opened in text mode or binary mode behave the same way.
8.2 Fundamental file structure concepts
Persistent=Retained after execution of the program which created it.
When we build file structures, we are making it possible to make data persistent. That is,
one program can store data from memory to a file, and terminate. Later, another program
can retrieve the data from the file, and process it in memory.
In this chapter, we look at file structures which can be used to organize the data within
the file, and at the algorithms which can be used to store and retrieve the data
sequentially.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

134

8.2.1 Field and Record organization
Record:A subdivision of a file, containing data related to a single entity.
Field:A subdivision of a record containing a single attribute of the entity which the record
describes.
stream of bytes:A file which is regarded as being without structure beyond separation
into a sequential set of bytes.
Within a program, data is temporarily stored in variables.
Individual values can be aggregated into structures, which can be treated as a single
variable with parts.
In C++, classes are typically used as as an aggregate structure.
C++ Person class (version 0.1):
class Person {
public:
char FirstName [11];
char LastName[11];
char Address [21];
char City [21];
char State [3];
char ZIP [5];
};
With this class declaration, variables can be declared to be of type Person. The
individual fields within a Person can be referred to as the name of the variable and the
name of the field, separated by a period (.).
C++ Program:
#include

class Person {
public:
char FirstName [11];
char LastName[11];
char Address [31];
char City [21];
char State [3];
char ZIP [5];
};

void Display (Person);

int main () {
Person Clerk;
Person Customer;

strcpy (Clerk.FirstName, "Fred");
strcpy (Clerk.LastName, "Flintstone");
strcpy (Clerk.Address, "4444 Granite Place");
strcpy (Clerk.City, "Rockville");
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

135

strcpy (Clerk.State, "MD");
strcpy (Clerk.ZIP, "00001");

strcpy (Customer.FirstName, "Lily");
strcpy (Customer.LastName, "Munster");
strcpy (Customer.Address, "1313 Mockingbird Lane");
strcpy (Customer.City, "Hollywood");
strcpy (Customer.State, "CA");
strcpy (Customer.ZIP, "90210");

Display (Clerk);
Display (Customer);
}

void Display (Person Someone) {
cout << Someone.FirstName << Someone.LastName
<< Someone.Address << Someone.City
<< Someone.State << Someone.ZIP;
}
In memory, each Person will appear as an aggregate, with the individual values being
parts of the aggregate:
Person
Clerk
FirstName LastName Address City State ZIP
Fred Flintstone 4444 Granite Place Rockville MD 0001
The output of this program will be:
FredFlintstone4444 Granite PlaceRockvilleMD00001LilyMunster1313 Mockingbird
LaneHollywoodCA90210
Obviously, this output could be improved. It is marginally readable by people, and it
would be difficult to program a computer to read and correctly interpret this output.
Stream file:
In the Windows, DOS, UNIX, and LINUX operating systems, files are not internally
structured; they are streams of individual bytes.

F r e d F l i n t s t o n e 4 4 4 4 G r a n ...
The only file structure recognized by these operating systems is the separation of a text
file into lines.
o For Windows and DOS, two characters are used between lines, a carriage return
(ASCII 13) and a line feed (ASCII 10);
o For UNIX and LINUX, one character is used between lines, a line feed (ASCII
10);
The code in applications programs can, however, impose internal organization on stream
files.
Delineation of records in files:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

136

Fixed length records: A record which is predetermined to be the same length as the other records
in the file.

Record 1 Record 2 Record 3 Record 4 Record 5
The file is divided into records of equal size.
All records within a file have the same size.
Different files can have different length records.
Programs which access the file must know the record length.
Offset, or position, of the nth record of a file can be calculated.
There is no external overhead for record separation.
There may be internal fragmentation (unused space within records.)
There will be no external fragmentation (unused space outside of records) except for
deleted records.
Individual records can always be updated in place.

Example (80 byte records):
0 66 69 72 73 74 20 6C 69 6E 65 0 0 1 0 0 0 first line......
10 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................
20 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x...
30 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B.....
40 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A..........:|.
50 73 65 63 6F 6E 64 20 6C 69 6E 65 0 1 0 0 0 second line.....
60 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................
70 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x...
80 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B.....
90 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A..........:|.
Advantage: the offset of each record can be calculated from its record number. This makes
direct access possible.There is no space overhead.
Disadvantage: there will probably be internal fragmentation (unusable space within records.)
Delimited variable length records:
variable length record
A record which can differ in length from the other records of the file.
delimited record
A variable length record which is terminated by a special character or sequence of
characters.
delimiter
A special character or group of characters stored after a field or record, which indicates
the end of the preceding unit.
Record 1 #
Record
2
# Record 3 # Record 4 # Record 5 #
The records within a file are followed by a delimiting byte or series of bytes.
The delimiter cannot occur within the records.
Records within a file can have different sizes.
Different files can have different length records.
Programs which access the file must know the delimiter.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

137

Offset, or position, of the nth record of a file cannot be calculated.
There is external overhead for record separation equal to the size of the delimiter per record.
There should be no internal fragmentation (unused space within records.)
There may be no external fragmentation (unused space outside of records) after file updating.
Individual records cannot always be updated in place.

Example (Delimiter = ASCII 30 (IE) = RS character:
0 66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E first line.secon
10 64 20 6C 69 6E 65 1E d line.
Example (Delimiter = '\n'):
0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line
10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) l
20 69 6E 65 D A ine..
Disadvantage: the offset of each record cannot be calculated from its record number. This
makes direct access impossible.
Advantage: there is space overhead for the length prefix.
Advantage: there will probably be no internal fragmentation (unusable space within records.)
Length prefixed variable length records:

110 Record 1 40
Record
2
100 Record 3 80 Record 4 70 Record 5
The records within a file are prefixed by a length byte or bytes.
Records within a file can have different sizes.
Different files can have different length records.
Programs which access the file must know the size and format of the length prefix.
Offset, or position, of the nth record of a file cannot be calculated.
There is external overhead for record separation equal to the size of the length prefix per
record.
There should be no internal fragmentation (unused space within records.)
There may be no external fragmentation (unused space outside of records) after file
updating.
Individual records cannot always be updated in place.
Example:
0 A 0 46 69 72 73 74 20 4C 69 6E 65 B 0 53 65 ..First Line..Se
10 63 6F 6E 64 20 4C 69 6E 65 1F 0 54 68 69 72 64 cond Line..Third
20 20 4C 69 6E 65 20 77 69 74 68 20 6D 6F 72 65 20 Line with more
30 63 68 61 72 61 63 74 65 72 73 characters
Disadvantage: the offset of each record can be calculated from its record number. This
makes direct access possible.
Disadvantage: there is space overhead for the delimiter suffix.
Advantage: there will probably be no internal fragmentation (unusable space within
records.)



www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

138

Indexed variable length records:

An auxiliary file can be used to point to the beginning of each record.
In this case, the data records can be contiguous.
If the records are contiguous, the only access is through the index file.
Example:
Index File:
0 12 0 0 0 25 0 0 0 47 0 0 0 ....%...G...

Data File:
0 46 69 72 73 74 20 28 31 73 74 29 20 53 74 72 69 First (1st) Stri
10 6E 67 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 53 ngSecond (2nd) S
20 74 72 69 6E 67 54 68 69 72 64 20 28 33 72 64 29 tringThird (3rd)
30 20 53 74 72 69 6E 67 20 77 68 69 63 68 20 69 73 String which is
40 20 6C 6F 6E 67 65 72 longer
Advantage: the offset of each record is be contained in the index, and can be looked up from its
record number. This makes direct access possible.
Disadvantage: there is space overhead for the index file.
Disadvantage: there is time overhead for the index file.
Advantage: there will probably be no internal fragmentation (unusable space within records.)
The time overhead for accessing the index file can be minimized by reading the entire
index file into memory when the files are opened.
Fixed field count records:
Records can be recognized if they always contain the same (predetermined) number of fields.
Delineation of fields in a record:
Fixed length fields:

Field 1 Field 2 Field 3 Field 4 Field 5
Each record is divided into fields of correspondingly equal size.
Different fields within a record have different sizes.
Different records can have different length fields.
Programs which access the record must know the field lengths.
There is no external overhead for field separation.
There may be internal fragmentation (unused space within fields.)
Delimited variable length fields:

Field 1 !
Field
2
! Field 3 ! Field 4 ! Field 5 !
The fields within a record are followed by a delimiting byte or series of bytes.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

139

Fields within a record can have different sizes.
Different records can have different length fields.
Programs which access the record must know the delimiter.
The delimiter cannot occur within the data.
If used with delimited records, the field delimiter must be different from the record
delimiter.
There is external overhead for field separation equal to the size of the delimiter per field.
There should be no internal fragmentation (unused space within fields.)
Length prefixed variable length fields:

12 Field 1 4
Field
2
10 Field 3 8 Field 4 7 Field 5
The fields within a record are prefixed by a length byte or bytes.
Fields within a record can have different sizes.
Different records can have different length fields.
Programs which access the record must know the size and format of the length prefix.
There is external overhead for field separation equal to the size of the length prefix per
field.
There should be no internal fragmentation (unused space within fields.)
Representing record or field length:
Record or field length can be represented in either binary or character form.
The length can be considered as another hidden field within the record.
This length field can be either fixed length or delimited.
When character form is used, a space can be used to delimit the length field.
A two byte fixed length field could be used to hold lengths of 0 to 65535 bytes in binary
form.
A two byte fixed length field could be used to hold lengths of 0 to 99 bytes in decimal
character form.
A variable length field delimited by a space could be used to hold effectively any length.
In some languages, such as strict Pascal, it is difficult to mix binary values and character
values in the same file.
The C++ language is flexible enough so that the use of either binary or character format
is easy.
Tagged fields:
Tags, in the form "Keyword=Value", can be used in fields.
Use of tags does not in itself allow separation of fields, which must be done with another
method.
Use of tags adds significant space overhead to the file.
Use of tags does add flexibility to the file structure.
Fields can be added without affecting the basic structure of the file.
Tags can be useful when records have sparse fields - that is, when a significant number of
the possible attributes are absent.
Mixing numbers and Characters: Use of a File Dump
File-dump gives us the ability to look inside a file at the actual bytes that are stored
Octal Dump: od -xc filename
e.g. The number 40, stored as ASCII characters and as a short integer
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

140


Byte order:
The byte order of integers (and floating point numbers) is not the same on all computers.
This is hardware dependent (CPU), not software dependent.
Many computers store numbers as might be expected: 40
10
= 28
16
is stored in a four byte
integer as 00 00 00 28.
PCs reverse the byte order, and store numbers with the least significant byte first: 40
10
=
28
16
is stored in a four byte integer as 28 00 00 00.
On most computers, the number 40 would be stored in character form in its ASCII
values: 34 30.
IBM mainframe computers use EBCDIC instead of ASCII, and would store "40" as F4
F0.
8.3 Managing Fixed length,fixed field buffers
For having the fixed length and fixed field buffer instead of writing the size of each field or each
record we can write the methods that control the fixed length of each field.
The class FixedLengthBuffer is subclass of IOBuffer.This class supports both the fixed length
and fixed field buffers.The object of FixedLengthBuffer class can record the size of each record.
FixedLengthBuffer class as given below
class FixedFieldBuffer:public FixedLengthBuffer
{
public:
FixedFieldBuffer(int maxFields,int RecordSize=3000);
FixedFieldBuffer(int maxFields,int *fieldSize);
int AddField(int fieldSize);//define the next field
int Pack(const void* field,int size=-1);
int Unpack(void * field,int maxBytes=-1);
int NumberOfFields()const;// return number of defined fields
protected:
int * FieldSize;//array to hold field sizes
int maxFields;//max number of fields
int NumFields;//actual number of defined fields
};
The AddField method is used to specify the size of the field.The total number of fields can be
obtained using NumberOfFields method.



Assignment Questions:
File Structures
1. Explain the fundamental File Processing Operations
i. opening files
ii. closing files
iii. Reading and Writing file contents
iv. Special characters in files.
2. Discuss the fundamental File Structure Concepts
i. Field and record organization
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

141

ii. Managing fixed-length,
iii. fixed-field buffers.


www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net

You might also like