Advance Data Structure
Advance Data Structure
Advance Data Structure
BCA-PC(L)-235
3. HASHING……………………......………………………. 51-68
4. GRAPH………………...………………………………… 69-97
ii
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 1
BASICS OF TREES
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Tree
1.4 Binary Trees
1.4.1 Binary Representation of Tree
1.4.2 Binary Tree Traversal
1.4.3 Implementation of Binary Trees
1.4.4 Deletion from a Binary Tree
1.5 Threaded Binary Trees
1.6 Implementation of Threaded Binary Tree
1.7 Check your progress
1.8 Summary
1.9 Keywords
1.10 Review Questions
1.11 References/Suggested Reading
1
1.1 LEARNING OBJECTIVES
1.2 INTRODUCTION
In the last few units we studied about linear data structures. In many realistic
situations we need to represent data and functioning which are not linear in nature.
For that purpose, we need some non-linear data structures. A tree structure means
that the data is organized such that items of information are related by branches. So
far we have been studying mainly linear types of data structures such as arrays,
lists, stacks and queues, now we will study a non -linear data structure called a
Tree. This structure is mainly used to represent data containing a hierarchical
relationship between elements, familiar examples of such structure are: family
trees, the hierarchy of positions in an organization, an algebraic expression
involving operations for which certain rules of precedence are prescribed etc. For
example, suppose we wish to use a data structure to represent a person and all of
his or her descendants. Assume that the person's name is Alan and he has 3
2
children, John, Joe and Johnson. Also suppose that Joe has 3 children, Alec, Mary
and Chris and Johnson has one child Peter. We can represent Alan and his
descendants quite naturally with the tree structure shown in Figure 1.1.
Fig 1.1
1.3 TREE
Definition of Tree: A Tree is a finite set of one or more nodes such that there is a
specially designated node called the root and remaining nodes are partitioned into
n>0 disjoint sets S 1 ....S n where each of these sets is a tree. S 1 ...S n are called the
subtrees of the root. If we look at Figure 1.1 we see that the root of the tree is
Alan. Tree has three subtrees whose roots are Joe, John and Johnson.
The condition that S 1 ....S n be disjoint sets prohibits subtrees from ever connecting
together.
Node: A node stands for the item of information plus the branches to other items.
Consider the Tree of Figure 1.2 it has 13 nodes.
Degree: The number of subtrees of a node is called its degree. In Figure 1.2 the
degree of node 1 is 3.
Leaf or Terminal Nodes: Nodes that have degree zero is called leaf or terminal
nodes. In Figure 1.2, 6,7,9,10,11,12,13 are 'Leaf' nodes, other nodes of Tree are
called 'Non-leaf' nodes.
Children: The roots of the subtrees of a node I are called the children of node I. I
is the 'parent' of its children.
Siblings: Children of the same parent are called 'siblings' 8,9,10, are siblings just
3
as 11 and 12.
Level: The 'level' of a node is defined by initially letting the root be at level 1. If a
node is at level l, then its children are at level l+1.
Height or Depth: The height or depth of a Tree is defined as the maximum level of
any node in the Tree.
Forest: A 'Forest' is a set of n>0 disjoint trees.
A Binary Tree is a finite set of elements that is either empty or is partitioned into
three disjoint subsets. The first subset contains a single element called the Root of
the tree. The other two subsets are themselves Binary Trees, called the left and
right subtrees of the original tree. A left or right subtree can be empty, Figure 1.3
shows a typical Binary Tree. A node of a Binary Tree can have at the most two
branches.
If A is the root of a Binary Tree and B,C are the roots of its left and right subtrees
respectively then A is said to be the father of B, C and B are called Left and Right
Sons respectively. If every non-leaf node in a Binary Tree has non-empty Left and
Right subtrees the tree is termed as Strictly Binary Tree. The Binary Tree of Figure
64 is Strictly Binary Tree. A Strictly Binary Tree with n leaves always contains 2n -
1 nodes.
A Complete Binary Tree is a Strictly Binary Tree of depth 'd' whose all l eaves are
4
at level d. Figure 1.5 represents complete Binary Tree.
Fig. 1.6
The importance of using Binary Trees to represent tree is that, for Binary trees only
about 1/2 of the link fields are zero. In arriving at the binary tree representation of
a tree we shall implicitly make use o f the fact that the order of the children of a
node is not important. "Parent-child" relationship between the nodes of a tree is
converted into the "Left Most -Child-Next-Right-Sibling" relationship Every node
has at most one left most child and at most one next right sibling. Rchild of the root
node of every resulting binary tree will be empty.
Example: Suppose we have a tree as given in the Fig 1.7
Fig.1.7
The binary tree corresponding to tree of Fig 1.7 is obtained by connecting together
all the siblings of a node and deleting all links from a node to its children except
for the link to its left most child. The node structure corresponds to that and shown
below:
6
Using the transformation defined above we obtain the representation for the tree as
shown in Fig 1.8
Fig. 1.8
On tilting it roughly 45 o clockwise we get the Fig 1.9.
Fig. 1.9
9
{ /*s is a pointer points to the existin g tree*/
struct btree *temp, *p;
temp = malloc(sizeof(struct btree));
temp data =num; temp left=temp right = NULL;
if(s==NULL)/*if current tree is input*/
s=temp;
else
{
p=s;
while(p left!=Null&& p right!=Null)
{
if (p data<=num) //if num is greater
p=p right ; //than p data go right
p=p left; // else go left
else
}
if (p data <=num)
p right =temp;
else
p left =temp;
}
}
The functioning of the above code can be explained through an example. Suppose
we have to make a tree with the following data:
5, 3, 7, 8, 2, 4, 6, 1.
When 5 came the tree was empty hence tree is formed having only one node pointed
by temp and s. Both the child pointers of this node are NULL.
Then next number is 3. In the while loop this number is compared with 5 because 5
is greater than 3, control is moved to the left child of 5 which is Null. After exiting
from while loop 3, is attached to the left of 5 next number is 7 which is greater than
10
5 hence, it is attached to the right of 5. Proceeding in the same manner the Binary
tree of Figure 1.12 is produced.
Fig. 1.12
Such a binary tree has the property that all elements in the left subtree of a node n
are less than the contents of n, and all elements in the right subtree of n are greater
than or equal to the contents of n. Binary Tree with this property is called a Binary
Search Tree.
In the second phase we can Traverse a Tree in either of the three fashions (ie.
Inorder, Preorder, Postorder). A Recursive Method can be used to traverse a Binary
Tree. The method starts by checking whether the tree is empty or not. If the tree is
not empty i.e. our pointer s of struct btree type doesn't point to Null then we will
proceed. Depending upon the desired fashion of traversing, we will call recursively
the traversing function by passing either left child pointer or right child po inter of
the node as an argument. The function to implement this is given below:
/* Inorder Traversing of a Binary Search Tree */
inorder (struct btree*s)
{
if (s!=NULL)
{
inorder(s®left);
printf("%d", s®data);
} inorder(s®right);
else
} return;
11
/*Preorder Traversing of a Binary Search Tree*/
preorder (struct btree*s)
{
if (s! = NULL)
{
printf("%d",s data);
preorder(s left);
preorder(s right);
}
else
return;
}
/* Post order Traversing of Binar y Search Tree */
postorder (struct btree *s)
{
if (s!=NULL)
{
postorder (s left);
postorder (s left);
} printf("%d", s data);
else
return;
}
Fig. 1.13
Figure 1.14 represents case (c) in which node to be deleted has one child; we have
to adjust the pointer of the parent of the node to be deleted such that after deletion
it points to the child of the node being deleted. In Figure 1.14(a) node 1 with data 1
is to be deleted. After deletion left pointers of node 6 is made to point to child of
node 1 i.e. 4 and now structure would be as shown in Figure 1.13(b).
Fig. 1.14
In case (d) the value is replaced by the smallest value in the right subtree or the
13
largest key value in the left subtree; subsequently empty node is deleted
recursively, Figure 1.15 represents this case. If the node with data value 6 is to be
deleted, then first its value is replaced by smallest value in its right subtree.
Adjustment is done for the parent of this node (with smallest value) according to
case (b) or (c), then the node is deleted.
Fig. 1.15
Function for implementing the deletion procedure is given below:
delete(Tree *root, int num)
{
int found;
tree*par, *x, *xsucc;
if(root= = NULL)
{
print f(\n Empty Tree");
} return;
par=x=NULL;
found =search (root, num, par & x)
/*Search function to find the node to be deleted*/
if (found = =0)
{
print f("\n Data to be deleted, not found");
14
} return;
}
/*If the node to be deleted has two children */
if (x-->left!=NULL&&x right!=NULL)
{
par=x;
xsucc=x right;
while(xsucc--->left!=NULL)
{
par=xsucc;
xsucc=xsucc left;
}
x data=xsucc data;
x=x succ;
}
/* If the node to be deleted has no child */
if (x left==NULL && x right = =NULL)
{
if(par right==x)
par right =NULL;
else
par left=NULL;
free(x);
return;
}
/* If the node to be deleted has only right child */
if (x-->left==NULL && x---> right!=NULL)
{
if(par right =x)
par right =NULL;
15
else
par left=NULL;
else
par left=NULL;
free(x);
return;
}
/* If the node to be deleted has only right child */
if(xleft==NUll && xright ! =NULL)
{
if (par left =x)
parleft =xright;
else
parright=xright;
free(x);
} return;
/ * If the node to be deleted has only left child */
if (xleft! =NULL && xright ==NULL)
{
if (parleft = =x)
parleft=x-->left;
else
parright =xleft;
free (x);
return;
Fig. 1.16
17
Fig.1.17
In order to avoid dangling Pointers the entire tree is treated as a left subtree of the
head node. Fig 1.18 structure of any node of a Threaded Binary Tree will be as
shown below.
Fig 1.18
This structure can be declared as:
Struct three
{
int lthread;
int rthread;
int data ;
struct thtree * rchild;
struct thtree * lchild;
}
18
this node. Function to implement this is given below:
insert (struct thtree * s, int x)
{
struct thtree * head ; *p, *z;
head =s;
/* allocating a New Node */
z= malloc(sizeof(struct thtree));
zleft=True; //Indicates a thread
zdata =x; //Assign new data
zright=True;//Indicates a thread
/* If tree is Empty */
if (s==NULL)
{
head =malloc(sizeof(struct thtree));
headleft=false;
headlchild=z; /*z becomes left child of head node
headdata =-9999; /* Garbage */
headright =false;
s=head;
zlchild=head; //left thread to Head
zrchild =head;//right thread to Head
}
else /* if tree is not empty */
{
p=headlchild;
while (p!=head) // traverse till the thread is found attached to
the
head
{
if (pdata >x)
19
{
if (pleft!= true) //checking for a thread
p=plfield;
else
{
zlchild =plchild;
plchild = z;
pleftr=false; /*indicates a link */
zright=true;
zrchild =p;
return ;
}
}
else
{
if (pdata<x)
{
if (pright !=true)
p=p rchild;
else
{
zrchild =prchild;
p rchild =z;
pright =false; /* indicates a link */
zleft =true;
zchild =p;
return;
}
}
}
20
}
}
}
Fig. 1.19
22
1.7 SUMMARY
23
12. In an M-way search Tree any node has at the most m children and m -1 keys.
13. A 'B' tree of order m can have at the most m children and m -1 keys and at least
m/2 keys (except the root node and leafs).
14. We have to split the node if before insertion it is full upto its capacity.
15. A 'B' tree grows in upward direction and the level increases only after
splitting.
1.8 KEYWORDS
Tree: A finite set of one or more nodes used to represent data containing a
hierarchical relationship between elements.
Node: The Item of information
Degree: The number of subtrees of a node
Leaf: Nodes that have degree zero
Siblings: Children of same parent
Children: Nodes with the same root node.
Forest: A set of n ³ 0 disjoint tree.
Binary tree: A finite set of elements that is either empty or is partitioned into three
disjoint subsets.
Complete Binary Tree: A strictly Binary tree of depth ‘d’ whose all leaves are at
level d.
Binary Search tree: A binary tree with the condition that either it is empty or each
node in the tree contains an identifier and all identifiers in its left subtree are less
than the identifier in its root node, all identifiers in its right subtree are greater than
the identifier in its root node.
Height Balanced Tree: An empty tree is height balanced. A non-empty binary tree
with HL and HR as its left and right subtrees is he ight balanced if HL and HR are
height balanced
HL – HR < 1 where hl and hr are the heights of HL and HR respectively.
'B' Tree: An M-way search tree that is either empty or is of height 1 and satisfies
the following properties:
24
1. The root node has at least 2 children
2. All nodes other than the root node and failure nodes have at least [m/2] children.
3. All failure nodes are at the same level.
25
13. How the use of Threads removes all the drawbacks of Binary Search Trees.
14. Write a function to create a Threaded Binary Tree.
15. Insert the following keys in an empty AVL Tree: MARCH, MAY,
NOVEMBER, AUGUST, APRIL, JANUARY, DECEMBER, JULY,
FEBRUARY, JUNE, OCTOBER and SEPTEMBER
16. Insert the following keys into a 'B' Tree of order 3:
a, f, b, k, h, m, e, s, r, e, l, n, u, p.
17. Explain the method for deleting an element from a 'B' Tree. Explain when we
require merging of nodes while deleting with examples.
26
1.11 REFERENCES/SUGGESTED READING
1. Birkhanser-Boston, An Introduction to Data Structures and Algorithms,
Springer-New York
2. Seymour Lipschutz, “Data Structure”, Tata McGraw Hill.
3. Horowitz, Sahni & Anderson-Freed, “Fundamentals of Data Structure in C”,
Orient Longman.
4. Trembley, J.P. And Sorenson P.G., “An Introduction to Data Structures with
Applications”, McGraw- Hill International Student Edition, New York.
5. Yedidyan Langsam, Moshe J. Augenstein and Aaron M. Tenenbaum, “Data
Structures using C”, Prentice Hall of India Pvt. Ltd., N ew Delhi.
6. Mark Allen Weiss, “ Data structures and Algorithm Analysis in C”, Addison -
Wesley (An Imprint of Pearson Education), Mexico City, Prentice -Hall of
India Pvt. Ltd., New Delhi.
27
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 2
SPECIAL TREES
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
2.1 Learning Objectives
2.2 Introduction
2.3 Height Balanced Trees or AVL Trees
2.3.1 Building Height Balanced Trees
2.4 Multiway Search Trees
2.5 B Tree
2.5.1 Insertion into A B Tree
2.5.2 Deletion from A B Tree
2.6 Check your progress
2.7 Summary
2.8 Keywords
2.9 Review Questions
2.10 References/Suggested Reading
28
2.1 LEARNING OBJECTIVES
2.2 INTRODUCTION
In the last few units we studied about linear data structures. In many realistic
situations we need to represent data and functioning which are not linear in nature.
For that purpose, we need some non-linear data structures. A tree structure means
that the data is organized such that items of information are related by branches. So
far we have been studying mainly linear types of data structures such as arrays,
lists, stacks and queues, now we will study a non -linear data structure called a
Tree. This structure is mainly used to represent data containing a hierarchical
relationship between elements, familiar examples of such structure are: family
trees, the hierarchy of positions in an organization, an algebraic expression
involving operations for which certain rules of precedence are prescribed etc. For
example, suppose we wish to use a data structure to represent a person and all of
his or her descendants. Assume that the person's name is Alan and he has 3
children, John, Joe and Johnson. Also suppose that Joe has 3 children, Alec, Mary
and Chris and Johnson has one child Peter. We can represent Alan and his
descendants quite naturally with the tree structure shown in Figure 2.1.
29
Fig 2.1
Fig 2.2
30
Fig 2.3
The tree given above is height balanced, but it is not completely balanced. The
figure above tree is height balanced, and is completely balanced also. An almost
height balanced tree is called an AVL tree. AVL may or may not be perfectly
balanced.
We can give a recursive definition of AVL trees as given below:
Definition
An empty tree is height balanced, if T is a non-empty binary tree with HL and HR
as its left and right subtrees, then T is height balanced if:
i) HL and HR are height balanced and
ii) hl-hr<1 where hl and hr are the heights of HL and HR respectively.
In order to prevent a tree from becoming unbalanced we associate a 'balance factor'
with each node of the height balanced tees. This indicator will contain one of the
three values which we denote as Left(L), Right (R) or Balance (B) according to the
following definitions:
Left: A node will be called 'left heavy' if the longest path in its left subtree is
longer than the longest path of its right subtree.
Balance: A node will be called 'balanced' if the longest path in both of its
subtrees are equal.
Right: A node will be called 'right heavy' if the longest path in its right subtree
is one longer than the longest path in its left subtree.
In a 'Balanced Tree' each node must be in one of these three states. If there exists
a node in a tree where this is not true, then such a tree is said to be unbalanced. In
31
other words, a tree is said to be height balanced if all of its nodes have a balance
factor of 1,0.
Fig 2.4 gives examples of trees which are balanced. 2.5 represents examples of
unbalanced trees.
Figure 2.4
Fig 2.5
32
of +1 or -1. This particular node is called the pivot node.
Inclusive of and below the pivot node, recompute all balance factors along th e
insertion path. It can balance factors along the insertion path. It can be shown
that no nodes other than these can possibly change their factors using the AVL
method.
Determine whether the absolute value of the pivot nodes balance factor switched
from 1 to 2.
If there was such a switch, perform a manipulation of tree pointers contested at
the pivot node to bring the tree back into height balance, Because the visual
effect of this pointer manipulation is to "rotate" the subtree whose root is the
pivot node, the operation in frequently referred to as an AVL-Rotation.
Each node of an AVL tree has the property that the height of the left subtree is
either one more, equal or one less than the height of the right subtree. Balanced
Factor (BF) = (height of left subtree-height of right subtree)
If two subtrees are of same height BF = 0
If right subtree is higher BF = -1
If left subtree is higher BF = +1
33
factor becomes + 2. The following characterization of rotation types is obtained.
LL New Node Y is inserted in the left subtree of the left subtree of A.
LR Y is inserted in the right subtree of the left subtree of A.
RR Y is inserted in the right subtree of the right subtree of A.
RL Y is inserted into the left subtree of the right subtree of A.
34
pivot = y;
if bf(pivot) = 0 then
bf(right (pivot)) = 0;
bf(left (pivot)) = 0;
else if
bf(pivot) = 1 then
bf(pivot) = 0;
bf(left (pivot)) = 0
bf(right (pivot)) = -1
else
bf(pivot) = 0;
bf(right (pivot)) = 1;
bf(left (pivot)) = 0;
endif;
endif;
Algorithm to implement RL rotation is given below:
x = left(pivot);
y = right(x);
left (pivot) = right(y);
right(x) = left(y);
left(y) = x;
right(y) = pivot;
pivot = y;
if bf(pivot) = 0 then;
bf(left (pivot)) = 0;
bf(right (pivot)) = 0;
else if bf(pivot) = 0
bf(left (pivot)) = 0;
bf(right (pivot)) = -1;
else
bf(pivot) = 0;
35
bf(left (pivot)) = 1
bf(right (pivot)) = 0
endif;
endif;
Example: Creation of an AVL Tree
Assume that the insertions are made in the order – March, May, November, August,
April, January, December, July, February, June, October and September.
New identifier After Insertion After Balancing
1. March No rebalancing needed
3. November
5. April
36
6. January
10. June
38
2.4 MULTIWAY SEARCH TREES
Binary Search Trees generalize directly to Multiway Search Trees (M -way Search
Trees) in which for some integer called the order of the tree, each node has at the
most m children. If k<m is the number of children , then the node contains exactly k-
1 keys, which partitions all the keys into k subsets. If some of these subsets are
empty, then the corresponding children in the tree are empty.
An M-way Search Tree, T, is a tree in which all nodes are of degree < = m. I f T is
empty, (i.e. T = 0) then T is an M-way Search Tree. When T is not empty it has the
following properties:
1. T is a node of the type
2. n, A 0 , (K 1 , A 1 ), (K 2 , A 2 ) ... (K n , A n )
3. where the A i , 0 £ i £ n are pointers to the subtrees of T and the K i , 1 £ i £ n are
key values; and 1 £ n < m.
4. K i < K i+1 , 1 £ i £ n.
5. All key values in the subtrees A i are less than the key value K i+1 , 0 £ i < n.
6. All key values in the subtree A n are greater than K n .
7. The subtrees A i , 0 £ i £ n are also M-way Search Trees.
39
Node Schematic Format
a 2, b, (20, c), (40, d)
b 2, 0, (10, 0), (15, 0)
c 2, 0, (25, 0), (30, e)
d 2, 0, (45, 0), (50, 0)
e 1, 0, (35, 0)
Fig 2.7 Example of 3-Way Search Tree
2.5 B TREE
40
o The leaves of the tree must all be at the same level and
o All nodes except the leaves must have at least minimum of children.
B-Tree of order M has following properties:
1. Each node has a maximum of M children and a minimum of M/2 children. Root
node can have minimum two to the maximum number of children.
2. Each node has one fewer key than children with a maximum of M -1 keys.
3. Keys are arranged in a defined order within the node. All keys in the subtree to
the left of a key are predecessors of the key and that on the right are successors
of the key.
4. When a new key is to be inserted into a full node, the node is split into two
nodes, and the key with the medium value is inserted in the pr esent node. In case
the parent node is the root, a new root is created.
5. All leaves are at the same level, i.e. there is no empty subtree above the level of
the leaves.
B-trees of high order are desirable since they result in a reduction in the number of
disk accesses needed to search an index. If the index has N entries then a 'B' tree of
order m=N+1 would have only one level. This choice of m clearly is not reasonable
since by assumption the index is too large to fit in internal memory. Consequently,
the single node representing the index cannot be read into memory and processed.
In arriving at a reasonable choice of m, we must keep in mind that we are really
interested in minimizing the total amount of time needed to search the B -tree for a
value x.
The first four keys will be inserted into one node. They are sorted into proper
order as they are inserted.
2. k:
Now there is no room left for the fifth key 'k' so its insertion causes a node to
split into two, and the median key 'f' moves up to create a new node, which is
new root.
3. d, h, m:
Since the split nodes are only half full, the next three keys can be inserted
without difficulty. These insertions can require rearrangement of the keys
within the node.
4. j:
42
Insertion of key 'j', again splits a node, and this time it is 'j' itself that is
median key, and therefore moves up to join f in the root.
5. e, s, i, r:
6. x:
7. c, l, n, t, u:
8. p:
The insertion of 'p', first splits the node origin ally containing k, l, m, n, sending the
median key m upwards into the node containing c, f, j, r, which is already full.
Hence this node in turn splits, and a new root containing 'j', is created.
43
2.5.2 DELETION FROM A 'B' TREE
If deletion of a value causes a node to be less than half full, it is combined with its
neighbouring nodes and this can propagate all the way to the root.
If the leaf contains more than the minimum of entries, then any one entry can be
deleted with no further action. If the leaf contains the minimum number, then we
first look at the two leaves (or, in the case of a node on the outside, one leaf) that
are immediately adjacent to each other and are children of the same node. If one of
these has more than the minimum number of entries, then one of them can be moved
into the parent node, and the entry from the parent node is moved into the leaf
where the deletion is occurring.
If the adjacent leaf has only the minimum number of entries, then the two leaves
and the median entry from the parent can all be combined as one leaf, which will
contain no more than the maximum number of entries allowed. If this step leaves
the parent node with too few entries, then the process propagates upward.
In the limiting case, the last entry is removed from the root, and then the height of
tree decreases. If the key to be deleted is present in a non -leaf node, and its
immediate successor or predecessor is in a leaf, then the immediate predecessor or
successor is promoted into the position occupied by the deleted key and the key is
deleted from the leaf.
Example of deletion from a B-tree:
1. Delete h, r:
2. Delete p:
2.7 SUMMARY
46
2.8 KEYWORDS
Tree: A finite set of one or more nodes used to represent data containing a
hierarchical relationship between elements.
Node: The Item of information
Degree: The number of subtrees of a node
Leaf: Nodes that have degree zero
Siblings: Children of same parent
Children: Nodes with the same root node.
Forest: A set of n ³ 0 disjoint tree.
Binary tree: A finite set of elements that is either empty or is partitioned into three
disjoint subsets.
Complete Binary Tree: A strictly Binary tree of depth ‘d’ whose all leaves are at
level d.
Binary Search tree: A binary tree with the condition that either it is empty or each
node in the tree contains an identifier and all identifiers in its left subtree are less
than the identifier in its root node, all identifiers in its right subtree are greater than
the identifier in its root node.
Height Balanced Tree: An empty tree is height balanced. A non-empty binary tree
with HL and HR as its left and right subtrees is height balanced if HL and HR are
height balanced
HL – HR < 1 where hl and hr are the heights of HL and HR respectively.
'B' Tree: An M-way search tree that is either empty or is of height 1 and satisfies
the following properties:
1. The root node has at least 2 children
2. All nodes other than the root node and failure nodes have at least [m/2] children.
3. All failure nodes are at the same level.
47
2.9 SELF ASSESSMENT TEST
48
FEBRUARY, JUNE, OCTOBER and SEPTEMBER
16. Insert the following keys into a 'B' Tree of order 3:
a, f, b, k, h, m, e, s, r, e, l, n, u, p.
17. Explain the method for deleting an element from a 'B' Tree. Explain when we
require merging of nodes while deleting with examples.
49
2.11 REFERENCES/SUGGESTED READING
50
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 3
HASHING
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
3.1 Learning Objectives
3.2 Introduction
3.3 Hash Tables
3.4 Hashing Function
3.4.1 Mid Square
3.4.2 Division
3.4.3 Folding
3.4.4 Digit Analysis
3.5 Terms Associated with Hash Tables
3.5.1 Identifier Density
3.5.2 Loading Factor
3.5.3 Synonyms
3.5.4 Collision
3.6 Bucket Overflow
3.7 Handling of bucket Overflows
3.8 Hash Indices
3.9 Check your Progress
3.10 Summary
3.11 Keywords
51
3.12 Self-Assessment Test
3.13 Answers to Check Your Progress
3.14 References/Suggested Reading
3.2 INTRODUCTION
In this unit we will study about hashing. One disadvantage of sequential file
organization is that we must access an index structure to locate data or must
use binary search but that results in more I/O operations. File organization
based on the technique of hashing allows us to avoid accessing an index
structure.
Hashing is the process of mapping large amount of data item to smaller
table with the help of hashing function.
Hashing is also known as Hashing Algorithm or Message Digest
Function.
It is a technique to convert a range of key values into a range of indexes
of an array.
It is used to facilitate the next level searching method when compared
with the linear or binary search.
Hashing allows to update and retrieve any data entry in a c onstant time
O(1).
52
Constant time O(1) means the operation does not depend on the size of
the data.
Hashing is used with a database to enable items to be retrieved more
quickly.
It is used in the encryption and decryption of digital signatures.
In tree tables, the search for an identifier key is carried out via a sequence of
comparisons. Hash differs from this, in that the address or location of an
identifier, X, is obtained by computing some arithmetic function f, of X. f ( X)
gives the address of X in the table. This address will be referred to as the hash
or home address of X.
The memory available to maintain the symbol table is assumed to be
sequential. This memory is referred to as the hash table, HT. The hash table is
partitioned into b buckets, HT(0), ... HT(b -1). Each bucket is capable of
holding records. Thus, a bucket is said to consist of s slots, each slot being
large enough to hold 1 record.
Usually s = 1 and each bucket can hold exactly 1 record. A hashing functio n, f
(x) is used to perform an identifier transformation on X. f(x) maps the set of
possible identifiers onto the integers 0 through b -1.
We use the term bucket to denote a unit of storage that can store one or more
records. A bucket is typically a disk bu t could be chosen to be smaller or
larger than a disk block.
53
probability that
f(x) = i will be 1/b for all buckets, then a random X has an equal chance of
hashing into any of the b buckets. A hash function satisfying this property will
be termed as "Uniform Hash Function".
The worst possible hash function maps all search key values to the same
bucket. This function is undesirable because all the records have to be kept in
the same bucket.
An ideal hash function distributes the stored keys uniformly across all the
buckets so that every bucket has the same numb er of records. We want to
choose a hash function that assigns search key values to buckets such that the
following holds:
The distribution is uniform, that is, each bucket is assigned the same
number of search key values from the set of all possible search key values.
The distribution is random, that is, in the average case, each bucket will
have nearly the same number of values assigned to it, regardless of the
actual distribution of search key values.
54
“a” = 1, “b” =2, … etc to all alphabetical characters. We can then simply
compute a number for each of the strings by using the sum of the characters as
follows.
“abc” = 1 + 2 + 3=6, “def” = 4 + 5 + 6=15, “ghi” = 7 + 8 + 9=24 If we assume
that we have a table of size 5 to store these strings, we can compute the
location of the string by taking the sum mod 5. So we will then store “abc” in
6 mod 5 = 1, “def” in 15 mod 5 = 0, and “ghi” in 24 mod 5 = 4 in locations 1,
0 and 4 as follows.
01234
def abc ghi
Now the idea is that if we are given a string, we can immediately compute the
location using a simple hash function, which is sum of the character’s mod
Table size. Using this hash value, we can search for the string. This seems to
be great way to store a Dictionary. Therefore, the idea of hashing seems to be
a great way to store pairs of (key, value) in a table.
Problem with Hashing
The method discussed above seems too good to be true as w e begin to think
more about the hash function. First of all, the hash function we used, that is
the sum of the letters, is a bad one. In case we have permutations of the same
letters, “abc”, “bac” etc in the set, we will end up with the same value for the
sum and hence the key. In this case, the strings would hash into the same
location, creating what we call a “collision”. This is obviously not a good
thing. Secondly, we need to find a good table size, preferably a prime number
so that even if the sums are different, then collisions can be avoided, when we
take mod of the sum to find the location. So we ask two questions.
Question 1: How do we pick a good hash function?
Question 2: How do we deal with collisions?
The problem of storing and retrieving data i n O(1) time comes down to
answering the above questions. Picking a “good” hash function is key to
successfully implementing a hash table. What we mean by “good” is that the
function must be easy to compute and avoid collisions as much as possible. If
55
the function is hard to compute, then we lose the advantage gained for lookups
in O(1). Even if we pick a very good hash function, we still will have to deal
with “some” collisions.
Finding a good hash Function
It is difficult to find a “perfect” hash function , that is a function that has no
collisions. But we can do “better” by using hash functions as follows. Suppose
we need to store a dictionary in a hash table. A dictionary is a set of Strings
and we can define a hash function as follows. Assume that S is a string of
length n and
S = S1S2…. Sn
H(S) = H(“S1S2…. Sn”) = S1 + p1S2 + p2S3 + ….+ pn -1 Sn
where p is a prime number. Obviously, each string will lead to a unique
number, but when we take the number Mod Table Size, it is still possible that
we may have collisions but may be fewer collisions than when using a naïve
hash function like the sum of the characters.
Although the above function minimizes the collisions, we still have to deal
with the fact that function must be easy to compute. Rather than direc tly
computing the above functions, we can reduce the number of computations by
rearranging the terms as follows.
H(S) = S1 + p ( S2 + p(S3 + …. P (Sn-1+ p Sn))))
This rearrangement of terms allows us to compute a good hash value quickly.
Several kinds of uniform hash functions are in use. We shall describe few of
them:
Table 3.1
3.4.2 DIVISION
Another simple choice for a hash function is obtained by using the modulo
(mod) operator. The identifier X is divided by some number M and the
remainder is used as the hash address of X.
f D (x) = X mod M.
This gives bucket address in the range 0 - (M-1) and so the hash table is at
least of size b = m. M should be prime number such that M does not divide
57
r k + a where r is the radix of the character set and k and a are very small
numbers.
Fig. 3.2 Identifier A1 right and left justified and zero filled (6 bits per
character)
3.4.3 FOLDING
In this method identifier X is partitioned into several parts, all but the last
being of the same length. These parts are then added together to obtain the
hash address for X. There are two ways to carry out this addition.
In the first, all but the last part is shifted so that the least significant bit of
each part lines up with the corresponding bit of the last part. The different
parts are now added together to set f (x).
58
Fig. 3.4 Folding at Boundaries P i r = Reverse of P i
3.5.3 SYNONYMS
The hash function f almost always maps several different identifiers into the
same bucket. Two identifiers I 1 , I 2 are said to be synonyms with respect to f if
f(I 1 ) = f (I 2 ). Distinct synonyms are entered into the same bucket so long as
all the s slots in that bucket have not been used.
3.5.4 COLLISION
A collision is said to occur, when two non -identical identifiers are hashed into
the same bucket. When the bucket size is 1, collision and overflow occurs
simultaneously.
So far we have assumed that, when a record is inserted, the bucket to which it
is mapped has space to store the record, if the bucket does not have enough
space, a bucket overflow is said to have occurred. Bucket overflow can occur
due to several reasons:
INSUFFICIENT BUCKETS
The number of buckets which we denote by nb, must be chosen such that
nb>nr/fr, where nr denotes the total number of records that will be stored and
fr denotes the number of records that will fit in a bucket.
SKEW
Some buckets are assigned more records than others, so a bucket may overflow
even when other buckets still have space. This situation is called bucket skew.
60
3.7 HANDLING OF BUCKET OVERFLOWS
When situation of overflow occurs it should be resolved and the records must
be placed somewhere else in the table.
We shall describe here some of the methods used for overflow resolution.
OVERFLOW CHAINING
Despite allocation of a few more buckets than required, bucket overflow can
still occur. We handle bucket overflow using Overflow Buckets. If a record
has be inserted into a bucket b, and b is alre ady full, an overflow bucket is
provided for b, and the record is inserted into the overflow bucket. If the
overflow bucket is also full, another overflow bucket is provi ded and so on.
All the overflow buckets of a given bucket are chained together in a linked
list. Overflow handling using such a linked list is ca lled Overflow Chaining.
Hash table is considered as an array of pointers to the records, that is, an array
of linked list.
ADVANTAGES OF CHAINING
Space Saving: Since the hash table is a contiguous array, enough space
61
must be set aside at compilation time to avoid overflow. On the other
hand, if the hash table contains only the pointers to the records, then the
size of the hash table may be reduced.
Collision Resolution: Chaining allows simple and efficient collision
handling. To handle collision only a link field is to be added.
Overflow: It is no longer necessary that the size of the hash table exceeds
the number of records.
If there are more records than entries in the table, it means that some of the
linked lists serve the purpose of containing more than one record.
DELETION
Deletion proceeds in exactly the same way as deletion from a simple linked
list. So in a chained hash table deletion becomes a quick and easy task.
OPEN HASHING
The form of hash structure that we have just described is sometimes referred
to as closed hashing. Under an alternate approach, called open hashing, the
set of buckets is fixed and there are no overflow chains, instead if a bucket is
full, records are inserted in some other bucket in the initial set of buckets B.
Two policies are used when assigning a new bucket to a record.
LINEAR PROBING
One policy is to use the next bucket (in cyclic order) that has space. This
policy is called "linear probing".
It starts with a hash address (the location where the collision occurred) and
does a sequential search through a table for the desired key or an empty
location. Hence, this method searches in straight line, and it is therefore called
linear probing.
The table should be considered circular, so that when the last location is
reached, the search proceeds to the first location of the table.
REHASHING
Another policy, in which we compute another hash function for a new record,
is used. If this position is filled, then some other method is needed to get
another position and so on.
62
CLUSTERING
The major drawback of linear probing is that as the table becomes about half
full, there is a tendency towards clustering, that is records start to appear in
long strings of adjacent positions with gaps between the strings.
QUADRATIC PROBING
If there is a collision at hash address h, this method probes the table at
locations H+1, H+4, H+9 ..., that is at location H+i 2 for i = 1, 2, ... that is, the
increment function is i 2 . Quadratic probing reduces clustering, but it will not
probe all locations in the table.
KEY DEPENDENT INCREMENTS
In key dependent increments we assume a function which can be part of the
key itself.
Example: We could truncate the key to a single character and use its code as
the increment.
63
Fig. 3.6
Hashing can also be used for index -structure beside for file organization.
Search keys are organized by hash index with their associated pointers, into a
hash file structure.
We apply a hash function on a search key to identify a memory block (bucket)
and store that key, and its associated pointers in a bucket. Figure 14.5 shows a
secondary hash index on the EMPLOYEE file, for the search key emp_no. The
64
hash function computes the modulo 7 of the sum of the digits of the emp_no.
The hash index has seven buckets, each of size 2. In this example, emp_no is a
primary key.
Since, a file itself is organized using hashing, there is no need for a separate
hash index on it. Hence, hash i ndex is never needed as a primary index
structure instead used only as a secondary structure. EMPLOYEE file has
emp_no (primary key). emp_name fields.
65
3.10 SUMMARY
3.11 KEYWORDS
Hash Table: The memory available to maintain the symbol table is assumed to
be sequential. Their memory is referred to as t he hash table.
Collision: A collision is said to occur, when two non -identical un-identifiers
are hashed into the same bucket.
Bucket Skew: Situation where some buckets are assigned more records than
others, so a bucket may overflow even when other buckets still have space.
1. c
2. d
3. Answers to fill in the blanks
a. bucket address
b. Digit Analysis
c. Collision
d. insufficient buckets, bucket skew
e. hash index
67
6. Mark Allen Weiss, “Data structures and Algorithm Analysis in C”,
Addison-Wesley (An Imprint of Pearson Education), Mexico City,
Prentice-Hall of India Pvt. Ltd., New Delhi.
68
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 4
GRAPHS
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
4.1 Learning Objectives
4.2 Introduction
4.3 Graph
4.3.1 Representation of Graphs
4.3.2 Adjacency Matrix
4.3.3 Adjacency List
4.3.4 Adjacency Multilist
4.4 Traversal of a Graph
4.4.1 Depth First Search
4.4.2 Breadth First Search
4.5 Weighted Graphs
4.6 Spanning Trees
4.7 Coding for Graph Traversal
4.8 Check your Progress
4.9 Summary
4.10 Keywords
4.11 Self-Assessment Test
4.12 Answers to Check your Progress
4.13 References/Suggested Reading
69
4.1 LEARNING OBJECTIVES
4.2 INTRODUCTION
In the last few units we studied about trees. In this unit we will study another very
important non-linear data structure i.e. graphs. We will also study different
representations and implementation of graphs. Graph is a superset of trees. When
certain restrictions and limitations are imposed on a Graph then it becomes a tree.
Hence, every tree is a graph but every graph is not necessarily a tree.
4.3 GRAPH
A graph G consists of a non-empty set V called the set of nodes (points, vertice s) of
the graph, a set E which is the set of edges of the graph and a mapping from the set
of edges E to a pair of elements of V.
Any two nodes which are connected by an edge in a graph are called "adjacent
nodes".
In a graph G(V,E) an edge which is directed from one node to another is called a
"directed edge", while an edge which has no specific direction is called an
"undirected edge". A graph in which every edge is directed is called a "directed
70
graph" or a "digraph". A graph in which every edge is undirected is called an
"undirected graph".
If some of the edges are directed and some are undirected in a graph then the graph
is called a "mixed graph".
Let (V,E) be a graph and xÎE be a directed edge associated with the ordered pair of
nodes (u,v), then the edge is said to be "initiating" or "originating" in the node u
and "terminating" or "ending" in the node y. The nodes u and v are also called
"initial or terminal" nodes of the edge x. An edge xÎE which joins the nodes u and
v, whether it be directed or undirected, is said to be "incident" to the node u and v.
An edge of a graph which joins a node to itself is called a "loop". In some directed
as well as undirected graphs we may have certain pairs of nodes join ed by more
than one edge. Such edges are called "parallel". Any graph which contains some
parallel edges is called a "multigraph". If there is not more than one edge but a
pair of nodes then, such a graph is called "simple graph." A graph in which
weights are assigned to every edge is called a "weighted graph". In a graph, a
node which is not adjacent to any other node is called "isolated node". A graph
containing only isolated nodes is called a "null graph". In a directed graph for any
node v the number of edges which have v as initial node is called the "outdegree" of
the node v. The number of edges to have v as their terminal node is called the
"Indegree" of v and sum of outdegree and indegree of a node v is called its "total
degree".
In the case of undirected graph the total degree of v is equal to the number of edges
incident on v. The total degree of a loop is 2 and that of an isolated node is 0.
Any sequence of edges of a digraph such that the terminal node of the edge if any,
appearing next in the sequence defines the path of the graph. A path is said to
traverse through the nodes appearing in the sequence originating in the initial node
of the first edge and ending in the terminal node of the first edge and ending at the
terminal node of the last edge in the sequence. The number of edges in the sequence
of a path is called the "length" of the path.
A path of a digraph in which the edges are distinct is called simple path (edge
simple). A path in which all the nodes through which traversing is done, ar e distinct
71
is called "elementary path (node simple)" .
A path which originates and ends in the same node is called "cycle(circuit)". A
cycle is called "elementary" if it does not traverse through any node more than
once.
Fig. 4.1
A Graph G, consists of two sets V and E. V is a finite non-empty set of vertices. E
is a set of pairs of vertices, these pairs are called edges. V(G) and E(G) will
represent the sets of vertices and edges of Graph G. In an undirected graph the pair
of vertices representing any edge is unordered. Thus , the pair (V 1 , V 2 ) and (V 2 ,
V 1 ) represent the same edge.
72
In a directed graph each edge is represented by a directed pair <V 1 , V 2 >, V 1 is the
tail and V 2 the head of edge. Thus <V 2 , V 1 > and <V 1 , V 2 > represent two different
edges.
73
A subgraph of G is a graph G' such that V(G') Í V(G) and E(G') Í E(G). A path from
vertex V p , vertex to vertex V q in graph G is a sequence of vertices V p , V i1 , V i2 ,
V i3 ... V in , V q such that (V p , V i ) (V i1 , V i2 ) ... (V in , V q ) are edges of E(G). If G' is
directed then the path consists of <V p1 , V i1 ,> <V i1 , V i2 >, ..., <V in , V q >, edges in
E(G'). The length of path is the number of edges on it. A simple path is a path in
which all vertices except possibly the first and the last are distinct. e.g., Paths 1, 2,
4, 3 and 1, 3, 4, 2 are both of length 3 in G1. The first is a simple path while the
other is not.
A cycle is a simple path in which the first and last vertices are the same. In an
undirected graph, G, two vertices V1 and V2 are said to be connected if there is a
path in G from V1 to V2. If the graph is undirected then there must also be a path
from V2 to V1.
An undirected graph is said to be connected if for every pair of distinct vertices Vi,
Vj in V(G) there is a path from Vi to Vj in G. A connected component or simply a
component of an undirected graph is a maximal connected subgraph.
e.g.,
74
strongly connected.
e.g.,
75
Fig. 4.6 Adjacency Matrix Representation of Graph
A simple digraph which doesn't have any cycle is called "acyclic". Naturally such
graphs cannot have any loop.
Example:
Let, V ={v 1 ,v 2 ,v 3 }
vxv ={(v 1 v 2 ), (v 1 v 3 ), (v 2 v 1 ),(v 2 v 3 ), (v 3 v 1 ),(v 3 v 2 ),(v 1 v 1 ),(v 2 v 2 ), (v 3 v 3 )}
Suppose we have a graph with 3 vertices: Consider the following graph with set V:
v1 v2 v3
v1 0 1 0
A= v2 0 0 0 Direction from row to column
v3 1 1 1
This matrix is also called "Adjacency martrix"
0 1 1
1 0 1
1 1 1
For a given digraph G (V,E) an Adjacency Matrix depends upon the ordering of the
elements of V. For different orderings of the elements of V we get different
adjacency matrices of the same graph G.
76
Path Matrix
We can extend the idea of matrix representation to multigraph and weighted graphs.
For simple undirected graphs such an extension simply gives Symmetric Adjacency
Matrix. In the case of multigraph or a weighted graph we write a ij =w ij , where w ij
denotes the weight of the edge (Vi,Vj). If (Vi, Vj) ÏE then we write w ij =0
For a null graph, Adjacency Matrix is a null matrix. If there are loops at each
node but no other edges in the graph then the Adjacency Matrix is the identity or
unit matrix.
Let us denote the elements of A 2 by a ij 2 then
V1 V2 V3 V4
V1 0 1 0 1
A [i] = V2 1 0 0 0
V3 1 1 0 1
V4 0 1 0 0
a ij (2) = n k=1 a ik ,a kj
for any fixed k a ik .a kj =1, if both a ik and a kj equals 1 i.e., (V i V k ) (V k V j ) are the
edges of the graph.
The matrices A 2 , A 3 and A 4 for the graph represented by the above matrix are:
V1 V2 V3 V4 V1 V2 V3 V4
V1 1 1 0 0 V1 1 1 0 1
A2 = V2 0 1 0 1 A3 = V2 0 1 0 1
V3 1 2 0 1 V3 2 2 0 1
V4 0 0 0 0 V4 1 1 0 0
77
V1 V2 V3 V4
V1 1 2 0 1
A4 = V2 1 1 0 1
V3 2 3 0 2
V4 1 1 0 0
((1) (2) (3).... can be considered as the length of path)
The matrix representing the paths of length less than or equal to n which exists from
V i to V j is
2 n
B n = A+A +......A
I Iteration
(1)
R=1, i=1, j=1,2,3 for p
P 11 =P 11 V (P 11 P 11 ) =0V(0L0) =0
P 12 =P 12 V (P 11 P 12 ) =1V(0L1) =1
P 13 =P 13 V (P 11 P 13 ) =0V(0L0) =0
79
P 21 =P 21 V(P 21 ^P 22 ) =0
P 22 =P 22 V (P 21 ^ P 22 ) =0
P 23 =0
80
the vertices adjacent to vertex i.
Each node has a head node. Head nodes are sequential providing easy random
access to the adjacency list for any particular vertex.
e.g., Graph Adjacency List
Fig. 4.7
81
where M is a one-bit mark field that may be used to indicate whether or not the
edge has been examined.
e.g.,
82
graph g= (V,E) and a vertex V in V(G), we are interested in visiting all vertices in
G that are reachable from (i.e. all vertices connected to V).
DFS of an undirected graph proceeds as follows:
The start vertex C is visited. Next an unvisited vertex w adjacent to V is selected
and a depth first search initiated. (Adjacent means connected by an arc from the
search node.)
Algorithm DFS (V)
1. Given an adjacent graph G=(V,E) with n vertices an array VISITED [N]
initially set to zero this algo visits all vertices reachable from V and VISITED [
] are global.
VISITED [V] ¬1
2. Check for adjacent vertex W to V and call recursively
for each vertex W adjacent to V do
if (visited (W)=0) then call DFS(W)
end
3. [exit]
Fig. 4.9
Traversing by DFS: V 1 , V 2 , V 4 , V 8 , V 5 , V 3 , V 6 , V 7
Traversing by BFS: V 1 , V 2 , V 3 , V 4 , V 5 , V 6 , V 7 , V 8
Analysis of Algorithms
1. DFS: If G is represented by its adjacency matrix, then the time to determine all
vertices adjacent to V is O(N). Since at the most n vertices are visited, the total
time is O(N 2 ).
2. BFS: Each vertex visited gets into the queue exactly once, so the
loop.....forever is iterated at most n times. If an adjacency matrix is used, then
the for loop takes o(n) times for each vertex visited.
The total time is therefore O(N 2 ).
A graph in which weights are assigned to every edge is called a weighted graph.
Shortest Path in Weighted Graphs
84
Let G= (v,e,w) be a weighted graph where w is weight (in terms of time or cost)
assigned to that edge. The length of a path in G is defined as the sum of lengths of
the edges in the path.
Dijkstra's Shortest Path Algorithm
Let TÍV with AÏT (A= starting nod e). Let P be the subset of vertices V -T, let L(T)
denote the length of a shortest path among all paths from A to T that does not
include any other vertex in T. We call L(T) the index of T with respect to P.
Let X be a vertex in T. Let P' be PU{X} and T' be T-{X}. Let L'(T) denote the
index of a vertex T in T' with respect to P'. We claim that
L'(T) =Min[L(T), L(X), L(W,T)] -----------(i)
Algorithm
1. Initially let P= [a] and T =V-[a] , for every vertex t in T, let l(t)= w (A,t).
2. Select the vertex in T that has the smallest index with respect to P, let it be
denoted as vertex x.
3. If x is the vertex we wish to reach from A stop. If not let P'=PU{x} and T=T -
{x}. For every vertex t in T compute its index with respect to P' accordin g to 1.
4. Repeat steps 2 and 3 using P' as P and T' as T.
5. Stop.
Example
Find the shortest path from A to Z for the given graph:
1.
Fig. 4.10
Solution
Initially P={A} and
T={B,C,E,D,Z}
The lengths of different vertices (with respect to P) are:
L(B)=1 , L(C)=4, L(D)=L(E)= L(Z)=
85
so L(B)= 1 is the shortest value
Now let P={A,B} and T={C,D,E,Z}
so, L(C)=3, L(D)=8, L(E)=6, L(Z)=
So L(C) =3, is the shortest value.
Now P={A,B, C} and T'={D,E,Z}
L'(D)= Min{8,}=8. L'(E)= Min{6,3+1}=4
L'(Z) = Min {,3+}=
Successive observations are shown in following figures:
2.
3.
4.
5.
6.
86
Fig. 4.11
Thus the shortest path from A to Z is (A, B, C, E, D, Z) of length 9.
87
V1 V2 V3 V4 V5 V6
V1 - 10 16 11 10 17
V2 10 - 9.5 ¥ ¥ 19.5
V3 16 9.5 - 7 ¥ 12
V4 11 ¥ 7 - 8 7
V5 10 ¥ ¥ 8 - 9
V6 17 19.5 12 7 9 -
Fig. 4.12
Start from vertex V 1 and connect it to its nearest neighbour say V k . Now
consider V 1 and V k as one subgraph and connect this subgraph to its closest
neighbour (i.e. to a vertex other than V 1 and V k that has the smallest entry
among all entries in rows 1 and k.)
2. Kruskal's Method
In this approach minimum cost spanning tree T is built edge by edge. Edges are
considered for inclusion in T in ascending order. An edge is included in T if it
doesn't form a cycle with edges already in T. Since G is connected and has n>0
vertices exactly (n-1) edges will be selected for inclusion in T.
(2,3) 5 Accept
(2,4) 6 Accept
88
(3,4) 10 Reject "
(2,6) 11 Accept
(4,5) 18 Accept
Fig 4.13
4.10 SUMMARY
1. A graph consists of two non-empty subsets E(G) and V(G), where V(G) is a set
of vertices and E(G) is a set of edges connecting those vertices.
2. Graph is a superset of tree. Every tree is a graph but every graph is not
necessarily a tree.
3. A graph in which every edge is directed is called directed graph or digraph. A
graph in which every edge is undirected is called an undirected graph.
4. A graph can be represented in various ways
a. By Adjacency Matrix
b. Path Matrix
c. Adjacency List
d. Adjacency Multilist.
5. There are two methods to traverse a graph.
a. Depth first search
b. Breadth first search
93
6. Spanning tree is a tree obtained from a graph which covers all its vertices.
7. A graph in which weights are assigned to every edge is ca lled Weighted Graph.
4.11 KEYWORDS
94
5. What are the different representations of a graph? Explain adjacency matrix
representation of a graph.
6. Write adjacency matrix representation of th e following graphs
95
17. Write Prim's algorithm for finding minimal spanning tree of any graph. Find the
minimal spanning trees of the graph of previous questions by Prim's algorithm.
18. By considering the complete graph with n vertices, show that the number of
spanning trees is at least 2 n-1 -1.
19. Prove that when DFS and BFS are applied to a connected graph the edges of the
graph form a tree.
20. What do you understand by shortest path from one node to another in a
weighted graph. Write Dijkstra's algorithm to find the shortest path in a
weighted graph. Find the shortest path from 3 to T using Dijkstra's algorithm in
the following graphs:
i)
ii)
96
4.13 ANSWERS TO CHECK YOUR PROGRESS
97
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 5
SORTING METHODS
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
5.1 Learning Objectives
5.2 Introduction
5.3 Sorting
5.3.1. Bubble Sort
5.3.2. Selection Sort
5.3.3. Insertion Sort
5.3.4. Merge Sort
5.3.5. Quick Sort
5.3.6. Heap Sort
5.4 Check your Progress
5.5 Summary
5.6 Keywords
5.7 Self-Assessment Test
5.8 Answers to Check Your Progress
5.9 References/Suggested Reading
98
5.1 LEARNING OBJECTIVES
After studying this unit, you should be able to:
1. Describe searching techniques
2. Describe sequential, binary and index sequential search
3. Define sorting
4. Describe various types of sorting
5. Describe 'O' notation
6. Describe bubble sort and quick sort techniques
5.2 INTRODUCTION
In the previous units, we have discussed many data structures and their storage
representations. Algorithms for a number of operations such as insertion and
deletion, which are commonly performed on these structures were described in
detail. This unit discusses two additional operations that are frequently performed
on data structures namely, sorting and searching. We will see that efficient
algorithms for these operations can be realized whe n data are properly structured.
5.3 SORTING
Sorting refers to the operation of arranging data in some given order, such as
increasing or decreasing, with numerical data or alphabetically, with character data.
Let P be a list of n elements P 1 , P 2 , P 3 ...P n in memory. Sorting P refers to the
operation of rearranging the contents of P so that they are increasing in order
(numerically or lexicographically), that is,
P 1 < P 2 < P 3 ...< P n
Since P has n elements, there are n! ways that the contents can appear in P. These
ways correspond precisely to the n! permutations of 1, 2, ...n. Accordingly, each
sorting algorithm must take care of these n! possibilities.
99
CLASSIFICATION OF SORTING
Sorting can be classified in two types:
1. Internal Sorting
2. External Sorting
Internal Sorting: If the records that have to be sorted are in the internal memory.
External Sorting: If the records that have to be sorted are in secondary memory.
EFFICIENCY CONSIDERATIONS
The most important considerations are:
1. The amount of machine time necessary for running the program.
2. The amount of space necessary for the program.
In most of the computer applications one is optimized at the expense of another.
The actual time units to sort a file of size n varies from machine to machine, from
one program to another, and from one set of data to another.
So, we find out the corresponding change in the amount of time required to sort a
file induced by a change in the file size n. The time efficiency is not measured by
the number of time units required but rather by the number of critical operations
performed.
Critical operations are those that take most of the execution time. For example, key
comparisons (that is, the comparisons of the key of two records in the file to
determine which is greater), movement of records or pointer to record, or
interchange of two records.
101
This enhancement speeds up the process.
Although n-1 passes are required to sort a file, however, in the preceding example
the file was sorted after five iterations making the last two iterations unnecessary.
To eliminate the unnecessary passes we keep a record of whether or not any
interchange is made in a given pass, so that further passes can be eliminated.
If the initial file is sorted, only one comparison is made on each pass, so that sort is
106
O(n). If the file is initially sorted in reverse order, the sort is O(N 2 ), since the total
number of comparisons are:
(n-1) + (n-2) + ... + 3 + 2 + 1 = (N-1) * N/2
which is O(N 2 ).
The closer the file is to sorted order, the more efficient the simple insertion sort
becomes. The space requirements for the sort consists of only one temporary
variable, Y. The speed of the sort can be improved somewhat by using a binary
search to find the proper position for x[k] in the sorted file.
x[0], x[1] ... x[k-1]
This reduces the total number of comparisons from O(N 2 ) to O(n log 2 (n)).
However even if the correct position i for x[k] is found in O(log 2 (n)) steps, each of
the elements x[i+1] ... x[k-1] must be moved by one position. This latter operation
to be performed N times requires O(N 2 ) replacement.
110
12 25 57 48 37 92 86 33
At this point, 25 is in its proper position in the array (x[1]), each element below
that position (12) is less than or equal to 25, and each element above that position
(57, 40, 37, 92 86 and 33) is greater than or equal to 25.
Since 25 is in its final position the original problem has been decomposed into the
problem of sorting the two subarrays.
(12) and (57 48 37 92 86 33)
First of these subarrays has one element so there is no need to sort it. Repeating the
process on the subarray x[2] through x[7] yields:
12 25 (48 37 33) 57 (92 86)
and further repetitions yield
12 25 (37 33) 48 57 (92 86)
12 25 (33) 37 48 57 (92 86)
12 25 33 37 48 57 (92 86)
12 25 33 37 48 57 (86) 92
12 25 33 37 48 57 86 92
Algorithm to Implement Quick Sort
qsort (int x[ ], int m, int n)
{
/* m and n contains upper and lower bounds of array * /
/* array has to be sorted in non -decreasing order */
int i, j, k, t;
if (m < n)
{
i = m;
j = n+1;
k = x [m]; /* key */
while(1)
{
do
{
111
i = i+1;
}
while (x [i] < k);
do
{
j = j - 1;
}
while (x [j] > k);
if (i < j)
{
t = x [i];
x [i] = x [j];
x [j] = t;
}
else break;
}
}
t = x [m];
x [m] = x [j];
x [j] = t;
q sort (x, m, j -1);
q sort (x, j+1; +n);
}
We shall illustrate the mechanics of this method by applying it to an array of
numbers. Suppose the array A initially appears as:
(15, 20, 5, 8, 95, 12, 80, 17, 9, 55)
Figure 5.2 shows a quick sort applied to this array.
112
A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8) A(9) A(10)
15 20 5 8 95 12 80 17 9 55
9 20 5 8 95 12 80 17 () 55
9 () 5 8 95 12 80 17 20 55
9 12 5 8 95 () 80 17 20 55
9 12 5 8 () 95 80 17 20 55
9 12 5 8 15 95 80 17 20 55
Fig. 5.2 Quick sort of an array
The following steps are involved:
1. Remove the Ist data item, 15, mark its position and scan the array from right to
left, comparing data item values with 15. When you find the Ist smaller value,
remove it from its current position and put in position A(1).
2. Scan line 2 from left to right beginning with position A(2), comparing data item
values with 15. When you find the Ist value greater than 15, extract it and store
in the position marked by parentheses in line 2.
3. Begin the right to left scan of line 3 with position A(8) looking for a value
smaller than 15. When you find it, extract it and store it in the position marked
by the parentheses in line 3.
4. Begin scanning line 4 from left to right at position A(3), find a value greater than
15, remove it, mark its position, and store it inside the parentheses in line 4. This
is shown in line 5.
5. Now, when you scan line 5 from right to left beginning at position A(7), you find
no value smaller than 15. Moreover, you com e to a parentheses position, position
A(5). This is the location to put the Ist data item, 15, as shown in line 6. At this
stage 15 is in its correct place relative to final sorted array.
ANALYSIS OF QUICK SORT
Assumption: i) File is of size n where n is a power of 2.
Say n = 2 m, so that m = log 2 n.
Average Case Behaviour
113
Proper position for the pivot always turns out to be exact middle of the subarray.
There will be approximately n comparisons on the first pass after which file will
split into two subfiles each of size n/2 approximately.
For each of these two files there will be approximately n/2 comparisons. So, after
halving the subfiles m times, there are n files of size 1. Thus the total number of
comparisons for the entire sort is approximately:
n+2 * (n/2) + 4 * (n/4) + 8 * (n/8) + .. + n* (n/n)
or
n + n + n + n + .. n (m terms)
There are m terms because the file is divided m times. Thus the total number of
comparisons are:
O(n * m) or O(n log n) [as m = log 2 n]
114
respectively.
A complete binary tree is said to satisfy the "Heap Condition" if the key of e ach
node is greater than or equal to the keys in its children. Thus, the root node will
have the largest key value. Trees can be represented as arrays, by first numbering
the nodes (starting from the root) from left to right.
The key value of nodes are then assigned to array positions whose index is given by
the number of the node.
The relationships of the node can also be determined from the array representation.
If a node is at position j, its children will be at positions 2j and 2j+1. Its parent will
be at position j/2. A heap is a complete binary tree in which each node satisfies the
heap condition, represented as an array. The operation on a heap works in two
steps:
1. The required node is inserted / deleted / or replaced.
2. First step may cause violation of the heap condition so the heap is traversed and
modified to rectify any such violations.
INSERTION IN HEAP
115
4. R P, therefore, the heap condition is still violated.
5. Swap R and P.
6. The heap condition is now satisfied by all nodes.
116
k[I]. Key contains the key of the record being inserted into an existing heap.
1. [Build Heap]
Repeat through step 7 for Q =2,3.....N
2. [Initialize construction phase]
IfQ
KEYfK[Q]
3. [Obtain parent of new record]
JfTrunc (I/2)
4. [Place new record in existing heap]
Repeat through step 6 while I>1 and KEY > K[J]
5. [Interchange record]
K[1] fK[J]
6. [Obtain next parent]
IfJ
JfTrunc (I/2)
if J <1
then Jf1
7. [Copy new record into its proper place]
K [I] fKEY
8. [Finished]
Return
DELETION FROM HEAP
When we delete any node from the tree, the child with greater value will be
promoted to replace that node.
Consider deletion of M from heap 2.
The larger of M's children are promoted to 5.
An efficient sorting method is based on the heap construction and node removal
from the heap in order. This algorithm is guaranteed to sort N elements in N log N
steps.
117
CONSTRUCTION OF HEAP
Two methods of heap construction and then removal in order from the heap to sort
the list are as follows:
Top-Down Heap Construction
Insert items into an initially empty heap, keeping the heap condition intact in all
steps.
Bottom-Up Heap Construction
Build a heap with the items in order presented.
From the right most node modify to satisfy the heap condition.
e.g.: To build a heap of the following using both methods of construction.
Let the input string be "PROFESSIONAL".
Top-Down Construction
118
Fig. 5.6
Bottom-up Construction
119
Fig. 5.7
120
b)
121
iv) Remove 'P; and store in x [9].
e)
ix) Similarly, the remaining 3 nodes are removed and the heap modified to get
the sorted list AEFILNOOPRSS.
Fig. 5.8
123
5.4 CHECK YOUR PROGRESS
5.5 SUMMARY
1. Searching and sorting are two important operations associated with any data
structure.
2. Efficient and reliable data processing depends upon sorted data.
3. The internal and external sorting methods have their relative efficiencies in
different applications.
4. There are three methods for performing searching:
a. Sequential Searching
b. Binary Searching
c. Index Sequential Searching
124
5. Binary search reduces the drawbacks of sequential search.
6. In Bubble sort in each pass each element is compared with its successor. i.e.,
x[i] with x[i+1] and interchanged if they are not in the proper order.
7. Total number of comparisons in bubble sort are O(n 2 ).
8. In quick sort an element x is chosen from a sp ecific position within the array
such that each element in position 0 through that element is less than or equal
to x and each of the elements in position greater than the position of x is
greater than or equal to x.
5.6 KEYWORDS
Sequential Search: The simplest technique for searching an unordered table for a
particular record in a sequential manner until the desired record is found.
Binary Search: The searching technique applied for searching the entries in a
sorted table.
Sorting: The operation of arranging data in some given order, such as increasing or
decreasing with numerical data or alphabetically, with character data.
Bubble Sort: A sorting technique where the file to be sorted is passed through
sequentially several times. In each pass each element is compared with its successor
and interchanged if they are not in the proper order.
Quick Sort: A divide and conquer algorithm which works by creating two problems
of half size, solving them recursively, then combining the solutions to the small
problems to get a solution to the original problem.
126
5. Yedidyan Langsam, Moshe J. Augenstein and Aaron M. Tenenbaum, “Data
Structures using C”, Prentice Hall of India Pvt. Ltd., New Delhi.
6. Mark Allen Weiss, “Data structures and Algorithm Analysis in C”,
Addison-Wesley (An Imprint of Pearson Education), Mexico City,
Prentice-Hall of India Pvt. Ltd., New Delhi.
127
Subject: Advance Data Structures
Course Code: BCA-PC(L)-323
Lesson No. 6
SEARCHING TECHNIQUES
REVISED /UPDATED SLM BY KAPILA DEVI
STRUCTURE
6.1 Learning Objectives
6.2 Introduction
6.3 Sequential Search
6.4 Binary Search
6.5 Indexed sequential Search
6.6 Check your Progress
6.7 Summary
6.8 Keywords
6.9 Self-Assessment Test
6.10 Answers to Check Your Progress
6.11 References/Suggested Reading
128
6.1 LEARNING OBJECTIVES
After studying this unit, you should be able to:
1. Describe searching techniques
2. Describe sequential, binary and index sequential search
3. Define sorting
4. Describe various types of sorting
5. Describe 'O' notation
6. Describe bubble sort and quick sort techniques
6.2 INTRODUCTION
In the previous units, we have discussed many data structures and their storage
representations. Algorithms for a number of operations such as insertion and
deletion, which are commonly performed on these structures were described in
detail. This unit discusses additional operations that are frequently perfo rmed on
data structures namely searching. We will see that efficient algorithms for these
operations can be realized when data are prop erly structured.
Information retrieval in the required format is the common activity in all computer
applications. This involves searching, sorting and merging. Searching methods are
designed to take advantage of the file organization and optimize the search for a
particular record or establish its absence.
129
/* Algorithm to search a list of values to find the required value */
Input : Array of size N. Target value T.
Output : Position of T in the list
Begin 1. Set Found to false
Set I to 0
2. While (I < = N) and (Found is false)
If List [I] = T
Found = true
Else
I = I+1
3. If Found is false
T is not present in the list
End
The first step of the algorithm initializes the key value of the sentinel record to x.
In the second step, a sequential search is performed on the n+1 reco rds. If the index
of the record found denotes record R n+1 then the search has failed; otherwise the
search is successful and contains the index of the desired record.
EFFICIENCY OF SEQUENTIAL SEARCH
Let us assume that no insertion and deletion is made i.e. we are searching through a
table of constant size n. If the record we are searching is the first one in the table,
only
one comparison is performed. If the record is the last one in the table, n
comparisons
are necessary. Hence, the average number of co mparisons done by sequential search
are:
==
Sequential search is easy to write and is efficient for short list. It does not require
sorted data. It is disastrous for long lists.
Features of Sequential Search Algorithm
1. It is used for unsorted and unordered small list of elements.
130
2. It has a time complexity of O(n), which means the time is linearly dependent
on the number of elements, which is not bad, but not that good too.
3. It has a very simple implementation.
131
low = 0;
hi = n - 1;
while (low < = hi)
{
mid = (low + hi) /2;
if (key == k (mid))
return (mid);
if (key < k (mid))
hi = mid -1;
else
low = mid+1;
}
/* end while * /
end
Each comparison in the binary search reduces the number of possible candidates by
a factor of 2. Thus, the maximum number of key comparisons is approximately
log 2 n.
132
key. If the index is one third the size of the file, every third record of the file is
represented in the index.
For any defined problem, there can be N numbe r of solution. This is true in general.
If I have a problem and I discuss about the problem with all of my friends, they will
all suggest me different solutions. And I am the one who has to decide which
solution is the best based on the circumstances.
Similarly, for any problem which must be solved using a program, there can be
infinite number of solutions. Let's take a simple example to understand this. Below
we have two different algorithms to find square of a number (for some time, forget
that square of any number n is n*n):
One solution to this problem can be, running a loop for n times, starting with the
number n and adding n to it, every time.
/*
we have to calculate the square of n
133
*/
for i=1 to n
do n = n + n
// when the loop ends n will hold its square
return n
Or, we can simply use a mathematical operator * to find the square.
/*
we have to calculate the square of n
*/
return n*n
In the above two simple algorithms, you saw how a s ingle problem can have many
solutions. While the first solution required a loop which will execute for n number
of times, the second solution used a mathematical operator * to return the result in
one line. So which one is the better approach, of course th e second one.
What is Time Complexity?
Time complexity of an algorithm signifies the total time required by the program to
run till its completion.
The time complexity of algorithms is most commonly expressed using the big O
notation. It's an asymptotic notation to represent the time complexity. We will study
about it in detail in the next tutorial.
Time Complexity is most commonly estimated by counting the number of
elementary steps performed by any algorithm to finish execution. Like in the
example above, for the first code the loop will run n number of times, so the time
complexity will be n at least and as the value of n will increase the time taken will
also increase. While for the second code, time complexity is constant, because it
will never be dependent on the value of n, it will always give the result in 1 step.
And since the algorithm's performance may vary with different types of input data,
hence for an algorithm we usually use the worst-case Time complexity of an
algorithm because that is the maximum time taken for any input size.
Calculating Time Complexity
134
Now let’s tap onto the next big topic related to Time complexity, which is How to
Calculate Time Complexity. It becomes very confusing some times, but we will try
to explain it in the simplest way.
Now the most common metric for calculating time complexity is Big O notation.
This removes all constant factors so that the running time can be estimated in
relation to N, as N approaches infinity. In general, you can think of it like this:
statement;
Above we have a single statement. Its Time Complexity will be Constant. The
running time of the statement will not change in relation to N.
For (i=0; i < N; i++)
{
statement;
}
The time complexity for the above algorithm will be Linear. The running time of
the loop is directly proportional to N. When N doubles, so does the running time.
for (i=0; i < N; i++)
{
for (j=0; j < N;j++)
{
statement;
}
}
This time, the time complexity for the above code will be Quadratic. The running
time of the two loops is proportional to the square of N. When N doubles, the
running time increases by N * N.
While (low <= high)
{
mid = (low + high) / 2;
if (target < list[mid])
high = mid - 1;
else if (target > list[mid])
135
low = mid + 1;
else break;
}
This is an algorithm to break a set of numbers into halves, to search a particular
field(we will study this in detail later). Now, this algorithm will have
a Logarithmic Time Complexity. The running time of t he algorithm is proportional
to the number of times N can be divided by 2(N is high -low here). This is because
the algorithm divides the working area in half with each iteration.
void quicksort (int list[], int left, int right)
{
int pivot = partition (list, left, right);
quicksort (list, left, pivot - 1);
quicksort (list, pivot + 1, right);
}
Taking the previous algorithm forward, above we have a small logic of Quick Sort
(we will study this in detail later). Now in Quick Sort, we divide the li st into halves
every time, but we repeat the iteration N times (where N is the size of list). Hence
time complexity will be N*log (N). The running time consists of N loops (iterative
or recursive) that are logarithmic, thus the algorithm is a combination o f linear and
logarithmic.
NOTE: In general, doing something with every item in one dimension is linear,
doing something with every item in two dimensions is quadratic, and dividing the
working area in half is logarithmic.
Types of Notations for Time Complexity
Now we will discuss and understand the various notations used for Time
Complexity.
1. Big Oh denotes "fewer than or the same as" <expression> iterations.
2. Big Omega denotes "more than or the same as" <expression> iterations.
3. Big Theta denotes "the same as" <expression> iterations.
4. Little Oh denotes "fewer than" <expression> iterations.
5. Little Omega denotes "more than" <expression> iterations.
136
Understanding Notations of Time Complexity with Example
O(expression) is the set of functions that grow slower than or at the same rate as
expression. It indicates the maximum required by an algorithm for all input values.
It represents the worst case of an algorithm's time complexity.
Omega(expression) is the set of functions that grow faster than or at the same rate
as expression. It indicates the minimum time required by an algorithm for all input
values. It represents the best case of an algorithm's time complexity.
Theta(expression) consist of all the functions that lie in both O(expression) and
Omega(expression). It indicates the average bound of an algorithm. It represents the
average case of an algorithm's time complexity.
Suppose you've calculated that an algorithm takes f(n) operations, where,
f(n) = 3*n^2 + 2*n + 4. // n^2 means square of n
Since this polynomial grows at the same rate as n2, then you could say that the
function f lies in the set Theta(n2). (It also lies in the sets O(n2) and Omega(n2) for
the same reason.)
The simplest explanation is, because Theta denotes the same as the expression.
Hence, as f(n) grows by a factor of n2, the time complexity can be best represented
as Theta(n2).
‘O’ Notation in detail
Given two functions f(n) and g(n), we say that f(n) is of the order of g(n) or that
f(n) is O(g(n)) if there exist positive integers a and b such that
f(n) a * g (n) for n b.
f(n) is O(n k ).
Let log m n and log k n be two functions. Let xm be log m n and xk be log k n then
m xm = n and k xk = n
Since if m x = n
So that log m n = x
So that m xm = k xk
Taking log m of both sides.
xm = log m (k xk )
Now it can easily be shown that log z (x y ) equals y *log Z x for any x, y and z, so
that the last equation can be written as
log m n = xk * log m k [since xm = log m n]
or as
log m n = (log m k) * log k n [since xk = log k n]
Thus log m n and log k n are constant multiples of each other.
If f(n) = c * g(n) then f(n) is O(g(n)) thus log m n is O(log k n) and log k n is O(log m n)
for any m and k.
The following facts establish an order hierarchy of functions:
C is O(1) for any constant C.
C is O(log n), but log n is not O(1).
C * log k n is O(log n) for any constant C, K.
138
C * log k n is O(n), but n is not O(log n).
1. Searching and sorting are two important operations associated with any data
structure.
2. Efficient and reliable data processing depends upon sorted data.
3. The internal and external sorting methods have their relative efficiencies in
different applications.
4. There are three methods for performing searching:
a. Sequential Searching
b. Binary Searching
c. Index Sequential Searching
5. Binary search reduces the drawbacks of sequential search.
6. In Bubble sort in each pass each element is compared with its successor. i.e.,
x[i] with x[i+1] and interchanged if they are not in the proper order.
7. Total number of comparisons in bubble sort are O(n 2 ).
8. In quick sort an element x is chosen from a specific posit ion within the array
such that each element in position 0 through that element is less than or equal
to x and each of the elements in position greater than the position of x is
greater than or equal to x.
6.8 KEYWORDS
Sequential Search: The simplest technique for searching an unordered table for a
particular record in a sequential manner until the desired record is found.
Binary Search: The searching technique applied for searching the entries in a
sorted table.
Time Complexity: Time complexity of an algorithm signifies the total time
required by the program to run till its completion.
140
6.9 SELF ASSESSMENT TEST
141
6.11 REFERENCES / SUGGESTED READINGS
1. Birkhanser-Boston, An Introduction to Data Structures and Algorithms,
Springer-New York
2. Seymour Lipschutz, “Data Structure”, Tata McGraw Hill.
3. Horowitz, Sahni & Anderson-Freed, “Fundamentals of Data Structure in C”,
Orient Longman.
4. Trembley, J.P. And Sorenson P.G., “An Introduction to Data Structures with
Applications”, McGraw- Hill International Student Edition, New York.
5. Yedidyan Langsam, Moshe J. Augenstein and Aaron M. Te nenbaum, “Data
Structures using C”, Prentice Hall of India Pvt. Ltd., New Delhi.
6. Mark Allen Weiss, “Data structures and Algorithm Analysis in C”,
Addison-Wesley (An Imprint of Pearson Education), Mexico City,
Prentice-Hall of India Pvt. Ltd., New Delhi.
142