351CS42 Data Structure
351CS42 Data Structure
UNIT - I
Fundamentals of algorithm analysis: Introduction -Big ‘O’
notations, Time and space complexity of algorithms, Elementary
data structures and their applications, Structure and Problem
Solving.
Fundamentals of algorithm:
What is an Algorithm?
An algorithm is a step-by-step recipe for solving an instance of a
problem. Every single procedure that a computer performs is an
algorithm. In this way, every step a computer takes is the result of a
single instruction in an algorithm. An algorithm is a precise
procedure for solving a problem in finite number of steps. Each
algorithm is a module designed to handle a specific problem. An
algorithm states the actions to be executed and the order in which
these actions are to be executed.
For example, on the back of every box of cake mix is an algorithm for
baking the cake. The ingredients necessary for preparing the cake
(like the cake mix, eggs, oil, and water) are listed first. Then comes a
list of instructions,
Here the problem is to bake a cake. The order in which the actions
are to be executed is important. If we have changed any order, you
will not get the required output.
Definition of algorithm:
An algorithm is a finite set of instructions that if followed
accomplishes a particular task.
An algorithm is a set of rules for carrying out calculation either
by hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a
required result.
An algorithm is a sequence of computational steps that
transform the input into the output.
An algorithm is a sequence of operations performed on data
that have to be organized in data structures.
An algorithm is an abstraction of a program to be executed on
a physical machine (model of Computation).
Algorithms are not dependent on a particular programming
language, machine, system, or compiler.
Algorithm design is all about the mathematical theory behind
the design of good programs.
2
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Deterministic algorithm:
Algorithm which has the property that the result of every
operation is uniquely defined are termed as deterministic
algorithm
3
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Complexity of Algorithm:
The various complexities of algorithms are
(1) Time complexity
(2) Space complexity
Time complexity:
The time complexity of algorithm is the amount of
computer time it needs top run to completion.
4
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Worst case
Average Case
Best case.
Worst case:
Worst case is the longest time that an algorithm will use over all
instances of size n for a given problem to produce a desired result.
Often this can be represented by a function f(n) such as f(n)=n2 or
f(n)=n log n. We write T (n) =O (f (n)) for the worst case time
complexity. Roughly, this means the algorithm will take no more than
f (n) operations. Most of an initial course on algorithm is devoted to
worst-case analysis.
Average Case:
Average case is the average time (or average space) that the
algorithm will use over all instances of size n for a given problem to
produce a desired result. It depends on the probability distribution of
instances of the problem.
Best case:
5
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Best case is the shortest time (or least space) that the algorithm
will use over all instances of size n for a given problem to produce a
desired result. Often this can be represented by a function f (n) such
as f (n) = n2 or f (n) =n log n. We write
T (n)=Ω(f(n))for the best case. Roughly, this means the algorithm will
take no less than f (n) operations. The best case is seldom interesting.
When the worst and best case performance of an algorithm are the
same we can write T(n)=ø f(n).Roughly, this says the algorithm
always uses f(n) operations on all instances of size n.
Example :
An algorithm to find the minimum and maximum
element in an array
Algorithm MaxMin ( a[],n,max,min)
Max := min := a[i]
{
for i= 2 to n do
{
if (a[i]> max) then max=a[i];
else if( a[i] <min) then min=a[i];
}
}
If the elements are in increasing order then best case occurs and the
number of comparisons is (n-1).
When the elements are in decreasing order then worst case occurs
and the number of elements compared is 2(n-1).
On the average a[i] is greater than max half the time and so the
average no. of comparisons are (3n/2)-1.
pace complexity:
The space complexity of an algorithm is the amount of
memory it needs to run the completion.
The space requirements s(p) of any algorithm is given by
s(p) = c + s(p) ( instance characteristics) where c is a constant.
Instruction space
Data Space
Environment Space
Instruction Space:
Instruction Space is the space in memory occupied by the
compiled version of the program. We consider this space as a
constant space for any value of n. We normally ignore this value,
but remember that it is there. The instruction space is independent
of the size of the problem.
Data Space:
Data Space is the space in memory, which used to hold the
variables, data structures, allocated memory, and other data
elements. The data space is related to the size of the problem.
Environment Space:
Environment Space is the space in memory used on the run time
stack for each function call. This is related to the run time stack
and holds the returning address of the previous function. The
7
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Space occupied is
Data Space: x
8
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Priori estimates:
Task of determining how much computing time and
storage an algorithm requires
Ex:
X=x+1 -> amount of time a single execution takes and the
number of times it is executed.
Posteriori Testing:
It is the process of executing a correct program on data
sets and measuring the time and space it takes to compute the
result
Asymptotic Notations:
Asymptotic Notations are methods used to estimate and
represent the efficiency of an algorithm using simple formula. This
can be useful for separating algorithms that do drastically different
amounts of work of work for large inputs.
For example,
Ø(n) Means that the rate varies directly with the size of the
input
Ø(n2)Means that the rate varies with the square of the size of
the input
Ø(nn) Means that the rate varies exponentially with the size
of the input
0(log n) Means that the rate varies with the logarithm
(power to which 2 has to be raised to get n ) of the size of the
input.
The letter ‘O’ in the word Big-oh stands for “order of” and the
word “Big” informs that we are mainly concerned with very large
values of n, i.e., only the largest term in the formula needed. The Big –
Oh notation does not attempt to provide exact running times for an
algorithm, and hence constant multiplies can be ignored when
considering an algorithm since we are concerned only with the order.
Big-Oh estimates how fast the execution time grows as the size of the
input grows.
10
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
11
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Arrays:
The simplest type of data structure is a linear (or one
dimensional) away. By linear we mean a list of a finite number of
similar data elements reference respectively by a set of consecutive
numbers, usually 1,2,3……n If we choose the name A for the away
then the elements of A are denoted by subscript notation
A (1), A(2), A(3), A(4)….. A(n)
The number k of A(K) is called a subscript and A (k) is called a
subscripted variable.
Linked List:
A linked list is a linear collection of data elements called nodes.
The linear order is represented by means of pointers.
INFO is called the information field and it consists of the data. The
link represent the pointer or (address) of the next data field or node.
The last node in the list is Null that is it does not contain an address.
Trees:
Data frequently contain a hierarchical relationship between various
elements. The data structure which reflects this relationship is called a
rooted tree graph or simply a tree.
Stack:
It is a linear list in which insertions and deletions can take place
only at one end called the top.
Queue: (FIFO)
It is a linear list in which deletion take place at the front of the list,
addition take place at the end of the list. (Rear)
Graph:
12
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Record structure:
A file is a group of records; a record is a group of data items. The
data items may be of the type.
Problem solving:
Problem solving is a creative process of finding out appropriate
(or) feasible solution for a given problem.
Problem:
It is any task or activity that is being taken for to find a solution.
Solution:
It is the method of approach identified to solve a particular
problem. The efficient approach of solving a problem is called feasible
solution.
UNIT-II
Arrays: ordered lists, representation of arrays, sparse matrices,
linked lists: singly and doubly linked lists, stacks, queues, multiples
stacks and queues, Applications: polynomial arithmetic, infix,
postfix and prefix arithmetic expression conversion and evaluations.
INTRODUCTION:
This unit introduces a data structure called Arrays. The
simplest form of array is a one-dimensional array that may be
defined as a finite ordered set of homogeneous elements, which is
stored in contiguous memory locations. For example, an array may
contain all integers or all characters or any other data type, but may
not contain a mix of data types.
13
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
14
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
String in Memory
C concedes a fact that the user would use strings very often and
hence provides a short cut for initialization of strings.
For example, the string used above can also be initialized as
char name[ ] =“sentence\n”;
Note that, in this declaration ‘\0’ is not necessary. C inserts the null
character automatically.
Multidimensional arrays are defined in the same manner as one-
dimensional arrays, except that a separate pair of square brackets is
required for each subscript. Thus a two-dimensional array will
require two pairs of square brackets; a three-dimensional array will
require three pairs of square brackets and so on.
15
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
SPARSE MATRICES:
16
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
18
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
the memory locations reserved for the array. The second column
occupies the next set and so forth. The schematic of a column major
representation is shown in Figure.
Consider the following two-dimensional array:
a b c d
e f g h
i j k l
To make its equivalent column major representation, we perform
the following process:
Transpose the elements of the array. Then, the representation will
be same as that of the row major representation.
By application of above mentioned process, we get {a, e, i, b, f, j, c, g,
k, d, h, i}
Count 1 2 3 4 5 6 7 8
11 22 33 44 55 66 77
19
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Insertion
In the array implementation of lists, elements are stored in
continuous locations. To add an element to the list at the end, we
can add it without any problem. But, suppose if we want to insert
the element at the beginning or middle of the list, then we have to
rewrite all the elements after the position where the element has to
be inserted. We have to shift (n)th element to (n+1)th position, where
‘n’ is number of elements in the list. The (n-1) th element to (n)th
position and this will continue until the (r) th element to (r+1)th
position, where ‘r’ is the position of insertion. For doing this, the
count will be incremented.
From the above example, if we want to add element ‘35’ after
element ‘33’. We have to shift 77 to 8 th position, 66 to 7th position, so
on, 44 to 5th position.
Before Insertion
Count 1 2 3 4 5 6 7
11 22 33 44 55 66 77
1 2 3 4 5 6 7 8
Step 1
Count 11 22 33 44 55 66 77 77
1 2 3 4 5 6 7 8
Step 2
Count 11 22 33 44 55 66 66 77
1 2 3 4 5 6 7 8
20
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Step 3
Count 11 22 33 44 55 55 66 77
1 2 3 4 5 6 7 8
Step 4
Count 11 22 33 44 44 55 66 77
1 2 3 4 5 6 7 8
Step 5
Count 11 22 33 35 44 55 66 77
Deletion:
To delete an element in the list at the end, we can delete it without
any problem. But, suppose if we want to delete the element at the
beginning or middle of the list, then, we have to rewrite all the
elements after the position where the element that has to be deleted
exists. We have to shift (r+1)th element to rth position, where n is the
number of elements in the list. And then the count is decremented.
Before deletion
Count 1 2 3 4 5 6 7
11 22 33 44 55 66 77
21
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
1 2 3 4 5 6 7
Step 1 11 22 33 55 55 66 77
Count
1 2 3 4 5 6 7
Step 2
11 22 33 55 66 66 77
Count
Step 3 1 2 3 4 5 6
Count 11 22 33 55 66 77
Linked lists are among the simplest and most common data
structures, and are used to implement many important abstract data
structures, such as stacks, queues, hash tables, symbolic
expressions, skip lists, and many more.
Basic concepts:
Each record of a linked list is often called an element or node.
The field of each node that contains the address of the next node is
usually called the next link or next pointer. The remaining fields are
known as the data, information, value, or payload fields.
LINKED LISTS-IMPLEMENTATION:
The Linked list is a chain of structures in which each structure
consists of data as well as pointer, which stores the address (link) of
the next logical structure in the list.
23
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Singly-linked list:
Singly-linked lists contain nodes which have a data field as well as a
next field, which points to the next node in the linked list.
Doubly-linked list :
In a doubly-linked list, each node contains, besides the next-node
link, a second link field pointing to the previous node in the
sequence. The two links may be called forward(s) and backwards.
24
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
25
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
STACK:
26
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
pop: the top item is taken from the stack, decreasing stack size by
one. In the case where there was no top item (i.e. the stack was
empty), a stack underflow occurs.
27
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
28
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The following algorithm for ADD and DELETE the item in the
Stack
End DELETE
Queue:
The queue data structure is characterised by the fact that additions
are made at the end ,or tail, of the queue while removals are made from
the front, or head, of the queue. Fort his reason, a stack is referred to as
a FIFO structure (First-In First-Out).
Operations:
30
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The following algorithms for ADD and DELETE item in the Queue:
31
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Front front + 1
Item Q(front)
End DELETEQ
A stack is generally First In, Last Out, and a queue is First In First
Out.
Item can be added or removed only at one end in stack and in a
queue insertion at the rear and deletion from the front.
The basic operation of stack are 'push' and 'pop', on other hand of
queue are 'enque' and 'dequeue‘
You can think of a queue like a line at the bank. The first person to
get there will get to the teller first. If a bunch of people come while
all the tellers are busy, they stand in line in the order in which they
arrived.
That is to say, new people (items) are added to the end of the line
and the first person in line is the only one who is called to a teller. In
real life this is known as "first come, first served." In programming
terms it's known as first-in-first-out, or FIFO.
You can think of a stack like a deck of cards. You can put down
cards into a pile on your table one at a time, but if you want to draw
cards, you can only draw them from the top of the deck one at a
time. Unlike a queue, the first card to be put down is the last card to
be used. This is known as first-in-last-out, or FILO (also called LIFO
for last-in-first-out).
32
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
INFIX: From our school times we have been familiar with the
expressions in which operands surround the operator, e.g. x+y,
6*3 etc this way of writing the Expressions is called infix
notation.
33
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Algorithm
1) Examine the next element in the input.
2) If it is an operand, output it.
3) If it is opening parenthesis, push it on stack.
4) If it is an operator, then
i) If stack is empty, push operator on stack.
ii) If the top of the stack is opening parenthesis, push
operator on stack.
Example
Suppose we want to convert 2*3/(2-1)+5*(4-1) into Postfix form:
Reversed Expression: )1-4(*5+)1-2(/3*2
Char Stack Contents(Top on Postfix
Scanned right) Expression
2 Empty 2
* * 2
3 * 23
/ / 23*
34
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
( /( 23*
2 /( 23*2
- /(- 23*2
1 /(- 23*21
) / 23*21-
+ + 23*21-/
5 + 23*21-/5
* +* 23*21-/5
( +*( 23*21-/5
4 +*( 23*21-/54
- +*(- 23*21-/54
1 +*(- 23*21-/541
) +* 23*21-/541-
Empty 23*21-/541-*+
35
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Example
Suppose we want to convert 2*3/(2-1)+5*(4-1) into Prefix form: Reversed
Expression: )1-4(*5+)1-2(/3*2
Stack Prefix
Char
Contents(Top on Expression(right to
Scanned
right) left)
) )
1 ) 1
- )- 1
4 )- 14
( Empty 14-
* * 14-
5 * 14-5
+ + 14-5*
) +) 14-5*
1 +) 14-5*1
- +)- 14-5*1
36
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
2 +)- 14-5*12
( + 14-5*12-
/ +/ 14-5*12-
3 +/ 14-5*12-3
* +/* 14-5*12-3
2 +/* 14-5*12-32
Empty 14-5*12-32*/+
Arithmetic Expressions
Algebraic C
Expression Expression
axb–c a*b–c
(m + n) (x (m + n) * (x
+ y) + y)
(ab / c) a*b/c
37
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
(x / y) + c x/y+c
Evaluation of Expressions
Expressions are evaluated using an assignment statement of
the form
Variable = expression;
.
main ()
{
float a, b, c x, y, z;
a = 9;
b = 12;
c = 3;
x = a – b / 3 + c * 2 – 1;
y = a – b / (3 + c) * (2 – 1);
z = a – ( b / (3 + c) * 2) – 1;
38
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
output
x = 10.00
y = 7.00
z = 4.00
High priority * / %
Low priority + -
39
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
40
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Explicit Conversion
Many times there may arise a situation where we want to force
a type conversion in a way that is different from automatic
conversion.
........................female students
Ratio =........-------------------
........................male students
41
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
UNIT - III
Tree structures:
Introduction:
A tree is a data structure that represents hierarchical relationship
between individual data items. A parent child relationship is
Cu
an example of a hierarchical
sto relationship in which there exists
well defined order ofmeprecedence between the two data items
being related. r
Ad
Co Nam dre Ite
de e ss m
42
DMI - St. Eugene university
Ite Ite Ite
m m m
1 2 3
054CS201 Data Structure and Algorithms
The highest level of a tree is called the root node. In fig.1 customer is
the root node. The child nodes that are directly connected to
the root node are called the child node. In fig 1 code, name,
address and items are the child nodes of the parent node
customer. The child nodes of a given parent called are siblings.
In fig 1 code, name, and items are siblings of the parent
customer.
Item
43
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Binary tree:
The most important tree structure is a binary tree. A binary tree is a
finite set of data elements arranged into a structure consisting
of a root and two disjoint sub trees, (ie) the left sub tree and
right sub tree.
E F
+ C
44
DMI - St. Eugene university
A B
054CS201 Data Structure and Algorithms
B C
D E
F G
Level 1
B C
45 Level 2
D EDMI - St. Eugene university
F G
Level 3
H I J K L M N O
054CS201 Data Structure and Algorithms
1. linear implementation
2. linked implementation
Linear implementation:
The root of the binary tree is stored in the first location of the
array. If the node is in the nth location of the array, the left child
is stored at location 2n and the right child is stored at location
(2n+1)
46
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
1 /
2 -
3 *
4 +
5 C
6 E
7 F
8 A
9 B
10
11
12
Unused Locations
13
14
15
Advantages:
1. It is simple method. Given the child node , its parent node can
be immediately identified and vice versa.
2. This method can be implemented easily, in older languages and
also in which only static memory allocation is directly
available.
Disadvantages:
Unused locations cause wastage of memory
47
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
- *
+ C E F
A B
Disadvantages:
i) Memory is wasted in the name of NULL pointers
ii) It is difficult to determine the parent node, if the child
node is given and vice versa
48
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Preorder traversals:
The three steps involved in the preorder traversals are
These ordered steps are repeated recursively, until all the nodes are
processed.
Pseudo algorithm:
In order traversal:
The three steps involved in the in order traversal are
1. Process the left sub tree
2. Process the root node
3. Process the right sub tree
Pseudo algorithm:
Pseudo algorithm:
CALL POSTORDER(RIGHT(ROOT))
CALL TRAVERSE(DATA(ROOT))
RETURN
END POSTORDER
The post order traversal of the binary tree in which the expression is
stored in infix notations results in postfix notation of the given
expression.
Example
51
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The following fig shows the threaded binary tree with a head node.
LINK
LINK
PTR
PTR
TR
TL
R
L
1 HEA 1
D
1 / 1
1 - 1 1 * 1
1 + 1 0 C 0 0 E 0 0 F 0
0 A 0 0 B 0
Operations on a tree:
Node insertion in a tree:
PROCEDURE TREE
INSERT(ROOT,DATA,LLINK,RLINK,AVAIL,N,LINK,PTR)
VAR N, DATA(N),RLINK(N),ROOT,AVAIL,ITEM :INTEGER
52
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
READ ITEM;
ROOT: = PTR
CALL GETNODE(AVAIL, LINK, NN, PTR)
DATA(ROOT) := ITEM
LLINK(ROOT) :=NULL
RLINK(ROOT):=NULL
FOR I = 2 TO N DO
READ ITEM
CALL GETNODE(AVAIL, LINK, NN, PTR)
DATA(PTR) := ITEM
CASE 1:
Link(p) having no right child
Non-null left sub tree of the node to be deleted
No right child for llink(p)
Check for the greatest node in the left sub tree
NX: =NP
NP: =LLINK (NX)
RLINK (NP):=RLINK (NX)
CALL RETURNNODE (NX)
CASE 2:
The node to be deleted has a right child, but no left child
NX :=NP
NP:=RLINK(NX)
CALL RETURNNODE(X)
CASE 3:
The node to be deleted has no child( root node)
NX := NP;
NP := NULL
CALL RETURN(NX)
GRAPHS:
INTRODUCTION:
54
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Types of graph:
The different types of graph are
i. directed graph
ii. undirected graph
Directed graphs:
A directed graph is a structure in which the edges between
different nodes are directionally oriented. These connecting
edges are called directed edges. A directed edge is also known
as an arc. A directed graph is also known as graph.
Out degree:
The outdegree of a node in a directed graph is the number of
directed edges existing from or coming out of the node
Indegree:
The indegree of a node in a directed graph is the number of
directed edges entering or coming towards the nodes.
Sink node:
55
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Representations of Graphs
Adjacency Matrices
Adjacency matrices require O(|V |2) space, and so they are space-
efficient only when they are dense (that is, when the graphs have
many edges). Time-wise, the adjacency matrices allow easy addition
and deletion of edges.
56
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Adjacency Lists
Traversal
57
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Algorithm:-
Step-1: Set the starting point of traversal and push it inside the stack
Step-2: Pop the stack and add the popped vertex to the list of visited
vertices
Step-3: Find the adjacent vertices for the most recently visited
vertex( from the Adjacency Matrix)
Step-4: Push these adjacent vertices inside the stack (in the increasing
order of their depth) if they are not visited and not there in the stack
already
Step-5: Go to step-2 if the stack is not empty
Graph Algorithms
Minimal spanning trees_shorest paths-Analysis.
58
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
PRIM’S ALGORITHM
Definition :Minimum spanning tree(MST)
A spanning tree for a connected undirected graph G
=(V,E) is sub graph t =(V,E’)if t is a tree.
A B
A B A A B
B
C D C D C D
C D
(a (b (c (d
(a) )G =(V,E) ) (b,c, & d) =some) of example for spanning
)
trees
n weighted Graph G =(V,E,W) ,the weight of a sub graph is the sum
of the weighted of the edges in the subgraph.A MST for a
weighted graph is spanning tree with minimum weight.
A B
A B
A
B
B
D
D
C
C D C D
C
(a (b (c (d
) ) ) )
59
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
(a)G =(V,E,W)
(b) and (c) are minimum spanning trees (W=6)when comparing to (d)
(W=7)
The Strategy:
1. Prim’s algorithm selects a starting vertex from the given
graph and classifies the start vertex under “tree
vertices”.
2. The nodes adjacent to tree vertices are identified and
classified under “fringe vertices “,and the remaining
vertices are classified as “unseen vertices”.
3. the key step of the algorithm is selection of new vertex from
the “fringe” and an incident egde,which is now added to “tree
vertices”/After this new vertex from the “fringe” is depends
on the weight of the edge process continues until the “fringe”.
Note:
a) Tree vertices : in the tree constructed so far.
b) Fringe vertices :not in the tree ,but adjacent to
some vertex in the tree.
c) Unseen vertices: all other
Algorithm:
PrimMST (G,n) //out line
Initialize all vertices as unseen.
Select an arbitrary vertex s to start the tree; reclassify it as
tree.
Reclassify all vertices adjunct to s as fringe.
While there are fringe vertices
Select an edge of minimum
Weight between a tree vertex t and a fringe vertex v;
Reclassify v as tree; add edge tv to the tree;
Reclassify all unseen vertices adjunct to v as fringe.
Example:
60
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
This is our original weighted graph. The numbers near the edges
61
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
62
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Here, the only vertices available are C and G. C is 5 away from E, and
G is 9 away from E. C is chosen, so it is highlighted along with
the arc EC.
63
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Now all the vertices have been selected and the minimum spanning
tree is shown in green. In this case, it has weight 39.
Return F;
Definition : forest
Example:
65
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
This is our original graph. The numbers near the arcs indicate their
weight. None of the arcs are highlighted
AD and CE are the shortest arcs, with length 5, and AD has been
arbitrarily chosen, so it is highlighted
66
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
CE is now the shortest arc that does not form a cycle, with length 5,
so it is highlighted as the second arc
The next arc, DF with length 6, is highlighted using much the same
method
67
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
68
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Finally, the process finishes with the arc EG of length 9, and the
minimum spanning tree is found
Dijkstra ‘s Algorithm
Dijkstra’s Algorithm is used to find the minimum weight path
from a specified source vertex to every other vertex in a weight
directed or undirected graph. the weight of a path is the sum
of the weights on the edges in that path.
when weight is interpreted as distance, a minimum weight path is
called as shortest path .this algorithm requires that edge
69
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The strategy:
1. Dijkstra’s algorithm selects a staring vertex from the given
graph and classified the start vertex under “tree vertices”.
2. The nodes adjunct to tree vertices are identified and
classified under “fringe vertices” and the remaining vertices
are classified as “unseen vertices”.
3. The key step of the algorithm is selection of new vertex and
an incident edge from the “fringe” which is now added to
“tree vertices”. After this new inclusion reclassified.the
selection of new vertex from the “fringe” is depends on the
total minimum weight of the edge from the start vertex s to
current vertex z.
Algorithm:
dijktraSP(G.n) //out line
Initialize all vertices as unseen
Start the tree with the specified source vertex s;reclassify
it as tree.
Define d(s,s) =0
Reclassify all vertices adjacent to s as fringe .
While there are fringe vertices.
Select an edge between a tree vertex t and a
Fringe vertex v such that (d(s,t) + W(tv) is
minimum;
Reclassify v as tree; add edge tb to the tree;
Define d(s,v) = (d(s,t) + W (tv))
Reclassify al unseen vertices adjacent to v as
fringe.
An Example
So far I've listed the instructions that make up the algorithm. But
to really understand it, let's follow the algorithm on an example.
We shall run Dikjstra's shortest path algorithm on the following
graph, starting at the source vertex a.
70
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Step :1
Step:2
71
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
step:4
73
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
UNIT IV
Searching & Sorting: Binary Search tree, Insertion & Deletion, AVL
Trees, Hash Function, Hash Table, internal sort : Radix sort,
insertion sort, exchange sort, selection sort, quick sort, shell sort,
merge sort, heap sort, external sort: k-way merge sort, balanced
merge sort, polyphase merge sort.
Searching:
56
26 200
18 28 190 213
12 24 27
Both the left and right subtrees must also be binary search
trees.
The left subtree of a node contains only values less than the
node's value.
74
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The major advantage of binary search trees over other data structures
is that the related sorting algorithms and search algorithms such as
in-order traversal can be very efficient.
Operations
Searching
We begin by examining the root node. If the tree is null, the value we
are searching for does not exist in the tree. Otherwise, if the value
equals the root, the search is successful. If the value is less than the
root, search the left subtree. Similarly, if it is greater than the root,
search the right subtree. This process is repeated until the value is
found or the indicated subtree is null. If the searched value is not
found before a null subtree is reached, then the item must not be
present in the tree.
75
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Insertion
Insertion begins as a search would begin; if the root is not equal to the
value, we search the left or right subtrees as before. Eventually, we
will reach an external node and add the value as its right or left child,
depending on the node's value. In other words, we examine the root
and recursively insert the new node to the left subtree if the new
value is less than the root, or the right subtree if the new value is
greater than or equal to the root.
76
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
if key == node.key:
return TreeNode(node.left, key, value, node.right)
if key < node.key:
return TreeNode(binary_tree_insert(node.left, key, value),
node.key, node.value, node.right)
else:
return TreeNode(node.left, node.key, node.value,
binary_tree_insert(node.right, key, value))
Deletion
Some examples of Binary Search Tree insertions and deletions. There
are several cases to be considered:
Deleting a leaf: Deleting a node with no children is easy, as we
can simply remove it from the tree.
Deleting a node with one child: Delete it and replace it with its
child.
Deleting a node with two children: Suppose the node to be
deleted is called N. We replace the value of N with either its in-
order successor (the left-most child of the right subtree) or the
in-order predecessor (the right-most child of the left subtree).
77
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
AVL TREES:
The balance factor of a node is the height of its right subtree minus
the height of its left subtree and a node with balance factor 1, 0, or -1
is considered balanced. A node with any other balance factor is
considered unbalanced and requires rebalancing the tree. The balance
factor is either stored directly at each node or computed from the
heights of the subtrees.
Operations:
Insertion:
Pictorial description of how rotations cause rebalancing in an AVL
tree.
Insertion into an AVL tree may be carried out by inserting the given
value into the tree as if it were an unbalanced binary search tree, and
then retracing one's steps toward the root updating the balance factor
of the nodes.
If the balance factor becomes -1, 0, or 1 then the tree is still in AVL
form, and no rotations are necessary.
If the balance factor becomes 2 or -2 then the tree rooted at this node
is unbalanced, and a tree rotation is needed. At most a single or
double rotation will be needed to balance the tree.
The other two cases are identical to the previous two, but with the
original balance factor of -2 and the left subtree outweighing the right
subtree.
Only the nodes traversed from the insertion point to the root of the
tree need be checked, and rotations are a constant time operation, and
because the height is limited to O(log(n)), the execution time for an
insertion is O(log(n)).
80
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Deletion:
If the node is a leaf, remove it. If the node is not a leaf, replace it with
either the largest in its left subtree (inorder predecessor) or the
smallest in its right subtree (inorder successor), and remove that
node. The node that was found as replacement has at most one
subtree. After deletion retrace the path back up the tree (parent of the
replacement) to the root, adjusting the balance factors as needed.
Hash Function
insert, retrieve, update, delete all start by applying the hash function
to the key
fast to compute
even distribution
Hash Table:
A hash table, or a hash map, is a data structure that
associates keys with values. The primary operation it supports
efficiently is a lookup: given a key (e.g., a person's name), find the
corresponding value (e.g., that person's telephone number). It works
by transforming the key using a hash function into a hash, a number
that is used as an index in an array to locate the desired location
("bucket") where the values should be.
Basic operation:
Collision resolution:
The problem arises because we have two keys that hash in the same
array entry, a collision. There are two ways to resolve collision:
83
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Since the intent of hashing is to create unique hash (or location) for
each key, when the hash function produces the same hash for
multiple keys an alternate location must be determined. The process
of finding an alternate location is called collision resolution. A
collision resolution strategy ensures future key lookup operations
return the correct, respective records.
Separate chaining:
Sorting Techniques:
84
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Introduction:
Sorting is a technique that is used to rearranging the unordered list
either ascending or descending order. The sorting algorithms are very
useful in our real world applications.
Internal Sort:
Data are assumed to be in the high speed,r andom access
memory.
External Sort:
Algorithm for sorting large set of data stored on external
devise, that is slower storage.
Different between internal & external sort
85
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Radix Sort:
A least significant digit (LSD) radix sort operates in O(nk) time, where
n is the number of keys, and k is the average key length. You can get
86
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
In other words:
1. Take the least significant digit (or group of bits, both being
examples of radices) of each key.
2. Group the keys based on that digit, but otherwise keep the
original order of keys. (This is what makes the LSD radix sort a
stable sort).
3. Repeat the grouping process with each more significant digit.
The sort in step 2 is usually done using bucket sort or counting sort,
which are efficient in this case since there are usually only a small
number of digits.
An example
The first counting pass starts on the least significant digit of each key,
producing an array of bucket sizes:
A second counting pass on the next more significant digit of each key
will produce an array of bucket sizes:
A third and final counting pass on the most significant digit of each
key will produce an array of bucket sizes:
6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)
1 (bucket size for digits of 1: 170)
1 (bucket size for digits of 8: 802)
At least one LSD radix sort implementation now counts the number
of times that each digit occurs in each column for all columns in a
88
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
single counting pass. (See the external links section.) Other LSD radix
sort implementations allocate space for buckets dynamically as the
space is needed.
Insertion Sort:
simple implementation
efficient for (quite) small data sets
efficient for data sets that are already substantially sorted: the
time complexity is O(n + d), where d is the number of
inversions
stable (i.e., does not change the relative order of elements with
equal keys)
89
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Algorithm
Pseudo code of the complete algorithm follows, where the arrays are
zero-based and the for-loop includes both the top and bottom limits
(as in Pascal):
insertionSort(array A)
90
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
begin
for i := 1 to length[A]-1 do
begin
value := A[i];
j := i-1;
while j ? 0 and A[j] > value do
begin
A[j + 1] := A[j];
j := j-1;
end;
A[j+1] := value;
end;
end;
Exchange Sorting:
91
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
9 5 7 2
5 9 7 2
5 7 9 2
5 7 2 9
5 2 7 9
2 5 7 9
Selection Sort:
92
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Method:
Effectively, we divide the list into two parts: the sub list of items
already sorted, which we build up from left to right and is found at
the beginning, and the sub list of items remaining to be sorted,
occupying the remainder of the array.
64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64
Quick Sort
Quick sort also referred as partition Exchange sort was
developed by C.A.R.Hoare. It is a sorting algorithm, which performs
very well on larger list than any other sorting methods.
The main idea of the quick sort is to divide the initial unsorted list
into two parts, such that every element in the first list is less than all
93
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
the elements present in the second list. The procedure will repeat
both parts which can be sorted until the sequences reduce to length 1.
The first step of the algorithm requires choosing a pivot value that
will be used to divide big and small numbers. Usually the first
element of the list is chosen as the pivot value. Once the pivot value
has been selected, all elements smaller than the pivot are placed
towards the beginning of the set and all the elements larger than the
pivot are placed to the right.
Recursion steps used in quick sort
1. Choose a pivot value. We take the value of the middle element
as pivot value, but it can be any value, which is in range of
sorted values, even if it doesn't present in the array.
2. Partition. Rearrange elements in such a way, that all elements
which are lesser than the pivot go to the left part of the array
and all elements greater than the pivot, go to the right part of
the array. Values equal to the pivot can stay in any part of the
array. Notice, that array may be divided in non-equal parts.
3. Sort both parts. Apply quick sort algorithm recursively to the
left and the right parts.
Quick(arr,first,Last)
Where arr is an array of n elements, first refers to the first
index of the array and last refers to the of the array.
if first < last then
assign pivot =arr[first]. I = first j=last
repeat while I <j
repeat while arr[i]<=pivot and I <last
increment I by 1
[end of step 4 while loop]
repeat while arr[j]<=pivot and j > first
decrement j by 1
[end of step 7 while loop]
if I < j then
interchange arr[i] and arr[j]
[end of step 10 if loop]
.[end of step 3 while loop]
94
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Call Quick(arr,first,j-1)
.Call quick(arr,j+1,last)
.[end of step 1 if loop]
.Print the sorted array arr
EXAMPLE
20 14 33 __ 75 96 51 84 62
20 14 33 45 75 96 51 84 62
20 14 33 45 75 96 51 84 62
20 14 33 45 __ 96 51 84 62
20 14 33 45 62 96 51 84 __
20 14 33 45 62 __ 51 84 96
20 14 33 45 62 51 __ 84 96
20 14 33 45 62 51 75 84 96
95
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Now compare the each value if that left element greater than right
element ,interchange the those values.
20 14 33 45 62 51 75 84 96
20 14 33 45 62 51 75 84 96
Now compare 62 > 51 ,the value should be interchange
14 20 33 45 51 62 75 84 96
Shell Sort:
Advantage of Shell sort is that it’s only efficient for medium size
lists. For bigger lists, the algorithm is not the best choice. Fastest of
all O (N^2) sorting algorithms.
5 times faster than the bubble sort and a little over twice as fast as
the insertion sort, its closest competitor
13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10
10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char s[255];
printf("Enter a string:");
gets(s);
shell(s, strlen(s));
printf("The sorted string is: %s.\n", s);
return 0;
}
inc ← round(n/2)
while inc > 0 do:
for i = inc .. n − 1 do:
temp ← a[i]
j←i
while j ≥ inc and a[j − inc] > temp do:
a[j] ← a[j − inc]
j ← j − inc
a[j] ← temp
inc ← round(inc / 2.2)
98
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Merge Sort:
Merge sort is an O (n log n) comparison-based sorting algorithm. In
most implementations it is stable, meaning that it preserves the input
order of equal elements in the sorted output. It is an example of the
divide and conquer algorithmic.
Step:1
99
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Step:2
Step:3
Step:4
Step:5
100
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Step:6
Step:7
Step:8
Step9:
101
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Pseudo-code
for i ← 0 to n-2 do
min ← i
for j ← (i + 1) to n-1 do
if A[j] < A[min]
min ← j
swap A[i] and A[min]
HEAP SORT:
A heap is defined to be a complete binary tree with the
property that the value of each node is at least as large as the value of
its child nodes, if they exist. The root node of the heap has the largest
value in the tree. There are three types of heaps.
They are
Min heap
Max heap
A max heap also referred also as descending heap of size n is defined
as a complete binary tree of n node such that the content of each node
is less than or equal to the contents of its parent node.
A min heap also referred as ascending heap of size n is defined as a
complete binary tree of n nodes such that the content of each node is
greater than or equal to the contents of its parent node.
102
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Step 1:
Construct the heap:
4
0
103
DMI
9 - St. Eugene
3 university
0 5
9 4 5 7
054CS201 Data Structure and Algorithms
4
0
Step 2 9 3
0 5
9 4 5 7
0 5 0 0
Step 3
9
0
4 3
0 5
Step 4 9 4 5 7
0 5 0 0
9
0
9 3
0 5
104
4 DMI -4St. Eugene
5 university
7
0 5 0 0
054CS201 Data Structure and Algorithms
Step 5
9
0
9
0 7
0
4 4 5 3
5 0 0 5
4
0
9 3
0 5
9 4 5 7
0 5 0 0
Step 2
105
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
4
0
9 3
0 5
9 4 5 7
0 5 0 0
step3
3
5
9 4
0 0
9 4 5 7
Step4 0 5 0 0
3
5
9 4
0 0
9 4 5 7
0 5 0 0
Step5
3 106
5 Eugene university
DMI - St.
4 4
9 9 5 7
5 0
054CS201 Data Structure and Algorithms
External sort:
In each of the merge passes, the length of the strings are increased by
a factor of k and number
107
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
A B C D De s c riptio n
0 0 16(1) 16(1) Dis tribution pa s s
Polyphase merge sorts are ideal for sorting and merging large files.
Two pairs of input and output files are opened as file streams. At the
end of each iteration, input files are deleted; output files are closed
and reopened as input files. The use of file streams makes it possible
to sort and merge files which can not be loaded into the computer's
main memory.
Strings of level
Level Distribution
1 1,1,1
2 2,2,1
3 4,3,2
4 7,6,4
5 13,11,7
……. …….
A B C D Comments
13(1) 11(1) 7(1) 0 Dis tribution Pas s
UNIT – V
SYLLABUS:
Files:
File is a collection of records and lines
At the lowest level, many modern operating systems consider files simply
as a one-dimensional sequence of bytes. At a higher level, where the
content of the file is being considered, these binary digits may represent
integer values, text characters, image pixels, audio or anything else.
At any instant in time, a file might have a size, normally expressed as
number of bytes, that indicates how much storage is associated with the file.
DBMS unix
DBMS
File System
110 512-byte
Hard disk DMI - St. Eugene university pages
Hard disk
054CS201 Data Structure and Algorithms
File Organization
For example,
A payroll file might contain information concerning all the employees in a
company and their payroll details; each record in the payroll file concerns
just one employee, and all the records have the common trait of being
related to payroll—this is very similar to placing all payroll information into a
specific filing cabinet in an office that does not have a computer. A text file
may contain lines of text, corresponding to printed lines on a piece of paper.
Alternatively, a file may contain an arbitrary binary image (a BLOB) or it may
contain an executable.
File operations:
1. Creation
2. Update includes record insertion, modification and deletion
3. Retrieval includes inquiry, report generation.
111
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Queries
Software that translates a users requests (phrased in a query
language) into instructions that can be used directly for file or
database access.
Query will search for words/ phrases in the selected databases.
A request for information from a database.
Query types
4 types are used
1. Simple query – the value of a single key is specified
2. Range query – a range of values for a single query is specified
3. Functional query – some function of key values in the file is specified
4. Boolean query – a Boolean combination of Q1-Q3 being logical operators
and or not.
Sequential organization:
In a Sequential file the records are arranged serially, one after another, like
cards in a dealing shoe. The only way to access records in a Sequential file,
is serially. A program must start at the first record and read all the
succeeding records until the required record is found or until the end of the
file is reached.
112
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Beginning of file
Record I-1
End of file
113
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
If the equation of a surface does not contain one of the variables (x,
y, or z) then the surface is a cylinder with elements parallel to the axis
of that variable and having the curve of intersection by a plane
orthogonal to the axis of that variable as directrix curve. A cylinder
with axis parallel to the z-axis can be described by parametric
114
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Cylinder
Disk Addressing
Fundamental unit (block) of information is the sector (generally a
power of 2 in size)
Sectors are arranged on tracks on a platter
If multiple platters, we organize the tracks into cylinders
We may also organize groups of cylinders to make partitions
File systems work in terms of logical blocks, So one lower level issue
on mass storage devices is the mapping of logical block address to physical
blocks Platter #, Cylinder # (Track #), Sector #
HASH INDEXED
• A hash index organizes the search keys, with their pointers, into a
hash file.
• Hash indices never primary even though they provide direct access.
Example of Hash Index
Dynamic Hashing
• More effective then static hashing when the database grows or
shrinks
• Extendable hashing splits and coalesces buckets appropriately with
the database size.
– i.e. buckets are added and deleted on demand.
Data Structure
• Hash indices are typically a prefix of the entire hash value.
• More then one consecutive index can point to the same bucket.
– The indices have the same hash prefix which can be shorter then the
length of the index.
116
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
TREE INDEXING:
117
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
B-Trees
Introduction
Definitions
A multiway tree of order m is an ordered tree where each node has at most
m children. For each node, if k is the actual number of children in the node,
then k - 1 is the number of keys in the node. If the keys and subtrees are
arranged in the fashion of a search tree, then this is called a multiway
search tree of order m.
For example, the following is a multiway search tree of order 4. Note that
the first row in each node shows the keys, while the second row shows the
pointers to the child nodes. Of course, in any useful application there would
be a record of data associated with each key, so that the first row in each
node might be an array of records where each record contains a key and its
associated data. Another approach would be to have the first row of each
node contain an array of records where each record contains a key and a
record number for the associated data record, which is found in another file.
This last method is often used when the data records are large. The
example software will use the first method.
118
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Then a multiway search tree of order 4 has to fulfill the following conditions
related to the ordering of the keys:
Note that if less than the full number of keys are in the Node,
these 4 conditions are truncated so that they speak of the
appropriate number of keys and branches.
This generalizes in the obvious way to multiway search trees with other
orders.
Example B-Tree
119
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Operations on a B-Tree
When inserting an item, first do a search for it in the B-tree. If the item is not
already in the B-tree, this unsuccessful search will end at a leaf. If there is
room in this leaf, just insert the new item here. Note that this may require
that some existing keys be moved one to the right to make room for the new
item. If instead this leaf node is full so that there is no room to add the new
item, then the node must be "split" with about half of the keys going into a
new node to the right of this one. The median (middle) key is moved up into
the parent node. (Of course, if that node has no room, then it may have to
be split as well.) Note that when adding to an internal node, not only might
we have to move some keys one position to the right, but the associated
pointers have to be moved right as well. If the root node is ever split, the
median key moves up into a new root node, thus causing the tree to
increase in height by one.
Let's work our way through an example similar to that given by Kruse. Insert
the following letters into what is originally an empty B-tree of order 5: C N G
A H E K Q M F W L T Z D P R X Y S Order 5 means that a node can have a
maximum of 5 children and 4 keys. All nodes other than the root must have
a minimum of 2 keys. The first 4 letters get inserted into the same node,
resulting in this picture:
When we try to insert the H, we find no room in this node, so we split it into
2 nodes, moving the median item G up into a new root node. Note that in
practice we just leave the A and C in the current node and place the H and
N into a new node to the right of the old one.
120
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Inserting M requires a split. Note that M happens to be the median key and
so is moved up into the parent node.
121
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
The letters F, W, L, and T are then added without needing any split.
Deleting an Item
In the B-tree as we left it at the end of the last section, delete H. Of course,
we first do a lookup to find H. Since H is in a leaf and the leaf has more than
the minimum number of keys, this is easy. We move the K over where the H
had been and the L over where the K had been. This gives:
122
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Next, delete the T. Since T is not in a leaf, we find its successor (the next
item in ascending order), which happens to be W, and move W up to
replace the T. That way, what we really have to do is to delete W from the
leaf, which we already know how to do, since this leaf has extra keys. In
ALL cases we reduce deletion to a deletion in a leaf, by using this method.
Next, delete R. Although R is in a leaf, this leaf does not have an extra key;
the deletion results in a node with only one key, which is not acceptable for
a B-tree of order 5. If the sibling node to the immediate left or right has an
123
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
extra key, we can then borrow a key from the parent and move a key up
from this sibling. In our specific case, the sibling to the right has an extra
key. So, the successor W of S (the last key in the node where the deletion
occurred), is moved down from the parent, and the X is moved up. (Of
course, the S is moved over so that the W can be inserted in its proper
place.)
124
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Of course, you immediately see that the parent node now contains only one
key, G. This is not acceptable. If this problem node had a sibling to its
immediate left or right that had a spare key, then we would again "borrow" a
key. Suppose for the moment that the right sibling (the node with Q X) had
one more key in it somewhere to the right of Q. We would then move M
down to the node with too few keys and move the Q up where the M had
been. However, the old left subtree of Q would then have to become the
right subtree of M. In other words, the N P node would be attached via the
pointer field to the right of M's new location. Since in our example we have
no way to borrow a key from a sibling, we must again combine with the
sibling, and move down the M from the parent. In this case, the tree shrinks
in height by one.
125
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Advantages b -tree
TRIE INDEXING
The term trie comes from "retrieval." Following the etymology, the inventor,
Edward Fredkin, pronounces it [tɹi] ("tree"). However, it is pronounced [tɹaɪ]
("try") by other authors.
In the example shown, keys are listed in the nodes and values below them.
Each complete English word has an (arbitrary) integer value associated with
it. A trie can be seen as a deterministic finite automaton, although the
symbol on each edge is often implicit in the order of the branches.
It is not necessary for keys to be explicitly stored in nodes. (In the figure,
words are shown only to illustrate how the trie works.)
One method is to prune from the tree all the branches that do not lead to
any key. In english, for example, there are no words that begin with the
letters ‘bb’, ‘bc’, ‘bf’,’bg’,….., but there are words beginning with ‘ba’, ‘bd’, or
126
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
‘be’. Hence all the brances and nodes for nonexistent words can be
removed from the tree. The resulting tree is called a trie. (This term
originated as letters extracted from the word retrieval, but it is usually
pronounced like the word “try”.
A trie is an m-ary tree .It s not a search tree of order m; that is, the
ordering of key values in the nodes does not conform to the rules of an m-
way search tree. The order of a trie is determined by the radix used to
represent key values. The radix of key values represented by digits is 10;
the radix of key values represented by alphabetic characters is 26 (or 27 if
the space is included).Each node of a trie of order m is essentially a one
dimensional array of m pointers. Each element in the array corresponds to
one of the elements of the radix set. A node in a 10-ary trie is shown in Fig.
The height of a trie is determined by the length of the key field. For a
node on the jth level of a 10 ary trie, Pi points to a sub tree representing all
key values whose jth digit is i.For example, P4 on the sixth level of a 10 –ary
trie points to a sub tree representing all key values whose sixth digit is 4.
127
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Example
Fig. illustrates part of a trie for structuring a collection of 4 digit key values.
This example trie represents the following collection of key values:
32xx………
35xx………
52xx………
59xx……
When a trie is used as an index, the leaf nodes contain addresses of data
records with the corresponding key values. A trie can also be used to
represent a collection of values, without a file of associated data. In that
case, the leaf nodes would simply contain placeholders indicating the
presence or absence of a corresponding value.
128
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Algorithms
We can describe trie lookup (and membership) easily. Given a recursive trie
type, storing an optional value at each node, and a list of children tries,
indexed by the next character, (here, represented as a Haskell data type):
data Trie a =
Trie { value :: Maybe a
, children :: [(Char,Trie a)] }
129
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Direct addressing -> Equal size records the available disk space is divided
out into nodes large enough to hold a record with primary key =Employee#,
the record for EMP .
Record B
510
620 Record D
130
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
750
Record A
This is very similar to the scheme for the case of direct addressing with
variable size records.
A sequential file is accessed one record at a time, from first to last, in order.
Each record can be of varying length.
A direct (or random) file is accessed in any order, by record number. Each
record must be of identical length.
A sequential file might look like this, with records separated by commas:
bear,skunk,moose,fish,alligator
A direct file with the same data would look (something) like this:
bear_____skunk____moose____fish_____alligator
Hashing is important not only for addressing into a relative file, but also for
addressing into main memory tables. The hash function is applied to record
keys to calculate the subscripts of corresponding records in the table. Thus
131
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
a record can be found with one probe (unless a collision) ; no table search is
needed.
Yes
Apply hash function to
Access record record’s key
Resolve collisions if
necessary
Store record
Static Hashing
• Hashing provides a means for accessing data without the use of an
index structure.
• Data is addressed on disk by computing a function on a search key
instead.
Organization
132
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Inverted files
o Providing the linkage because an index & the file of data records are
called inversion.
o It is similar to multilists.
The difference is that which in multilists record with the same key value are
linked together with link
In the case of inverted files this link in for is kept in the index itself
133
DMI - St. Eugene university
054CS201 Data Structure and Algorithms
Cellular partitions
134
DMI - St. Eugene university