0% found this document useful (0 votes)
223 views

351CS42 Data Structure

The document discusses algorithms and their analysis. It defines an algorithm as a step-by-step procedure to solve a problem in a finite number of steps. Algorithms must be unambiguous, finite, and effective. The types of algorithms include recursive, iterative, heuristic, deterministic, and non-deterministic. The time and space complexity of algorithms are also discussed, with time complexity analyzing how the running time grows relative to the input size. Time complexity is classified as worst-case, average-case, and best-case.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views

351CS42 Data Structure

The document discusses algorithms and their analysis. It defines an algorithm as a step-by-step procedure to solve a problem in a finite number of steps. Algorithms must be unambiguous, finite, and effective. The types of algorithms include recursive, iterative, heuristic, deterministic, and non-deterministic. The time and space complexity of algorithms are also discussed, with time complexity analyzing how the running time grows relative to the input size. Time complexity is classified as worst-case, average-case, and best-case.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 134

054CS201 Data Structure and Algorithms

UNIT - I
Fundamentals of algorithm analysis: Introduction -Big ‘O’
notations, Time and space complexity of algorithms, Elementary
data structures and their applications, Structure and Problem
Solving.

Fundamentals of algorithm:

What is an Algorithm?
An algorithm is a step-by-step recipe for solving an instance of a
problem. Every single procedure that a computer performs is an
algorithm. In this way, every step a computer takes is the result of a
single instruction in an algorithm. An algorithm is a precise
procedure for solving a problem in finite number of steps. Each
algorithm is a module designed to handle a specific problem. An
algorithm states the actions to be executed and the order in which
these actions are to be executed.

An algorithm is a well-ordered collection of clear and simple


instructions of definite and effectively computable operations that,
when executed, produces a result and stops executing at some point
in a finite amount of time rather than just going on and on infinitely.
The various steps in developing algorithms are,

 Devising the algorithm


 Validating the algorithm
 Expressing the algorithm
 Determination of complexity of the algorithm.

For example, on the back of every box of cake mix is an algorithm for
baking the cake. The ingredients necessary for preparing the cake
(like the cake mix, eggs, oil, and water) are listed first. Then comes a
list of instructions,

 Preheat cake oven to 200 degrees.


 Mix ingredients in a bowl with a spoon for 20
minutes or until moist.
 Pour into greased 10-by-10 inch pan.
1
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

 Bake for 30 minutes.


 Cool for 15 minutes before eating.

Here the problem is to bake a cake. The order in which the actions
are to be executed is important. If we have changed any order, you
will not get the required output.

Definition of algorithm:
 An algorithm is a finite set of instructions that if followed
accomplishes a particular task.
 An algorithm is a set of rules for carrying out calculation either
by hand or on a machine.
 An algorithm is a finite step-by-step procedure to achieve a
required result.
 An algorithm is a sequence of computational steps that
transform the input into the output.
 An algorithm is a sequence of operations performed on data
that have to be organized in data structures.
 An algorithm is an abstraction of a program to be executed on
a physical machine (model of Computation).
 Algorithms are not dependent on a particular programming
language, machine, system, or compiler.
 Algorithm design is all about the mathematical theory behind
the design of good programs.

Why study algorithm design?

 Programming is a very complex task, and there are a number


of aspects of programming that make it so complex. The first is
that most programming projects are very large, requiring the
coordinated efforts of many people. (This is the topic a course
like software engineering.)
 The next is that many programming projects involve storing
and accessing large quantities of data efficiently. (This is the
topic of courses on data structures and databases.)
 The last is that many programming projects involve solving
complex computational problems, for which simplistic or

2
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

naive solutions may not be efficient enough. The complex


problems may involve numerical data (the subject of courses
on numerical analysis), but often they involve discrete data.
This is where the topic of algorithm design and analysis is
important.

In addition all algorithms must satisfy the following criteria:

(1) Input : (zero or more quantities are externally supplied)

(2) Output : (at least one quantity is produced)

(3) Definiteness: Each instruction is clear and unambiguous

(4) Finiteness: The algorithm terminates after a finite number of


steps.

(5) Effectiveness: the algorithm should be more effective

Classification of algorithms or types of algorithms:

(1) Recursive algorithm:


An algorithm is said to be recursive if the same
algorithm is invoked in the body
(2) Iterative algorithm:
An algorithm is said to be an iterative algorithm if the
set of statement is executed more than once.
(3) Heuristic algorithm:
An algorithm which posses the following two
characteristics are defined as a heuristic algorithm
(i) it will usually result in good solution even though
such a solution need not be optimal one.
(ii) It can be easily and fastly implemented

Deterministic algorithm:
Algorithm which has the property that the result of every
operation is uniquely defined are termed as deterministic
algorithm

3
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Non Deterministic algorithm:


An algorithm which contain operations whose outcomes are
not uniquely defined. But one limited to specified set of
possibilities are termed as non deterministic algorithm.

Complexity of Algorithm:
The various complexities of algorithms are
(1) Time complexity
(2) Space complexity

Time complexity:
The time complexity of algorithm is the amount of
computer time it needs top run to completion.

Time complexity of an algorithm is the amount of time


(or the number of steps) needed by a program its task. The way in
which the number of steps required by an algorithm varies with the
size of the problem it is solving. Time complexity is normally
expressed as an order of magnitude, e.g., O(n2)means that if the size
of the problem(n)doubles then the algorithm will take four times as
many steps to complete.

We can define time complexity of a given algorithm for


computation of function f() as a total number of statements that are
executed for computing the value f(n).Thus, time complexity is a
function dependent from the value of n. In practice it is often more
convenient to consider it as a function from|n|.

The time T (p) taken by a program P is the sum of the compile


time and the run time.

The compile time does not depend on the instance


characteristics. The run time is denoted by Tp( instance
characteristics)

i.e TP = compile time + run time


tp(n) = Ca ADD(n) + Cs SUB(n) + Cm MUL(n) + Cd DIV(n)+………….

4
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Where n denotes the instance characteristics and Ca,Cs,Cm,Cd


and so on denotes the time needed for an addition, subtraction,
multiplication and so on.
write T(n)=øf(n).Roughly ,this says the algorithm
always uses f(n)operations on all instances of size n.

Time complexity of an algorithm is generally classified in to three


types. They are,

 Worst case
 Average Case
 Best case.

Worst case:
Worst case is the longest time that an algorithm will use over all
instances of size n for a given problem to produce a desired result.
Often this can be represented by a function f(n) such as f(n)=n2 or
f(n)=n log n. We write T (n) =O (f (n)) for the worst case time
complexity. Roughly, this means the algorithm will take no more than
f (n) operations. Most of an initial course on algorithm is devoted to
worst-case analysis.

Worst-case time complexity does not always give an


accurate prediction of execution time, particularly relative time.
Sufficiently large time can be an impractical number depending on
constants. Also, bounds may not be very tight. To be sure, we have to
actually measure time.

Average Case:
Average case is the average time (or average space) that the
algorithm will use over all instances of size n for a given problem to
produce a desired result. It depends on the probability distribution of
instances of the problem.

Best case:

5
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Best case is the shortest time (or least space) that the algorithm
will use over all instances of size n for a given problem to produce a
desired result. Often this can be represented by a function f (n) such
as f (n) = n2 or f (n) =n log n. We write
T (n)=Ω(f(n))for the best case. Roughly, this means the algorithm will
take no less than f (n) operations. The best case is seldom interesting.
When the worst and best case performance of an algorithm are the
same we can write T(n)=ø f(n).Roughly, this says the algorithm
always uses f(n) operations on all instances of size n.
Example :
An algorithm to find the minimum and maximum
element in an array
Algorithm MaxMin ( a[],n,max,min)
Max := min := a[i]
{
for i= 2 to n do
{
if (a[i]> max) then max=a[i];
else if( a[i] <min) then min=a[i];
}
}

If the elements are in increasing order then best case occurs and the
number of comparisons is (n-1).
When the elements are in decreasing order then worst case occurs
and the number of elements compared is 2(n-1).
On the average a[i] is greater than max half the time and so the
average no. of comparisons are (3n/2)-1.

pace complexity:
The space complexity of an algorithm is the amount of
memory it needs to run the completion.
The space requirements s(p) of any algorithm is given by
s(p) = c + s(p) ( instance characteristics) where c is a constant.

Space complexity of a program is the amount of


memory used at once by the algorithm until it completes its
execution. The way in which the amount of storage space required by
6
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

an algorithm varies with the size of the problem it is solving. the


space needed by the program is generally occupied by the following,
 A fixed amount of memory occupied by the space for
the program code & space occupied by the variables
used in the program.
 A variable amount of memory occupied by the
component variable whose size is dependent on the
problem being solved. This space increases or decreases
depending upon whether the program uses iterative or
recursive procedures.
Space complexity is normally expressed as an order of magnitude,
e.g. O(n2)means that if the size of the problem(n)doubles then four
times as much working storage will be needed. The following are the
different spaces considered for determining the amount of memory
used by the algorithm.

 Instruction space
 Data Space
 Environment Space

Instruction Space:
Instruction Space is the space in memory occupied by the
compiled version of the program. We consider this space as a
constant space for any value of n. We normally ignore this value,
but remember that it is there. The instruction space is independent
of the size of the problem.

Data Space:
Data Space is the space in memory, which used to hold the
variables, data structures, allocated memory, and other data
elements. The data space is related to the size of the problem.

Environment Space:
Environment Space is the space in memory used on the run time
stack for each function call. This is related to the run time stack
and holds the returning address of the previous function. The

7
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

memory each function utilizes on the stack is a constant as each


item on the stack has a return value and pointer on it.

Let us see the space occupied by an iterative procedure, and a


recursive function call.
Iterative Factorial example
fact(long n)
{
for(i=1;i<=n;i++)
x=i*x;
return x;
}
Space occupied is
Data Space: I,n,& x
Environment Space: Almost nothing because the function is called
only once,

This algorithm has a complexity of O(1) because it doesn’t


depend on n(the size of the problem).No matter how big the
problem becomes, the space complexity remains the same since
the same variables are used, and the function is called only once.

Recursive factorial example:


long fact(long x)
{
if(x<=1)
return(1);
else
return(x*fact(x-1));
}

Space occupied is

Data Space: x

Environment Space: fact () is called recursively, and so the amount


of space this program used is based on the size of the problem.

So the space complexity is

8
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

O (n) =(x+function call)*n


=x+function call) +memory needed for fact(x-1)
x+function call+x+function call+……..+
x+function call n (n-1)……..+1

Note that in measuring space complexity, variables are always


allocated space whether they are used in the program or not.
Space complexity is not as big of an issue as time complexity
because space can be reused, whereas time cannot.

Methods used to test algorithms:


There are different methods used for testing an algorithm
(i) Performance analysis (priori
estimates)
(ii) Performance
measurements(posteriori testing)

Priori estimates:
Task of determining how much computing time and
storage an algorithm requires
Ex:
X=x+1 -> amount of time a single execution takes and the
number of times it is executed.

The product of these will be the total time taken. It is a manual


method. It is also called human testing.

Posteriori Testing:
It is the process of executing a correct program on data
sets and measuring the time and space it takes to compute the
result

Asymptotic Notations:
Asymptotic Notations are methods used to estimate and
represent the efficiency of an algorithm using simple formula. This
can be useful for separating algorithms that do drastically different
amounts of work of work for large inputs.

Complexity of an algorithm is usually represented in Big O, ø, Ω or


in f(n) notation.
9
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

For example,

 Ø(n) Means that the rate varies directly with the size of the
input
 Ø(n2)Means that the rate varies with the square of the size of
the input
 Ø(nn) Means that the rate varies exponentially with the size
of the input
 0(log n) Means that the rate varies with the logarithm
(power to which 2 has to be raised to get n ) of the size of the
input.

The following notations are the methods used to estimate


the efficiency of an algorithm.
 Big-Oh Notation(O)
 Big-Omega Notation(Ω)
 Big-Theta Notation(ø)
 Little-oh Notation(o)
Little-omega Notation (ω)
Big “Oh” Notation
This notation is used to define the worst case running time of
an algorithm and concerned with very large values of n.

The letter ‘O’ in the word Big-oh stands for “order of” and the
word “Big” informs that we are mainly concerned with very large
values of n, i.e., only the largest term in the formula needed. The Big –
Oh notation does not attempt to provide exact running times for an
algorithm, and hence constant multiplies can be ignored when
considering an algorithm since we are concerned only with the order.
Big-Oh estimates how fast the execution time grows as the size of the
input grows.

If defines the function f(n)=O(g(n)),if f(n) ≤ cg(n) for all n ≥ no


& c is a positive constant and no is a the upper bound.
We have different computing times for the function, i.e listed
below with their computing time and its equivalent function name.

10
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Computing Time Name


O(1)
Constant
O(log n) Logarithmic Function
O(n) Linear in
n
O(n2) Quadratic
O(n3) Cubic

Many algorithms can be written in multiple ways. Usually the


simplest, most obvious algorithm will also be the slowest. Cleverer,
less intuitive algorithms developed over the years may be faster.
Some common running times for algorithms based on big-Oh
notations are,

 Linear running time


 Quadratic running time
 Exponential running time
 Polynomial running time
 Logarithmic running time.

In linear running time the execution time is directly


proportional to the input size n i.e., O (n).
In quadratic running time the execution time is directly
proportional to n squared i.e., O (n2).
In exponential running time the execution time is directly
proportional to 2 squared n i.e.,O(2n).
In polynomial running time the execution time is directly
proportional to n power k i.e., O (nk) where k>=1.
In logarithmic running time the execution time is log n where
n is the input size i.e.,O(log n).

Elementary data structures:


(Data are simply values on set of values. A data item refers to a
single unit of values. Data items that are divided into sub items are
called group items, those that are not are called elementary items)
For example an employees name may be divided into three sub
items first name, middle name and last name – but the social security
number would normally be treated as a single item.

11
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Collections of data are frequently organized into a hierarchy of


field’s records and files.

Elements of data structures:

Arrays:
The simplest type of data structure is a linear (or one
dimensional) away. By linear we mean a list of a finite number of
similar data elements reference respectively by a set of consecutive
numbers, usually 1,2,3……n If we choose the name A for the away
then the elements of A are denoted by subscript notation
A (1), A(2), A(3), A(4)….. A(n)
The number k of A(K) is called a subscript and A (k) is called a
subscripted variable.

Linked List:
A linked list is a linear collection of data elements called nodes.
The linear order is represented by means of pointers.

INFO is called the information field and it consists of the data. The
link represent the pointer or (address) of the next data field or node.
The last node in the list is Null that is it does not contain an address.

Trees:
Data frequently contain a hierarchical relationship between various
elements. The data structure which reflects this relationship is called a
rooted tree graph or simply a tree.

Stack:
It is a linear list in which insertions and deletions can take place
only at one end called the top.

Queue: (FIFO)
It is a linear list in which deletion take place at the front of the list,
addition take place at the end of the list. (Rear)

Graph:

12
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Data sometimes contain a relationship between pair of elements.


Which is not necessarily hierarchical in nature.

Record structure:
A file is a group of records; a record is a group of data items. The
data items may be of the type.

Application of Elementary data types:


a. Some of the application of elementary data types are
sorting, searching, insertion, deletion, traversing.
b. Solving polynomial arithmetic.
c. Converting arithmetic expressions & evaluation (ie)
infix, postfix and prefix.

Problem solving:
Problem solving is a creative process of finding out appropriate
(or) feasible solution for a given problem.

Problem:
It is any task or activity that is being taken for to find a solution.

Solution:
It is the method of approach identified to solve a particular
problem. The efficient approach of solving a problem is called feasible
solution.

UNIT-II
Arrays: ordered lists, representation of arrays, sparse matrices,
linked lists: singly and doubly linked lists, stacks, queues, multiples
stacks and queues, Applications: polynomial arithmetic, infix,
postfix and prefix arithmetic expression conversion and evaluations.
INTRODUCTION:
This unit introduces a data structure called Arrays. The
simplest form of array is a one-dimensional array that may be
defined as a finite ordered set of homogeneous elements, which is
stored in contiguous memory locations. For example, an array may
contain all integers or all characters or any other data type, but may
not contain a mix of data types.
13
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The general form for declaring a single dimensional array is:


data_type array_name [expression];
where data_type represents data type of the array. That is, integer,
char, float etc. array_name is the name of array and expression
which indicates the number of elements in the array.
For example, consider the following C declaration:
int a[100];
It declares an array of 100 integers.
The amount of storage required to hold an array is directly related
to its type and size. For a single dimension array, the total size in
bytes required for the array is computed as shown below.
Memory required (in bytes) = size of (data type) X length of
array
The first array index value is referred to as its lower bound and in C
it is always 0 and the maximum index value is called its upper
bound. The number of elements in the array, called its range is
given by upper bound-lower bound.
We store values in the arrays during program execution. Let us
now see the process of initializing an array while declaring it.
int a [4] = {34, 60, 93, 2};
int b[]= {2,3,4,5};
float c[ ] ={-4, 6, 81, -60};
we conclude the following facts from these examples:
If the array is initialized at the time of declaration, then the
dimension of the array is optional.
Till the array elements are not given any specific values, they
contain garbage values

ARRAYS AND POINTERS:


C compiler does not check the bounds of arrays. It is your job to do
the necessary work for checking boundaries whenever needed.

One of the most common arrays is a string, which is simply an array


of characters terminated by a null character. The value of the null

14
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

character is zero. A string constant is a one-dimensional array of


characters terminated by a null character (\0).

For example, consider the following:


char message [ ] ={‘e’, ‘x’, ‘a’, ‘m’, ‘p’, ‘l’, ‘e’, ’\0’};
Also, consider the following string which is stored in an array:
“Sentence\n”
Figure 2.1 shows the way a character array is stored in memory. Each character
in the array occupies one byte of memory and the last character is always ‘\0’.
Note that ‘\0’ and ‘0’ are not the same. The elements of the character array are
stored in contiguous memory locations.
s e n t e n c \n \0

String in Memory
C concedes a fact that the user would use strings very often and
hence provides a short cut for initialization of strings.
For example, the string used above can also be initialized as
char name[ ] =“sentence\n”;
Note that, in this declaration ‘\0’ is not necessary. C inserts the null
character automatically.
Multidimensional arrays are defined in the same manner as one-
dimensional arrays, except that a separate pair of square brackets is
required for each subscript. Thus a two-dimensional array will
require two pairs of square brackets; a three-dimensional array will
require three pairs of square brackets and so on.

The format of declaration of a multidimensional array in C is given


below:
data_type array_name [expr 1] [expr 2]….[expr n];
where data_type is the type of array such as int, char etc.,
array_name is the name of array and expr1, expr 2, …..expr n are
positive valued integer expressions.

The schematic of a two-dimensional array of size 3X5 .

15
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

a[0][0] a[0][1] a[0][2] a[0][3] A[0][4]


Row 0

a[1][0] a[1][1] a[1][2] a[1][3] A[1][4]


Row 1

a[2][0] a[2][1] a[2][2] a[2][3] A[2][4]


Row 2

Schematic of a Two-Dimensional Array


In the case of a two-dimensional array, the following formula yields
the number of bytes of memory needed to hold it:
bytes =size of 1st index X size of 2nd index X size of (base type)
The pointers and arrays are closely related. As you know, an array
name without an index is a pointer to the first element in the array.
Consider the following array:
char p[10];
p and &p [0] are identical because the address of the first element of
an array is the same as the address of the array.
So, an array name without an index generates a pointer.
Conversely a pointer can be indexed as if it were declared to be an
array.
For example, consider the following program fragment:
int *x,a[10];
x=a;
x[5]=100;
*(x+5) =100;
Both assignment statements place the value 100 in the sixth element
of a. Furthermore the (0,4) element of a two-dimensional array may
be referenced in the following two ways: either by array indexing
a[0][4], or by the pointer*(int*) a+4).
In general, for any two-dimensional array a[j][k] is equivalent to:
*((base type*)a+(j*rowlength)*k)

SPARSE MATRICES:

16
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Matrices with good number of zero entries are called sparse


matrices.
Consider the following matrices.
(a) Triangular Matrix
(b) Tridiagonal matrix
A triangular matrix is a square matrix in which all the elements
either above or below the main diagonal are zero. Triangular
matrices are sparse matrices. A tridiagonal matrix is a square
matrix in which all the elements except for the main diagonal,
diagonals on the immediate upper and lower side are zeroes.
Tridigonal matrices are also sparse matrices.

Let us consider a sparse matrix from storage point of view.


Suppose that the entire sparse matrix is stored. Then, a
considerable amount of memory which stores the matrix consists of
zeroes. This is nothing but wastage of memory. In real life
applications, such wastage may count to megabytes. So, an efficient
method of storing sparse matrices has to be looked into.
Figure 2.4 shows a sparse matrix of order 7X6.
0 1 2 3 4 5
0 0 0 0 5 0 0
1 0 4 0 0 0 0
2 0 0 0 0 9 0
3 0 3 0 2 0 0
4 1 0 2 0 0 0
5 0 0 0 0 0 0
6 0 0 8 0 0 0

Figure 2.4: Representation of a spare matrix of order 7x6


REPRESENTATION OF ARRAYS:
It is not uncommon to find a large number of programs which
process the elements of an array in sequence. But, does it mean that
the elements of an array are also stored in sequence in memory.
The answer depends on the operating system under which the
program is running. However, the elements of an array are stored
17
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

in sequence to the extent possible. If they are being stored in


sequence, then how are they sequenced? Is it that the elements are
stored row wise or column wise? Again, it depends on the
operating system. The former is called row major order and the
later is called column Row Major Representation.
Row Major Representation:
The first method of representing a two-dimensional
array in memory is the row major representation. Under this
representation, the first row of the array occupies the first set of the
memory location reserved for the array; the second row occupies
the next set, and so forth.

The schematic of row major representation of an Array. Let us


consider the following two-dimensional array:
a b c d
e f g h
i j k l

To make its equivalent row consider representation, we perform the


following process:
Move the elements of the second row starting from the first element
to the memory location adjacent to the last element of the first row.
When this step is applied to all the rows except for the first row, you
have a single row of elements. This is the Row major
representation.
By application of above mentioned process, we get {a, b, c, d, e, f, g,
h, i, j, k, l}

Row 0 Row 1 Row 2 Row i


….
Schematic of a Row major representation of an Array
Column Major Representation:
The second method of representing a two-dimensional array
in memory is the column major representation. Under this
representation, the first column of the array occupies the first set of

18
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

the memory locations reserved for the array. The second column
occupies the next set and so forth. The schematic of a column major
representation is shown in Figure.
Consider the following two-dimensional array:
a b c d
e f g h
i j k l
To make its equivalent column major representation, we perform
the following process:
Transpose the elements of the array. Then, the representation will
be same as that of the row major representation.
By application of above mentioned process, we get {a, e, i, b, f, j, c, g,
k, d, h, i}

Col 0 Col 1 Col 2 Col i


….

Schematic of a Column major representation of an Array

ARRAY IMPLEMENTATION OF LISTS:


In the array implementation of lists, we will use array to hold the
entries and a separate counter to keep track of the number of
positions are occupied. A structure will be declared which consists
of Array and counter.
typedef struct
{
int count;
int entry[200];
} list;
For simplicity, we have taken list entry as integer. Of course, we
can also take list entry as structure of employee record or student
record, etc.

Count 1 2 3 4 5 6 7 8
11 22 33 44 55 66 77
19
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Insertion
In the array implementation of lists, elements are stored in
continuous locations. To add an element to the list at the end, we
can add it without any problem. But, suppose if we want to insert
the element at the beginning or middle of the list, then we have to
rewrite all the elements after the position where the element has to
be inserted. We have to shift (n)th element to (n+1)th position, where
‘n’ is number of elements in the list. The (n-1) th element to (n)th
position and this will continue until the (r) th element to (r+1)th
position, where ‘r’ is the position of insertion. For doing this, the
count will be incremented.
From the above example, if we want to add element ‘35’ after
element ‘33’. We have to shift 77 to 8 th position, 66 to 7th position, so
on, 44 to 5th position.

Before Insertion

Count 1 2 3 4 5 6 7
11 22 33 44 55 66 77

1 2 3 4 5 6 7 8
Step 1
Count 11 22 33 44 55 66 77 77

1 2 3 4 5 6 7 8
Step 2
Count 11 22 33 44 55 66 66 77

1 2 3 4 5 6 7 8

20
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Step 3
Count 11 22 33 44 55 55 66 77

1 2 3 4 5 6 7 8
Step 4
Count 11 22 33 44 44 55 66 77

1 2 3 4 5 6 7 8
Step 5
Count 11 22 33 35 44 55 66 77

Program 3.1 will demonstrate the insertion of an element at desired


position

Deletion:
To delete an element in the list at the end, we can delete it without
any problem. But, suppose if we want to delete the element at the
beginning or middle of the list, then, we have to rewrite all the
elements after the position where the element that has to be deleted
exists. We have to shift (r+1)th element to rth position, where n is the
number of elements in the list. And then the count is decremented.

From the above example, if we want to delete an element ‘44’ from


list. We have to shift 55 to 4th position, 66 to 5th position, 77 to 6th
position.

Before deletion
Count 1 2 3 4 5 6 7
11 22 33 44 55 66 77

21
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

1 2 3 4 5 6 7
Step 1 11 22 33 55 55 66 77
Count

1 2 3 4 5 6 7
Step 2
11 22 33 55 66 66 77
Count

Step 3 1 2 3 4 5 6
Count 11 22 33 55 66 77

Program 3.2 will demonstrate deletion of an element from the linear


array

List Implementation using Linked Lists


Linked list:
In computer science, a linked list is a data structure that consists of a
sequence of data records such that in each record there is a field that
contains a reference (i.e., a link) to the next record in the sequence.

Linked lists are among the simplest and most common data
structures, and are used to implement many important abstract data
structures, such as stacks, queues, hash tables, symbolic
expressions, skip lists, and many more.

Linear collection of self-referential class objects, called nodes


Connected by pointer links
Accessed via a pointer to the first node of the list
Link pointer in the last node is set to null to mark the list’s end
22
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Use a linked list instead of an array when


You have an unpredictable number of data elements
You want to insert and delete quickly.

Basic concepts:
Each record of a linked list is often called an element or node.
The field of each node that contains the address of the next node is
usually called the next link or next pointer. The remaining fields are
known as the data, information, value, or payload fields.

Conventions of Linked List:


There are several conventions for the link to indicate the end of the
list.
1. a null link that points to no node (0 or NULL)
2. a dummy node that contains no item
3. a reference back to the first node, making it a circular list.

LINKED LISTS-IMPLEMENTATION:
The Linked list is a chain of structures in which each structure
consists of data as well as pointer, which stores the address (link) of
the next logical structure in the list.

A linked list is a data structure used to maintain a


dynamic series of data. Think of a linked list as a line of bogies of
train where each bogie is connected on to the next bogie. If you
know where the first bogie is, you can follow its link to the next one.
By following links, you can find any bogie of the train. When you
get to a bogie that isn’t holding (linked) on to another bogie, you
know you are at the end.

Linked lists work in the same way, except programmers usually


refer to nodes instead of bogies. A single node is defined in the
same way as any other user defined type or object, except that it
also contains a pointer to a variable of the same type as itself.

23
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Types of Linked list:


Singly-linked list,
doubly-linked list,
multiply-linked lists

Singly-linked list:
Singly-linked lists contain nodes which have a data field as well as a
next field, which points to the next node in the linked list.

Doubly-linked list :
In a doubly-linked list, each node contains, besides the next-node
link, a second link field pointing to the previous node in the
sequence. The two links may be called forward(s) and backwards.

Circularly linked List:


In the last node of a list, the link field often contains a null reference,
a special value that is interpreted by programs as meaning "there is
no such node". A less common convention is to make it point to the
first node of the list; in that case the list is said to be circular or
circularly linked; otherwise it is said to be open or linear.

24
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

25
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

STACK:

26
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A stack is a limited version of an array. New elements or nodes as


they are often
called, can be added to a stack and removed from a stack only from
one end. For this
reason, a stack is referred to as a LIFO structure (Last-In First-Out).

The two main operations applicable to a stack are:


push: an item is put on top of the stack, increasing the stack size by
one. As stack size is usually limited, this may provoke a stack
overflow if the maximum size is exceeded.

pop: the top item is taken from the stack, decreasing stack size by
one. In the case where there was no top item (i.e. the stack was
empty), a stack underflow occurs.

Additional primitives can be defined:


IsEmpty reports whether the stack is empty
IsFull reports whether the stack is full
Initialise creates/initialises the stack
Destroy deletes the contents of the stack (may be
implemented by reinitialising the stack)

27
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

28
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The following algorithm for ADD and DELETE the item in the
Stack

ADD the item into the Stack:


Procedure ADD (item,stack,n,top)
If top >= n then call stack_full
Top  top  + 1
Stack(top)  item
End ADD

DELETE the item from the Stack:


Procedure DELETE (item,stack,top)
If top <= 0 then call stack_empty
Item  stack(top)
Top top -1
29
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

End DELETE

Queue:
The queue data structure is characterised by the fact that additions
are made at the end ,or tail, of the queue while removals are made from
the front, or head, of the queue. Fort his reason, a stack is referred to as
a FIFO structure (First-In First-Out).

Operations:

The main primitive operations of a queue are known as:


Add adds a new node
Remove removes a node

Additional primitives can be defined:


IsEmpty reports whether the queueis empty
IsFull reports whether the queue is full
Initialise creates/initialises the queue
Destroy deletes the contents of the queue (may
be implemented by re-initialising

30
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The following algorithms for ADD and DELETE item in the Queue:

Procedure ADDQ (item,Q,n,rear)


If rear = m then call queue_ full
Rear  rear +1
Q(rear) ,<-- item
End ADDQ

Algorithm for Delete the item from the Queue:


Procedure DELETEQ(item,Q,front,rear)
If front = rear then call queue_empty

31
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Front  front + 1
Item  Q(front)
End DELETEQ

A stack is generally First In, Last Out, and a queue is First In First
Out.
Item can be added or removed only at one end in stack and in a
queue insertion at the rear and deletion from the front.
The basic operation of stack are 'push' and 'pop', on other hand of
queue are 'enque' and 'dequeue‘
You can think of a queue like a line at the bank. The first person to
get there will get to the teller first. If a bunch of people come while
all the tellers are busy, they stand in line in the order in which they
arrived.
That is to say, new people (items) are added to the end of the line
and the first person in line is the only one who is called to a teller. In
real life this is known as "first come, first served." In programming
terms it's known as first-in-first-out, or FIFO.

You can think of a stack like a deck of cards. You can put down
cards into a pile on your table one at a time, but if you want to draw
cards, you can only draw them from the top of the deck one at a
time. Unlike a queue, the first card to be put down is the last card to
be used. This is known as first-in-last-out, or FILO (also called LIFO
for last-in-first-out).

Algebraic Expressions, an Introduction:

An algebraic expression is a legal combination of operands and


operators. Operand is the quantity (unit of data) on which a
mathematical operation is performed. Operand may be a
variable like x, y, z or a constant like 5, 4,0,9,1 etc. Operator is a
symbol which signifies a mathematical or logical operation
between the operands. Example of familiar operators include
+,-,*, /, ^ (throughout the tutorial '^' will be referred to as

32
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Exponential Operator) etc. Considering these definitions of


operands and operators now we can write an example of
expression as x+y*z. Note the phrase "LEGAL combination" in
the definition of an Algebraic Expression, in aforementioned
example of expression x+y*z, the operands x , y, z and the
operators +,* form some legal combination. Take another
example +xyz*, in this expression operators and operands do
not make any LEGAL combination; this expression is not a
valid algebraic expression.

An Algebraic Expression can be represented using three


different notations:

INFIX: From our school times we have been familiar with the
expressions in which operands surround the operator, e.g. x+y,
6*3 etc this way of writing the Expressions is called infix
notation.

PREFIX: Prefix notation also Known as Polish notation, is a


symbolic logic invented by Polish mathematician Jan
Lukasiewicz in the 1920's. In the prefix notation, as the name
only suggests, operator comes before the operands, e.g. +xy,
*+xyz etc.

POSTFIX: Postfix notation is also known as Reverse Polish


notation. They are different from the infix and prefix notations
in the sense that in the postfix notation, the operator comes
after the operands, e.g. xy+, xyz+* etc.

For example, will expression 3+5*4 evaluate to 32 i.e. (3+5)*4 or


to 23 i.e. 3+(5*4). To solve this problem Precedence or Priority
of the operators were defined. Operator precedence governs
evaluation order

33
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Converting Expression from Infix to Postfix using STACK

To convert an expression from infix to postfix, we are going to


use a stack.

Algorithm
1) Examine the next element in the input.
2) If it is an operand, output it.
3) If it is opening parenthesis, push it on stack.
4) If it is an operator, then
i) If stack is empty, push operator on stack.
ii) If the top of the stack is opening parenthesis, push
operator on stack.

iii) If it has higher priority than the top of stack, push


operator on stack.
iv) Else pop the operator from the stack and output it,
repeat step 4.
5) If it is a closing parenthesis, pop operators from the stack
and output them until an opening parenthesis is encountered.
pop and discard the opening parenthesis.
6) If there is more input go to step 1
7) If there is no more input, unstack the remaining operators to
output.

Example
Suppose we want to convert 2*3/(2-1)+5*(4-1) into Postfix form:
Reversed Expression: )1-4(*5+)1-2(/3*2
Char Stack Contents(Top on Postfix
Scanned right) Expression
2 Empty 2
* * 2
3 * 23
/ / 23*

34
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

( /( 23*
2 /( 23*2
- /(- 23*2
1 /(- 23*21
) / 23*21-
+ + 23*21-/
5 + 23*21-/5
* +* 23*21-/5
( +*( 23*21-/5
4 +*( 23*21-/54
- +*(- 23*21-/54
1 +*(- 23*21-/541
) +* 23*21-/541-
Empty 23*21-/541-*+

So, the Postfix Expression is 23*21-/541-*+

Converting Expression from Infix to Prefix using STACK

It is a bit trickier algorithm, in this algorithm we first reverse


the input expression so that a+b*c will become c*b+a and then
we do the conversion and then again the output string is
reversed. Doing this has an advantage that except for some
minor modifications the algorithm for Infix->Prefix remains
almost same as the one for Infix->Postfix.
Algorithm:
1) Reverse the input string.
2) Examine the next element in the input.
3) If it is operand, add it to output string.
4) If it is closing parenthesis, push it on stack.
5) If it is an operator, then

35
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

i) If stack is empty, push operator on stack.


ii) If the top of stack is closing parenthesis, push operator on
stack.
iii) If it has same or higher priority than the top of stack, push
operator on stack.
iv) Else pop the operator from the stack and add it to output
string, repeat step 5.
6) If it is a opening parenthesis, pop operators from stack and
add them to output string until a closing parenthesis is
encountered. Pop and discard the closing parenthesis.
7) If there is more input go to step 2
8) If there is no more input, unstack the remaining operators
and add them to output string.
9) Reverse the output string.

Example
Suppose we want to convert 2*3/(2-1)+5*(4-1) into Prefix form: Reversed
Expression: )1-4(*5+)1-2(/3*2
Stack Prefix
Char
Contents(Top on Expression(right to
Scanned
right) left)
) )
1 ) 1
- )- 1
4 )- 14
( Empty 14-
* * 14-
5 * 14-5
+ + 14-5*
) +) 14-5*
1 +) 14-5*1
- +)- 14-5*1

36
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

2 +)- 14-5*12
( + 14-5*12-
/ +/ 14-5*12-
3 +/ 14-5*12-3
* +/* 14-5*12-3
2 +/* 14-5*12-32
Empty 14-5*12-32*/+

Reverse the output string : +/*23-21*5-41 So, the final Prefix


Expression is +/*23-21*5-41.

Arithmetic Expressions

An expression is a combination of variables constants and


operators written according to the syntax of C language. In C
every expression evaluates to a value i.e., every expression
results in some value of a certain type that can be assigned to a
variable. Some examples of C expressions are shown in the
table given below.

Algebraic C
Expression Expression

axb–c a*b–c

(m + n) (x (m + n) * (x
+ y) + y)

(ab / c) a*b/c

37
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

3x2 +2x + 1 3*x*x+2*x+1

(x / y) + c x/y+c

Evaluation of Expressions
Expressions are evaluated using an assignment statement of
the form

Variable = expression;

Variable is any valid C variable name. When the statement is


encountered, the expression is evaluated first and then
replaces the previous value of the variable on the left hand
side. All variables used in the expression must be assigned
values before evaluation is attempted.

Example of evaluation statements are


x=a*b–c
y=b/c*a
z = a – b / c + d;

The following program illustrates the effect of presence of


parenthesis in expressions.

.
main ()
{
float a, b, c x, y, z;
a = 9;
b = 12;
c = 3;
x = a – b / 3 + c * 2 – 1;
y = a – b / (3 + c) * (2 – 1);
z = a – ( b / (3 + c) * 2) – 1;
38
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

printf (“x = %fn”,x);


printf (“y = %fn”,y);
printf (“z = %fn”,z);
}
.

output

x = 10.00
y = 7.00
z = 4.00

Precedence in Arithmetic Operators


An arithmetic expression without parenthesis will be
evaluated from left to right using the rules of precedence of
operators. There are two distinct priority levels of arithmetic
operators in C.

High priority * / %
Low priority + -

Rules for evaluation of expression


First parenthesized sub expression left to right are evaluated.
.
If parentheses are nested, the evaluation begins with the
innermost sub expression.
.
The precedence rule is applied in determining the order of
application of operators in evaluating sub expressions.
.
The associability rule is applied when two or more operators of
the same precedence level appear in the sub expression.
.

39
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Arithmetic expressions are evaluated from left to right using


the rules of precedence.
.
When Parenthesis is used, the expressions within parenthesis
assume highest priority.

Type conversions in expressions


Implicit type conversion
C permits mixing of constants and variables of different types
in an expression. C automatically converts any intermediate
values to the proper type so that the expression can be
evaluated without loosing any significance.
This automatic type conversion is known as implicit type
conversion during evaluation it adheres to very strict rules and
type conversion. If the operands are of different types the
lower type is automatically converted to the higher type before
the operation proceeds. The result is of higher type.

The following rules apply during evaluating expressions

All short and char are automatically converted to int then


1. If one operand is long double, the other will be converted to
long double and result will be long double.
.
2. If one operand is double, the other will be converted to
double and result will be double.
.
3. If one operand is float, the other will be converted to float
and result will be float.
.
4. If one of the operand is unsigned long int, the other will be
converted into unsigned long int and result will be unsigned
long int.
.
5. If one operand is long int and other is unsigned int then

40
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

.....a. If unsigned int can be converted to long int, then


unsigned int operand will be. converted as such and the result
will be long int.
.....b. Else both operands will be converted to unsigned long int
and the result will be unsigned long int.
.
6. If one of the operand is long int, the other will be converted
to long int and the result will be long int. .
7. If one operand is unsigned int the other will be converted to
unsigned int and the result will be unsigned int.

Explicit Conversion
Many times there may arise a situation where we want to force
a type conversion in a way that is different from automatic
conversion.

Consider for example the calculation of number of female and


male students in a class

........................female students
Ratio =........-------------------
........................male students

Since if female students and male students are declared as


integers, the decimal part will be rounded off and its ratio will
represent a wrong figure. This problem can be solved by
converting locally one of the variables to the floating point as
shown below.

Ratio = (float) female students / male students

The operator float converts the female students to floating


point for the purpose of evaluation of the expression. Then
using the rule of automatic conversion, the division is

41
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

performed by floating point mode, thus retaining the fractional


part of the result. The process of such a local conversion is
known as explicit conversion or casting a value. The general
form is (type name) expression .

UNIT - III

Trees: Binary trees: Definition, traversal, threaded binary tree,


Counting Binary Tree.Graphs: Representation, traversal, connected
components, shortest path and transitive closure, topological sort,
activity network, critical path, path enumeration. Dijkstra’s
Algorithm, Floyd Warshall’s Algorithm, Minimum Spanning Tree
Definitions

Tree structures:
Introduction:
A tree is a data structure that represents hierarchical relationship
between individual data items. A parent child relationship is
Cu
an example of a hierarchical
sto relationship in which there exists
well defined order ofmeprecedence between the two data items
being related. r

The Figure represents a tree structure of a customer record

Ad
Co Nam dre Ite
de e ss m

42
DMI - St. Eugene university
Ite Ite Ite
m m m
1 2 3
054CS201 Data Structure and Algorithms

The highest level of a tree is called the root node. In fig.1 customer is
the root node. The child nodes that are directly connected to
the root node are called the child node. In fig 1 code, name,
address and items are the child nodes of the parent node
customer. The child nodes of a given parent called are siblings.
In fig 1 code, name, and items are siblings of the parent
customer.

A branch in a tree structure is a link between a parent and its child.


The root of the tree structure is the ancestor of all the nodes in
the tree. A node with no children is called a leaf node. In fig 1
code is a leaf node.

Item

Item Item Item


1 2 3

43
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A subset of a tree , that itself is a tree is known as a sub tree. Fig. 2


depicts a subtree of the tree structure given in fig.1 . the root
node of the given sub tree is items.

Binary tree:
The most important tree structure is a binary tree. A binary tree is a
finite set of data elements arranged into a structure consisting
of a root and two disjoint sub trees, (ie) the left sub tree and
right sub tree.

E F

+ C

44
DMI - St. Eugene university
A B
054CS201 Data Structure and Algorithms

Forms of Binary tree:

1. Strictly binary tree:


If even non leaf node in a binary tree has non empty left and right sub
trees, the tree is termed as strictly binary tree

B C

D E

F G

2. Completely binary tree:


A complete binary tree of depth d is strictly binary tree of whose
leaves are at level d.
Level 0
A

Level 1
B C

45 Level 2
D EDMI - St. Eugene university
F G

Level 3
H I J K L M N O
054CS201 Data Structure and Algorithms

binary tree of depth d almost complete binary tree if,

1) each leaf in the tree is either at level d or at level d-1


2) for any node nd in the tree with a right descendent at
level d, all the left descendent of nd that are leaves are
also at level d.

Implementation of binary trees:

Binary tree structures can be implemented in two ways. They


are

1. linear implementation
2. linked implementation

Linear implementation:

Binary tree structures can be implemented linearly using


arrays. Linear implementation uses a one dimensional of
size(2d+1-1) where d is the depth of the tree. The depth of the
tree is the maximum level of any node in the tree.

The root of the binary tree is stored in the first location of the
array. If the node is in the nth location of the array, the left child
is stored at location 2n and the right child is stored at location
(2n+1)

46
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

1 /
2 -
3 *
4 +
5 C
6 E
7 F
8 A
9 B
10
11
12
Unused Locations
13
14
15

The linear representation of the binary tree

Advantages:

1. It is simple method. Given the child node , its parent node can
be immediately identified and vice versa.
2. This method can be implemented easily, in older languages and
also in which only static memory allocation is directly
available.

Disadvantages:
Unused locations cause wastage of memory

Linked Implementation of a binary tree:


Binary tree structures can be implemented using linked lists
and pointers.

47
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The linked representation of the binary tree structure

- *

+ C E F

A B

Location Data Llink Rlink


1 / 2 7
2 - 3 5
3 + 5 6
4 C Null Null
5 A Null Null
6 B Null Null
7 * 8 9
8 E Null Null
9 F Null Null
Linked representation using three arrays
Advantages:
It is most efficient method for implementing binary tree

Disadvantages:
i) Memory is wasted in the name of NULL pointers
ii) It is difficult to determine the parent node, if the child
node is given and vice versa

48
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

iii) It is more difficult to implement binary trees using


linked representation in languages that do not offer dynamic
storage techniques.

Binary tree traversals:

Traversing a tree is nothing but processing the nodes of the


tree such that each node is visited only once.

The possible tree traversals are

1) Pre order traversals


2) In order traversals
3) Post order traversals

Preorder traversals:
The three steps involved in the preorder traversals are

1) Process the root node


2) Process the left sub tree
3) Process the right sub tree

These ordered steps are repeated recursively, until all the nodes are
processed.

Pseudo algorithm:

PROCEDURE PREORDER(ROOT ,NN,LEFT,RIGHT)


VAR ROOT,NN,LEFT(NN),RIGHT(NN) :INTEGER
VAR DATA(NN): CHARACTER
IF ROOT=NULL THEN
RETURN
ENDIF
CALL TRAVERSE(DATA(ROOT))
CALL PREORDER(LEFT(ROOT))
CALL PREORDER(RIGHT(ROOT))
RETURN
END PREORDER
The preorder traversal results in prefix notation of the
expression stored in the binary tree in infix form.
49
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

In order traversal:
The three steps involved in the in order traversal are
1. Process the left sub tree
2. Process the root node
3. Process the right sub tree

Pseudo algorithm:

PROCEDURE INORDER(ROOT ,NN,LEFT,RIGHT)


VAR ROOT,NN,LEFT(NN),RIGHT(NN) :INTEGER
VAR DATA(NN): CHARACTER
IF ROOT=NULL THEN
RETURN
ENDIF
CALL INORDER (LEFT(ROOT))
CALL TRAVERSE(DATA(ROOT))
CALL INORDER(RIGHT(ROOT))
RETURN
END INORDER
The in order traversal of the binary tree in which the expression is
stored results in infix notation of the given expression.

Post order traversal:


The three steps involved in the in order traversal are

1. Process the left sub tree


2. Process the right sub tree
3. Process the root node

Pseudo algorithm:

PROCEDURE POSTORDER(ROOT ,NN,LEFT,RIGHT)


VAR ROOT,NN,LEFT(NN),RIGHT(NN) :INTEGER
VAR DATA(NN): CHARACTER
IF ROOT=NULL THEN
RETURN
ENDIF
CALL POSTORDER (LEFT(ROOT))
50
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

CALL POSTORDER(RIGHT(ROOT))
CALL TRAVERSE(DATA(ROOT))
RETURN
END POSTORDER

The post order traversal of the binary tree in which the expression is
stored in infix notations results in postfix notation of the given
expression.

Example

In this binary search tree


 Preorder traversal sequence: F, B, A, D, C, E, G, I, H (root, left, right)
 Inorder traversal sequence: A, B, C, D, E, F, G, H, I (left, root, right)
 Postorder traversal sequence: A, C, E, D, B, H, I, G, F (left, right,
root)

Threaded binary trees:

In the linked representation of the binary tree, more memory


space is wasted in the name of null pointers. To overcome this
disadvantage, the binary trees can be threaded.

51
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Threaded binary tree is a tree structure, that eliminates the need


for recursive traversal through a tree by making use of null
pointers to point to the preceding and succeeding nodes.

The following fig shows the threaded binary tree with a head node.

LINK

LINK
PTR

PTR
TR
TL

R
L
1 HEA 1
D

1 / 1

1 - 1 1 * 1

1 + 1 0 C 0 0 E 0 0 F 0

0 A 0 0 B 0

Advantages of threaded binary tree:

1. Threading a binary tree is used to improve significantly the


processing speed of a list implemented with a binary tree.
2. Threading a binary tree eliminates the need for recursion in
traversing a tree.

Operations on a tree:
Node insertion in a tree:

PROCEDURE TREE
INSERT(ROOT,DATA,LLINK,RLINK,AVAIL,N,LINK,PTR)
VAR N, DATA(N),RLINK(N),ROOT,AVAIL,ITEM :INTEGER
52
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

READ ITEM;
ROOT: = PTR
CALL GETNODE(AVAIL, LINK, NN, PTR)
DATA(ROOT) := ITEM
LLINK(ROOT) :=NULL
RLINK(ROOT):=NULL
FOR I = 2 TO N DO
READ ITEM
CALL GETNODE(AVAIL, LINK, NN, PTR)
DATA(PTR) := ITEM

L1: IF DATA(ROOT) > DATA(PTR) THEN


WRITE “ GO LEFT”
IF LLINK(ROOT)=NULL THEN
LLINK(ROOT) :=PTR
ELSE
ROOT= LLINK(ROOT)
GOTO L1
ENDIF
ELSE
IF DATA(ROOT) < DATA(PTR) THEN
WRITE “GO RIGHT”
IF RLINK(ROOT) =NULL THEN
RLINK(ROOT) :=PTR
ELSE
ROOT= LLINK(ROOT)
GOTO L1
ENDIF
ELSE
WRITE “DUPLICATION NOT POSSIBLE”
ENDIF
END FOR
END TREEINSERT

Node deletion in a tree:


In deletion algorithm we use pointer, which point to be
(1) Root pointer
(2) Left child pointer
(3) Right child pointer
53
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Three cases of node deletion on a binary tree:

1. The node to be deleted has a left child without a right


child
2. The node to be deleted has a right child, but no left child
3. The node to be deleted has no children

CASE 1:
Link(p) having no right child
 Non-null left sub tree of the node to be deleted
 No right child for llink(p)
 Check for the greatest node in the left sub tree

Since there is no right child for llink(p), then llink(p) is going to be


greatest node

NX: =NP
NP: =LLINK (NX)
RLINK (NP):=RLINK (NX)
CALL RETURNNODE (NX)

CASE 2:
The node to be deleted has a right child, but no left child
NX :=NP
NP:=RLINK(NX)
CALL RETURNNODE(X)

CASE 3:
The node to be deleted has no child( root node)
NX := NP;
NP := NULL
CALL RETURN(NX)

GRAPHS:

INTRODUCTION:

54
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A graph is set of objects called nodes and edges. A node


is a data element of the graph and an edge is a connecting path
between two nodes of a graph

Types of graph:
The different types of graph are

i. directed graph
ii. undirected graph
Directed graphs:
A directed graph is a structure in which the edges between
different nodes are directionally oriented. These connecting
edges are called directed edges. A directed edge is also known
as an arc. A directed graph is also known as graph.

Strongly connected digraph


A directed graph is said to be strongly connected if there exists
a directed path from one node to another

Weekly connected digraph


A directed graph is said to be weakly connected if there exist a
path between the nodes from node1 to node2 or node 2 to
node1.

Terms related to digraphs:

Out degree:
The outdegree of a node in a directed graph is the number of
directed edges existing from or coming out of the node

Indegree:
The indegree of a node in a directed graph is the number of
directed edges entering or coming towards the nodes.

The indegree of node A of the directed graph shown in fig

(i.e) from D to A: D A=1

Sink node:

55
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A node which acts as a repository of information is called a sink


node. The out degree of sink node is 0.
Source node
A node which shares the information is called a source node.
The indegree of a source node is 0.

Representations of Graphs

Adjacency Matrices

Graphs G = (V, E) can be represented by adjacency matrices G[v1..v|V


|, v1..v|V |], where the rows and columns are indexed by the nodes,
and the entries G[vi, vj] represent the edges. In the case of unlabeled
graphs, the entries are just boolean values.

Adjacency matrices require O(|V |2) space, and so they are space-
efficient only when they are dense (that is, when the graphs have
many edges). Time-wise, the adjacency matrices allow easy addition
and deletion of edges.

56
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Adjacency Lists

A representation of the graph consisting of a list of nodes, with each node


containing a list of its neighboring nodes.

This representation takes O(|V | + |E|) space.

Traversal

Graph traversal refers to the problem of visiting all the nodes in a


graph in a particular manner. Tree traversal is a special case of graph
traversal. In contrast to tree traversal, in general graph traversal, each
node may have to be visited more than once, and a root-like node that
connects to all other nodes might not exist.

Depth First Search:-


Here, the starting point can be any vertex and the next vertex to be
visited is decided by the traversal you choose. If you choose the DFS,
the next vertex to be visited will be the adjacent vertex (to the starting
point), which has the highest depth value(??????...look at the graph
below) and then the next adjacent vertex with the next higher depth
value and so on till all the adjacent nodes for that vertex are not
visited. We repeat this same procedure for every visited vertex of the
graph.

Well, I got to explain this with the help of a graph –

57
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

If V1 is the starting point of my


traversal then, as you can see - V2, V3
and V8 are its adjacent vertices and
so in the Adjacency List of V1. Now, I
am suppose to visit these vertices,
but as this is DFS, I should visit V8
first and then V2 and V3. So, to keep
track of which vertex to be visited
next, we make use of a STACK,
which holds these vertex values in
the DFS order. Let's see the
algorithm...

Algorithm:-
Step-1: Set the starting point of traversal and push it inside the stack
Step-2: Pop the stack and add the popped vertex to the list of visited
vertices
Step-3: Find the adjacent vertices for the most recently visited
vertex( from the Adjacency Matrix)
Step-4: Push these adjacent vertices inside the stack (in the increasing
order of their depth) if they are not visited and not there in the stack
already
Step-5: Go to step-2 if the stack is not empty

Graph Algorithms
Minimal spanning trees_shorest paths-Analysis.

58
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The greedy method can be effectively applied for graph


optimization problems. some example are:
1. Prim’s Algorithm: to find a minimum spanning
tree in an undirected graph.
2. Kruskal’s Algorithm: To find a minimum
spanning tree for G or a minimum spanning tree
collection if G is not connected.
3. Dijktra’s algorithm: To find single source shortest
paths in undirected and directed graphs.

PRIM’S ALGORITHM
Definition :Minimum spanning tree(MST)
A spanning tree for a connected undirected graph G
=(V,E) is sub graph t =(V,E’)if t is a tree.

A B
A B A A B
B

C D C D C D
C D
(a (b (c (d
(a) )G =(V,E) ) (b,c, & d) =some) of example for spanning
)
trees
n weighted Graph G =(V,E,W) ,the weight of a sub graph is the sum
of the weighted of the edges in the subgraph.A MST for a
weighted graph is spanning tree with minimum weight.

A B
A B
A

B
B
D

D
C

C D C D
C

(a (b (c (d
) ) ) )

59
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

(a)G =(V,E,W)
(b) and (c) are minimum spanning trees (W=6)when comparing to (d)
(W=7)

The Strategy:
1. Prim’s algorithm selects a starting vertex from the given
graph and classifies the start vertex under “tree
vertices”.
2. The nodes adjacent to tree vertices are identified and
classified under “fringe vertices “,and the remaining
vertices are classified as “unseen vertices”.
3. the key step of the algorithm is selection of new vertex from
the “fringe” and an incident egde,which is now added to “tree
vertices”/After this new vertex from the “fringe” is depends
on the weight of the edge process continues until the “fringe”.

Note:
a) Tree vertices : in the tree constructed so far.
b) Fringe vertices :not in the tree ,but adjacent to
some vertex in the tree.
c) Unseen vertices: all other
Algorithm:
PrimMST (G,n) //out line
Initialize all vertices as unseen.
Select an arbitrary vertex s to start the tree; reclassify it as
tree.
Reclassify all vertices adjunct to s as fringe.
While there are fringe vertices
Select an edge of minimum
Weight between a tree vertex t and a fringe vertex v;
Reclassify v as tree; add edge tv to the tree;
Reclassify all unseen vertices adjunct to v as fringe.

Example:

60
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

This is our original weighted graph. The numbers near the edges

indicate their weight.


Vertex D has been arbitrarily chosen as a starting point. Vertices A, B,
E and F are connected to D through a single edge. A is the
vertex nearest to D and will be chosen as the second vertex
along with the edge AD.

61
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The next vertex chosen is the vertex nearest to either D or A. B is 9


away from D and 7 away from A, E is 15, and F is 6. F is the
smallest distance away, so we highlight the vertex F and the
arc DF.

The algorithm carries on as above. Vertex B, which is 7 away from A,


is highlighted

In this case, we can choose between C, E, and G. C is 8 away from B, E


is 7 away from B, and G is 11 away from F. E is nearest, so we
highlight the vertex E and the arc BE.

62
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Here, the only vertices available are C and G. C is 5 away from E, and
G is 9 away from E. C is chosen, so it is highlighted along with
the arc EC.

Vertex G is the only remaining vertex. It is 11 away from F, and 9


away from E. E is nearer, so we highlight it and the arc EG.

63
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Now all the vertices have been selected and the minimum spanning
tree is shown in green. In this case, it has weight 39.

Kruskal’s minimum spanning tree algorithm:


In the previous topic we studied spanning tree, minimum
spanning tree and the method to construct minimum spanning
tree with appropriate conditions in greedy method. Here we
examine Kruskal’sMSTalgorithm that also uses a greediest
stratergy.The differentiation exist between prim’s and
kuruskal’s MST construction will be discussed at the end of the
topic.
The Strategy:
1. To construct MST ,edges of the graph are
selected in non decreasing order of cost(or) distance.
2. set of t edges so far selected for the
spanning tree will form a forest, but not necessarily one
tree.
3. terminate when all edges have been
processed.
Algorithm:
KruskalMST (G,n) //out line
R=E; //R is remaining edges
While (R is not empty)
Remove the lightest edge vw,from R,
If(vw does not make cycle in F)
Add vw to F;
64
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Return F;

Definition : forest

A forest is a set of n>=disjoint trees. the notation of a forest is


very close that of a tree because if we remove the root of a tree,
we get a forest. for ex :if we remove A, we get a forest with
three trees.
.
Algorithm :Kruskal’s MST
Input:G =(V,E,G) a weighted graph, with |V| =n ,|E| =m
Output:F,a subset of E which forms a minimum spanning tree for
G ,or a minimum spanning tree collection if G is not connected
kruskalMST(G,n,F)
int count;
Build a minimizing priority queue ,pq, of edges of G,
prioritized by weight
Initialize a union –find structure ,sets, in which each vertex of G is in
its own set.
F=ф;
While (isEmpty(pq)==false)
vwEdge =getMin(pq);
deleteMin(pq);
int vset = find (sets, vwEdge.from);
int west = find(sets,vwEdge.to);
if(vset ≠ west)
Add vw edge to F,
Union (sets, vset, west);
Return;

(a) sets - A structure defined in the algorithm corresponds to the


equivalence relation (≡)
(b)Find Union – Find (v) returns the unique identifier of the
equivalence class of vertex v if s and t are identifiers for
distinct equivalence classes, then Union(s,t) merges them.

Example:
65
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

This is our original graph. The numbers near the arcs indicate their
weight. None of the arcs are highlighted

AD and CE are the shortest arcs, with length 5, and AD has been
arbitrarily chosen, so it is highlighted

66
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

CE is now the shortest arc that does not form a cycle, with length 5,
so it is highlighted as the second arc

The next arc, DF with length 6, is highlighted using much the same
method

67
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The next-shortest arcs are AB and BE, both with length 7. AB is


chosen arbitrarily, and is highlighted. The arc BD has been
highlighted in red, because there already exists a path (in
green) between B and D, so it would form a cycle (ABD) if it
were chosen.

The process continues to highlight the next-smallest arc, BE with


length 7. Many more arcs are highlighted in red at this stage:
BC because it would form the loop BCE, DE because it would
form the loop DEBA, and FE because it would form FEBAD.

68
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Finally, the process finishes with the arc EG of length 9, and the
minimum spanning tree is found

Differences between Prim’s and Kruskal’s Algorithms

Prim’s Algorithm Kruskal’s Algorithm


(a) Input : A weighted graph G = (a).Input: A weighted graph G
(V,E,W) and necessarily =(V,E,W) and not
connected graph. necessarily connected
graph
(b).Output : A spanning tree with (b). Output :A spanning tree with
minimum weight age(no minimum weight age
cycle) when G is connected (or) a
spanning tree collection
with minimum weightage,
when G is not connected.
(no cycles)
(c) Worst case running time is (c). Worst case running time is
O(n2),which is better for o(m log m) ,which is better
dense graphs. for sparse graphs

Dijkstra ‘s Algorithm
Dijkstra’s Algorithm is used to find the minimum weight path
from a specified source vertex to every other vertex in a weight
directed or undirected graph. the weight of a path is the sum
of the weights on the edges in that path.
when weight is interpreted as distance, a minimum weight path is
called as shortest path .this algorithm requires that edge
69
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

weighted graph G=(V,E,W) and s source vertex s.The problem


is to find a shortest path from s to each vertex.

The strategy:
1. Dijkstra’s algorithm selects a staring vertex from the given
graph and classified the start vertex under “tree vertices”.
2. The nodes adjunct to tree vertices are identified and
classified under “fringe vertices” and the remaining vertices
are classified as “unseen vertices”.
3. The key step of the algorithm is selection of new vertex and
an incident edge from the “fringe” which is now added to
“tree vertices”. After this new inclusion reclassified.the
selection of new vertex from the “fringe” is depends on the
total minimum weight of the edge from the start vertex s to
current vertex z.

Algorithm:
dijktraSP(G.n) //out line
Initialize all vertices as unseen
Start the tree with the specified source vertex s;reclassify
it as tree.
Define d(s,s) =0
Reclassify all vertices adjacent to s as fringe .
While there are fringe vertices.
Select an edge between a tree vertex t and a
Fringe vertex v such that (d(s,t) + W(tv) is
minimum;
Reclassify v as tree; add edge tb to the tree;
Define d(s,v) = (d(s,t) + W (tv))
Reclassify al unseen vertices adjacent to v as
fringe.

An Example

So far I've listed the instructions that make up the algorithm. But
to really understand it, let's follow the algorithm on an example.
We shall run Dikjstra's shortest path algorithm on the following
graph, starting at the source vertex a.
70
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Step :1

The minimum shortest path weight age is 0 + 4 = 4

Step:2

71
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The minimum shortest path weight age is = 2 + 1 = 3


Step:3

We skip c, it has already been settled. But a shorter path is found


for d:
The minimum shortest path weight age is = 3 + 1 = 4

step:4

The total shortest path weight age is=4


72
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

This completes our description of Dijkstra's shortest path


algorithm. Other shortest path algorithms exist (see the
References section at the end of this article), but Dijkstra's is one
of the simplest, while still offering good performance in most
cases.

73
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

UNIT IV

Searching & Sorting: Binary Search tree, Insertion & Deletion, AVL
Trees, Hash Function, Hash Table, internal sort : Radix sort,
insertion sort, exchange sort, selection sort, quick sort, shell sort,
merge sort, heap sort, external sort: k-way merge sort, balanced
merge sort, polyphase merge sort.

SEARCHING AND SORTING

Searching:

Searching takes a problem as input and returns a solution to the


problem.

Binary Search Tree:

A binary search tree (BST) is a binary tree data structure which


has the following properties:

56

26 200

18 28 190 213

12 24 27

Each node (item in the tree) has a distinct value.

 Both the left and right subtrees must also be binary search
trees.
 The left subtree of a node contains only values less than the
node's value.

74
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

 The right subtree of a node contains only values greater than


the node's value.

The major advantage of binary search trees over other data structures
is that the related sorting algorithms and search algorithms such as
in-order traversal can be very efficient.

Binary search trees can choose to allow or disallow duplicate values,


depending on the implementation.

Operations

Operations on a binary tree require comparisons between nodes.


These comparisons are made with calls to a comparator, which is a
subroutine that computes the total order (linear order) on any two
values. This comparator can be explicitly or implicitly defined,
depending on the language in which the BST is implemented.

Searching

Searching a binary tree for a specific value can be a recursive or


iterative process. This explanation covers a recursive method.

We begin by examining the root node. If the tree is null, the value we
are searching for does not exist in the tree. Otherwise, if the value
equals the root, the search is successful. If the value is less than the
root, search the left subtree. Similarly, if it is greater than the root,
search the right subtree. This process is repeated until the value is
found or the indicated subtree is null. If the searched value is not
found before a null subtree is reached, then the item must not be
present in the tree.

Here is the search algorithm in the Python programming language:

def search_binary_tree(node, key):


if node is None:
return None # key not found
if key < node.key:
return search_binary_tree(node.left, key)

75
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

else if key > node.key:


return search_binary_tree(node.right, key)
else: # key is equal to node key
return node.value # found key

Insertion

Insertion begins as a search would begin; if the root is not equal to the
value, we search the left or right subtrees as before. Eventually, we
will reach an external node and add the value as its right or left child,
depending on the node's value. In other words, we examine the root
and recursively insert the new node to the left subtree if the new
value is less than the root, or the right subtree if the new value is
greater than or equal to the root.

Here's how a typical binary search tree insertion might be performed


in C++:

/* Inserts the node pointed to by "newNode" into the subtree rooted at


"treeNode" */
void InsertNode(Node* &treeNode, Node *newNode)
{
if (treeNode == NULL)
treeNode = newNode;
else if (newNode->key < treeNode->key)
InsertNode(treeNode->left, newNode);
else
InsertNode(treeNode->right, newNode);
}

The above "destructive" procedural variant modifies the tree in place.


It uses only constant space, but the previous version of the tree is lost.
Alternatively, as in the following Python example, we can reconstruct
all ancestors of the inserted node; any reference to the original tree
root remains valid, making the tree a persistent data structure:

def binary_tree_insert(node, key, value):


if node is None:
return TreeNode(None, key, value, None)

76
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

if key == node.key:
return TreeNode(node.left, key, value, node.right)
if key < node.key:
return TreeNode(binary_tree_insert(node.left, key, value),
node.key, node.value, node.right)
else:
return TreeNode(node.left, node.key, node.value,
binary_tree_insert(node.right, key, value))

Deletion
Some examples of Binary Search Tree insertions and deletions. There
are several cases to be considered:
 Deleting a leaf: Deleting a node with no children is easy, as we
can simply remove it from the tree.
 Deleting a node with one child: Delete it and replace it with its
child.
 Deleting a node with two children: Suppose the node to be
deleted is called N. We replace the value of N with either its in-
order successor (the left-most child of the right subtree) or the
in-order predecessor (the right-most child of the left subtree).

struct Binary_Search_Tree *Bst_Delete(struct Binary_Search_Tree


*node, int val)
{
struct Binary_Search_Tree *successor, *node_delete;
if(node)
{
if(node->val > val)

77
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

node->left = Bst_Delete(node->left, val);


else if(node->val < val)
node->right = Bst_Delete(node->right, val);
else
{
if(node->left == NULL)
{
node_delete = node;
node = node->right;
free(node_delete);
}
else if(node->right == NULL)
{
node_delete = node;
node = node->left;
free(node_delete);
}
else /* use inorder_predecessor every alternate delete for better tree
balancing */
{
successor = inorder_successor_Bst(node->right);
node->val = successor->val;
node->right = Bst_Delete(node->right, successor->val);
}
}
return node;
}
else
{
printf("\nNode to be deleted not found\n");
return NULL;
}
}

AVL TREES:

* It was described in 1962 by two russian mathematicians. Namely


G.M.Adel’son Velskii and E.M.Landis. An AVL tree is a self-
balancing binary search tree, and it is the first such data structure to
78
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

be invented. In an AVL tree, the heights of the two child subtrees of


any node differ by at most one; therefore, it is also said to be height-
balanced. Insertions and deletions may require the tree to be
rebalanced by one or more tree rotations.

The balance factor of a node is the height of its right subtree minus
the height of its left subtree and a node with balance factor 1, 0, or -1
is considered balanced. A node with any other balance factor is
considered unbalanced and requires rebalancing the tree. The balance
factor is either stored directly at each node or computed from the
heights of the subtrees.

Operations:

The basic operations of an AVL tree generally involve carrying out


the same actions as would be carried out on an unbalanced binary

search tree, but preceded or followed by one or more operations


called tree rotations, which help to restore the height balance of the
subtrees.
79
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Insertion:
Pictorial description of how rotations cause rebalancing in an AVL
tree.

Insertion into an AVL tree may be carried out by inserting the given
value into the tree as if it were an unbalanced binary search tree, and
then retracing one's steps toward the root updating the balance factor
of the nodes.

If the balance factor becomes -1, 0, or 1 then the tree is still in AVL
form, and no rotations are necessary.

If the balance factor becomes 2 or -2 then the tree rooted at this node
is unbalanced, and a tree rotation is needed. At most a single or
double rotation will be needed to balance the tree.

There are basically four cases which need to be accounted for, of


which two are symmetric to the other two. For simplicity, the root of
the unbalanced sub tree will be called P, the right child of that node
will be called R, and the left child will be called L. If the balance factor
of P is 2, it means that the right sub tree outweighs the left sub tree of
the given node, and the balance factor of the right child (R) must then
be checked. If the balance factor of R is 1, it means the insertion
occurred on the (external) right side of that node and a left rotation is
needed (tree rotation) with P as the root. If the balance factor of R is -
1, this means the insertion happened on the (internal) left side of that
node. This requires a double rotation. The first rotation is a right
rotation with R as the root. The second is a left rotation with P as the
root.

The other two cases are identical to the previous two, but with the
original balance factor of -2 and the left subtree outweighing the right
subtree.

Only the nodes traversed from the insertion point to the root of the
tree need be checked, and rotations are a constant time operation, and
because the height is limited to O(log(n)), the execution time for an
insertion is O(log(n)).

80
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Deletion:

If the node is a leaf, remove it. If the node is not a leaf, replace it with
either the largest in its left subtree (inorder predecessor) or the
smallest in its right subtree (inorder successor), and remove that
node. The node that was found as replacement has at most one
subtree. After deletion retrace the path back up the tree (parent of the
replacement) to the root, adjusting the balance factors as needed.

The retracing can stop if the balance factor becomes -1 or 1 indicating


that the height of that subtree has remained unchanged. If the balance
factor becomes 0 then the height of the subtree has decreased by one
and the retracing needs to continue. If the balance factor becomes -2
or 2 then the subtree is unbalanced and needs to be rotated to fix it. If
the rotation leaves the subtree's balance factor at 0 then the retracing
towards the root must continue since the height of this subtree has
decreased by one. This is in contrast to an insertion where a rotation
resulting in a balance factor of 0 indicated that the subtree's height
has remained unchanged.

Hash Function

 A hash function maps a key to an index within in a range


 Desirable properties:

 simple and quick to calculate

 even distribution, avoid collision as much as possible

A hash function is used to map a key to an array index (home


address)

search starts from here

insert, retrieve, update, delete all start by applying the hash function
to the key

Some hash functions

 if KeyType is int - key % TSize


81
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

 if KeyType is a string - convert to an integer and then


% T size

Goals for a hash function:

 fast to compute
 even distribution

 cannot guarantee no collisions unless all key values are


known in advance

Hash Table:
A hash table, or a hash map, is a data structure that
associates keys with values. The primary operation it supports
efficiently is a lookup: given a key (e.g., a person's name), find the
corresponding value (e.g., that person's telephone number). It works
by transforming the key using a hash function into a hash, a number
that is used as an index in an array to locate the desired location
("bucket") where the values should be.

 a hash table is an array of size Tsize

 has index positions 0 .. Tsize-1

 two types of hash tables

 open hash table

 array element type is a <key, value> pair

 all items stored in the array

 chained hash table

 element type is a pointer to a linked list of nodes


containing <key, value> pairs

 items are stored in the linked list nodes

 keys are used to generate an array index


82
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

 home address (0 .. Tsize-1)

Basic operation:

A hash table works by transforming the key using a hash function


into a hash, a number that is used as an index in an array to locate the
desired location ("bucket") where the values should be. The number is
normally converted into the index by taking a modulo operation, or
sometimes bit masking is used where the array size is a power of two.
The optimal hash function for any given use of a hash table can vary
widely, however, depending on the nature of the key.

Typical operations on a hash table include insertion, deletion and


lookup (although some hash tables are pre calculated so that no
insertions or deletions, only lookups are done on a live system). These
operations are all performed in amortized constant time, which
makes maintaining and accessing a huge hash table very efficient.

It is also possible to create a hash table statically where, for example,


there is a fairly limited fixed set of input values - such as the value in
a single byte (or possibly two bytes ) from which an index can be
constructed directly (see section below on creating hash tables). The
hash table can also be used simultaneously for tests of validity on the
values that are disallowed.

Collision resolution:
 The problem arises because we have two keys that hash in the same
array entry, a collision. There are two ways to resolve collision:

 Hashing with Chaining: every hash table entry contains a


pointer to a linked list of keys that hash in the same entry

 Hashing with Open Addressing: every hash table entry


contains only one key. If a new key hashes to a table entry
which is filled, systematically examine other table entries
until you find one empty entry to place the new key

83
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Since the intent of hashing is to create unique hash (or location) for
each key, when the hash function produces the same hash for
multiple keys an alternate location must be determined. The process
of finding an alternate location is called collision resolution. A
collision resolution strategy ensures future key lookup operations
return the correct, respective records.

An efficient collision resolution strategy is an important part of any


hash table. Consider the following case, derived using the birthday
paradox, of a hash table containing 1 million indices. Even if a hash
function were to output random indices uniformly distributed over
the array, there is a 95% chance of a collision occurring before it
contains 2500 records.

There are a number of collision resolution techniques, but the most


popular are open addressing and chaining.

Separate chaining:

Hash collision resolved by chaining.

Sometimes called simply chaining or direct chaining, in its simplest


form each slot in the array is a linked list, or the head cell of a linked
list, where the list contains the elements that hashed to the same
location. Insertion requires finding the correct slot, then appending to
either end of the list in that slot; deletion requires searching the list
and removal.

Sorting Techniques:

84
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Introduction:
Sorting is a technique that is used to rearranging the unordered list
either ascending or descending order. The sorting algorithms are very
useful in our real world applications.

Some of the applications are:

1. Entries in telephone books.


2. Dictionaries in alphabetical order.
3. Working with large set of data in computers are facilitated
when the data are stored
The following factors are used to be considered while choosing a
sorting techniques are:
1. Programming time
2. Execution time of the sorting technique
3. Number of comparisons required for sorting the list
4. Main or auxiliary memory space need for the sorting technique
Algorithms are classified under two groups depending on storage of
the entries and speed.
1. Internal Sort
2. External Sort

Internal Sort:
Data are assumed to be in the high speed,r andom access
memory.

External Sort:
Algorithm for sorting large set of data stored on external
devise, that is slower storage.
Different between internal & external sort

Internal sort External sort


1) Data are assumed to be in the high 1)Algorithm for sorting large set
speed random access memory of data stored on external devices
2)Sorting taken place within n locations 2)Not necessarily to be n
locations
3)Quick sort, Heap sort 3)Merge Sort

85
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Radix Sort:

Radix sort is a sorting algorithm that sorts integers by


processing individual digits. Because integers can represent strings of
characters (e.g., names or dates) and specially formatted floating
point numbers, radix sort is not limited to integers. Radix sort dates
back as far as 1887 to the work of Herman Hollerith on tabulating
machines. Radix sort

Radix sort is an algorithm that sorts a list of fixed-size numbers of


length k in O(n · k) time by treating them as bit strings. We first sort
the list by the least significant bit while preserving their relative order
using a stable sort. Then we sort them by the next bit, and so on from
right to left, and the list will end up sorted. Most often, the counting
sort algorithm is used to accomplish the bitwise sorting, since the
number of values a bit can have is small.

Most digital computers internally represent all of their data as


electronic representations of binary numbers, so processing the digits
of integer representations by groups of binary digit representations is
most convenient. Two classifications of radix sorts are least significant
digit (LSD) radix sorts and most significant digit (MSD) radix sorts.
LSD radix sorts process the integer representations starting from the
least significant digit and move towards the most significant digit.
MSD radix sorts work the other way around.

Least significant digit radix sorts:

A least significant digit (LSD) radix sort is a fast stable sorting


algorithm which can be used to sort keys in lexicographic order. Keys
may be a string of characters, or numerical digits in a given 'radix'.
The processing of the keys begins at the least significant digit (i.e., the
rightmost digit), and proceeds to the most significant digit (i.e., the
leftmost digit). The sequence in which digits are processed by a least
significant digit (LSD) radix sort is the opposite of the sequence in
which digits are processed by a most significant digit (MSD) radix
sort.

A least significant digit (LSD) radix sort operates in O(nk) time, where
n is the number of keys, and k is the average key length. You can get
86
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

this performance for variable-length keys by grouping all of the keys


that have the same length together and separately performing an LSD
radix sort on each group of keys for each length, from shortest to
longest, in order to avoid processing the whole list of keys on every
sorting pass.

In other words:

1. Take the least significant digit (or group of bits, both being
examples of radices) of each key.
2. Group the keys based on that digit, but otherwise keep the
original order of keys. (This is what makes the LSD radix sort a
stable sort).
3. Repeat the grouping process with each more significant digit.

The sort in step 2 is usually done using bucket sort or counting sort,
which are efficient in this case since there are usually only a small
number of digits.

An example

1. Original, unsorted list:

170, 45, 75, 90, 2, 24, 802, 66

2. Sorting by least significant digit (1s place) gives:

170, 90, 2, 802, 24, 45, 75, 66

Notice that we keep 2 before 802, because 2 occurred before


802 in the original list, and similarly for pairs 170 & 90 and 45
& 75.

3. Sorting by next digit (10s place) gives:

2, 802, 24, 45, 66, 170, 75, 90

4. Sorting by most significant digit (100s place) gives:

2, 24, 45, 66, 75, 90, 170, 802


87
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

It is important to realize that each of the above steps requires just a


single pass over the data, since each item can be placed in its correct
bucket without having to be compared with other items.

Some LSD radix sort implementations allocate space for buckets by


first counting the number of keys that belong in each bucket before
moving keys into those buckets. The number of times that each digit
occurs is stored in an array. Consider the previous list of keys viewed
in a different way:

170, 045, 075, 090, 002, 024, 802, 066

The first counting pass starts on the least significant digit of each key,
producing an array of bucket sizes:

2 (bucket size for digits of 0: 170, 090)


2 (bucket size for digits of 2: 002, 802)
1 (bucket size for digits of 4: 024)
2 (bucket size for digits of 5: 045, 075)
1 (bucket size for digits of 6: 066)

A second counting pass on the next more significant digit of each key
will produce an array of bucket sizes:

2 (bucket size for digits of 0: 002, 802)


1 (bucket size for digits of 2: 024)
1 (bucket size for digits of 4: 045)
1 (bucket size for digits of 6: 066)
2 (bucket size for digits of 7: 170, 075)
1 (bucket size for digits of 9: 090)

A third and final counting pass on the most significant digit of each
key will produce an array of bucket sizes:

6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)
1 (bucket size for digits of 1: 170)
1 (bucket size for digits of 8: 802)

At least one LSD radix sort implementation now counts the number
of times that each digit occurs in each column for all columns in a
88
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

single counting pass. (See the external links section.) Other LSD radix
sort implementations allocate space for buckets dynamically as the
space is needed.

Insertion Sort:

Insertion sort is a simple sorting algorithm that is relatively efficient


for small lists and mostly-sorted lists, and often is used as part of
more sophisticated algorithms. It works by taking elements from the
list one by one and inserting them in their correct position into a new
sorted list. In arrays, the new list and the remaining elements can
share the array's space, but insertion is expensive, requiring shifting
all following elements over by one. The insertion sort works just like
its name suggests - it inserts each item into its proper place in the
final list. The simplest implementation of this requires two list
structures - the source list and the list into which sorted items are
inserted. To save memory, most implementations use an in-place sort
that works by moving the current item past the already sorted items
and repeatedly swapping it with the preceding item until it is in
place. Shell sort (see below) is a variant of insertion sort that is more
efficient for larger lists. This method is much more efficient than the
bubble sort, though it has more constraints. However, insertion sort
provides several advantages:

 simple implementation
 efficient for (quite) small data sets

 efficient for data sets that are already substantially sorted: the
time complexity is O(n + d), where d is the number of
inversions

 more efficient in practice than most other simple quadratic


(i.e., O(n2)) algorithms such as selection sort or bubble sort: the
average running time is n2/4, and the running time is linear in
the best case

 stable (i.e., does not change the relative order of elements with
equal keys)

89
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

 in-place (i.e., only requires a constant amount O(1) of


additional memory space)

 online (i.e., can sort a list as it receives it)

Algorithm

In abstract terms, every iteration of insertion sort removes an element


from the input data, inserting it into the correct position in the
already-sorted list, until no input elements remain. The choice of
which element to remove from the input is arbitrary, and can be
made using almost any choice algorithm.

Sorting is typically done in-place. The resulting array after k


iterations has the property where the first k entries are sorted. In each
iteration the first remaining entry of the input is removed, inserted
into the result at the correct position, thus extending the result:

The most common variant of insertion sort, which operates on arrays,


can be described as follows:

1. Suppose there exists a function called Insert designed to insert


a value into a sorted sequence at the beginning of an array. It operates
by beginning at the end of the sequence and shifting each element
one place to the right until a suitable position is found for the new
element. The function has the side effect of overwriting the value
stored immediately after the sorted sequence in the array.

2. To perform an insertion sort, begin at the left-most element of


the array and invoke Insert to insert each element encountered into its
correct position. The ordered sequence into which the element is
inserted is stored at the beginning of the array in the set of indices
already examined. Each insertion overwrites a single value: the value
being inserted.

Pseudo code of the complete algorithm follows, where the arrays are
zero-based and the for-loop includes both the top and bottom limits
(as in Pascal):

insertionSort(array A)
90
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

begin
for i := 1 to length[A]-1 do
begin
value := A[i];
j := i-1;
while j ? 0 and A[j] > value do
begin
A[j + 1] := A[j];
j := j-1;
end;
A[j+1] := value;
end;
end;

The advantage of Insertion Sort is that it is relatively simple and


easy to implement.
The disadvantage of Insertion Sort is that it is not efficient to
operate with a large list or input size.
Sort: 34 8 64 51 32 21
34 8 64 51 32 21
The algorithm sees that 8 is smaller than 34 so it swaps.
8 34 64 51 32 21
51 is smaller than 64, so they swap.
8 34 51 64 32 21
Sort: 34 8 64 51 32 21
8 34 51 64 32 21 (from previous slide)
The algorithm sees 32 as another smaller number and
moves it to its appropriate location between 8 and 34.
8 32 34 51 64 21
The algorithm sees 21 as another smaller number and
moves into between 8 and 32.
Final sorted numbers:
8 21 32 34 51 64

Exchange Sorting:

91
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The second class of sorting algorithm that we consider comprises


algorithms that sort by exchanging pairs of items until the sequence
is sorted. In general, an algorithm may exchange adjacent elements as
well as widely separated ones. In fact, since the insertion sorts
considered in the preceding section accomplish the insertion by
swapping adjacent elements, insertion sorting can be considered as a
kind of exchange sort. The reason for creating a separate category for
insertion sorts is that the essence of those algorithms is insertion into
a sorted list. On the other hand, an exchange sort does not necessarily
make use of such a sorted list.

Make n-1 compares of adjacent items, swapping when needed. Do


this n-1 times with one less compare on each pass.

9 5 7 2

Compare 9 to 5; 9 is greater than 5 so swap them

5 9 7 2

Compare 9 to 7, 9 is greater than 7 so swap them

5 7 9 2

Compare 9 to 2, 9 is greater than 2 so swap them

5 7 2 9

Compare 7 to 2, 7 is greater than 2 so swap them

5 2 7 9

Compare 5 to 2, 5 is greater than 2 so swap them

2 5 7 9

End of n-1 pass, sort is done.

Selection Sort:

92
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Selection sort is a sorting algorithm, specifically an in-place


comparison sort. It has O (n2) complexity, making it inefficient on
large lists, and generally performs worse than the similar insertion
sort. Selection sort is noted for its simplicity, and also has
performance advantages over more complicated algorithms in certain
situations. Selection sort

Method:

The algorithm works as follows:

1. Find the minimum value in the list


2. Swap it with the value in the first position
3. Repeat the steps above for remainder of the list (starting at
the second position)

Effectively, we divide the list into two parts: the sub list of items
already sorted, which we build up from left to right and is found at
the beginning, and the sub list of items remaining to be sorted,
occupying the remainder of the array.

Here is an example of this sort algorithm sorting five elements:

64 25 12 22 11

11 25 12 22 64

11 12 25 22 64

11 12 22 25 64

11 12 22 25 64

Quick Sort
Quick sort also referred as partition Exchange sort was
developed by C.A.R.Hoare. It is a sorting algorithm, which performs
very well on larger list than any other sorting methods.
The main idea of the quick sort is to divide the initial unsorted list
into two parts, such that every element in the first list is less than all

93
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

the elements present in the second list. The procedure will repeat
both parts which can be sorted until the sequences reduce to length 1.

The first step of the algorithm requires choosing a pivot value that
will be used to divide big and small numbers. Usually the first
element of the list is chosen as the pivot value. Once the pivot value
has been selected, all elements smaller than the pivot are placed
towards the beginning of the set and all the elements larger than the
pivot are placed to the right.
Recursion steps used in quick sort
1. Choose a pivot value. We take the value of the middle element
as pivot value, but it can be any value, which is in range of
sorted values, even if it doesn't present in the array.
2. Partition. Rearrange elements in such a way, that all elements
which are lesser than the pivot go to the left part of the array
and all elements greater than the pivot, go to the right part of
the array. Values equal to the pivot can stay in any part of the
array. Notice, that array may be divided in non-equal parts.
3. Sort both parts. Apply quick sort algorithm recursively to the
left and the right parts.

Quick sort- algorithm:

Quick(arr,first,Last)
Where arr is an array of n elements, first refers to the first
index of the array and last refers to the of the array.
if first < last then
assign pivot =arr[first]. I = first j=last
repeat while I <j
repeat while arr[i]<=pivot and I <last
increment I by 1
[end of step 4 while loop]
repeat while arr[j]<=pivot and j > first
decrement j by 1
[end of step 7 while loop]
if I < j then
interchange arr[i] and arr[j]
[end of step 10 if loop]
.[end of step 3 while loop]
94
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

interchange arr[first] and arr[j]

Call Quick(arr,first,j-1)
.Call quick(arr,j+1,last)
.[end of step 1 if loop]
.Print the sorted array arr

EXAMPLE

The given Array list is


45 14 62 51 75 96 33 84 20
 Pivot element = 45 .vacant is created in that position.
__ 14 62 51 75 96 33 84 20
 Search from the backward to find < 45 ,and create a vacant
position in that index
20 14 62 51 75 96 33 84 ___
 Search from the backward to find to >= 45,and create a vacant
position in that index
20 14 __ 51 75 96 33 84 62
 Process continues to find correct position for the pivot element
= 45 and it is stored in the location of E[split point],which
divides the array into two sub ranges.
20 14 33 51 75 96 __ 84 62

20 14 33 __ 75 96 51 84 62

20 14 33 45 75 96 51 84 62

20 14 33 45 75 96 51 84 62

20 14 33 45 __ 96 51 84 62

20 14 33 45 62 96 51 84 __

20 14 33 45 62 __ 51 84 96

20 14 33 45 62 51 __ 84 96

20 14 33 45 62 51 75 84 96

95
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Now compare the each value if that left element greater than right
element ,interchange the those values.

20 14 33 45 62 51 75 84 96

Now compare 20 >14 ,the values should be interchange


14 20 33 45 62 51 75 84 96

20 14 33 45 62 51 75 84 96
Now compare 62 > 51 ,the value should be interchange

14 20 33 45 51 62 75 84 96

This array is sorted

Shell Sort:

Shell sort is a sorting algorithm that is a generalization of insertion


sort, with two observations:

 insertion sort is efficient if the input is "almost sorted", and


 insertion sort is typically inefficient because it moves values
just one position at a time.
Founded by Donald Shell and named the sorting algorithm after
himself in 1959.
1st algorithm to break the quadratic time barrier but few years
later, a sub quadratic time bound was proven. Shell sort works by
comparing elements that are distant rather than adjacent elements
in an array or list where adjacent elements are compared.
Shell sort uses a sequence h1, h2, …, ht called the increment
sequence. Any increment sequence is fine as long as h1 = 1 and
some other choices are better than others. It makes multiple passes
through a list and sorts a number of equally sized sets using the
insertion sort.

Shell sort improves on the efficiency of insertion sort by quickly


shifting values to their destination. It is also known as diminishing
increment sort. The distance between comparisons decreases as the
96
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

sorting algorithm runs until the last phase in which adjacent


elements are compared.

Advantage of Shell sort is that it’s only efficient for medium size
lists. For bigger lists, the algorithm is not the best choice. Fastest of
all O (N^2) sorting algorithms.
5 times faster than the bubble sort and a little over twice as fast as
the insertion sort, its closest competitor

Disadvantage of Shell sort is that it is a complex algorithm and it’s


not nearly as efficient as the merge, heap, and quick sorts.

For example, consider a list of numbers like [ 13 14 94 33 82 25 59 94 65 23


45 27 73 25 39 10 ]. If we started with a step-size of 5, we could visualize
this as breaking the list of numbers into a table with 5 columns. This
would look like this:

13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10

We then sort each column, which gives us

10 14 73 25 23
13 27 94 33 39
25 59 94 65 82

45

When read back as a single list of numbers, we get [ 10 14 73 25 23 13 27


94 33 39 25 59 94 65 82 45 ] . Here, the 10 which was all the way at the end,
has moved all the way to the beginning. This list is then again sorted
using a 3-gap sort, and then 1-gap sort (simple insertion sort).

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

void shell(char *items, int count)


{
97
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

register int i, j, gap, k;


char x, a[5];

a[0]=9; a[1]=5; a[2]=3; a[3]=2; a[4]=1;

for(k=0; k < 5; k++) {


gap = a[k];
for(i=gap; i < count; ++i) {
x = items[i];
for(j=i-gap; (x < items[j]) && (j >= 0); j=j-gap)
items[j+gap] = items[j];
items[j+gap] = x;
}
}
}

int main(void)
{

char s[255];

printf("Enter a string:");
gets(s);
shell(s, strlen(s));
printf("The sorted string is: %s.\n", s);

return 0;
}

Shell sort algorithm in pseudocode


input: an array a of length n

inc ← round(n/2)
while inc > 0 do:
for i = inc .. n − 1 do:
temp ← a[i]
j←i
while j ≥ inc and a[j − inc] > temp do:
a[j] ← a[j − inc]
j ← j − inc
a[j] ← temp
inc ← round(inc / 2.2)
98
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Merge Sort:
Merge sort is an O (n log n) comparison-based sorting algorithm. In
most implementations it is stable, meaning that it preserves the input
order of equal elements in the sorted output. It is an example of the
divide and conquer algorithmic.

Divide array into two halves.


Recursively sort each half.
Merge two halves to make sorted whole.

Merging the Elements:

Keep track of smallest element in each sorted half.


Insert smallest of two elements into auxiliary array.

Repeat until done.

Step:1

99
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Step:2

Step:3

Step:4

Step:5
100
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Step:6

Step:7

Step:8

Step9:

101
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Pseudo-code

The following pseudocode sorts a set A consisting of n elements in


increasing order. The array index starts at 0.

for i ← 0 to n-2 do
min ← i
for j ← (i + 1) to n-1 do
if A[j] < A[min]
min ← j
swap A[i] and A[min]

HEAP SORT:
A heap is defined to be a complete binary tree with the
property that the value of each node is at least as large as the value of
its child nodes, if they exist. The root node of the heap has the largest
value in the tree. There are three types of heaps.

They are
 Min heap
 Max heap
A max heap also referred also as descending heap of size n is defined
as a complete binary tree of n node such that the content of each node
is less than or equal to the contents of its parent node.
A min heap also referred as ascending heap of size n is defined as a
complete binary tree of n nodes such that the content of each node is
greater than or equal to the contents of its parent node.

102
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The heap in a sort is nothing but the binary in which the


element s to be sorted are stored. Then, a loop is used to remove each
element of the heap. when an element is removed , it is still part of
the list, and is swapped with the last item of the heap, which is
directly before the item to be discarded in previous iteration. Since
the heap becomes smaller.

Algorithm: Heap Sort


Input: E,an unsorted array ,and n>= 1,the number of elements,the
ranges of indexes is 1 …….n
Output :E,with elements in non decreasing order of their keys.

Void heapsort(Element [ ]E,int n)


Int heaosize;
constructHeap(E,n);
//repeatedly remove root element and rearrange heap.
For(heapsize = n,heapsize >= 2,heapsize --)
Element curmax = E[ 1];
Element K = E[heapsize];
Fixheap(E,heapsize-1,1,k);
E[heapsize] = curmax;
Return;

Example for Heap Sort:


Given Array is 40 90 35 90 45
50 70

Step 1:
Construct the heap:

4
0
103
DMI
9 - St. Eugene
3 university
0 5

9 4 5 7
054CS201 Data Structure and Algorithms

4
0

Step 2 9 3
0 5

9 4 5 7
0 5 0 0

Step 3
9
0

4 3
0 5

Step 4 9 4 5 7
0 5 0 0

9
0

9 3
0 5

104
4 DMI -4St. Eugene
5 university
7
0 5 0 0
054CS201 Data Structure and Algorithms

Step 5

9
0

9
0 7
0

4 4 5 3
5 0 0 5

Min heap: step `1

4
0

9 3
0 5

9 4 5 7
0 5 0 0

Step 2
105
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

4
0

9 3
0 5

9 4 5 7
0 5 0 0

step3
3
5

9 4
0 0

9 4 5 7
Step4 0 5 0 0

3
5

9 4
0 0

9 4 5 7
0 5 0 0

Step5

3 106
5 Eugene university
DMI - St.

4 4
9 9 5 7
5 0
054CS201 Data Structure and Algorithms

External sort:

External sorting is a term for a class of sorting algorithms that can


handle massive amounts of data. External sorting is required when
the data being sorted does not fit into the main memory of a
computing device (usually RAM) and a slower kind of memory
(usually a hard drive) needs to be used.

K-WAY MERGE SORT:


Definition:
A merge sort that sorts a data stream using repeated merges. It
distributes the input into k stream by repeatedly reading a block of
input that fits in memory, called a run, sorting it, then writing it to
the next stream. It merges runs from the k streams into an output
stream. It then repeatedly distributes the runs in the output stream
to the k streams and merges them until there is a single sorted
output.

The procedure of 2-way balanced merge easily extended to the


general k-way balanced merge. Of strings in divided by k.

n-km be the number strings generated in first pass.

The number of merge passes required is given by m=logk n

Balanced merge (2 way ) with 32 strings 4 tape units.

In each of the merge passes, the length of the strings are increased by
a factor of k and number

107
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A B C D De s c riptio n
0 0 16(1) 16(1) Dis tribution pa s s

8(2) 8(2) 0 0 2-wa y me rge


0 0 4(4) 4(4) 2-wa y me rge
2(8) 2(8) 0 0 2-wa y me rge
0 0 1(16) 1(16) 2-wa y me rge
1(32) 0 0 0 2-wa y me rge

Balanced merge sort

Definition: A k-way merge sort in which the number of input and


output data streams is the same.

It is also called as two way merge sort.

• The first pass consists of creating strings of length 2 by


comparing and if necessary, by exchanging pairs of items.
• As a result, the pass generates n/2 strings each of length 2
where n is the number of items in the given file.

• The second pass consists of merging pairs of strings generated


in the previous pass.

• Successive passes are similar to the second pass and in each


pass the number of strings is reduced by half of the number
and the sizes of the strings are doubled.

original file 45 1st pass 2nd pass 3rd pass


5
78 78 5
45
2
45 78 9
5 88 11
5
36
88 88 2 45
9
36 9 78
36
2
108 11
88
9 DMI - St. Eugene university
36
11 11
2
054CS201 Data Structure and Algorithms

Polyphase merge sort

A polyphase merge sort is an algorithm which decreases the number


of runs at every iteration of the main loop by merging runs in pairs.
Typically, a merge sort splits items into groups then recursively sorts
each group. Once the groups are sorted, they are merged into a final,
sorted sequence.

Polyphase merge sorts are ideal for sorting and merging large files.
Two pairs of input and output files are opened as file streams. At the
end of each iteration, input files are deleted; output files are closed
and reopened as input files. The use of file streams makes it possible
to sort and merge files which can not be loaded into the computer's
main memory.

Strings of level

Level Distribution

1 1,1,1
2 2,2,1
3 4,3,2
4 7,6,4
5 13,11,7
……. …….

Polyphase sort with 31 string and 4 tape units:

A B C D Comments
13(1) 11(1) 7(1) 0 Dis tribution Pas s

6(1) 4(1) 0 7(3) Merge pas s 1.3-way merging


2(1) 0 4(5) 3(3) Merge pas s 2 .3-way merging
109
0 2(9) 2(5) 1(3)
DMI - St.Merge pas s 3university
Eugene .3-way merging

1(17) 1(9) 1(5) 0 Merge pas s 4 .3-way merging

0 0 0 1(31) Merge pas s 5 .3-way merging


054CS201 Data Structure and Algorithms

UNIT – V
SYLLABUS:

Files : Files, Queries and sequential organization; Cylinder surface


indexing, Hashed Indexed, Tree Indexing, B-Trees, Trie Indexing,
Sequential file organization, random file organization, Hashed file
organization, Inverted files, cellular partitions.

Files:
File is a collection of records and lines

At the lowest level, many modern operating systems consider files simply
as a one-dimensional sequence of bytes. At a higher level, where the
content of the file is being considered, these binary digits may represent
integer values, text characters, image pixels, audio or anything else.
At any instant in time, a file might have a size, normally expressed as
number of bytes, that indicates how much storage is associated with the file.

10 M Table => 20,000 pages

DBMS unix
DBMS
File System
110 512-byte
Hard disk DMI - St. Eugene university pages
Hard disk
054CS201 Data Structure and Algorithms

File Organization

For example,
A payroll file might contain information concerning all the employees in a
company and their payroll details; each record in the payroll file concerns
just one employee, and all the records have the common trait of being
related to payroll—this is very similar to placing all payroll information into a
specific filing cabinet in an office that does not have a computer. A text file
may contain lines of text, corresponding to printed lines on a piece of paper.
Alternatively, a file may contain an arbitrary binary image (a BLOB) or it may
contain an executable.

Files and folders arranged in a hierarchy


In modern computer systems, files are typically accessed using names
(filenames). In some operating systems, the name is associated with the file
itself.

A File is a collection of related records.

There are 6 basic kinds of files are available


1. Master file  it represents a static view
2. Transaction file  it contain data to add a new record/ remove/
modify an existing record on a master file.
3. Report file  data that are formatted for presentation to a
user
4. Work file  temporary file in a system
5. Program file  it contains instruction
6. Text file  it contain alphanumeric and graphic data
input using a text editor program.

Modes of accessing files:

Input  only read by the program


Output  only written by a program
Input and output  both read from and written to during a program’s
execution

File operations:
1. Creation
2. Update  includes record insertion, modification and deletion
3. Retrieval  includes inquiry, report generation.

111
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

4. Maintenance, including restructuring, reorganization

Queries and sequential organization

Queries
 Software that translates a users requests (phrased in a query
language) into instructions that can be used directly for file or
database access.
 Query will search for words/ phrases in the selected databases.
 A request for information from a database.

Query types
4 types are used
1. Simple query – the value of a single key is specified
2. Range query – a range of values for a single query is specified
3. Functional query – some function of key values in the file is specified
4. Boolean query – a Boolean combination of Q1-Q3 being logical operators
and or not.

Example: Request – query language

 FIND EMP_NAME OF EMP_PAY_RECORD WHERE EMP_NO =


12751
 FIND ALL EMP_NAME, EMP_NO OF EMP_PAY_RECORD
WHERE EMP_DEPT_NAME = “MIS”
 FIND ALL EMP_NAME, EMP_NO OF EMP_PAY_RECORD
WHERE 20,000 < EMP_SAL<40,000.
Where clause gives the qualification criteria.

Sequential organization:

Sequential organization is simplest file organization in COBOL.

In a Sequential file the records are arranged serially, one after another, like
cards in a dealing shoe. The only way to access records in a Sequential file,
is serially. A program must start at the first record and read all the
succeeding records until the required record is found or until the end of the
file is reached.

Sequential files may be Ordered or Unordered (these should be called


Serial files). The ordering of the records in a file has a significant impact on
the way in which it is processed and the processing that can be done on it.

112
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

2 types of sequential file:


1. Serial – access  it takes longer to access the record, less expensive.
2. Direct – access  faster to access the record, more expensive.

Beginning of file

Record I-1

End of file

 Sequential file can be stored on either serial or direct access


devices.
 It is a simplest kind of file structure.
 Unordered: insertion sort
 Ordered : key order
 Simple to maintain
 Provide good performance for processing large numbers of records.

Cylinder – Surface Indexing


In mathematics, specifically in topology, a surface is a two-
dimensional topological manifold

A cylinder is a special case of a quadric surface. Geometrically, a cylinder


is the surface generated by a line that moves so that it is always parallel to a
fixed line and always intersects a fixed curve. Any position of the generating
line is an element of the cylinder and the fixed curve that all of these
elements intersect is the directrix curve.

113
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

If the equation of a surface does not contain one of the variables (x,
y, or z) then the surface is a cylinder with elements parallel to the axis
of that variable and having the curve of intersection by a plane
orthogonal to the axis of that variable as directrix curve. A cylinder
with axis parallel to the z-axis can be described by parametric

equations of the form


A cylinder is circular if the traces in planes perpendicular to the axis
of the cylinder are circles, in particular, the directrix curve is an circle.
The standard form and parametric form of an equation for a circular
cylinder with elements parallel to the z-axis are as follows:

A cylinder is elliptic if the traces in planes perpendicular to the axis of


the cylinder are ellipses, in particular, the directrix curve is an ellipse.
The standard form and parametric form of an equation for an elliptic
cylinder with elements parallel to the z-axis are as follows:

A cylinder is parabolic if the traces in planes perpendicular to the


axis are parabolas, in particular, the directrix curve is a parabola. The
standard form and parametric form of an equation for an elliptic

cylinder with elements parallel to

Following figure is the example for Cylinder Surface-indexing

114
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Cylinder

Disk Addressing
 Fundamental unit (block) of information is the sector (generally a
power of 2 in size)
 Sectors are arranged on tracks on a platter
 If multiple platters, we organize the tracks into cylinders
 We may also organize groups of cylinders to make partitions
 File systems work in terms of logical blocks, So one lower level issue
on mass storage devices is the mapping of logical block address to physical
blocks Platter #, Cylinder # (Track #), Sector #

HASH INDEXED

• A hash index organizes the search keys, with their pointers, into a
hash file.
• Hash indices never primary even though they provide direct access.
Example of Hash Index
Dynamic Hashing
• More effective then static hashing when the database grows or
shrinks
• Extendable hashing splits and coalesces buckets appropriately with
the database size.
– i.e. buckets are added and deleted on demand.

The Hash Function


• Typically produces a large number of values, uniformly and
randomly.
• Only part of the value is used depending on the size of the database.
115
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Data Structure
• Hash indices are typically a prefix of the entire hash value.
• More then one consecutive index can point to the same bucket.
– The indices have the same hash prefix which can be shorter then the
length of the index.

Example of hash index

General Extendable Hash Structure

116
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

TREE INDEXING:

The basic tree-based indexing method is discrimination tree indexing. The


tree reflects exactly the structure of terms. A more complex tree-based
method is abstraction tree indexing. The nodes are labeled with lists of
terms, in a manner that reflects the substitution of variables from a term to
another: the domain of variable substitutions in a node is the codomain of
the substitutions in a subnode (substitutions are mappings from variables to
terms).

117
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

B-Trees

Introduction

A B-tree is a specialized multiway tree designed especially for use on disk.


In a B-tree each node may contain a large number of keys. The number of
subtrees of each node, then, may also be large.

Definitions

A multiway tree of order m is an ordered tree where each node has at most
m children. For each node, if k is the actual number of children in the node,
then k - 1 is the number of keys in the node. If the keys and subtrees are
arranged in the fashion of a search tree, then this is called a multiway
search tree of order m.

For example, the following is a multiway search tree of order 4. Note that
the first row in each node shows the keys, while the second row shows the
pointers to the child nodes. Of course, in any useful application there would
be a record of data associated with each key, so that the first row in each
node might be an array of records where each record contains a key and its
associated data. Another approach would be to have the first row of each
node contain an array of records where each record contains a key and a
record number for the associated data record, which is found in another file.
This last method is often used when the data records are large. The
example software will use the first method.

118
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Then a multiway search tree of order 4 has to fulfill the following conditions
related to the ordering of the keys:

 The keys in each node are in ascending order.


 At every given node (call it Node) the following is true:

 The subtree starting at record Node.Branch[0]


has only keys that are less than Node.Key[0].
 The subtree starting at record Node.Branch[1]
has only keys that are greater than
Node.Key[0] and at the same time less than
Node.Key[1].
 The subtree starting at record Node.Branch[2]
has only keys that are greater than
Node.Key[1] and at the same time less than
Node.Key[2].
 The subtree starting at record Node.Branch[3]
has only keys that are greater than
Node.Key[2].

 Note that if less than the full number of keys are in the Node,
these 4 conditions are truncated so that they speak of the
appropriate number of keys and branches.

This generalizes in the obvious way to multiway search trees with other
orders.

Example B-Tree

The following is an example of a B-tree of order 5. This means that (other


that the root node) all internal nodes have at least ceil(5 / 2) = ceil(2.5) = 3
children (and hence at least 2 keys). Of course, the maximum number of
children that a node can have is 5 (so that 4 is the maximum number of
keys). According to condition 4, each leaf node must contain at least 2 keys.
In practice B-trees usually have orders a lot bigger than 5.

119
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Operations on a B-Tree

Inserting a New Item

When inserting an item, first do a search for it in the B-tree. If the item is not
already in the B-tree, this unsuccessful search will end at a leaf. If there is
room in this leaf, just insert the new item here. Note that this may require
that some existing keys be moved one to the right to make room for the new
item. If instead this leaf node is full so that there is no room to add the new
item, then the node must be "split" with about half of the keys going into a
new node to the right of this one. The median (middle) key is moved up into
the parent node. (Of course, if that node has no room, then it may have to
be split as well.) Note that when adding to an internal node, not only might
we have to move some keys one position to the right, but the associated
pointers have to be moved right as well. If the root node is ever split, the
median key moves up into a new root node, thus causing the tree to
increase in height by one.

Let's work our way through an example similar to that given by Kruse. Insert
the following letters into what is originally an empty B-tree of order 5: C N G
A H E K Q M F W L T Z D P R X Y S Order 5 means that a node can have a
maximum of 5 children and 4 keys. All nodes other than the root must have
a minimum of 2 keys. The first 4 letters get inserted into the same node,
resulting in this picture:

When we try to insert the H, we find no room in this node, so we split it into
2 nodes, moving the median item G up into a new root node. Note that in
practice we just leave the A and C in the current node and place the H and
N into a new node to the right of the old one.

120
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Inserting E, K, and Q proceeds without requiring any splits:

Inserting M requires a split. Note that M happens to be the median key and
so is moved up into the parent node.

121
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

The letters F, W, L, and T are then added without needing any split.

Deleting an Item

In the B-tree as we left it at the end of the last section, delete H. Of course,
we first do a lookup to find H. Since H is in a leaf and the leaf has more than
the minimum number of keys, this is easy. We move the K over where the H
had been and the L over where the K had been. This gives:

122
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Next, delete the T. Since T is not in a leaf, we find its successor (the next
item in ascending order), which happens to be W, and move W up to
replace the T. That way, what we really have to do is to delete W from the
leaf, which we already know how to do, since this leaf has extra keys. In
ALL cases we reduce deletion to a deletion in a leaf, by using this method.

Next, delete R. Although R is in a leaf, this leaf does not have an extra key;
the deletion results in a node with only one key, which is not acceptable for
a B-tree of order 5. If the sibling node to the immediate left or right has an

123
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

extra key, we can then borrow a key from the parent and move a key up
from this sibling. In our specific case, the sibling to the right has an extra
key. So, the successor W of S (the last key in the node where the deletion
occurred), is moved down from the parent, and the X is moved up. (Of
course, the S is moved over so that the W can be inserted in its proper
place.)

Finally, let's delete E. This one causes lots of problems. Although E is in a


leaf, the leaf has no extra keys, nor do the siblings to the immediate right or
left. In such a case the leaf has to be combined with one of these two
siblings. This includes moving down the parent's key that was between
those of these two leaves. In our example, let's combine the leaf containing
F with the leaf containing A C. We also move down the D.

124
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Of course, you immediately see that the parent node now contains only one
key, G. This is not acceptable. If this problem node had a sibling to its
immediate left or right that had a spare key, then we would again "borrow" a
key. Suppose for the moment that the right sibling (the node with Q X) had
one more key in it somewhere to the right of Q. We would then move M
down to the node with too few keys and move the Q up where the M had
been. However, the old left subtree of Q would then have to become the
right subtree of M. In other words, the N P node would be attached via the
pointer field to the right of M's new location. Since in our example we have
no way to borrow a key from a sibling, we must again combine with the
sibling, and move down the M from the parent. In this case, the tree shrinks
in height by one.

125
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Advantages b -tree

 Looking up keys is faster. Looking up a key of length m takes worst


case O(m) time. the simple operations tries use during lookup, such
as array indexing using a character, are fast on real machines.
 Tries can require less space when they contain a large number of
short strings, because the keys are not stored explicitly and nodes
are shared between keys with common initial subsequences.
 Tries facilitate longest-prefix matching, helping to find the key
sharing the longest possible prefix of characters all unique.

TRIE INDEXING

In computer science, a trie, or prefix tree, is an ordered tree data structure


that is used to store an associative array where the keys are usually strings.
Unlike a binary search tree, no node in the tree stores the key associated
with that node; instead, its position in the tree shows what key it is
associated with. All the descendants of a node have a common prefix of the
string associated with that node, and the root is associated with the empty
string. Values are normally not associated with every node, only with leaves
and some inner nodes that correspond to keys of interest.

The term trie comes from "retrieval." Following the etymology, the inventor,
Edward Fredkin, pronounces it [tɹi] ("tree"). However, it is pronounced [tɹaɪ]
("try") by other authors.

In the example shown, keys are listed in the nodes and values below them.
Each complete English word has an (arbitrary) integer value associated with
it. A trie can be seen as a deterministic finite automaton, although the
symbol on each edge is often implicit in the order of the branches.

It is not necessary for keys to be explicitly stored in nodes. (In the figure,
words are shown only to illustrate how the trie works.)

Though it is most common, tries need not be keyed by character strings.


The same algorithms can easily be adapted to serve similar functions of
ordered lists of any construct, e.g., permutations on a list of digits,
permutations on a list of shapes, etc.

One method is to prune from the tree all the branches that do not lead to
any key. In english, for example, there are no words that begin with the
letters ‘bb’, ‘bc’, ‘bf’,’bg’,….., but there are words beginning with ‘ba’, ‘bd’, or

126
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

‘be’. Hence all the brances and nodes for nonexistent words can be
removed from the tree. The resulting tree is called a trie. (This term
originated as letters extracted from the word retrieval, but it is usually
pronounced like the word “try”.

A trie of order m can be defined formally as being either empty or consisting


of an ordered sequence of exactly m tries of order m.

A trie is an m-ary tree. It is not a search tree of order m; that is

To this point we have based our searches through tree structures on


comparisons between key values. A trie is a data structure that uses instead
the representations key values as sequences of characters or digits to guide
searches through the structure. The name trie was suggested by E.Fredkin
(1960) because it is part of information retrieval.

A trie is an m-ary tree .It s not a search tree of order m; that is, the
ordering of key values in the nodes does not conform to the rules of an m-
way search tree. The order of a trie is determined by the radix used to
represent key values. The radix of key values represented by digits is 10;
the radix of key values represented by alphabetic characters is 26 (or 27 if
the space is included).Each node of a trie of order m is essentially a one
dimensional array of m pointers. Each element in the array corresponds to
one of the elements of the radix set. A node in a 10-ary trie is shown in Fig.

Node structure for a 10-ary trie

Note that the node is composed entirely of pointers. The position of a


pointer in the node determines the value to which it corresponds.

The height of a trie is determined by the length of the key field. For a
node on the jth level of a 10 ary trie, Pi points to a sub tree representing all
key values whose jth digit is i.For example, P4 on the sixth level of a 10 –ary
trie points to a sub tree representing all key values whose sixth digit is 4.

127
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Example

Fig. illustrates part of a trie for structuring a collection of 4 digit key values.
This example trie represents the following collection of key values:

2302, 2303, 2305, 2306, 2307, 2308

2350, 2352, 2353, 2355, 2356, 2358, 2359

2581, 2582, 2584, 2585, 2586, 2588, 2589

2590, 2591, 2593, 2594, 2596, 2597, 2599

2900, 2901, 2903, 2904, 2905, 2906, 2907, 2908,


2909

32xx………

35xx………

52xx………

59xx……

7312, 7313, 7315, 7316

7390, 7391, 7393, 7395, 7397, 7399

7490, 7492, 7494, 7496, 7498.

When a trie is used as an index, the leaf nodes contain addresses of data
records with the corresponding key values. A trie can also be used to
represent a collection of values, without a file of associated data. In that
case, the leaf nodes would simply contain placeholders indicating the
presence or absence of a corresponding value.

A node in a trie used to represent alphabetic key values contains 26


pointers. For a node on the jth level of such a trie, pi points to a sub tree
whose jth character is the ith letter of the alphabet. For example P10 on the
third level of the trie points to a sub tree containing all key values whose
third character is J.

128
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Algorithms

We can describe trie lookup (and membership) easily. Given a recursive trie
type, storing an optional value at each node, and a list of children tries,
indexed by the next character, (here, represented as a Haskell data type):

data Trie a =
Trie { value :: Maybe a
, children :: [(Char,Trie a)] }

We can lookup a value in the trie as follows:

find :: String -> Trie a -> Maybe a


find [] t = value t
find (k:ks) t = case lookup k (children t) of
Nothing -> Nothing
Just t’ -> find ks t'

Sequential File Organization


1. A sequential file is designed for efficient processing of records in
sorted order on some search key.
o Records are chained together by pointers to permit fast
retrieval in search key order.
o Pointer points to next record in order.
o Records are stored physically in search key order (or as
close to this as possible).
o This minimizes number of block accesses.
2. It is difficult to maintain physical sequential order as records are
inserted and deleted.
o Deletion can be managed with the pointer chains.
o Insertion poses problems if no space where new record
should go.
o If space, use it, else put new record in an overflow block.
o Adjust pointers accordingly.

Random file organization

 Records are stored at random locations on disks.


 This randomization could be achieved by any one of several
techniques(direct addressing directory lookup hashing)

129
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

Direct addressing -> Equal size records the available disk space is divided
out into nodes large enough to hold a record with primary key =Employee#,
the record for EMP .

Direct Access Index for data


Entry Pointer to :

Record B
510

620 Record D

130
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

750
Record A

Directly look up:

This is very similar to the scheme for the case of direct addressing with
variable size records.

Difference between sequential file and random file

A sequential file is accessed one record at a time, from first to last, in order.
Each record can be of varying length.

A direct (or random) file is accessed in any order, by record number. Each
record must be of identical length.

A sequential file might look like this, with records separated by commas:

bear,skunk,moose,fish,alligator

A direct file with the same data would look (something) like this:

bear_____skunk____moose____fish_____alligator

HASHED FILE ORGANIZATION

Hashing is important not only for addressing into a relative file, but also for
addressing into main memory tables. The hash function is applied to record
keys to calculate the subscripts of corresponding records in the table. Thus

131
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

a record can be found with one probe (unless a collision) ; no table search is
needed.

Hashing can be used in conjunction with directory lookup,

Is the record’s key in the directory?

Yes
Apply hash function to
Access record record’s key

Resolve collisions if
necessary

Store record

Insert record’s key and


address
The performance of a hashing technique can in the directory
be measured
by its proceeding requirements and the I/O needed to store or retrieve a
record. The performance of a particular hash function is dependent upon

1. The distribution of key values that are actually in use.


2. The number of key values that are actually in use relative to
the size of the address space.
3. The number of records that can be stored at a given address
without causing a collision.
4. The collision resolution technique used.

Static Hashing
• Hashing provides a means for accessing data without the use of an
index structure.
• Data is addressed on disk by computing a function on a search key
instead.
Organization

132
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

• A bucket in a hash file is unit of storage (typically a disk block) that


can hold one or more records.
Bucket Overflow
• How does bucket overflow occur?
– Not enough buckets to handle data
– A few buckets have considerably more records then others.
This is referred to as skew.
• Multiple records have the same hash value
• Non-uniform hash function distribution.
– If a bucket is full, link another bucket to it. .
– The system must then check overflow buckets for querying and
updates. This is known as closed hashing.
Alternatives
• Open hashing
– The number of buckets is fixed
– Overflow is handled by using the next bucket in cyclic order
that has space.
This is known as linear probing.

• Advantage: performance does not decrease as the database size


increases
– Space is conserved by adding and removing as necessary
• Disadvantage: additional level of indirection for operations
– Complex implementation

Inverted files

o Providing the linkage because an index & the file of data records are
called inversion.
o It is similar to multilists.

The difference is that which in multilists record with the same key value are
linked together with link

In the case of inverted files this link in for is kept in the index itself

E # index occupation index sex


index

133
DMI - St. Eugene university
054CS201 Data Structure and Algorithms

A,B,C,D,E indicate addresses of the records in the file

Indexes for fully invested file

Cellular partitions

 In order to reduce file search times, the storage media may be


divided into cells. A cell may be an entire disk pack or it may simply
be a cylinder. Lists are localized to lie with in a cell.
 Thus if we had a multilist organization in which the list for key 1=prog
included records on several different cylinders then we could break
this list into several smaller lists were each prog list included only
those records in the same cylinder.
 The index entry for prog will now contain several entries of the type
(addr, length), where addr is a pointer to the start of a list of records
with key1= prog and length is the number of records on this list.
 By doing this, all records in the same cell (ie., cylinder) may be
accessed without moving the read/write heads. In case a cell is a
disk pack then using cellular partitions it is possible to search
different cells in parallel ( provided the system hardware permits
simultaneous reading/ writing from several disk drives).
 It should be noted that in any real situation a judicious combination
of the techniques of this section would be called for i.e., the file may
be inverted on certain keys, ringed on others, and a simple multilist
on yet other keys.

134
DMI - St. Eugene university

You might also like