Data Structure Using C

Download as pdf or txt
Download as pdf or txt
You are on page 1of 236

UNIT-I

Introduction Algorithm Specification, Performance analysis, Performance Measurement. Arrays: Arrays,


Dynamically Allocated Arrays. Structures and Unions. Sorting: Motivation, Quick sort, how fast can we
sort, Merge sort, Heap sort

What is data structure?

A data structure is a data organization, management and storage format that enable efficient access
and modification. a data structure is a collection of data values, the relationships among them, and the
functions or operations that can be applied to the data.

Introduction
Data Structure can be defined as the group of data elements which provides an efficient way of
storing and organizing data in the computer so that it can be used efficiently. Some examples of Data
Structures are arrays, Linked List, Stack, Queue, etc. Data Structures are widely used in almost every aspect
of Computer Science i.e. operating System, Compiler Design, Artificial intelligence, Graphics and many
more.
Data Structures are the main part of many computer science algorithms as they enable the
programmers to handle the data in an efficient way. It plays a vital role in enhancing the performance of
software or a program as the main function of the software is to store and retrieve the user's data as fast as
possible

Basic Terminology
Data structures are the building blocks of any program or the software. Choosing the appropriate data
structure for a program is the most difficult task for a programmer. Following terminology is used as far as
data structures are concerned

Data: Data can be defined as an elementary value or the collection of value.


For example, student's name and its id are the data about the student.

Group Items: Data items which have subordinate data items are called Group item.
For example, name of a student can have first name and the last name.

Record: Record can be defined as the collection of various data items.


For example, if we talk about the student entity, then its name, address, course and marks can be
grouped together to form the record for the student.

File: A File is a collection of various records of one type of entity,


For example, if there are 60 employees in the class, then there will be 20 records in the related file
where each record contains the data about each employee.

Attribute and Entity: An entity represents the class of certain objects. it contains various attributes. Each
attribute represents the particular property of that entity.

Field: Field is a single elementary unit of information representing the attribute of an entity.

N Venkata Vinod Kumar 1


Need of Data Structures
As applications are getting complex and amount of data is increasing day by day, there may arise the
following problems:

Processor speed: To handle very large amount of data, high speed processing is required, but as the data is
growing day by day to the billions of files per entity, processor may fail to deal with that much amount of
data.

Data Search: Consider an inventory size of 106 items in a store, If our application needs to search for a
particular item, it needs to traverse 106 items every time, results in slowing down the search process.

Multiple requests: If thousands of users are searching the data simultaneously on a web server, then there
are the chances that a very large server can be failed during that process. In order to solve the above
problems, data structures are used. Data is organized to form a data structure in such a way that all items are
not required to be searched and required data can be searched instantly.

Advantages of Data Structures

Efficiency: Efficiency of a program depends upon the choice of data structures.


For example: suppose, we have some data and we need to perform the search for a particular record.
In that case, if we organize our data in an array, we will have to search sequentially element by element.
Hence, using array may not be very efficient here. There are better data structures which can make the search
process efficient like ordered array, binary search tree or hash tables.

Reusability: Data structures are reusable, i.e. once we have implemented a particular data structure, we can
use it at any other place. Implementation of data structures can be compiled into libraries which can be used
by different clients.

Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The client
program uses the data structure through interface only, without getting into the implementation details.

Data Structure Classification

N Venkata Vinod Kumar 2


Linear Data Structures: A data structure is called linear if all of its elements are arranged in the linear
order. In linear data structures, the elements are stored in non-hierarchical way where each element has the
successors and predecessors except the first and last element.

Types of Linear Data Structures:

Arrays: An array is a collection of similar type of data items and each data item is called an element of the
array. The data type of the element may be any valid data type like char, int, float or double.

The elements of array share the same variable name but each one carries a different index number
known as subscript. The array can be one dimensional, two dimensional or multidimensional.

The individual elements of the array age are:

age[0], age[1], age[2], age[3],......... age[98], age[99].

Linked List: Linked list is a linear data structure which is used to maintain a list in the memory. It can be
seen as the collection of nodes stored at non-contiguous memory locations. Each node of the list contains a
pointer to its adjacent node.

Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called top.
A stack is an abstract data type (ADT), can be implemented in most of the programming languages. It
is named as stack because it behaves like a real-world stack.
For example: - piles of plates or deck of cards etc.

Queue: Queue is a linear list in which elements can be inserted only at one end called rear and deleted only
at the other end called front.

It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows First-In-
First-Out (FIFO) methodology for storing the data items.

Non Linear Data Structures: This data structure does not form a sequence i.e. each item or element is
connected with two or more other items in a non-linear arrangement. The data elements are not arranged in
sequential structure.

Types of Non Linear Data Structures:

Trees: Trees are multilevel data structures with a hierarchical relationship among its elements known as
nodes. The bottommost nodes in the hierarchy are called leaf node while the topmost node is called root
node. Each node contains pointers to point adjacent nodes.

Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree can have
more than one child except the leaf nodes whereas each node can have at most one parent except the root
node. Trees can be classified into many categories.

N Venkata Vinod Kumar 3


Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by vertices)
connected by the links known as edges. A graph is different from tree in the sense that a graph can have cycle
while the tree cannot have the one.

Operations on data structure

1) Traversing: Every data structure contains the set of data elements. Traversing the data structure means
visiting each element of the data structure in order to perform some specific operation like searching or
sorting.
Example: If we need to calculate the average of the marks obtained by a student in 6 different subject,
we need to traverse the complete array of marks and calculate the total sum, then we will divide that sum by
the number of subjects i.e. 6, in order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the data structure at any
location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion: The process of removing an element from the data structure is called Deletion. We can delete an
element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is called Searching.
There are two algorithms to perform searching, Linear Search and Binary Search.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are
many algorithms that can be used to perform sorting, for example, insertion sort, selection sort, bubble sort,
etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements,
clubbed or joined to produce the third list, List C of size (M+N), then this process is called merging

Characteristics of a Data Structure


 Correctness − Data structure implementation should implement its interface correctly.
 Time Complexity − Running time or the execution time of operations of data structure must be as
small as possible.
 Space Complexity − Memory usage of a data structure operation should be as little as possible.

Algorithm
An algorithm is a procedure having well defined steps for solving a particular problem. Algorithm is
finite set of logic or instructions, written in order for accomplish the certain predefined task. It is not the
complete program or code, it is just a solution (logic) of a problem, which can be represented either as an
informal description using a Flowchart or Pseudo code.

The major categories of algorithms are given below:


 Sort: Algorithm developed for sorting the items in certain order.
 Search: Algorithm developed for searching the items inside a data structure.
 Delete: Algorithm developed for deleting the existing element from the data structure.
 Insert: Algorithm developed for inserting an item inside a data structure.
 Update: Algorithm developed for updating the existing element inside a data structure.

N Venkata Vinod Kumar 4


The performance of algorithm is measured on the basis of following properties:

Time complexity: It is a way of representing the amount of time needed by a program to run to the
completion.
Space complexity: It is the amount of memory space required by an algorithm, during a course of its
execution. Space complexity is required in situations when limited memory is available and for the multi user
system.
Each algorithm must have:
Specification: Description of the computational procedure.
Pre-conditions: The condition(s) on input.
Body of the Algorithm: A sequence of clear and unambiguous instructions.
Post-conditions: The condition(s) on output.

Example: Design an algorithm to multiply the two numbers x and y and display the result in z.

Step 1: START
Step 2: declare three integers x, y & z
Step 3: define values of x & y
Step 4: multiply values of x & y
Step 5: stores the output of step 4 in z
Step 6: print z
Step 7: STOP

Alternatively the algorithm can be written as?

Step 1 START MULTIPLY


Step 2 get values of x & y
Step 3 z← x * y
Step 4 display z
Step 5 STOP

Characteristics of an Algorithm
An algorithm must follow the mentioned below characteristics:

 Input: An algorithm must have 0 or well defined inputs.


 Output: An algorithm must have 1 or well defined outputs, and should match with the desired
output.
 Feasibility: An algorithm must be terminated after the finite number of steps.
 Independent: An algorithm must have step-by-step directions which is independent of any
programming code.
 Unambiguous: An algorithm must be unambiguous and clear. Each of their steps and input/outputs
must be clear and lead to only one meaning.

N Venkata Vinod Kumar 5


Algorithm:
Algorithm is a finite set of instructions that is followed, accomplishes a particular task.
(OR)
An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required
output for any legitimate (lawful, legal, reasonable genuine) input in a finite amount of time.

There is something or someone capable of understanding and following the instructions given. We call this a
"computer".

All algorithms must satisfy the following criteria:


1. Input: It may be zero or more quantities are externally supplied. But input not necessary for all
Algorithms.
2. Output: At least one quantity is produced. It is must for all the algorithms
3. Definiteness: Each instruction is clear and unambiguous.
4. Finiteness: The instruction (Steps) of an algorithm must be finite. Means Algorithm terminate after
finite number of steps (Instructions).
5. Effectiveness: Every instruction must be very basic so that it can be carried out, in principle, by a
person using only pencil and paper. It is not enough that each operation be definite, it also must be
feasible (possible or realistic).
The study of algorithms includes many important and active areas of research. There are four distinct areas of
study one can identify:

1. How to devise algorithms -


2. How to validate algorithms
3. How to Design algorithms
4. How to testing algorithms
Example for algorithm:
Write an algorithm to add two numbers entered by user.
Step 1: Start.
Step 2: Declare variables num1, num2 and sum.
Step 3: Read values num1 and num2.
Step 4: Add num1 and num2 and assign the result to sum.
sum←num1+num2.
Step 5: Display sum.
Step 6: Stop.

N Venkata Vinod Kumar 6


Algorithm specification.

Pseudocode Conventions
 Algorithm can be described in many ways like English for specifying algorithm in step by step and
Flowchart for graphical representation.
 But these two ways work well only if algorithm is small or simple. For large or big algorithm we are
going to use pseudo code.
 Pseudo code is a kind of structured English for describing algorithms.
 It allows the designer to focus on the logic of the algorithm without being distracted (diverted) by
details of language syntax.
 At the same time, the pseudo code needs to be complete. It describes the entire logic of the algorithm
so that implementation becomes a rote mechanical task of translating line by line into source code.

NOTE: Alternatively, we can express the same algorithm in a pseudocode:

Example for pseudocode:

ALGORITHM Sum(num1,num2)
Input: read num1,num2
Output: write sum of num1&num2
{
Read num1,num2;
Sum = num1+num2;
Write sum;
}
Pseudo code conversion:
1. “ // ” Is used to provide comment line.
2. Using of matching parenthesis “{}” to form blocks. A compound statement (i.e., Collection of simple
statements) can be represented as a block.
3. An identifier begins with a letter. Data types of variables are not explicitly declared. But types will be
clear from the context whether a variable is global or local to a procedure.
For example:
node=record
{
datatype_1 data_1;
.
.
datatype_n data_n;
node *link;
}
4. Assignment of values to variables is done using assignment statement.
<variable> := <expression>;
5. There are
 Two Boolean values true and false.
 Logical operator “and, or and not” instead of “&, ||, !” respectively.
 Relation operator “<, ≤, =, ≠, ≥ and >”.

N Venkata Vinod Kumar 7


6. Elements of multidimensional arrays are accessed using [ and ]. For example, if A is a two
dimensional array, the (i, j)th element of the array is denoted as A[i, j]. Array indices (index) start at
zero.
7. Looping statement are employed: while , repeat until and for as like below
General representation Pseudo code representation
While (condition){ While <condition> do {
Stmt1; <stmt 1>;
Stmt n; <stmt n>;
} }
do{ repeat {
Stmt1; <stmt 1>;
Stmt n; <stmt n>;
} Until <condition>
While(condition); }
for( initialization;condition;expression) for variable :=value1 to n step step do
{ {
Stmt1; <stmt 1>;
Stmt n; <stmt n>;
} }

8 A conditional statement: if and switch has the following forms:


General form Pseudo code form
if (condition) If <condition> then
stmt1; <Stmt 1>:
else else
stmt 2 <stmt 2>;
Switch (condition){ case
Case label: stmt1; {
break; :<condition 1> : <stmt1>
. .
default: stmt; :<condition n>: <stmt n>
} :else: <stmt n+1>
}

9 Input and output are done using the instructions read and write. No format is used to specify the size
of input or output quantities.

10 There is only one type of procedure: Algorithm. An algorithm consists of a heading and a body. The
heading takes the form

Algorithm Name (<parameter list>)

N Venkata Vinod Kumar 8


Algorithm for finds & returns the maximum of n given numbers
In general way (program) In algorithm (pseudo code)
A[n]={9,3,…}; Algorithm max(A, n)
Result=A[0]; // A is an array of size n
for (i=1;i<=n-1;i++){ {
if A[i] > Result; Result :=A[1];
Result =A[i]; for i:=2 to n do{
} if A[i] > Result; then Result :=A[i];
printf ( Result); return Result;
}
Algorithm for selection sorting
Program Algorithm
main(){ Algorithm SelectionSort(a,n)
int s, i, j, temp, a[20]; // sor the array a of n-elements.
for (i=0; i<s; i++){ {
for(j=i+1; j<s; j++){ for i:=1 to n do {
temp=a[i]; j:=i;
a[i]=a[j]; for k:=i+1 to n do
a[j]=temp; if(a[k]<a[j] then j:=k;
} t:=a[i];
} a[i];=a[j];
} a[j]:=t; }}
Recursive Algorithms:
A function is called itself then it called recursive function.
Void main()
{
Fun1();// main() calls Fun1
}
Void Fun1()
{
Fun1();
}
For example
// a part of c- program
Void main(){
int i=5;
printf(fact(i));
}
int fact(int i)
{
if(i<=1)
{
return 1;
}
return i*fact(i-1); }

N Venkata Vinod Kumar 9


Similarly an algorithm is said to be recursive if the same algorithm is invoked in the body. There are two
types of Recursive algorithms.
 Direct recursive algorithm an algorithm that calls itself is directive recursive.

 Indirect recursive algorithm an algorithm, if it calls another algorithm is indirect


recursive. Means algorithm A is said to be indirect recursive if it calls
another algorithm which in turn calls A.
Example for recursive algorithm:
Towers of Hanoi Problem simple rules:
 Only one disk can be moved at a time.
 Each move consists of taking the upper disk from one of the stacks and placing it on top of another
stack i.e. a disk can only be moved if it is the uppermost disk on a stack.
 No disk may be placed on top of a smaller disk.

A very elegant (Style or design or dress) solution for this tower of Hanoi by using Recursion.

For n disks, total 2n – 1 moves are required

Algorithm for solving “Towers of Hanoi” by using recursion algorithm:

Algorithm TowersOfHanoi(n, x, y, z)
//Move the top n disks form Tower x to Tower y
{
if(n≥1) then
TowerOfHanoi(n-1, x, z,y);
write(“Move top disk form Tower x to top of tower y);
TowerOfHanoi(n-1,z,y,x);
}
}

N Venkata Vinod Kumar 10


Explanation:
Assume that the number of disks in n. To get the largest disk to the bottom of Tower y, we move the
remaining n-1 disks to tower z. And then move the largest disk form tower x to tower b.
Now tower y has largest disk and tower x is empty and tower z has n-1 in decreasing order.
Here we left the remaining task that is moving n-1 disks form tower Z to tower Y. For this we use same
procedure by using tower x is intermediate storage.

For example:
Input: if number of disks n= 2
Output: Total number of moves =2n-1
22-1=3 moves as shown below
S-1: Disk 1 moved from A to B
S-2: Disk 2 moved from A to C
S-3: Disk 1 moved from B to C
Similarly for number of disks =3
Input : 3
Output :
S-1: Disk 1 moved from A to C
S-2: Disk 2 moved from A to B
S-3: Disk 1 moved from C to B
S-4: Disk 3 moved from A to C
S-5: Disk 1 moved from B to A
S-6: Disk 2 moved from B to C
S-7: Disk 1 moved from A to C

Performance Analysis
Introduction:
Analyze Algorithms or performance analysis:
If an algorithm is executed, it used the
 Computer’s CPU  to perform the operations.
 Memory (both RAM and ROM)  to hold the program & data.
Analysis of algorithms is the task of determining how much computing time and storage memory required
for an algorithm.

They are many criteria upon which we can judge an algorithm to say it perform well.
1) Does it do what we want it to do?
2) Does it work correctly according to the original specifications of the task?
3) Is there documentation that describes how to use it and how it works?
4) Are procedures created in such a way that they perform logical sub functions?
5) Is the code readable?

Performance evaluation has two major phases


 Priori estimatesPerformance analysis
 Posterior testing Performance measurements

N Venkata Vinod Kumar 11


Space Complexity:
Definition:
The space complexity of an algorithm is the amount of memory it needs to run to completion.

The space needed by algorithm is the sum of following components

 A fixed partthat is independent of the characteristics of input and output. Example Number & size.
In this includes instructions space [space for code] + Space for simple variables + Fixed-size
component variables + Space for constant & soon.

 A variable part that consists of the space needed by component variables whose size is dependent
on the particular problem instance being solved + The space needed by referenced variables +
Recursion stack space.

The space requirement S(P) of any algorithm P can be written as

S(P)=C+ SP
CConstant
SP Instance characteristics.

Find space complexity of iterative Find space complexity for Recursive algorithm:
Algorithm of sum of ‘n’ numbers.
Algorithm: Algorithm:
Algorithm sum(a,n) Algorithm RSUM(a,n)
//Here a is array of Size n //a is an array of size n
{ {
S:=0; If(n≤0) then return 0;
if(n≤0)) then return 0; Else return a[n]+RSUM(a,n-1);
for i:=1 to n do }
s:=s+a[i];
return s;
}
Space complexity S(P): Space Complexity:
s1 word The recursion stack space includes space for the
i1 word formal parameters, the local variables and the return
n1 word S(P)≥ n+3 address.
an word Return address requires1 word memory
Each call to RSUM requires at least 3words (It
Word is a String of bits stored in computer includes space for the values of n, the return address
memory, its size is a 4 to 64 bits. and a pointer to a[]).
The depth recursion is n+1
The recursion stack space needed is ≥3(n+1)

N Venkata Vinod Kumar 12


Time complexity T(P):

Definition:
Time complexity of an algorithm is the amount of computer time it needs to run to completion.
Here RUN means Compile + Execution.
Time Complexity
T(P)=tc+tp

But we are neglecting t c Because the compile time does not depends on the instance characteristics. The
compiled program will be run several times without recompilation.

So T(P)= tp
Here tp  instance characteristics.

For example:
The program p do some operations like ADD, SUB, MUL etc.
If we knew the characteristics of the compiler to be used, we could process to determine the number of
additions, subtractions, multiplications, divisions, compares, loads, stores and so on.
We obtain tp(n) express as follow:
tp(n)=CaADD(n)+ CSSUB(n)+ CmMUL(n)+ CdDIV(n)+…........

n Instance characteristics
Ca, CS, Cm, Cd, …...  time needed for an addition, Subtraction, multiplication, division, and etc.

ADD, SUBB, MUL, DIV  Are functions, whose values are the numbers of additions, subtractions,
multiplications,.. etc, that are performed when the code for p is used on
an instance with characteristics.
Performance Measurement
Program step:Program step is the syntactically or semantically meaningful segment of a program. And it
has an execution time that is independent of the instance characteristics.

Example:
 For comment//-- zero steps
 For assignment statements (Which does not involve any calls to other algorithms)
:= one step
 For iterative statements such as “for, while and until-repeat” statements, we consider the step counts
only for control part of the statement.
For while loop “while (<expr>) do “ : the step count for control part of a while stmt is 
Number of step counts for assignable to <expr>

For for loop ie for i:=<expr> to <expr1> do: The step count of control part of “for” statement is
Sum of the count of <expr> & <expr1> N and remaining execution of the “for” statement, i.e., one.

We determine the number of steps needed by a program to solve a particular problem. For this there
are two methods.

N Venkata Vinod Kumar 13


1) First method is, introduce a new variable “count” in to the program for finding number of steps in
program.

2) Second method is, building a “table” in which we list the total number of steps contributed by each
statement.

Example for 1st method:

Find time complexity of Iterative algorithm Find time complexity for Recursive algorithm:
of sum of ‘n’ numbers.
Algorithm: Algorithm:
Algorithm sum(a,n) Algorithm RSUM(a,n)
{ {
// count is global it is initially zero Count :=count+1; // for if condition
S:=0; If(n≤0) then
Count:=count+1; // count for assignment {
for i:=1 to n do Count:=count+1; // for return stmt
{ return 0;
Count :=count+1; //for “for” loop }
s:=s+a[i]; Else {
count :=count+1; //for assingment Count :=count+1;
} //for the addition, function invocation & return
Count :=count+1; //for last time of for loop return a[n]+RSUM(a,n-1);
Count :=count+1; // for return stmt }
return s; }
}

If n=0 then tRsum(0)=2


If n>0 then increases by 2, ie., 2+ t Rsum(n-1)
Means
Finally count values is =2n+3; tRsum(n)=2+ tRsum(n-1)
So total number of steps= 2n+3 =2+2+ tRsum(n-2)
=2+2+2+ tRsum(n-3)
.
.
=2(n)+ tRsum(n-n)
=2n+ tRsum(0)=2n+2

In the above example the recursive algorithm has less time complexity then iterative algorithm.

N Venkata Vinod Kumar 14


Example for 2nd method:

Find time complexity of Algorithm of sum of ‘n’ numbers.


Statements s/e Frequency Total steps
1. Algorithm sum(a,n) 0 _ 0
2. { 0 _ 0
3. S:=0; 1 1 1
4. for i:=1 to n do 1 n+1 n+1
5. s:=s+a[i]; 1 n n
6. return s; 1 1 1
7. } 0 _ 0

2n+3

Find time complexity for Recursive algorithm


Statements s/e Frequency Total steps
n=0 n>0 n=0 n>0
1. Algorithm RSUM(a,n) 0 _ _ _ _
2. { 0 _ _ _ _
3. If(n≤0) then 1 1 1 1 1
4. return 0; 1 1 0 1 0
5. Else 0
6. return a[n]+RSUM(a,n-1); 1+x 0 1 0 1+x
7. } 0 _ _ 0 0

2 2+x

Here x= tRsum(n-1)
Asymptotic Notation:
A problem may have numerous (many) algorithmic solutions. In order to choose the best algorithm
for a particular task, you need to be able to judge how long a particular solution will take to run.

Asymptotic notation of an algorithm is a mathematical representation of its complexity

Asymptotic notation is used to judge the best algorithm among numerous algorithms for a particular
problem.
Asymptotic complexity is a way of expressing the main component of algorithms like
 Cost
 Time complexity
 Space complexity
Some Asymptotic notations are
1. Big ohO
2. OmegaΩ
3. Theta θ
4. Little oho
5. Little Omegaω

N Venkata Vinod Kumar 15


1. Big - Oh Notation (O)

Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity.

That means Big - Oh notation always indicates the maximum time required by an algorithm for all input
values. That means Big - Oh notation describes the worst case of an algorithm time complexity.

Big - Oh Notation can be defined as follows...

The function f(n) =O(g(n)) (read as “f of n is big oh of g of n) iff (if and only if) there exit positive
constants c and n0 such that f(n)<=c*g(n) for all n, n>=n0
f(n)=O(g(n))

Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis

In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates the
algorithm's upper bound.

Example

Consider the following f(n) and g(n)...


f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as O(g(n)) then it must satisfy f(n) <= C x g(n) for all values of C > 0 and n0>=
1

f(n) <= C g(n)


⇒3n + 2 <= C n

Above condition is always TRUE for all values of C = 4 and n >= 2.


By using Big - Oh notation we can represent the time complexity as follows...
3n + 2 = O(n)

N Venkata Vinod Kumar 16


2. Big - Omega Notation (Ω)

Big - Omega notation is used to define the lower bound of an algorithm in terms of Time Complexity.

That means Big - Omega notation always indicates the minimum time required by an algorithm for all input
values. That means Big - Omega notation describes the best case of an algorithm time complexity.

Big - Omega Notation can be defined as follows...

The function f(n) = Ω(g(n)) (read as “f of n is omega of g of n) iff (if and only if) there exit positive
constants c and n0 such that f(n)>=c*g(n) for all n, n>=n0 f(n) =
Ω(g(n))

Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis

In above graph after a particular input value n0, always C x g(n) is less than f(n) which indicates the
algorithm's lower bound.

Example

Consider the following f(n) and g(n)...


f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C > 0 and n0>= 1

f(n) >= C g(n)


⇒3n + 2 <= C n

Above condition is always TRUE for all values of C = 1 and n >= 1.


By using Big - Omega notation we can represent the time complexity as follows...
3n + 2 = Ω(n)

N Venkata Vinod Kumar 17


3. Big - Theta Notation (Θ)

Big - Theta notation is used to define the average bound of an algorithm in terms of Time Complexity.

That means Big - Theta notation always indicates the average time required by an algorithm for all input
values. That means Big - Theta notation describes the average case of an algorithm time complexity.

Big - Theta Notation can be defined as follows...

The function f(n) = Θ (g(n)) (read as “f of n is theta of g of n) iff (if and only if) there exist positive
constants c1,c2 and n0 such thatc1*g(n)<= f(n)<=c2*g(n) for all n, n>=n0
f(n) = Θ(g(n))

Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis

In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is greater
than f(n) which indicates the algorithm's average bound.

Example

Consider the following f(n) and g(n)...


f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) >= C2 g(n) for all values of C1,
C2 > 0 and n0>= 1

C1 g(n) <= f(n) >= ⇒C2 g(n)


C1 n <= 3n + 2 >= C2 n

Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 1.


By using Big - Theta notation we can represent the time complexity as follows...
3n + 2 = Θ (n)

N Venkata Vinod Kumar 18


1. Little oh: o
The function f(n)=o(g(n)) (read as “f of n is little oh of g of n”) iff
Lim f(n)/g(n)=0 for all n, n≥0
n~
Example:
3n+2= o(n2) as

Lim ((3n+2)/n2)=0
n~

2. Little Omega:ω
The function f(n)= ω (g(n)) (read as “f of n is little ohomega of g of n”) iff

Lim g(n)/f(n)=0 for all n, n≥0


n~
Example:
3n+2= o(n2) as

Lim (n2/(3n+2) =0
n~

Graph for visualize the relationships between these notations

Performance Measurement

Order of Growth

 Any algorithm expected to do work fast for any input size


 For example: sorting of elements (n=5) when the input size is small the algorithm works fine
 When we increases the size of input we can analyze how can our algorithm works for example :when
n=5000 elements how our algorithm works
 Order of growth means how your algorithm perform accroding to the input size.when the input
change then algorithm performance is also change the calculation of this performance is analysis of
algorithm.

N Venkata Vinod Kumar 19


For example: Linear Search

The simplest and most obvious way to search a table is to use Linear search i.e. examine the 1st, 2nd,
3rd,...entries until the entry with the required key is found or the end of the table is reached without the entry
being found. The body of the search function could be written as follows if Linear search is used:

int found;
int i;
found = FALSE;
i = 0;
while (!found && i<n)
{
if (!strcmp(word, table[i].key)) // strcmp returns zero
// if strings are same
{
found = TRUE;
index = i;
}
else i++;
}
return found;

 The efficiency of Linear search is now considered. In assessing the efficiency of an algorithm it is
usual to count the number of occurrences of some typical operation as a function of the size of the
problem.
 In searching the major operation is the number of comparisons of the searched-for key against the
table entries and the problem size is taken to be the number of entries in the table. Hence in Linear
search of a table with n entries we have:

Best Case:

Is the minimum number of steps that can be achieved for given parameters.

Find at first place - one comparison

Worst Case

Is the maximum number of steps that can be achieved for given parameters

Find at nth place or not at all - n comparisons

Average Case

Average case is the situation in which it is not either best case or worst case

Element is found somewhere in the middle of the list

Probability of successful search is p

Probability of unsuccessful search is 1-p

N Venkata Vinod Kumar 20


Consider the given element is found at position ‘i’ then the probability of finding that element ‘i’ is p/n

C Avg(n) = [1*p/n+2*p/n+3*p/n+….+n*p/n]+n[1-p]

Successful search unsuccessful search

= p/n[1+2+3+4+……….+n]+n[1-p]

=p/n [n (n+1)/2]+n[1-p]

=p(n+1)/2 +n[1-p]

C Avg(n)= p(n+1)/2 +n[1-p]

For successful search p=1 the above equation can be written as

C Avg(n)= 1(n+1)/2 +n[1-1]

= n+1/2

For unsuccessful search p=0 the above equation can be written as

C Avg(n)= 0(n+1)/2 +n[1-0]

=n

Hence Linear Search is an order(n) process - i.e. it takes a time proportional to the number of entries. For
example if n was doubled then, on average, the search would take twice as long.

 Performance measurement is the process of executing a correct program on different data sets to
measure the time and space that it takes to compute the results. Complexity of a program is generally
some function of the instance characteristics

 The ultimate test is performed to ensure that the program developed on the basis of the designed
algorithm, runs satisfactorily. Testing a program involves two phases viz., debugging and profiling.

N Venkata Vinod Kumar 21


 Debugging is the process of executing a program with sample datasets to determine if the results
obtained are satisfactory.

 However, it is pointed out that “debugging can only indicate the presence of errors but not their
absence” i.e., a program that yields unsatisfactory results on a sample data set is definitely faulty, but
on the other hand a program producing the desirable results on one/more data sets need not be correct.
In order to actually prove that a program is perfect, a process of “proving” is necessary wherein the
program is analytically proved to be correct and in such cases, it is bound to yield perfect results for
all possible sets of data.

 On the other hand, profiling or performance measurement is the process of executing a correct
program on different data sets to measure the time and space that it takes to compute the results. That
several different programs may do a given job satisfactorily.

 This is the final stage of algorithm evaluation. A question to be answered when the program is ready
for execution, (after the algorithm has been devised, made a priori analysis of that, coded into a
program debugged and compiled) is how do we actually evaluate the time taken by the program?
Obviously, the time required to read the input data or give the output should not be taken into
account. If somebody is keying in the input data through the keyboard or if data is being read from an
input device, the speed of operation is dependent on the speed of the device, but not on the speed of
the algorithm. So, we have to exclude that time while evaluating the programs

 The entire procedure explained above is called “profiling”. However, unfortunately, the times
provided by the system clock are not always dependable. Most often, they are only indicative in
nature and should not be taken as an accurate measurement. Especially when the time durations
involved are of the order of 1-2 milliseconds, the figures tend to vary often between one run and the
other, even with the same program and all same input values .

Common Asymptotic Notations

constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)

ARRAYS

One Dimensional array

Array is a container which can hold a fix number of items and these items should be of the same type. Most
of the data structures make use of arrays to implement their algorithms. Following are the important terms to
understand the concept of Array.

N Venkata Vinod Kumar 22


 Element − each item stored in an array is called an element.
 Index − each location of an element in an array has a numerical index, which is used to identify the
element.

Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's take C array declaration.

As per the above illustration, following are the important points to be considered.
 Index starts with 0.
 Each element can be accessed via its index. For example, we can fetch an element at index 6 as 9.

Basic Operations
 Traverse − print all the array elements one by one.
 Insertion − Adds an element at the given index.
 Deletion − Deletes an element at the given index.
 Search − Searches an element using the given index or by the value.
 Merging -- two arrays
 Sorting an array in ascending order or descending order

In C, when an array is initialized with size, then it assigns defaults values to its elements in following order.

Data Type Default Value

bool false

char 0

int 0

float 0.0

double 0.0f

void

wchar_t 0

N Venkata Vinod Kumar 23


Traverse Operation
Traversing Linear Structures. Traversing a linear structure means moving through it sequentially, node by
node. Processing the data element of a node may be complex, but the general pattern is as follows: ...
Repeat until there are no more nodes. Process the current node.

Algorithm for array Traversal

Step 1. Start
Step 2. [INITIALIZATION] Set I lower bound
Step 3. Repeat steps 3to 4 while I< = upper bound
Step 4. Apply process to A [I]
Step 5. Set I = I + 1
[End loop]
Step 6: Exit
Example Program to read and display n numbers using an array
#include<stdio.h>
#include<conio.h>
int main( )
{
inti,n,arr[20];
clrscr( );
printf(“\n enter the number of elements in the array:”);
scanf(“%d”,&n);
for(i=0; i<n; i++)
{
Printf(“\n arr[%d]= “, i);
Scanf(“%d”,&arr[i]);
}
Printf(“\n the array elements are”);
For(i=0; i<n; i++)
Printf(“\t %d”,arr[i]);
return 0;
}

Output
Enter the number of elements in the array:5
Arr[0] =1
Arr[1]=2
Arr[2]=3
Arr[3]=4
Arr[4]=5
The elements in the array are 1 2 3 4 5

N Venkata Vinod Kumar 24


Insertion Operation
Insert operation is to insert one or more data elements into an array. Based on the requirement, a new
element can be added at the beginning, end, or any given index of array.

Algorithm for inserting a element to an existing array

Step 1. Start
Step 2. Set upper_bound = upper_bound+1
Step 3. Set A[upper_bound] = val
Step 4. Stop
Example

Data [ ]={12, 23, 34, 45, 56, 67, 78, 89, 90,100};

a) Calculate the length of the array


b) Find the lower and upper-bound
c) Show the memory representation of the array
d) If the new data element with the value 75 has to be inserted find its position
e) Insert the 75 and show the memory representation

Solution:

Length of the array =number of the elements


Lower bound= 0 upper bound = 9

Index Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6] Data[7] Data[8] Data[9]
Data value 12 23 34 45 56 67 78 89 90 100

After inserting the 75 value

Index Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6] Data[7] Data[8] Data[9] Data[10]
Data value 12 23 34 45 56 67 75 78 89 90 100

Algorithm for insert an element in middle of the array

Step 1. Start
Step 2. [INITIALIZATION] set I = N
Step 3. Repeat Steps 4 and 5 While I > = pos
Step 4. Set A[ I+1 ] = A[ I ]
Step 5. Set I = I – 1
[End of the loop]
Step 6. Set N = N + 1
Step 7. Set A[pos] = value
Step 8. Stop

N Venkata Vinod Kumar 25


Example

Initial data array

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5]


45 23 34 12 56 20

Calling Insert (Data, 3, 100) will lead to the following in the array

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6]


45 23 34 12 56 20 20

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6]


45 23 34 12 56 56 20

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6]


45 23 34 12 12 56 20

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6]


45 23 34 100 12 56 20
Example program

#include <stdio.h>
#include <conio.h>
int main()
{
int i, n, num, pos, arr[10] ;
clrscr();
printf("\n Enter the number of elements in the array:" );
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n Arr[%d] = :",i);
scanf("%d", &arr [i]);
}
printf("\n Enter the number to be inserted:");
scanf("%d",&num);
printf("\n Enter the position at which the number has to be added:");
scanf("%d", &pos);
for(i=n;i>=pos;i--)
{
arr[i+1] = arr[i];
}
arr[pos] = num;
n++;
printf("\n The array after insertion is: ",&num);
for(i=0;i<n;i++)
{
printf (" \n Arr [%d]=%d",i,arr[i]);
}
getch();
return 0;
}

N Venkata Vinod Kumar 26


Output

Enter the number of elements in the array :4


Arr[0] =1
Arr[1] =2
Arr[2] =3
Arr[3] =4
Enter the number to be inserted: 9
Enter the position at which the number has to be added: 2
The array after insertion is
Arr[0] =1
Arr[1] =2
Arr[2] =3
Arr[3] =4
Arr[4] =9

Algorithm for Deleting a last Element from an array

1. Start
2. Set upper_bound = upper_bound -1
3. Stop

Algorithm for Deleting a Middle Element from an array

1. [INITIALIZATION] set I = pos


2. Repeat steps 3 and 4 while I < = N – 1
3. Set A [ I] =A [ I + 1 ]
4. Set I = I + 1
[End of the loop]
5. Set N= N-1
6. Exit
Example
Initial data array
Data[0] Data[1] Data[2] Data[3] Data[4] Data[5]
45 23 34 12 56 20

Calling Insert (Data, 2) will lead to the following in the array

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5]


45 23 12 12 56 20

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5]


45 23 12 56 56 20

Data[0] Data[1] Data[2] Data[3] Data[4] Data[5]


45 23 12 56 20 20

Data[0] Data[1] Data[2] Data[3] Data[4]


45 23 12 56 20

N Venkata Vinod Kumar 27


Example program to delete an element from array at specified position

#include <stdio.h>
#define MAX_SIZE 100
intmain()
{
intarr[MAX_SIZE];
int i, size,pos;
printf("Enter size of the array : ");
scanf("%d",&size);
printf("Enter elements in array : ");
for(i=0; i<size; i++)
{
scanf("%d",&arr[i]);
}
printf("Enter the element position to delete : ");
scanf("%d",&pos);

if(pos<0||pos> size)
{
printf("Invalid position! Please enter position between 1 to %d", size);
}
else
{
for(i=pos-1; i<size-1; i++)
{
arr[i]=arr[i +1];
}
size--;
}
printf("\nElements of array after delete are : ");
for(i=0; i<size; i++)
{
printf("%d\t",arr[i]);
}
return0;
}

Output
Enter size of the array: 5
Enter elements in array: 10 20 30 40 50
Enter the element position to delete: 2
Elements of array after delete are: 10 30 40 50

N Venkata Vinod Kumar 28


Merging of two arrays
Merging of two arrays in a third array means first copying the contents of the first array into the
third array and then copying the contents of the second array into the third array. Hence the merged array
contains the contents of the first array followed by the contents of the second array.
If the arrays are unsorted, then merging the arrays is very simple, as one just needs to copythe
contents of one array into another array.
But merging the arrays is not a trivial task when the two arrays are sorted and the merged array also
needs to be sorted.

Array 1 90 56 89 77 69

Array 2 45 88 76 99 12 58 81

Array 3 90 56 89 77 69 45 88 76 99 12 58 81
Merging of two unsorted arrays

If we have two sorted arrays and the resultant merged array and needs to be a sorted one, thenthe
task of merging the arrays become a little difficult
.
Array 1 20 30 40 50 60

Array 2 15 22 31 45 56 62 78

Array 3 15 20 22 30 31 40 45 50 56 60 62 78
Merging of two sorted arrays

The merged array is formed using two sorted arrays. Here we first compare the 1 st element of the
array 1 with the 1st element of the array 2, and then put the smaller element in the merged array. Since 20>
15, we put 15 as the first element in the merged array. Then compare the 2nd element of the second array
with the 1st element of the first array, since 20 < 22, now 20 is stored as second element of the merged array
and the process will continues until the merged array will complete.

Passing Arrays to Functions

Passing data values

main() void func(intnum)


{ {
intarr[5] ={1, 2, 3, 4, 5}; printf("%d", num);
func(arr[3]); }
}

Passing addresses

main() void func(int *num)


{ {
intarr[5] ={1, 2, 3, 4, 5}; printf("%d", *num);
func(&arr[3]); }
}

N Venkata Vinod Kumar 29


Passing the entire array
main() void func(intarr[5])
{ {
intarr[5] ={1, 2, 3, 4, 5}; int i;
func(arr); for(i=0;i<5;i++)
} printf("%d", arr[i]);
}

Time Complexity
Algorithm Average Case Worst Case

Access O(1) O(1)

Search O(n) O(n)

Insertion O(n) O(n)

Deletion O(n) O(n)

Space Complexity
In array, space complexity for worst case is O(n).

Advantages of Array
 Array provides the single name for the group of variables of the same type therefore, it is easy to
remember the name of all the elements of an array.
 Traversing an array is a very simple process, we just need to increment the base address of the array
in order to visit each element one by one.
 Any element in the array can be directly accessed by using the index.

Memory Allocation of the array


As we have mentioned, all the data elements of an array are stored at contiguous locations in the main
memory. The name of the array represents the base address or the address of first element in the main
memory. Each element of the array is represented by a proper indexing.

The indexing of the array can be defined in three ways.

0 (zero - based indexing) : The first element of the array will be arr[0].
1 (one - based indexing) : The first element of the array will be arr[1].
n (n - based indexing) : The first element of the array can reside at any random index number.

we have shown the memory allocation of an array arr of size 5. The array follows 0-based indexing approach.
The base address of the array is 100th byte. This will be the address of arr[0]. Here, the size of int is 4 bytes
therefore each element will take 4 bytes in the memory.

N Venkata Vinod Kumar 30


In 0 based indexing, if the size of an array is n then the maximum index number, an element can have is n-1.
However, it will be n if we use 1 based indexing.

Accessing Elements of an array


To access any random element of an array we need the following information:
 Base Address of the array.
 Size of an element in bytes.
 Which type of indexing, array follows?
Address of any element of a 1D array can be calculated by using the following formula:

Byte address of element A[i] = base address + size * ( i - first index)


Example:

In an array, A[-10 ..... +2 ], Base address (BA) = 999, size of an element = 2 bytes,
find the location of A[-1].
L(A[-1]) = 999 + [(-1) - (-10)] x 2
= 999 + 18
= 1017
Memory Allocation Process
Global variables, static variables and program instructions get their memory in permanent storage
area whereas local variables are stored in a memory area called Stack.
The memory space between these two regions is known as Heap memory. This region is used for
dynamic memory allocation during execution of the program. The size of heap keeps changing.

Dynamic Memory Allocation


The process of allocating memory at runtime is known as dynamic memory allocation. Library
routines known as memory management functions are used for allocating and freeing memory during
execution of a program. These functions are defined in stdlib.h header file.

N Venkata Vinod Kumar 31


Function Description

allocates requested size of bytes and returns a void pointer pointing to the first byte of the
malloc ()
allocated space
allocates space for an array of elements, initialize them to zero and then returns a void pointer to
calloc ()
the memory

free () releases previously allocated memory

realloc () modify the size of previously allocated space

Let’s understand the difference between static memory allocation and dynamic memory allocation.

Static memory allocation Dynamic memory allocation

Memory is allocated at compile time. Memory is allocated at run time.

Memory can't be increased while executing program. Memory can be increased while executing program.

used in array. used in linked list.

Malloc () function
 The malloc() function allocates single block of requested memory.
 It doesn't initialize memory at execution time, so it has garbage value initially.
 It returns NULL if memory is not sufficient.

The syntax of malloc() function is:-


ptr=(cast-type*)malloc(byte-size)
(or)
void* malloc(byte-size)
malloc() function is used for allocating block of memory at runtime. This function reserves a block of
memory of the given size and returns a pointer of type void. This means that we can assign it to any type of
pointer using typecasting. If it fails to allocate enough space as specified, it returns a NULL pointer.

Example Of Malloc() Function.

#include<stdio.h>
#include<stdlib.h>
int main()
{
int n,i,*ptr,sum=0;
clrscr();

N Venkata Vinod Kumar 32


printf("Enter number of elements: ");
scanf("%d",&n);
ptr=(int*)malloc(n*sizeof(int)); //memory allocated using malloc
if(ptr==NULL)
{
printf("Sorry! unable to allocate memory");
exit(0);
}
printf("Enter elements of array: ");
for(i=0;i<n;++i)
{
scanf("%d",ptr+i);
sum+=*(ptr+i);
}
printf("Sum=%d",sum);
free(ptr);
getch();
return 0;
}
Output:
Enter elements of array: 3
Enter elements of array: 10
10
10
Sum=30
calloc() function
 The calloc() function allocates multiple block of requested memory.
 It initially initialize all bytes to zero.
 It returns NULL if memory is not sufficient.

The syntax of calloc() function is:


ptr=(cast-type*)calloc(number, byte-size)
(or)
void *calloc(number of items, element-size)
calloc() is another memory allocation function that is used for allocating memory at runtime. calloc function
is normally used for allocating memory to derived data types such as arrays and structures. If it fails to
allocate enough space as specified, it returns a NULL pointer.

Let's see the example of calloc() function.


#include<stdio.h>
#include<stdlib.h>
int main(){
int n,i,*ptr,sum=0;
clrscr();
printf("Enter number of elements: ");

N Venkata Vinod Kumar 33


scanf("%d",&n);
ptr=(int*)calloc(n,sizeof(int)); //memory allocated using calloc
if(ptr==NULL)
{
printf("Sorry! unable to allocate memory");
exit(0);
}
printf("Enter elements of array: ");
for(i=0;i<n;++i)
{
scanf("%d",ptr+i);
sum+=*(ptr+i);
}
printf("Sum=%d",sum);
free(ptr);
getch();
return 0;
}
Output:
Enter elements of array: 3
Enter elements of array: 10
20
30
Sum=60

realloc() function
If memory is not sufficient for malloc() or calloc(), you can reallocate the memory by realloc()
function. In short we can say that, it changes the memory size.

Let's see the syntax of realloc() function.


ptr=realloc(ptr, new-size)
(or)
void* realloc(pointer, new-size)

Example for : realloc()


int *x;
x = (int*)malloc(50 * sizeof(int));
x = (int*)realloc(x,100); //allocated a new memory to variable x
free() function
The memory occupied by malloc() or calloc() functions must be released by calling free() function.
Otherwise, it will consume memory until program exit.

Syntax of free() function.


free(ptr)

N Venkata Vinod Kumar 34


Diffrence between malloc() and calloc()

Calloc() Malloc()
1 .calloc() initializes the allocated memory with 0 1. Malloc () initializes the allocated memory with
value. garbage values.
2. Number of arguments is 2 2.Number of argument is 1

Syntax : (cast_type *)calloc(blocks, size_of_block); Syntax : (cast_type *)malloc(Size_in_bytes);

Passing array to the function:


As we have mentioned earlier that, the name of the array represents the starting address or the address
of the first element of the array. All the elements of the array can be traversed by using the base address.

Example:
#include <stdio.h>
int summation(int[]);
void main ()
{
int arr[5] = {0,1,2,3,4};
int sum = summation(arr);
printf("%d",sum);
}
int summation (int arr[])
{
int sum=0,i;
for (i = 0; i<5; i++)
{
sum = sum + arr[i];
}
return sum;
}
The above program defines a function named as summation which accepts an array as an argument.
The function calculates the sum of all the elements of the array and returns it.

Two Dimensional Arrays


2D array can be defined as an array of arrays. The 2D array is organized as matrices which can be
represented as the collection of rows and columns.
However, 2D arrays are created to implement a relational database look alike data structure. It
provides ease of holding bulk of data at once which can be passed to any number of functions wherever
required.

How to declare 2D Array


The syntax of declaring two dimensional array is very much similar to that of a one dimensional array.
int arr[max_rows][max_columns];

N Venkata Vinod Kumar 35


The two dimensional array, the elements are organized in the form of rows and columns. First
element of the first row is represented by a[0][0] where the number shown in the first index is the number of
that row while the number shown in the second index is the number of the column.

How do we access data in a 2D array


The elements of 2D arrays can be random accessed. Similar to one dimensional arrays, we can access
the individual cells in a 2D array by using the indices of the cells. There are two indices attached to a
particular cell, one is its row number while the other is its column number.

However, we can store the value stored in any particular cell of a 2D array to some variable x by using the
following syntax.

int x = a[i][j];
where i and j is the row and column number of the cell respectively.

Example:
for ( int i=0; i<n ;i++)
{
for (int j=0; j<n; j++)
{
a[i][j] = 0;
}
}
We know that, when we declare and initialize one dimensional array in C programming simultaneously, we
don't need to specify the size of the array. However this will not work with 2D arrays. We will have to define
at least the second dimension of the array.

The syntax to declare and initialize the 2D array is


int arr[2][2] = {0,1,2,3};
The number of elements that can be present in a 2D array will always be equal to (number of rows *
number of columns).

N Venkata Vinod Kumar 36


Example : Storing User's data into a 2D array and printing it.

#include <stdio.h>
void main ()
{
int arr[3][3],i,j;
clrscr();
for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
printf("Enter a[%d][%d]: ",i,j);
scanf("%d",&arr[i][j]);
}
}
printf("\n printing the elements ....\n");
for(i=0;i<3;i++)
{
printf("\n");
for (j=0;j<3;j++)
{
printf("%d\t",arr[i][j]);
getch();
}
}
}

Mapping 2D array to 1D array


When it comes to map a 2 dimensional array, most of us might think that why this mapping is
required. However, 2 D arrays exists from the user point of view. 2D arrays are created to implement a
relational database table lookalike data structure, in computer memory, the storage technique for 2D array is
similar to that of an one dimensional array.
The size of a two dimensional array is equal to the multiplication of number of rows and the number
of columns present in the array. We do need to map two dimensional array to the one dimensional array in
order to store them in the memory.
A 3 X 3 two dimensional array is shown in the following image. However, this array needs to be
mapped to a one dimensional array in order to store it into the memory.

There are two main techniques of storing 2D array elements into memory

N Venkata Vinod Kumar 37


1. Row major ordering
In row major ordering, all the rows of the 2D array are stored into the memory contiguously.
Considering the array shown in the above image, its memory allocation according to row major order is.

First, the 1st row of the array is stored into the memory completely, then the 2nd row of the array is
stored into the memory completely and so on till the last row.

2. Column major ordering


According to the column major ordering, all the columns of the 2D array are stored into the memory
contiguously. The memory allocation of the array is.

First, the 1st column of the array is stored into the memory completely, and then the 2nd row of the
array is stored into the memory completely and so on till the last column of the array.

APPLICATION OF ARRAYS
 Arrays are widely used to implement mathematical vectors, matrices, and other kinds of rectangular
tables.
 Many databases include one-dimensional arrays whose elements are records.
 Arrays are also used to implement other data structures such as strings, stacks, queues, heaps, and
hash tables. We will read about these data structures in the subsequent chapters.
 Arrays can be used for sorting elements in ascending or descending order.

Structure
A structure is a composite data type that defines a grouped list of variables that are to be placed under
one name in a block of memory. It allows different variables to be accessed by using a single pointer to the
structure.

N Venkata Vinod Kumar 38


Syntax
struct structure_name
{
data_type member1;
data_type member2;
.
.
.
.

data_type memeber;
};
Let's see the example to define a structure for an entity employee.
struct employee
{
int id;
char name[10];
float salary;
};

Here, struct is the keyword; employee is the name of the structure; id, name, and salary are the members or
fields of the structure.

Advantages
 It can hold variables of different data types.
 We can create objects containing different types of attributes.
 It allows us to re-use the data layout across programs.
 It is used to implement other data structures like linked lists, stacks, queues, trees, graphs etc.

N Venkata Vinod Kumar 39


Example program
#include<stdio.h>
#include<conio.h>
void main( )
{
struct employee
{
int id ;
float salary ;
int mobile ;
};
struct employee e1,e2,e3 ;
clrscr();
printf ("\nEnter ids, salary & mobile no. of 3 employee\n");
scanf ("%d %f %d", &e1.id, &e1.salary, &e1.mobile);
scanf ("%d%f %d", &e2.id, &e2.salary, &e2.mobile);
scanf ("%d %f %d", &e3.id, &e3.salary, &e3.mobile);
printf ("\n Entered Result ");
printf ("\n%d %f %d", e1.id, e1.salary, e1.mobile);
printf ("\n%d%f %d", e2.id, e2.salary, e2.mobile);
printf ("\n%d %f %d", e3.id, e3.salary, e3.mobile);
getch();
}
Union
Like structure, Union in c language is a user-defined data type that is used to store the different type
of elements.
At once, only one member of the union can occupy the memory. In other words, we can say that the
size of the union in any instance is equal to the size of its largest element.

Advantage of union over structure


It occupies less memory because it occupies the size of the largest member only.
Disadvantage of union over structure
Only the last entered data can be stored in the union. It overwrites the data previously stored in the
union.

Union example

#include <stdio.h>
#include <string.h>

N Venkata Vinod Kumar 40


union employee
{ int id;
char name[50];
}e1; //declaring e1 variable for union
int main( )
{
//store first employee information
e1.id=101;
strcpy(e1.name, "Super star");//copying string into char array
//printing first employee information
printf( "employee 1 id : %d\n", e1.id);
printf( "employee 1 name : %s\n", e1.name);
return 0;
}
Output:
employee 1 id : 1869508435
employee 1 name : Super star

Sorting
Sorting refers to the operation or technique of arranging and rearranging sets of data in some specific
order. A collection of records called a list where every record has one or more fields. The fields which
contain a unique value for each record is termed as the key field.
For example, a phone number directory can be thought of as a list where each record has three fields -
'name' of the person, 'address' of that person, and their 'phone numbers'. Being unique phone number can
work as a key to locate any record in the list.
Sorting is the operation performed to arrange the records of a table or list in some order according to
some specific ordering criterion. Sorting is performed according to some key value of each record.
The records are either sorted either numerically or alphanumerically. The records are then arranged in
ascending or descending order depending on the numerical value of the key. Here is an example, where the
sorting of a lists of marks obtained by a student in any particular subject of a class.

Categories of Sorting

The techniques of sorting can be divided into two categories.

 Internal Sorting
 External Sorting

Internal Sorting: If all the data that is to be sorted can be adjusted at a time in the main memory, the internal
sorting method is being performed.

External Sorting: When the data that is to be sorted cannot be accommodated in the memory at the same
time and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes etc, then
external sorting methods are performed.

N Venkata Vinod Kumar 41


The complexity of the sorting algorithms
The complexity of sorting algorithm calculates the running time of a function in which 'n' number of
items are to be sorted. The choice for which sorting method is suitable for a problem depends on several
dependency configurations for different problems. The most noteworthy of these considerations are:
 The length of time spent by the programmer in programming a specific sorting program
 Amount of machine time necessary for running the program
 The amount of memory necessary for running the program
The Efficiency of Sorting Techniques
To get the amount of time required to sort an array of 'n' elements by a particular method, the normal
approach is to analyze the method to find the number of comparisons (or exchanges) required by it. Most of
the sorting techniques are data sensitive, and so the metrics for them depends on the order in which they
appear in an input array.
Various sorting techniques are analyzed in various cases and named these cases are:

 Best case
 Worst case
 Average case
Hence, the result of these cases is often a formula giving the average time required for a particular sort of
size 'n.' Most of the sort methods have time requirements that range from O(nlog n) to O(n 2).
Types of Sorting Techniques
 Bubble Sort
 Selection Sort
 Insertion Sort
 Merge Sort
 Quick Sort
 Heap Sort
 Radix Sort
Quick Sort Algorithm
Quick sort is a fast sorting algorithm used to sort a list of elements. Quick sort algorithm is invented
by C. A. R. Hoare.
The quick sort algorithm attempts to separate the list of elements into two parts and then sort each
part recursively. That means it use divide and conquer strategy. In quick sort, the partition of the list is
performed based on the element called pivot. Here pivot element is one of the elements in the list.
The list is divided into two partitions such that "all elements to the left of pivot are smaller than the
pivot and all elements to the right of pivot are greater than or equal to the pivot".

Step by Step Process/ Algorithm


In Quick sort algorithm, partitioning of the list is performed using following steps...

Step 1 - Consider the first element of the list as pivot (i.e., Element at first position in the list).
Step 2 - Define two variables i and j. Set i and j to first and last elements of the list respectively.
Step 3 - Increment i until list[i] > pivot then stop.
Step 4 - Decrement j until list[j] < pivot then stop.
Step 5 - If i < j then exchange list[i] and list[j].
Step 6 - Repeat steps 3,4 & 5 until i > j.
Step 7 - Exchange the pivot element with list[j] element.

N Venkata Vinod Kumar 42


N Venkata Vinod Kumar 43
Complexity of the Quick Sort Algorithm

To sort an unsorted list with 'n' number of elements, we need to make ((n-1) + (n-2) + (n-3) +......+1) = (n (n-
1))/2 number of comparisons in the worst case. If the list is already sorted, then it requires 'n' number of
comparisons.

Worst Case : O (n2)


Best Case : O (n log n)
Average Case : O (n log n)

Example Program for Quick Sort

#include<stdio.h>
#include<conio.h>
void quickSort(int [10],int,int);
void main()
{
int list[20],size,i;
printf("Enter size of the list: ");
scanf("%d",&size);
printf("Enter %d integer values: ",size);
for(i = 0; i < size; i++)
scanf("%d",&list[i]);
quickSort(list,0,size-1);
printf("List after sorting is: ");
for(i = 0; i < size; i++)
printf(" %d",list[i]);
getch();
}
void quickSort(int list[10],int first,int last){
int pivot,i,j,temp;
if(first < last)
{
pivot = first;
i = first;
j = last;
while(i < j)
{
while(list[i] <= list[pivot] && i < last)
i++;
while(list[j] > list[pivot])
j--;
if(i <j)
{
temp = list[i];
list[i] = list[j];

N Venkata Vinod Kumar 44


list[j] = temp;
}
}
temp = list[pivot];
list[pivot] = list[j];
list[j] = temp;
quickSort(list,first,j-1);
quickSort(list,j+1,last);
}
}
Out Put
Enter the size of the list: 8
Enter the 8 Integer values: 5 3 8 1 4 6 2 7
List after sorting is: 1 2 3 4 5 6 7 8

Merge Sort Algorithm


Merge Sort follows the rule of Divide and Conquer to sort a given set of numbers/elements,
recursively, hence consuming less time.
In the last two tutorials, we learned about Selection Sort and Insertion Sort, both of which have a
worst-case running time of O(n2). As the size of input grows, insertion and selection sort can take a long
time to run.
Merge sort, on the other hand, runs in O (n*log n) time in all the cases.
Divide and Conquer
If we can break a single big problem into smaller sub-problems, solve the smaller sub-problems and
combine their solutions to find the solution for the original big problem, it becomes easier to solve the whole
problem.

In Merge Sort, the given unsorted array with n elements is divided into n sub arrays, each having one
element, because a single element is always sorted in itself. Then, it repeatedly merges these sub arrays, to
produce new sorted sub arrays, and in the end, one complete sorted array is produced.

The concept of Divide and Conquer involves three steps:

 Divide the problem into multiple small problems.


 Conquer the sub problems by solving them. The idea is to break down the problem into atomic sub
problems, where they are actually solved.
 Combine the solutions of the sub problems to find the solution of the actual problem.

N Venkata Vinod Kumar 45


How Merge Sort Works?
As we have already discussed that merge sort utilizes divide-and-conquer rule to break the problem
into sub-problems, the problem in this case being, sorting a given array.
In merge sort, we break the given array midway, for example if the original array had 6 elements,
then merge sort will break it down into two sub arrays with 3 elements each.

But breaking the original array into 2 smaller sub arrays is not helping us in sorting the array.
So we will break these sub arrays into even smaller sub arrays, until we have multiple sub arrays with
single element in them. Now, the idea here is that an array with a single element is already sorted, so once we
break the original array into sub arrays which has only a single element, we have successfully broken down
our problem into base problems.
And then we have to merge all these sorted sub arrays, step by step to form one single sorted array.
Let's consider an array with values {14, 7, 3, 12, 9, 11, 6, and 12}

N Venkata Vinod Kumar 46


In merge sort we follow the following steps:

 We take a variable p and store the starting index of our array in this. And we take another variable r
and store the last index of array in it.
 Then we find the middle of the array using the formula (p + r)/2 and mark the middle index as q, and
break the array into two sub arrays, from p to q and from q + 1 to r index.
 Then we divide these 2 sub arrays again, just like we divided our main array and this continues.
 Once we have divided the main array into sub arrays with single elements, then we start merging the
sub arrays.
Example program
#include<stdio.h>
void mergeSort(int[],int,int);
void merge(int[],int,int,int);
void main ()
{
int a[10]= {10, 9, 7, 101, 23, 44, 12, 78, 34, 23};
int i;
clrscr();
mergeSort(a,0,9);
printf("printing the sorted elements");
for(i=0;i<10;i++)
{
printf("\n%d\n",a[i]);
}
getch();
}
void mergeSort(int a[], int beg, int end)
{
int mid;
if(beg<end)
{
mid = (beg+end)/2;
mergeSort(a,beg,mid);
mergeSort(a,mid+1,end);
merge(a,beg,mid,end);
}
}
void merge(int a[], int beg, int mid, int end)
{
int i=beg,j=mid+1,k,index = beg;
int temp[10];
while(i<=mid && j<=end)
{
if(a[i]<a[j])
{
temp[index] = a[i];

N Venkata Vinod Kumar 47


i = i+1;
}
else
{
temp[index] = a[j];
j = j+1;
}
index++;
}
if(i>mid)
{
while(j<=end)
{
temp[index] = a[j];
index++;
j++;
}
}
else
{
while(i<=mid)
{
temp[index] = a[i];
index++;
i++;
}
}
k = beg;
while(k<index)
{
a[k]=temp[k];
k++;
}
}
Output:
Printing the sorted elements
7
9
10
12
23
23
34
44
78
101

N Venkata Vinod Kumar 48


Heap Sort Algorithm
Heap sort is one of the sorting algorithms used to arrange a list of elements in order. Heapsort
algorithm uses one of the tree concepts called Heap Tree. In this sorting algorithm, we use Max Heap to
arrange list of elements in Descending order and Min Heap to arrange list elements in Ascending order.

Step by Step Process/ Algorithm

Step 1 - Construct a Binary Tree with given list of Elements.


Step 2 - Transform the Binary Tree into Min Heap.
Step 3 - Delete the root element from Min Heap using Heapify method.
Step 4 - Put the deleted element into the Sorted list.
Step 5 - Repeat the same until Min Heap becomes empty.
Step 6 - Display the sorted list.

Example Problem

N Venkata Vinod Kumar 49


N Venkata Vinod Kumar 50
Complexity of the Heap Sort Algorithm

To sort an unsorted list with 'n' number of elements, following are the complexities...

Worst Case : O (n log n)


Best Case : O (n log n)
Average Case : O (n log n)

Example program
#include<stdio.h>
int temp;
void heapify(int arr[], int size, int i)
{
int largest = i;
int left = 2*i + 1;
int right = 2*i + 2;
if (left < size && arr[left] >arr[largest])
largest = left;

N Venkata Vinod Kumar 51


if (right < size && arr[right] > arr[largest])
largest = right;
if (largest != i)
{
temp = arr[i];
arr[i]= arr[largest];
arr[largest] = temp;
heapify(arr, size, largest);
}
}
void heapSort(int arr[], int size)
{
int i;
for (i = size / 2 - 1; i >= 0; i--)
heapify(arr, size, i);
for (i=size-1; i>=0; i--)
{
temp = arr[0];
arr[0]= arr[i];
arr[i] = temp;
heapify(arr, i, 0);
}
}
void main()
{
int arr[] = {1, 10, 2, 3, 4, 1, 2, 100,23, 2};
int i;
int size = sizeof(arr)/sizeof(arr[0]);
clrscr();
heapSort(arr, size);
printf("printing sorted elements\n");
for (i=0; i<size; ++i)
printf("%d\n",arr[i]);
getch();
}
Output:
Printing sorted elements
1
1
2
2
2
3
4
10
23
100

N Venkata Vinod Kumar 52


UNIT – 2
Stack, Queue and Linked lists
Stacks, Stacks using Dynamic Arrays, Queues, Circular Queues Using Dynamic Arrays, Evaluation of
Expressions, Multiple Stacks and Queues. Linked lists: Singly Linked Lists and Chains, Representing
Chains in C, Linked Stacks and Queues, Additional List Operations, Doubly Linked Lists.

INTRODUCTION

Stack is an important data structure which stores its elements in an ordered manner. We will explain the
concept of stacks using an analogy. You must have seen a pile of plates where one plate is placed on top of
another as shown in Fig. Now, when you want to remove a plate, you remove the topmost plate first. Hence,
you can add and remove an element (i.e., a plate) only at/from one position which is the topmost position.
A stack is a linear data structure which uses the same principle, i.e., the elements in a stack are
added and removed only from one end, which is called the TOP. Hence, a stack is called a LIFO (Last-
In-First-Out) data structure, as the element that was inserted last is the first one to be taken out.
Now the question is where do we need stacks in computer science? The answer is in function calls.
Consider an example, where we are executing function A. In the course of its execution, function A calls
another function B. Function B intern calls another function C, which calls function D.

In order to keep track of the returning point of each active function, a special stack called system
stack or call stack is used. Whenever a function calls another function, the calling function is pushed onto the
Top of the stack. This is because after the called function gets executed, the control is passed back to the
calling function.
Now when function E is executed, function D will be removed from the top of the stack and
executed. Once function D gets completely executed, function C will be removed from the stack for
execution. The whole procedure will be repeated until all the functions get executed. Let us look at the stack
after each function is executed.
N Venkata Vinod Kumar 1
The system stack ensures a proper execution order of functions. Therefore, stacks are frequently used
in situations where the order of processing is very important, especially when the processing needs to be
postponed until other conditions are fulfilled.

ARRAY REPRESENTATION OF STACKS


In the computer’s memory, stacks can be represented as a linear array. Every stack has a variable
called TOP associated with it, which is used to store the address of the topmost element of the stack. It is this
position where the element will be added to or deleted from. There is another variable called MAX, which is
used to store the maximum number of elements that the stack can hold. If TOP = NULL, then it indicates that
the stack is empty and if TOP = MAX–1, then the stack is full.(You must be wondering why we have written
MAX–1. It is because array indices start from 0.)

The stack in Fig. shows that TOP = 4, so insertions and deletions will be done at this position.In the above
stack, five more elements can still be stored.

OPERATIONS ON A STACK
A stack supports three basic operations: push, pop, and peek. The push operation adds an element to
the top of the stack and the pop operation removes the element from the top of the stack. The peek operation
returns the value of the topmost element of the stack.

Push Operation
The push operation is used to insert an element into the stack. The new element is added at thetopmost
position of the stack. However, before inserting the value, we must first check if TOP=MAX–1,because if
that is the case, then the stack is full and no more insertions can be done. If an attemptis made to insert a
value in a stack that is already full, an OVERFLOW message is printed.

To insert an element with value 6, we first check if TOP=MAX–1. If the condition is false, then we
increment the value of TOP and store the new element at the position given by stack [TOP].

Algorithm to PUSH an element in a stack

Step 1: If Top = Max-1, Then


Print “Overflow”
Goto Step 4
[End Of If]
Step 2: Set Top = Top + 1
Step 3: Set Stack [Top] = Value
Step 4: End

N Venkata Vinod Kumar 2


To insert an element in a stack, we first check for the OVERFLOW condition, then TOPis incremented so
that it points to the next location in the array, the value is stored in the stack at the location pointed by TOP.

Pop Operation
The pop operation is used to delete the topmost element from thestack. However, before deleting the
value, we must first check ifTOP=NULL because if that is the case, then it means the stack is emptyand no
more deletions can be done. If an attempt is made to delete a value from a stack that isalready empty, an
UNDERFLOW message is printed.

Algorithm to POP an element from a stack

Step 1: If Top = Null, Then


Print “Underflow”
Goto Step 4
[End Of If]
Step 2: Set Val = Stack[Top]
Step 3: Set Top = Top - 1
Step 4: End

To delete the topmost element, we first check if TOP=NULL. If the condition is false, then wedecrement the
value pointed by TOP.

To delete an element from a stack, we first check for the UNDERFLOW condition, the value of the location
in the stack pointed by TOP is stored in VAL, TOP is decremented.

Peek/Peep Operation
Peek is an operation that returns the value of the top most element of the stack without deleting it
from the stack. However, the Peek operation first checks if the stack is empty ,i.e., if TOP = NULL, then an
appropriate message is printed, else the value is returned.

Algorithm for Peek/Peep Operation


Step 1: If Top =Null, Then
Print “Stack Is Empty”
Go To Step 3
[End of If]
Step 2: Return Stack [Top]
Step 3: End

Here, the Peek operation will return 5, as it is the value of the topmost element of the stack.

N Venkata Vinod Kumar 3


Stacks using Dynamic Arrays
 Shortcoming of static stack implementation: is the need to know at compile-time, a good
bound(MAX_STACK_SIZE) on how large the stack will become.
 This shortcoming can be overcome by
→ using a dynamically allocated array for the elements &
→ then increasing the size of the array as needed
 Initially, capacity=1 where capacity=maximum no. of stack-elements that may be stored in array.
 The CreateS() function can be implemented as follows
Stack CreateS()::=
struct element {
int key; };
element *stack;
MALLOC(stack,sizeof(*stack));
int capacity=-1;
int top=-1;
Boolean IsEmpty(Stack)::= top<0;
Boolean IsFull(Stack) ::= top>=capacity-1;
void stackFull()
{
REALLOC(stack,2*capacity*sizeof(*stack));
capacity=2*capacity;
}
 Once the stack is full, realloc() function is used to increase the size of array.
 In array-doubling, we double array-capacity whenever it becomes necessary to increase the capacity
of an array.
ANALYSIS
 In worst case, the realloc function needs to
→ allocate 2*capacity*sizeof(*stack) bytes of memory and
→ copy capacity*sizeof(*stack) bytes of memory from the old array into the new one.
 The total time spent over all array doublings = O(2k) where capacity=2k
 Since the total number of pushes is more than 2k-1, the total time spend in array doubling is
O(n) where n=total number of pushes.

MULTIPLE STACKS
While implementing a stack using an array, we had seen that the size of the array must be known in
advance. If the stack is allocated less space, then frequent OVERFLOW conditions will be encountered.
To deal with this problem, the code will have to be modified to reallocate more space for the array. In case
we allocate a large amount of space for the stack, it may result in sheer wastage of memory. Thus, there lies a
trade-off between the frequency of overflows and the space allocated. So, a better solution to deal with this
problem is to have multiple stacks or to have more than one stack in the same array of sufficient size.

N Venkata Vinod Kumar 4


An array STACK[n] is used to represent two stacks, Stack A and Stack B. The value of n is such that the
combined size of both the stacks will never exceed n. While operating on these stacks, it is important to note
one thing Stack A will grow from left to right, whereas Stack B will grow from right to left at the same time.
Extending this concept to multiple stacks, a stack can also be used to represent number of stacks in the same
array. That is, if we have a STACK[n], then each stack I will be allocated an equal amount of space bounded
by indices b[i] and e[i].

 Assume that we have ‘n’ stacks, we can divide the available memory into ‘n’ segments (Fig: 3.18).
 Let ‘i’ = stack number of one of n stacks.
 Let boundary[i] (0<=i<MAX_STACKS) points to position immediately to left of bottom element of
stack i, while top[i] (0<=i<MAX_STACKS) points to top element.
 Stack i is empty iff boundary[i]=top[i]
 The relevant declaration are:
#define MEMORY_SIZE 100 /* size of memory */
#define MAX_STACKS 10 /* max number of stacks plus 1 */
element memory[MEMORY_SIZE];
int top[MAX_STACKS];
int n; /* number of stacks entered by the user */
 To divide the array into equal segment, we use the following code:
top[0]=boundary[0]= -1;
for(j=1;j<n; j++)
top[j]=boundary[j]=(MEMORY_SIZE/n)*j;
boundary[n]=MEMORY_SIZE-1;

Figure3.18:Initial configuration for n stacks in memory[m]

Program3.16:Addanitemto theith stack Program3.17:Deletean itemfromtheithstack

N Venkata Vinod Kumar 5


Figure3.19:Configuration when stack i meets stack i+1,but the memory is not full

 In push function, top[i]==boundary[i+1] condition implies only that a particular stack


ran out of memory, not that the entire memory is full.(In fact, there may be a lot of
unused space between other stacks in array memory).
 Therefore, we create an error recovery function, stackFull, which determines if there is
any free space in memory.
 If there is space available, it should shift the stacks so that space is allocated to the full
stack.
 We can guarantee that stackFull adds elements as long as there is free space in array
memory if we:
1) Determine the least j, i<j<n such that there is free space between stacks j and
j+1 i.e. top[j]<boundary[j+1]. If there is such a j, then move stacks i+1,i+2 . . . .
. .j one position to the right. This creates a space between stacks i and i+1.
2) If there is no j as in (i),then look to the left of stack i. Find the largest j such
that 0<=j<i and there is space between stacks j and j+1 i.e.
top[j]<boundary[j+1]. If there is such a j, then move stacks j+1,j+2. . . . . .i one
space to the left. This also creates a space between stacks i and i+1.
3) If there is no j satisfying either (i) or (ii),then all MEMORY_SIZE spaces of
memory are utilized and there is no free space. In this case, stackFull
terminates with an error message

APPLICATIONS OF STACKS
In this section we will discuss typical problems where stacks can be easily applied for a simple and
efficient solution. The topics that will be discussed in this section include the following:
 Reversing a list
 Parentheses checker
 Conversion of an infix expression into a postfix expression
 Evaluation of a postfix expression
 Conversion of an infix expression into a prefix expression
 Evaluation of a prefix expression
 Recursion
 Tower of Hanoi
 Parsing
 Browsers
 Editors
 Tree Traversals

Evaluation of Arithmetic Expressions


Polish Notations
Infix, postfix, and prefix notations are three different but equivalent notations of writing algebraic
expressions. But before learning about prefix and postfix notations, let us first see what an infix notation is.
We all are familiar with the infix notation of writing algebraic expressions.
While writing a arithmetic expression using infix notation, the operator is placed in between the
operands. For example, A+B; here, plus operator are placed between the two operands A and B. Although it
is easy for us to write expressions using infix notation, computers find it difficult to parse as the computer
needs a lot of information to evaluate the expression. Information is needed about operator precedence and

N Venkata Vinod Kumar 6


associability rules, and brackets which override these rules. So, computers work more efficiently with
expressions written using prefix and postfix notations.
Postfix notation was developed by Jan Łukasiewicz who was a Polish logician, mathematician, and
philosopher. His aim was to develop a parenthesis-free prefix notation (also known as Polish notation) and a
postfix notation, which is better known as Reverse Polish Notation or RPN.
In postfix notation, as the name suggests, the operator is placed after the operands. For example, if an
expression is written as A+B in infix notation, the same expression can be written as AB+ in postfix notation.
The order of evaluation of a postfix expression is always from left to right. Even brackets cannot alter the
order of evaluation.

The expression (A + B) * C can be written as:


[AB+]*C
AB+C* in the postfix notation

A postfix operation does not even follow the rules of operator precedence. The operator which occurs first in
the expression is operated first on the operands. For example, given a postfix notation AB+C*. While
evaluation, addition will be performed prior to multiplication.
Thus we see that in a postfix notation, operators are applied to the operands that are immediately left
to them. In the example, AB+C*, + is applied on A and B, then * is applied on the result of addition and C.
Although a prefix notation is also evaluated from left to right, the only difference between a postfix
notation and a prefix notation is that in a prefix notation, the operator is placed before the operands. For
example, if A+B is an expression in infix notation, then the corresponding expression in prefix notation is
given by +AB.
While evaluating a prefix expression, the operators are applied to the operands that are present
immediately on the right of the operator. Like postfix, prefix expressions also do not follow the rules of
operator precedence and associativity, and even brackets cannot alter the order of evaluation.

Convert the following infix expressions into postfix expressions.

Solution
(i) (A–B) * (C+D)
[AB–] * [CD+]
AB–CD+*

(ii) (A + B) / (C + D) – (D * E)
[AB+] / [CD+] – [DE*]
[AB+CD+/] – [DE*]
AB+CD+/DE*–

Conversion of an Infix Expression into a Postfix Expression

Let I be an algebraic expression written in infix notation. I may contain parentheses, operands, and
operators. For simplicity of the algorithm we will use only +, –, *, /, %operators. The precedence of these
operators can be given as follows:

Higher priority *, /, %

N Venkata Vinod Kumar 7


Lower priority +, –

No doubt, the order of evaluation of these operators can be changed by making use of parentheses.
For example, if we have an expression A + B * C, then first B * C will be done and the result will be added
to A. But the same expression if written as, (A + B) * C, will evaluate A + B first and then the result will be
multiplied with C.
The algorithm given below transforms an infix expression into postfix expression, as shown in Fig.
The algorithm accepts an infix expression that may contain operators, operands, and parentheses. For
simplicity, we assume that the infix operation contains only modulus (%), multiplication (*), division (/),
addition (+), and subtraction (―) operators and that operators with same precedence are performed from left-
to-right.
The algorithm uses a stack to temporarily hold operators. The postfix expression is obtained from
left-to-right using the operands from the infix expression and the operators which are removed from the
stack. The first step in this algorithm is to push a left parenthesis on the stack and to add a corresponding
right parenthesis at the end of the infix expression. The algorithm is repeated until the stack is empty.

Algorithm to convert an Infix notation into postfix notation

Step 1: Add ‘)” to the end of the infix expression


Step 2: Push “(“on to the stack
Step 3: Repeat until each character in the infix notation is scanned
IF a “(“is encountered, push it on the stack
IF an operand (whether a digit or an alphabet) is encountered,
Add it to the postfix expression.
IF a “)” is encountered, then;
a . Repeatedly pop from stack and add it to the postfix expression until a “(” is
encountered.
b. Discard the “(“. That is, remove the “(“from stack and do not add it to the postfix expression
IF an operator X is encountered, then;
a Repeatedly pop from stack and add each operator (popped from the stack) to the
postfix expression which has the same precedence or a higher precedence than X
b. Push the operator X to the stack
Step 4: Repeatedly pop from the stack and add it to the postfix expression until the stack is empty
Step 5: EXIT

Convert the following infix expression into postfix expression using the algorithm
A – (B / C + (D % E * F) / G)* H
(A – (B / C + (D % E * F) / G)* H)

N Venkata Vinod Kumar 8


Evaluation of a Postfix Expression
The ease of evaluation acts as the driving force for computers to translate an infix notation into a
postfix notation. That is, given an algebraic expression written in infix notation, the computer first converts
the expression into the equivalent postfix notation and then evaluates the postfix expression.
Both these tasks converting the infix notation into postfix notation and evaluating the postfix
expression make extensive use of stacks as the primary tool.
Using stacks, any postfix expression can be evaluated very easily. Every character of the postfix
expression is scanned from left to right. If the character encountered is an operand, it is pushed on to the
stack. However, if an operator is encountered, then the top two values are popped fromthe stack and the
operator is applied on these values. The result is then pushed on to the stack.

Let us now take an example that makes use of this algorithm. Consider the infix expressiongiven as 9 – ((3 *
4) + 8) / 4. Evaluate the expression.

N Venkata Vinod Kumar 9


The infix expression 9 – ((3 * 4) + 8) / 4 can be written as 9 3 4 * 8 + 4 / – using postfix notation.
Look at Table 7.1, which shows the procedure.

Queues
A queue is an important data structure which is extensively used in computer applications. In this we
will study the operations that can be performed on a queue. Will also discuss the implementation of a queue
by using both arrays as well as linked lists, Will illustrate different types of queues like multiple queues,
double ended queues, circular queues, and priority queues. And also lists some real-world applications of
queues.
INTRODUCTION
Let us explain the concept of queues using the analogies given below.
 People moving on an escalator. The people who got on the escalator first will be the first one to step
out of it.
 People waiting for a bus. The first person standing in the line will be the first one to get into the bus.
 People standing outside the ticketing window of a cinema hall. The first person in the line will get the
ticket first and thus will be the first one to move out of it.
 Luggage kept on conveyor belts. The bag which was placed first will be the first to come out at the
other end.
 Cars lined at a toll bridge. The first car to reach the bridge will be the first to leave.

A queue is a FIFO (First-In, First-Out) data structure in which the element that is inserted first is the first
one to be taken out. The elements in a queue are added at one end called the REAR and removed from the
other end called the FRONT. Queues can be implemented by using either arrays or linked lists.

ARRAY REPRESENTATION OF QUEUES


Queues can be easily represented using linear arrays. As stated earlier, every queue has front and rear
variables that point to the position from where deletions and insertions can be done, respectively.

In above Fig FRONT = 0 and REAR = 5. Suppose we want to add another element with value 45, then
REAR would be incremented by 1 and the value would be stored at the position pointed by REAR.

Here, FRONT = 0 and REAR = 6. Every time a new element has to be added, we repeat the same procedure.
If we want to delete an element from the queue, then the value of FRONT will be incremented. Deletions are
done from only this end of the queue. Here, FRONT = 1 and REAR = 6.

N Venkata Vinod Kumar 10


However, before inserting an element in a queue, we must check for overflow conditions. An overflow will
occur when we try to insert an element into a queue that is already full. When REAR = MAX – 1, where
MAX is the size of the queue, we have an overflow condition. Note that we have written MAX – 1 because
the index starts from 0.

ALGORITHM TO INSERT AN ELEMENT IN A QUEUE

Step 1: If Rear = Max


Write Overflow
Goto Step 4
[End of If]
Step 2: If Front = 0 and Rear = 0
Set Front = Rear = 1
Else
Set Rear = Rear + 1
[End of If]
Step 3: Set Queue [Rear] = Num
Step 4: Exit

The algorithm to insert an element in a queue, In Step 1, we first check for the overflow condition. In
Step 2, we check if the queue is empty. In case the queue is empty, then both FRONT and REAR are set to
zero, so that the new value can be stored at the 0th location. Otherwise, if the queue already has some values,
then REAR is incremented so that it points to the next location in the array. In Step 3, the value is stored in
the queue at the location pointed by REAR.

Similarly, before deleting an element from a queue, we must check for underflow conditions. An underflow
condition occurs when we try to delete an element from a queue that is already empty. If FRONT = –1 and
REAR = –1, it means there is no element in the queue.

ALGORITHM TO DELETE AN ELEMENT FROM A QUEUE

The algorithm to delete an element from a queue, In Step 1, we check for underflow condition. An underflow
occurs if FRONT = –1 or FRONT > REAR. However, if queue has some values, then FRONT is
incremented so that it now points to the next value in the queue.

N Venkata Vinod Kumar 11


Note: The process of inserting an element in the queue is called en-queue, and the process of deleting an
element from the queue is called de-queue.

TYPES OF QUEUES
A queue data structure can be classified into the following types:
1. Circular Queue
2. Deque
3. Priority Queue
4. Multiple Queues

Circular Queues
In linear queues, we have discussed so far that insertions can be done only at one end called theREAR
and deletions are always done from the other end called the FRONT.

Here, FRONT = 0 and REAR = 9.


Now, if you want to insert another value, it will not be possible because the queue is completelyfull. There is
no empty space where the value can be inserted. Consider a scenario in which twosuccessive deletions are
made.

Here, front = 2 and REAR = 9.

Even though thereis space available, the overflow condition still exists because the condition rear =
MAX – 1 still holdstrue. This is a major drawback of a linear queue.
To resolve this problem, we have two solutions. First, shift the elements tothe left so that the vacant
space can be occupied and utilized efficiently. Butthis can be very time-consuming, especially when the
queue is quite large.The second option is to use a circular queue. In the circular queue, thefirst index comes
right after the last index.

The circular queue will be full only when front = 0 and rear = Max – 1. A circular queue is
implemented in the same manner as a linear queue is implemented. The only difference will be in the code
that performs insertion and deletion operations. For insertion, we now have to check for the following three
conditions:

 If front = 0 and rear = MAX – 1, then the circular queue is full.


N Venkata Vinod Kumar 12
 If rear! = MAX – 1, then rear will be incremented and the value will be inserted.

 If front! = 0 and rear = MAX – 1, then it means that the queue is not full. So, set rear = 0 and insert
the new element there

ALGORITHM TO INSERT AN ELEMENT IN A CIRCULAR QUEUE

The algorithm to insert an element in a circular queue, In Step 1, we check for the overflow condition.
In Step 2, we make two checks. First to see if the queue is empty, and second to see if the REAR end has
already reached the maximum capacity while there are certain free locations before the FRONT end. In Step
3, the value is stored in the queue at the location pointed by REAR.
After seeing how a new element is added in a circular queue, let us now discuss how deletions are
performed in this case. To delete an element, again we check for three conditions.

N Venkata Vinod Kumar 13


 If front = –1, then there are no elements in the queue. So, an underflow condition will be reported.
 If the queue is not empty and front = rear, then after deleting the element at the frontthe queue
becomes empty and so front and rear are set to –1.
 If the queue is not empty and front = MAX–1, then after deleting the element at the front, front is set
to 0.

ALGORITHM TO DELETE AN ELEMENT FROM A CIRCULAR QUEUE

Which shows the algorithm to delete an element from a circular queue, In Step 1, we check for the
underflow condition. In Step 2, the value of the queue at the location pointed by FRONT is stored in VAL. In
Step 3,we make two checks. First to see if the queue has become empty after deletion and second to see if
FRONT has reached the maximum capacity of the queue. The value of FRONT is then updated based on the
outcome of these checks.
N Venkata Vinod Kumar 14
CIRCULAR QUEUES USING DYNAMICALLY ALLOCATED ARRAYS

• Shortcoming of static queue implementation: is the need to know at compile-time, a good


bound(MAX_QUEUE_SIZE) on how large the queue will become.
• This short coming can be overcome by
→using a dynamically allocated array for the elements &
→then increasing the size of the array as needed
• In array-doubling, we double array-capacity whenever it becomes necessary to increase the
capacity of an array (Figure3.8).

Figure3.8:Doublingqueuecapacity

• To get a proper circular queue configuration, we must slide the elements in the right segment to the
right end of the array(Figure3.8d).
• Thearraydoublingandtheslidetotherighttogethercopyatmost2*capacity-2elements.
• Thenumberofelementscopiedcanbelimitedtocapacity-
1bycustomizingthearraydoublingcodesoastoobtaintheconfigurationofFigure3.8e.Thisconfigurationm
aybeobtainedasfollows:
1) Create a new array newQueue of twice the capacity.
2) Copy the second segment to positions in newQueue begining at0.
3) Copy the first segment to positions in newQueue begining at capacity-front-1.

Program3.9:Add to a circular queue

N Venkata Vinod Kumar 15


Program 3.10: Doubling queue capacity

Multiple Queues
When we implement a queue using an array, the size of the array must be known in advance. If the queue is
allocated less space, then frequent overflow conditions will be encountered. To deal with this problem, the
code will have to be modified to reallocate more space for the array.

In case we allocate a large amount of space for the queue, it will result in sheer wastage of the memory.
Thus, there lies a tradeoff between the frequency of overflows and the space allocated.

So a better solution to deal with this problem is to have multiple queues or to have more than one queue in
the same array of sufficient size. Figure 8.31 illustrates this concept.

In the figure, an array Queue[n] is used to represent two queues, Queue A and Queue B. The value of n is
such that the combined size of both the queues will never exceed n. While operating on these queues, it is
important to note one thing queue A will grow from left to right, whereas queue B will grow from right to
left at the same time.

Extending the concept to multiple queues, a queue can also be used to represent n number of queues in the
same array. That is, if we have a QUEUE[n],then each queue I will be allocated an equal amount of space
bounded by indices b[i] and e[i].

APPLICATIONS OF QUEUES
 Queues are widely used as waiting lists for a single shared resource like printer, disk, CPU.
 Queues are used to transfer data asynchronously (data not necessarily received at same rateas sent)
between two processes (IO buffers), e.g., pipes, file IO, sockets.

N Venkata Vinod Kumar 16


 Queues are used as buffers on MP3 players and portable CD players, iPod playlist.
 Queues are used in Playlist for jukebox to add songs to the end, play from the front of the list.
 Queues are used in operating system for handling interrupts. When programming a real-time system
that can be interrupted, for example, by a mouse click, it is necessary to process the interrupts
immediately, before proceeding with the current job. If the interrupts have to be handled in the order
of arrival, then a FIFO queue is the appropriate data structure.

LINKED LISTS

A linked list is ordered collection of finite. Homogeneous data elements called nodes where the linear order
is maintained by means of links or pointers.

A linked list is a data structure consisting of a group of nodes which together represent a sequence. Under
the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in
the sequence; more complex variants add additional links. This structure allows for efficient insertion or
removal of elements from any position in the sequence.

Memory Representation of Linked Lists

START
1

DATA NEXT
1 H 4
2
3
4 E 7
5
6
7 L 8
8 L 10
9
10 O -1

Let us see how a linked list is maintained in the memory, in order to form a linked list, we need a
structure called node which has two fields, DATA and NEXT. Data will store the information part and next
will store the address of the node in sequence. We see that the variable START is used to store the address of
the first node.
Representing Chains in C
Types of Linked Lists

 Single linked list


 Double linked list
 Circular linked list

Representation of a Linked List

There are two ways to represent a linked list in memory:

N Venkata Vinod Kumar 17


1. Static representation using array
2. Dynamic representation using free pool of storage

Linked lists are a way to store data with structures so that the programmer can automatically create a new
place to store data whenever necessary. Specifically, the programmer writes a struct definition that contains
variables holding information about something and that has a pointer to a struct of its same type (it has to be
a pointer--otherwise, every time an element was created, it would create a new element, infinitely). Each of
these individual structs or classes in the list is commonly known as a node or element of the list. I.inkedLists

In memory a linked list is often described as looking like this:


---------- ----------
- Data - - Data -
---------- ----------
- Pointer- - - -> - Pointer-
---------- ----------
Head node in linked list

The entry point into a linked list is called the head of the list. It should be noted that head is not a
separate node, but the reference to the first node. If the list is empty then the head is a null reference.
Linked list is a dynamic data structure.

Node is represented as

Struct node
{
int data;
struct node *next;
}
Linked List Operations
 Display/ Traverse− Displays the complete list.
 Search − Searches an element using the given key.
 Insertion − Adds an element at the beginning of the list.
 Deletion − Deletes an element at the beginning of the list.
Traversing/Display
Start with the head and access each node until you reach null. Do not change the head reference
Algorithm for Traversing a Linked List

N Venkata Vinod Kumar 18


Algorithm to Search a Linked List

Step 1: [Initialize] Set Ptr = Start


Step 2: Repeat Step 3 While Ptr! = Null
Step 3: If Val = Ptr->Data
Set Pos = Ptr
Go To Step 5
Else
Set Ptr = Ptr->Next
[End of If]
[End of Loop]
Step 4: Set Pos = Null
Step 5: Exit

Inserting a Node at the Beginning

Suppose we want to add a new node with the data 9 and add it as the first node of the list. Then the
changes will do in the linked list. Allocate the memory for the new node and initialize its DATA part to 9.
Add the new node as the first node of the list by making the NEXT part of the new node contain the address
of the START. Now make START to point to the first node of the list.

Algorithm for GETNODE(NODE) to get a pointer of a memory block which suits the type NODE:

N Venkata Vinod Kumar 19


Algorithm to Insert a New Node in the Beginning

N Venkata Vinod Kumar 20


Example:

Inserting a Node at the End


If we want to add a node with data 9 as the last node of the list, Then insert a new node at the end of a linked
list. Take pointer variable initialize with START. That is the pointer now points to the first node of the linked
list. With the help of loop will traverse through the linked list to reach the last node. Once reached the last
node the next of the last node to store the address of the new node, then last node next will contains NULL.
Algorithm to Insert a New Node at the End of the Linked List

Example:
N Venkata Vinod Kumar 21
INSERTING A NODE AT ANY POSITION:

N Venkata Vinod Kumar 22


Deletions from List:
Like Insertions, there are also various cases of deletions
(i) Deletion at front of the list
(ii) Deletion at end of the list
(iii) Deletion at any position in the list
Return Node (PTR):
RETURNNODE (PTR) returns a node having pointer PTR to the free pool of storage.
Algorithm for Return Node (PTR):

Deleting the First Node


Algorithm to Delete the First Node from the Linked List

N Venkata Vinod Kumar 23


Ex:

ALGORITHM TO DELETE THE LAST NODE:

N Venkata Vinod Kumar 24


Ex:

DELETION AT ANY POSITION IN THE LIST


ALGORITHM:

N Venkata Vinod Kumar 25


Ex:

N Venkata Vinod Kumar 26


Linked Stacks and Queues:

Data link
top

Front Rear

. data link
.

. Linked Queues
.

Linked Stacks

Directions of links for both the stack and queue are such as to facilitate easy insertion and deletion of nodes.
In the case of figure (a), one can easily add a node at the top or delete one from the top. In figure (b), one can
easily add a node at the rear and both addition and deletion can be performed at the front, though for a queue
we normally would not wish to add nodes at the front.
We assume initial condition for the stack is:
Top[i]=NULL, 0<=i<MAX_STACKS
And boundary condition is:
Top[i]=NULL iff the ith stack is empty
Functions push and pop, add and delete items to/from a stack. The code for each is straight forward. Function
push creates a new node, temp, and places item in the data field and top in the link field. The variable top is
then changed to point to temp. A typical function call to add an element to the ith stack would be push
(i,item). Function of returns the top element and changes top to point to the address contained in its link field.
The removed node is then returned to system memory. A typical function call to delete an element from the
ith stack would be item=pop(i);

void push(int i, element item)


{
stackPointer temp;
MALLOC(temp, sizeof(*temp);
temp data = item;
temp link = top[i];
top[i] = temp;
}

Add to a linked list

N Venkata Vinod Kumar 27


Element pop(int i)
{
stackPointer temp = top[i];
element item;
if(!item) Delete from a linked stack
return stackEmpty();
item = temp data;
top[i] = temp link;
free(temp);
return item;
}

To represent m<=MAX_QUEUES queues simultaneously, we begin with the declarations:


We assume that the initial condition for the queue is:
Front[i]=NULL, 0<=i<MAX_QUEUES
And the boundary is:
Front[i]=NULL, iff the ith queue is empty
Functions addq and deleteq implement the add and delete operations for multiple queues. Function addq is
more complex than push because we must check for an empty queue. If the queue is empty, we change front
to point to the new node; otherwise we change rear’s link field to point to the new node. In either case, we
then change rear to point to the new node. Function deleteq is similar to pop since we are removing a node
that is currently at the start of the list. Typical function call would be addq(i,item) and item=delete(i);

void addq(i,item)
{
queuePointer temp;
MALLOC(temp,sizeof(*temp));
temp data = item;
temp link = NULL;
if(front[i])
rear[i] link = temp;
else
front[i] = temp;
rear[i] = temp;
}
Additional List Operations
Operations for Chains
Inverting a chain is another useful operation.
Typedef struct list Node *listPointer;
Typedef struct
{
Char data;
listPointer link;
}listNode;

For a list of length>=1 nodes, the while loop is executed length times and so the computing time is linear or
O(length).
listPointer invert(listPointer lead)
{

N Venkata Vinod Kumar 28


listPointer middle, trail;
middle=NULL;
while(lead)
{ Inverting a list
trail=middle;
middle=lead;
lead=lead link;
middle link=trail;
}
return middle;
}

Merging two single linked lists into one list:


Two single linked lists, namely Ll and L2 are available and we want to merge the list L2 after Ll. Also
assume that, HEADER1 and HEADER2 are the header nodes of the lists L1 and L2, respectively. Merging
can be done by setting the pointer of the link field of the last node in the list L1 with the pointer of the first
node in L2. The header node in the list L2 should be returned to the pool of free storage. Merging two single
linked lists into one list is illustrated in Figure.

Another function is one that concatenates two chains, ptr1 and ptr2. The complexity of this function is
O(length of list ptr1).

ListPointer concatenate(listPointer ptr1, listPointer ptr2)


{
ListPointer temp; Concatenating singly linked lists
If(!ptr1) return ptr2;
If(!ptr2) return ptr1;
For(temp=ptr1; temp link; temp=temp link)
temp link=ptr2;
}

N Venkata Vinod Kumar 29


Operations for Circular Lined Lists:
In a single linked list, the link field of the last node is null (hereafter a single linked list may be read as
ordinary linked list), but a number of advantages can be gained if we utilize this link field to store the pointer
of the header node. A linked list where the last node points the header node is called the circular linked list.
Figure shows a pictorial representation of a circular linked list.

By keeping a pointer last to the last node in the list rather than to the first, we are able to insert an element at
both the front and end with ease. Had we kept a pointer to the first node instead of the last node, inserting at
the front would require us to must move down the entire length of the list until we find the last node so that
we can change the pointer in the last node to point to the new first node.

Void insertFront(listPointer *last,listPointer node)


{
if(!(*last))
{
*last=node; Inserting at the front of a list
node link=node;
}
else
{
Node link=(*last) link;
(*last) link=node;
}
}

Int length(listPointer last)


{
listPointer temp;
int count=0;
if(last)
{
Temp=last; finding the length of a circular list
Do
{
Count++;
Temp=temp link;
}while(temp!=last);
}
Return count;
}

N Venkata Vinod Kumar 30


Doubly(e) Linked List:
In a single linked list one can move beginning from the header node to any node in one direction only (from
left to right). This is why a single linked list is also termed a one-way list. On the other hand, a double linked
list is a two-way list because one can move in either direction, either from left to right or from right to left.
This is accomplished by maintaining two link fields instead of one as in a single linked list. A structure of a
node for a double linked list is represented as in Figure.

From the figure, it can be noticed that two links, viz. RLINK and LLINK, point to the nodes on the right side
and left side of the node, respectively. Thus, every node, except the header node and the last node, points to
its immediate predecessor and immediate successor.
Operations on a Double Linked List
In this section, only the insertion and deletion operations are discussed.
Inserting a node into a double linked list
Figure shows a schematic representation of various cases of inserting a node into a double linked list. Let us
consider the algorithms of various cases of insertion.
Inserting a node in the front
The algorithm Insertliront Dl: is used to define the insertion operation in a double linked list.

N Venkata Vinod Kumar 31


Inserting a node at the end
The algorithm InsertEnd_DL is to insert a node at the end into a double linked list.

Inserting a node at any position in the list :


The algorithm InsertAny_DL is used to insert a node at any position into a double linked list.

N Venkata Vinod Kumar 32


Deleting a node from a double linked list
Deleting a node from a double linked list may take place from any position in the list, as shown in Figure.
Let us consider each of those cases separately. Deleting a node from the front of a double linked list

N Venkata Vinod Kumar 33


Deleting a node at the end of a double linked list

Deleting a node from any position in a double linked list

Circular double Linked List:


The advantages of both double linked list and circular linked list are incorporated into another type of list
structure called circular double linked list and it is known to be the best of its kind. Figure shows a schematic
representation of a circular double linked list.
Here, note that the RLINK (right link) of the rightmost node and LLINK (left link) of the leftmost node
contain the address of the header node; again the RLINK and LLINK of the header node contain the address
of the rightmost node and the leftmost node, respectively. An empty circular double linked list is represented
as shown in Figure. In case of an empty list, both LLINK and RLINK of the header node point to itself.

N Venkata Vinod Kumar 34


Insertion into a doubly circular linked list:
void dinsert(nodePointer node, nodePointer newnode)
{ Insertion into a doubly linked list
newnode→llink=node;
newnode→rlink=node→rlink;
node→rlink→llink=newnode;
node→rlink=newnode;
}

Deletion from a doubly circular linked list:


void ddelete(nonePointer node, nodePointer deleted)
{
if(node==deleted)
printf("Deletion of header node not permitted.\n");
else
{
deleted→llink→rlink=deleted→rlink;
deleted→rlink→llink=deleted→llink;
free(deleted);
}
}
Before
First -

N Venkata Vinod Kumar 35


After
First

N Venkata Vinod Kumar 36


UNIT – 3
Trees :Introduction, Binary Trees, Binary Tree Traversals, Additional Binary Tree Operations, Binary Search
Trees, Counting Binary Trees, Optimal Binary search Trees, AVL Trees. B-Trees: B- Trees, B + Trees.

TREES
TREE:
• This is a finite set of one or more nodes such that
1) There is a specially designated node called root.
2) Remaining nodes are partitioned into disjoint sets T1, T2. . . . . Tn where each of these are called sub
trees of root (Figure 5.2).
Consider the tree shown below

Tree Terminology

Root
✓ The first node from where the tree originates is called as a root node.
✓ In any tree, there must be only one root node.
✓ We can never have multiple root nodes in a tree data structure.

Here, node A is the only root node

N VENKATA VINOD KUMAR 1


Edge
✓ The connecting link between any two nodes is called as an edge.
✓ In a tree with n number of nodes, there is exactly (n-1) number of edges.

Example

Parent
✓ The node which has a branch from it to any other node is called as a parent node.
✓ In other words, the node which has one or more children is called as a parent node.
✓ In a tree, a parent node can have any number of child nodes.
Example

Here,
• Node A is the parent of nodes B and C
• Node B is the parent of nodes D, E and F
• Node C is the parent of nodes G and H
• Node E is the parent of nodes I and J
• Node G is the parent of node K
Child
✓ The node which is a descendant of some node is called as a child node.
✓ All the nodes except root node are child nodes.
Example

N VENKATA VINOD KUMAR 2


Here,
• Nodes B and C are the children of node A
• Nodes D, E and F are the children of node B
• Nodes G and H are the children of node C
• Nodes I and J are the children of node E
• Node K is the child of node G

Siblings
✓ Nodes which belong to the same parent are called as siblings.
✓ In other words, nodes with the same parent are sibling nodes.
Example

Here,
• Nodes B and C are siblings
• Nodes D, E and F are siblings
• Nodes G and H are siblings
• Nodes I and J are siblings

Degree
✓ Degree of a node is the total number of children of that node.
✓ Degree of a tree is the highest degree of a node among all the nodes in the tree.

Example

Here,
• Degree of node A = 2
• Degree of node B = 3
• Degree of node C = 2
• Degree of node D = 0

N VENKATA VINOD KUMAR 3


• Degree of node E = 2
• Degree of node F = 0
• Degree of node G = 1
• Degree of node H = 0
• Degree of node I = 0
• Degree of node J = 0
• Degree of node K = 0

Internal Node
✓ The node which has at least one child is called as an internal node.
✓ Internal nodes are also called as non-terminal nodes.
✓ Every non-leaf node is an internal node.
Example

Here, nodes A, B, C, E and G are internal nodes.


Leaf Node
✓ The node which does not have any child is called as a leaf node.
✓ Leaf nodes are also called as external nodes or terminal nodes.

Here, nodes D, I, J, F, K and H are leaf nodes.


Level
✓ In a tree, each step from top to bottom is called as level of a tree.
✓ The level count starts with 0 and increments by 1 at each level or step.
Example

N VENKATA VINOD KUMAR 4


Height
✓ Total number of edges that lies on the longest path from any leaf node to a particular node is called as
height of that node.
✓ Height of a tree is the height of root node.
✓ Height of all leaf nodes = 0
Example

Here
• Height of node A = 3
• Height of node B = 2
• Height of node C = 2
• Height of node D = 0
• Height of node E = 1
• Height of node F = 0
• Height of node G = 1
• Height of node H = 0
• Height of node I = 0
• Height of node J = 0
• Height of node K = 0

Depth

✓ Total number of edges from root node to a particular node is called as depth of that node.
✓ Depth of a tree is the total number of edges from root node to a leaf node in the longest path.
✓ Depth of the root node = 0
✓ The terms “level” and “depth” are used interchangeably.

Example

Here
• Depth of node A = 0
• Depth of node B = 1
• Depth of node C = 1
N VENKATA VINOD KUMAR 5
• Depth of node D = 2
• Depth of node E = 2
• Depth of node F = 2
• Depth of node G = 2
• Depth of node H = 2
• Depth of node I = 3
• Depth of node J = 3
• Depth of node K = 3

Sub-tree
✓ In a tree, each child from a node forms a sub-tree recursively.
✓ Every child node forms a sub-tree on its parent node.
Example

Forest
✓ A forest is a set of disjoint trees.

Advantages of Tree
✓ Tree reflects structural relationships in the data.
✓ It is used to represent hierarchies.
✓ It provides an efficient insertion and searching operations.
Trees are flexible. It allows to move subtrees around with minimum effort.
TERMINOLOGIES USED IN A TREE
• Node contains
✓ item of information &
✓ links to other nodes
• Number of subtrees of a node is called its degree.
For e.g., degree of A=3; degree of C=1
• Nodes with degree=0 are called terminal (leaf) nodes (For e.g., K, L, F, G, M, I, J) whereas other nodes

N VENKATA VINOD KUMAR 6


are referred to as non-terminals (For e.g., B, E, F, C, H, I, J).
• The subtrees of a node A are the children of A. A is the parent of its children. For e.g., children of D
are H, I and J. Parent of D is A.
• Children of same parent are called siblings. For e.g., H, I and J are siblings.
• The maximum number of children that is possible for a node is known as Degree of a node. Ex: A-B-E-
K in above fig.
• Degree of a tree is maximum of the degree of the nodes in the tree. Degree of given tree=3.
• Ancestors of a node are all nodes along the path from root to that node.
For e.g., ancestors of M are A, D and H
• If a node is at level 'l', then its children are at level 'l+1'.
• Height (or depth) of a tree is defined as maximum level of any node in the tree. For e.g., Height of
given tree = 4.

REPRESENTATION OF TREES
A tree can be represented in three forms, namely:
1) List representation
2) Left-child right-sibling representation
3) Degree-two tree representation (Binary Tree)
LIST REPRESENTATION
Consider the tree shown below

• The tree can be drawn as a list: (A(B(E(K,L),F),C(G),D(H(M),I,J)))


• The information in the root node comes first, followed by a list of subtrees of that node.
• Each tree-node can be represented by a memory-node that has
➢ Fields for data &
➢ Pointers to children of tree-node (Figure 5.3)

N VENKATA VINOD KUMAR 7


• For a tree of degree ‘k’, we can use the node-structure as shown below (Figure 5.4).

LEFT CHILD-RIGHT SIBLING REPRESENTATION


• Figure 5.5 shows the node-structure used in left child-right sibling representation.
Data
Left child Right sibling
Figure 5.5: Left child-Right sibling node structure
• Left-child field of each node points to its leftmost child (if any),and
right-sibling field points to the closest right sibling(if any). (Figure 5.6)

DEGREE-TWO TREE REPRESENTATION


• To obtain the two-degree representation of a tree, we simply rotate the right-sibling pointers in a left
child-right sibling tree clockwise by 45 degrees.
• In this representation, we refer to 2 children of a node as left & right children (Fig 5.7).
• Left child-right child trees are also known as binary trees.

N VENKATA VINOD KUMAR 8


BINARY TREE
• This is a finite set of nodes that is either empty or consists of
✓ A root &
✓ Two disjoint binary trees called left subtrees and right subtrees
ADT Binary_Tree(abbreviated BinTree) is
objects: a finite set of nodes either empty or consisting of a root node, left Binary_Tree, and right
Binary_Tree.
functions:
for all bt, bt1, bt2 € BinTree, item € element
Bintree Create() ::= creates an empty binary tree
Boolean IsEmpty(bt) ::= if (bt==empty binary tree)
return TRUE else return FALSE
BinTree MakeBT(bt1, item, bt2) ::= return a binary tree whose left subtree is bt1, whose
right subtree is bt2, and whose root node contains the
data item
Bintree Lchild(bt) ::= if (IsEmpty(bt))

N VENKATA VINOD KUMAR 9


return error else return the left subtree of bt.
element Data(bt) ::= if (IsEmpty(bt))
return error else return the data in the root node of bt
Bintree Rchild(bt) ::= if (IsEmpty(bt)) return error else return the right
subtree of bt
• Difference between a binary tree and a tree:
1) There is no tree having zero nodes, but there is an empty binary tree.
2) In a binary tree, we distinguish between the order of the children while in a tree we do not. i.e., in the
case of binary tree a node may have at most two children(that is, a tree having a degree=2), whereas in
the case of tree, a node may have any number of children.
TYPES OF BINARY TREE
1) Skewed tree is a tree consisting of only left subtree or only right subtree (Figure 5.10a).
2) A binary tree is a full binary tree if it contains the maximum possible number of nodes at all levels.
Full binary tree is a binary tree of depth k having 2k-1 nodes, k>=0 (Figure 5.11).
3) Complete tree is a binary tree in which every level except possibly last level is completely filled. A binary
tree with n nodes & depth k is complete iff its nodes correspond to nodes numbered from 1 to n in full binary
tree of depth k (Figure 5.10B).

PROPERTIES OF BINARY TREES


• The maximum number of nodes on level 'i' of a binary tree is 2i-1, i>=1. (For e.g. maximum number of
nodes on level 4=24-1=23=8).
• The maximum number of nodes in a binary tree of depth 'k' is 2k-1, k>=1. (For e.g. maximum number of
nodes with depth 4=24-1=16-1=15).
• Relation between number of leaf nodes and degree-2 nodes: For any non-empty binary tree ‘T’, if n0 is
the number of leaf nodes and n2 the number of nodes of degree 2, then n0=n2+1.
BINARY TREE REPRESENTATIONS
A binary tree can be represented in two forms, namely: 1) Array Representation 2) Linked Representation
ARRAY REPRESENTATION
• We can use a one-dimensional array to store nodes of binary tree (Figure 5.12).
Theorem: If a complete binary tree with ‘n’ nodes is represented sequentially, then for any node with index i(
1<=i<=n) ,we have
1) parent(i) is at [i/2] if i!=1.
If i=1, i is the root and has no parent.
2) leftChild(i) is at 2i if 2i<=n.
If 2i>n, then i has no left child.

N VENKATA VINOD KUMAR 10


3) rightChild(i) is at 2i+1<=n.
If 2i+1>=n, then i has no right child.
• Consider the tree shown below

Advantage: For complete binary tree, array representation is ideal, as no space is wasted.
Disadvantage: For skewed tree, less than half the array is utilized. In the worst case, a skewed tree of depth k
will require 2k-1 spaces. Of these, only k will be used.
LINKED REPRESENTATION
• Shortcoming of array representation: Insertion and deletion of nodes from middle of a tree requires
movement of potentially many nodes to reflect the change in level number of these nodes. These
problems can be overcome easily through the use of a linked representation (Figure 5.14).
• Each node has three fields:
1) leftChild,
2) data and
3) rightChild (Figure 5.13).
typedef struct node *treePointer;
typedef struct
{
int data;
treePointer leftChild,rightChild;
}
node;
• Root of tree is stored in the data member 'root' of Tree. This data member serves as access-pointer to the
tree.

N VENKATA VINOD KUMAR 11


BINARY TREE TRAVERSALS
• Tree traversal refers to process of visiting all nodes of a tree exactly once (Figure 5.16).
• There are 3 techniques, namely:
1) Inorder traversal(LVR);
2) Preorder traversal(VLR);
3) Postorder traversal(LRV). (Let L= moving left, V= visiting node and R=moving right).
• In postorder, we visit a node after we have traversed its left and right subtrees.
In preorder, the visiting node is done before traversal of its left and right subtrees.
In inorder, firstly node’s left subtrees is traversed, then node is visited and finally node’s
right subtrees is traversed.

INORDER TRAVERSAL (LVR):


• Inorder traversal calls for moving down tree toward left until you can go no farther (Program 5.1).
• Then, you "visit" the node, move one node to the right and continue.

N VENKATA VINOD KUMAR 12


• If you cannot move to the right, go back one more node.
void inorder(tree_pointer ptr)
{
/* inorder tree traversal */
if (ptr)
{
inorder(ptr->left_child);
printf(“%d”, ptr->data);
indorder(ptr->right_child);
}
}
Program 5.1: Inorder traversal of binary tress

Call of Value in Action inorder In root Value


inorder root Action
1 + 11 C
2 * 12 NULL
3 * 11 C printf
4 / 13 NULL
5 A 2 * printf
6 NULL 14 D
5 A printf 15 NULL
7 NULL 14 D printf
4 / printf 16 NULL
8 B 1 + printf
9 NULL 17 E
Figure: Trace of 8 B printf 18 NULL Program 5.1

Each 10 NULL 17 E printf step of the trace
shows 3 * printf 19 NULL the call of
inorder, the value in the
root, and whether or not the printf function is invoked (Figure 5.17).
• Since there are 19 nodes in the tree, inorder() is invoked 19 times for the complete traversal. The nodes
of figure 5.16 would be output in an inorder as
A/B*C*D+E
PREORDER TRAVERSAL (VLR):
• Visit a node, traverse left, and continue (Program 5.2).
• When you cannot continue, move right and begin again or move back until you can move right and
resume.
• The nodes of figure 5.16 would be output in preorder as

N VENKATA VINOD KUMAR 13


+**/ABCDE

POSTORDER TRAVERSAL (LRV):


• Visit a node, traverse right, and continue (Program 5.3).
• When you cannot continue, move left and begin again or move back until you can move left and resume.
• The nodes of figure 5.16 would be output in postorder as
AB/C*D*E+

ITERATIVE INORDER TRAVERSAL


• wrt figure 5.17, a node that has no action indicates that the node is added to the stack, while a node that
has a printf action indicates that the node is removed from the stack (Program 5.4).
• The left nodes are stacked until a null node is reached, the node is then removed from the stack, and the
node's right child is stacked.
• The traversal continues with the left child.
• The traversal is complete when the stack is empty.

LEVEL-ORDER TRAVERSAL
• This traversal uses a queue (Program 5.7).

N VENKATA VINOD KUMAR 14


• We visit the root first, then the root's left child followed by the root's right child.
• We continue in this manner, visiting the nodes at each new level from the leftmost node to the rightmost
node.

Example for Traversals:

Here Visit 1 is: VLR(Vector, Left, Right)


Visit 2 is: LVR (Left, Vector, Right)

N VENKATA VINOD KUMAR 15


Visit 3 is: LRV (Left, Right, Vector)
Visit 4 is: RLV (Right, Left, Vector)
Visit 5 is: RVL (Right, Vector, Left)
Visit 6 is: VRL (Vector, Right, Left)

Binary Tree
✓ Binary tree is a special tree data structure in which each node can have at most 2 children.
✓ Thus, in a binary tree, each node has either 0 child or 1 child or 2 children.

Binary Tree Properties


1 Minimum number of nodes in a binary tree of height H = H + 1
Example
To construct a binary tree of height = 4, we need at least 4 + 1 = 5 nodes.

2. Maximum number of nodes in a binary tree of height H = 2H+1 – 1

Example
Maximum number of nodes in a binary tree of height 3
= 23+1 – 1
= 16 – 1
= 15 node
Thus, in a binary tree of height = 3, maximum number of nodes that can be inserted = 15.

N VENKATA VINOD KUMAR 16


We cannot insert more number of nodes in this binary tree

3. Total Number of leaf nodes in a Binary Tree = Total Number of nodes with 2 children + 1

Example

Here

Number of leaf nodes = 3


Number of nodes with 2 children = 2

Clearly, number of leaf nodes is one greater than number of nodes with 2 children.
This verifies the above relation.

4. Maximum number of nodes at any level ‘L’ in a binary tree = 2L

Example
Maximum number of nodes at level-2 in a binary tree
= 22
=4

Thus, in a binary tree, maximum number of nodes that can be present at level-2 = 4.

N VENKATA VINOD KUMAR 17


Types of Binary Trees
✓ Rooted Binary Tree
✓ Full / Strictly Binary Tree
✓ Complete / Perfect Binary Tree
✓ Almost Complete Binary Tree
✓ Skewed Binary Tree

1. Rooted Binary Tree


A rooted binary tree is a binary tree that satisfies the following 2 properties
✓ It has a root node.
✓ Each node has at most 2 children.
Example:

2. Full / Strictly Binary Tree-


A binary tree in which every node has either 0 or 2 children is called as a full binary tree.
Full binary tree is also called as strictly binary tree.
Example:

Here
• First binary tree is not a full binary tree.
• This is because node C has only 1 child.

3. Complete / Perfect Binary Tree

A complete binary tree is a binary tree that satisfies the following 2 properties
✓ Every internal node has exactly 2 children.
✓ All the leaf nodes are at the same level.
Complete binary tree is also called as Perfect binary tree.

N VENKATA VINOD KUMAR 18


Example:

Here
• First binary tree is not a complete binary tree.
• This is because all the leaf nodes are not at the same level.

4. Almost Complete Binary Tree

An almost complete binary tree is a binary tree that satisfies the following 2 properties
✓ All the levels are completely filled except possibly the last level.
✓ The last level must be strictly filled from left to right.
Example:

5. Skewed Binary Tree


A skewed binary tree is a binary tree that satisfies the following 2 properties
✓ All the nodes except one node have one and only one child.
✓ The remaining node has no child.
OR
✓ A skewed binary tree is a binary tree of n nodes such that its depth is (n-1).
Example:

N VENKATA VINOD KUMAR 19


Binary Tree Representations
A binary tree data structure is represented using two methods. Those methods are as follows...
✓ Array Representation
✓ Linked List Representation

Consider the following binary tree

1. Array Representation of Binary Tree


In array representation of a binary tree, we use one-dimensional array (1-D Array) to represent a binary
tree.

To represent a binary tree of depth 'n' using array representation, we need one dimensional array with a
maximum size of 2n + 1.

2. Linked List Representation of Binary Tree


We use a double linked list to represent a binary tree. In a double linked list, every node consists of three
fields. First field for storing left child address, second for storing actual data and third for storing right child
address.

The above example of the binary tree represented using Linked list representation is

Linked Representation
struct node

N VENKATA VINOD KUMAR 20


{
int data;
struct node *left;
struct node *right;
};

Binary Tree Traversals


Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes
are connected via edges (links) we always start from the root (head) node. That is, we cannot randomly access a
node in a tree. There are three ways which we use to traverse a tree.

✓ Pre-order Traversal
✓ In-order Traversal
✓ Post-order Traversal

Pre-order Traversal
In this traversal method, the root node is visited first, then the left sub tree and finally the right sub tree.

We start from A, and following pre-order traversal, we first visit A itself and then move to its left sub
tree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The output of pre-order
traversal of this tree will be
A→B→D→E→C→F→G
Steps
✓ Visit the root node
✓ traverse the left sub-tree in pre-order
✓ traverse the right sub-tree in pre-order
Root → Left → Right

Algorithm
Step 1: Repeat Steps 2 to 4 while TREE! = NULL

N VENKATA VINOD KUMAR 21


Step 2: Write TREE -> DATA
Step 3: PREORDER (TREE -> LEFT)
Step 4: PREORDER (TREE -> RIGHT)
[END OF LOOP]
Step 5: END

Traverse the entire tree starting from the root node keeping yourself to the left.

Example
Traverse the following binary tree by using pre-order traversal

✓ Since, the traversal scheme, we are using is pre-order traversal, therefore, the first element to be printed
is 18.
✓ Traverse the left sub-tree recursively. The root node of the left sub-tree is 211, print it and move to left.
✓ Left is empty therefore print the right children and move to the right sub-tree of the root.
✓ 20 are the root of sub-tree therefore, print it and move to its left. Since left sub-tree is empty therefore
move to the right and print the only element present there i.e. 190.
✓ Therefore, the printing sequence will be 18, 211, 90, 20, and 190.
Applications
✓ Preorder traversal is used to get prefix expression of an expression tree.
✓ Preorder traversal is used to create a copy of the tree.

In order Traversal
In this traversal method, the left sub tree is visited first, then the root and later the right sub-tree. We
should always remember that every node may represent a sub tree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order.

N VENKATA VINOD KUMAR 22


We start from A, and following in-order traversal, we move to its left sub tree B. B is also traversed in-
order. The process goes on until all the nodes are visited. The output of in order traversal of this tree will be −
D→B→E→A→F→C→G
Steps
✓ Traverse the left sub-tree in in-order
✓ Visit the root
✓ Traverse the right sub-tree in in-order
Left → Root → Right
Algorithm
Step 1: Repeat Steps 2 to 4 while TREE! = NULL
Step 2: INORDER (TREE –> LEFT)
Step 3: Write TREE –> DATA
Step 4: INORDER (TREE –> RIGHT)
[END OF LOOP]
Step 5: END
Example

Traverse the following binary tree by using in-order traversal.

✓ Print the left most node of the left sub-tree i.e. 23.
✓ Print the root of the left sub-tree i.e. 211.
✓ Print the right child i.e. 89.
✓ Print the root node of the tree i.e. 18.

N VENKATA VINOD KUMAR 23


✓ Then, move to the right sub-tree of the binary tree and print the left most node i.e. 10.
✓ Print the root of the right sub-tree i.e. 20.
✓ Print the right child i.e. 32.
✓ Hence, the printing sequence will be 23, 211, 89, 18, 10, 20, and 32.
Application
In order traversal is used to get infix expression of an expression tree.
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we traverse the left sub tree,
then the right sub tree and finally the root node.

We start from A, and following Post-order traversal, we first visit the left subtree B. B is also traversed
post-order. The process goes on until all the nodes are visited. The output of post-order traversal of this tree will
be
D→E→B→F→G→C→A
Steps
✓ Traverse the left sub-tree in post-order
✓ Traverse the right sub-tree in post-order
✓ visit the root
Left → Right → Root
Algorithm
Step 1: Repeat Steps 2 to 4 while TREE! = NULL
Step 2: POSTORDER (TREE -> LEFT)
Step 3: POSTORDER (TREE -> RIGHT)
Step 4: Write TREE -> DATA
[END OF LOOP]
Step 5: END
Example
Traverse the following tree by using post-order traversal

N VENKATA VINOD KUMAR 24


✓ Print the left child of the left sub-tree of binary tree i.e. 23.
✓ Print the right child of the left sub-tree of binary tree i.e. 89.
✓ Print the root node of the left sub-tree i.e. 211.
✓ Now, before printing the root node, move to right sub-tree and print the left child i.e. 10.
✓ Print 32 i.e. right child.
✓ Print the root node 20.
✓ Now, at the last, print the root of the tree i.e. 18.
✓ The printing sequence will be 23, 89, 211, 10, 32, and 18.
Applications
✓ Post order traversal is used to get postfix expression of an expression tree.
✓ Post order traversal is used to delete the tree.
✓ This is because it deletes the children first and then it deletes the parent.

Breadth First Traversal


✓ Breadth First Traversal of a tree prints all the nodes of a tree level by level.
✓ Breadth First Traversal is also called as Level Order Traversal.
Example

Application
Level order traversal is used to print the data in the same order as stored in the array representation of a
complete binary tree.

Additional Binary Tree Operations

Copying Binary Trees


Using the definition of a binary tree and the recursive version of the traversals, we can easily write other
routines for working with binary trees. For instance, if we want to implement a copy constructor to initialize a
binary tree with an exact copy of another binary tree, we can modify the postorder traversal algorithm only
slightly to get Program

Testing Equality
Another problem that is especially easy to salve using recursion is determining the equivalence of two
binary trees. Binary trees are equivalent if they have the same topology and the information in corresponding
nodes is identical. By the same topology we mean that every branch in one tree corresponds to a branch in the
second in the same order and vice versa.
The function operator=O calls the helper function equal (Program 5.10) which traverses the binary trees
in preorder, though any order could be used.

N VENKATA VINOD KUMAR 25


The Satisftabillty Problem
Consider the set of formulas we can construct by taking variables x1,x2,…..xn and the operators ˄ (and),
˅ (or), and ¬ (not). These variables can hold only one of two possible values, true or false. The Set of
expressions that can be formed using these variables and operation is defined by the following rules:

(1) It variable is an expression.


(2) if x and y are expressions, then ¬x, x ˄ y, x˅ y are expressions.
(3) Parentheses can be used to alter the normal order of evaluation, which is ¬ before ˄ before ˅.

This set defines the formulas of the propositional calculations (other operations such as implication can be
expressed using ˅,˄ and ¬).
x1 ˅(x2˄¬x3)

is a formula (read as “x1 or x2 and not x3”). If x1 and x3 are false and x2 is true, then the value of expression is:
false ˅ (true ˄ ¬false)
=false ˅ true
= true
The satisfiability problem for formulas of propositional calculus asks if there is an assignment of values to the
variables that causes the value of the expression to be true.
Again, let us assume that our formula is already in a binary tree, say

(x1˄¬x2)˅(¬x1˄x3)˅¬x3
In the tree of below fig., the inorder traversal of this tree is:
x1˄¬x2˅¬x1˄x3˅¬x3
which is the infix form of the expression. The most obvious algorithm to determine satisfiability is to let (x1, x 2,
x3) take on all possible combinations of true and false .values and to check the formula for each combination.
For n variables there are 2n possible combinations of true = t and false =f. For example, for n = 3, the eight
combinations are: (t,t,t), (t,t,f), (t,f,t) (t,f,f),(f,t,t), (f,t,f), (f,f,t),(f,f,f). The algorithm will take O(g 2″). or
exponential time, where 8 is the time to substitute. values for x1, x2, ……xn and evaluate the expression.’

Propositional formula in a binary tree


To evaluate an expression we can traverse its tree in post order, evaluating sub trees until the entire
expression is reduced to a single value. This corresponds to the postfix evaluation of an arithmetic expression
that we saw earlier. Viewing this from the perspective of the tree representation, for every node we reach, the

N VENKATA VINOD KUMAR 26


values of its arguments (or children) have already been computed. So when we reach the v node on level 2, the
values of x1 ˄¬x2 and ¬x1˄x3 will already be available to us, and we can apply the rule for or. Notice that a
node. Containing ¬ has only a right branch, since ¬ is a unary operator. For the purposes of our evaluation
algorithm, we assume each node has four fields:

Left child data value Right child

This is the node structure. The left child and right child fields are similar to those used previously. The fields
data holds either the value of a variable or a propositional calculus operator, while value holds either a value of
TRUE and FALSE.

We define node structure in C as:


typedef enum
{
not,
and,
or,
true,
false
} logical;
typedef struct node *treePointer;
typedef struct
{
treePointer leftChild;
logical data;
short int value;
treePointer rightChild;
} node;

Also we assume that for leaf nodes, node →data contains the current value (i.e., Logical True or Logical False)
of the variable represented at this node. The first version of our algorithm for the satisfiability problem is
Program 5.11. In this, n is the number of variables in the formula and formula is the binary tree that represents
the formula.

Binary Search Trees

✓ Binary Search tree can be defined as a class of binary trees, in which the nodes are arranged in a specific
order. This is also called ordered binary tree.
✓ In a binary search tree, the value of all the nodes in the left sub-tree is less than the value of the root.
✓ Similarly, value of all the nodes in the right sub-tree is greater than or equal to the value of the root.
✓ This rule will be recursively applied to all the left and right sub-trees of the root.

N VENKATA VINOD KUMAR 27


A Binary search tree is shown in the above figure. As the constraint applied on the BST, we can see that
the root node 30 doesn't contain any value greater than or equal to 30 in its left sub-tree and it also doesn't
contain any value less than 30 in its right sub-tree.
Advantages of using binary search tree
✓ Searching become very efficient in a binary search tree since, we get a hint at each step, about which
sub-tree contains the desired element.
✓ The binary search tree is considered as efficient data structure in compare to arrays and linked lists. In
searching process, it removes half sub-tree at every step. Searching for an element in a binary search tree
takes o(log2n) time. In worst case, the time it takes to search an element is 0(n).
✓ It also speed up the insertion and deletion operations as compare to that in array and linked list.

Create the binary search tree using the following data elements.
43, 10, 79, 90, 12, 54, 11, 9, 50
✓ Insert 43 into the tree as the root of the tree.
✓ Read the next element, if it is lesser than the root node element insert it as the root of the left sub-tree.
✓ Otherwise, insert it as the root of the right of the right sub-tree.

N VENKATA VINOD KUMAR 28


Operations on Binary Search Tree
Searching means finding or locating some specific element or node within a data structure. However,
searching for some specific node in binary search tree is pretty easy due to the fact that, element in BST is
stored in a particular order.

✓ Compare the element with the root of the tree.


✓ If the item is matched then return the location of the node.
✓ Otherwise check if item is less than the element present on root, if so then move to the left sub-tree.
✓ If not, then move to the right sub-tree.
✓ Repeat this procedure recursively until match found.
✓ If element is not found then return NULL.

Algorithm:
Search (ROOT, ITEM)

Step 1: IF ROOT -> DATA = ITEM OR ROOT = NULL


Return ROOT
ELSE
IF ROOT < ROOT -> DATA
Return search (ROOT -> LEFT, ITEM)
ELSE
Return search (ROOT -> RIGHT, ITEM)
[END OF IF]

N VENKATA VINOD KUMAR 29


[END OF IF]
Step 2: END

Example

Insertion
Insert function is used to add a new element in a binary search tree at appropriate location. Insert
function is to be designed in such a way that, it must node violate the property of binary search tree at each
value.

✓ Allocate the memory for tree.


✓ Set the data part to the value and set the left and right pointer of tree, point to NULL.
✓ If the item to be inserted, will be the first element of the tree, then the left and right of this node will
point to NULL.
✓ Else, check if the item is less than the root element of the tree, if this is true, then recursively perform
this operation with the left of the root.
✓ If this is false, then perform this operation recursively with the right sub-tree of the root.

Algorithm for Insert (TREE, ITEM)


Step 1: IF TREE = NULL
N VENKATA VINOD KUMAR 30
Allocate memory for TREE
SET TREE -> DATA = ITEM
SET TREE -> LEFT = TREE -> RIGHT = NULL
ELSE
IF ITEM < TREE -> DATA
Insert (TREE -> LEFT, ITEM)
ELSE
Insert (TREE -> RIGHT, ITEM)
[END OF IF]
[END OF IF]
Step 2: END
ITEM=95

N VENKATA VINOD KUMAR 31


1) We must first verify if the tree already contains the node with the same data (Figure 5.29 & Program 5.15).
2) If the search is successful, then the new node cannot be inserted into the binary search tree.
3) If the search is unsuccessful, then we can insert the new node at that point where the search terminated.

void insert_node(tree_pointer *node, int num)


{
tree_pointer ptr,
temp = modified_search(*node, num);
if (temp || !(*node))
{
ptr = (tree_pointer) malloc(sizeof(node));
if (IS_FULL(ptr))
{
fprintf(stderr, “The memory is full\n”);
exit(1);
}
ptr->data = num;
ptr->left_child = ptr->right_child = NULL;
if (*node)
if (num<temp->data)
temp->left_child=ptr;
else
temp->right_child = ptr;
else
*node = ptr;
}
}
Program: Insertion into a binary search tree
Deletion
Delete function is used to delete the specified node from a binary search tree. However, we must delete a
node from a binary search tree in such a way, that the property of binary search tree doesn't violate. There are
three situations of deleting a node from binary search tree.

1) Deletion of a leaf: To delete 35 from the tree of figure 5.29b, the left-child field of its parent is set to NULL.

N VENKATA VINOD KUMAR 32


2) Deletion of a non-leaf that has only one child: The node containing the dictionary pair to be deleted is freed,
and its single-child takes the place of the freed node. So, to delete the 5 from the tree in figure 5.29a, we simply
change the pointer from the parent node to the single-child node.
3) The pair to be deleted is in a non-leaf node that has two children: The pair to be deleted is replaced by either
the largest pair in its left subtree or the smallest one in its right subtree. For instance, if we wish to delete the
pair with key 30 from the tree in figure 5.29b, then we replace it by key 5 as shown in figure.

The node to be deleted is a leaf node


It is the simplest case, in this case, replace the leaf node with the NULL and simple free the allocated space.
In the following image, we are deleting the node 85, since the node is a leaf node, therefore the node
will be replaced with NULL and allocated space will be freed.

The node to be deleted has only one child.


In this case, replace the node with its child and delete the child node, which now contains the value
which is to be deleted. Simply replace it with the NULL and free the allocated space.
In the following Example, the node 12 is to be deleted. It has only one child. The node will be replaced
with its child node and the replaced node 12 (which is now leaf node) will simply be deleted.

N VENKATA VINOD KUMAR 33


The node to be deleted has two children.
It is a bit complexed case compare to other two cases. However, the node which is to be deleted is
replaced with its in-order successor or predecessor recursively until the node value (to be deleted) is placed on
the leaf of the tree. After the procedure, replace the node with NULL and free the allocated space.

In the following Example, the node 50 is to be deleted which is the root node of the tree. The in-order traversal
of the tree given below
6, 25, 30, 50, 52, 60, 70, 75
Replace 50 with its in-order successor 52. Now, 50 will be moved to the leaf of the tree, which will simply be
deleted.
LEFT>VECTOR (ROOT)>RIGHT

Algorithm for Delete (TREE, ITEM)

Step 1: IF TREE = NULL


Write "item not found in the tree" ELSE IF ITEM < TREE -> DATA
Delete (TREE->LEFT, ITEM)
ELSE IF ITEM > TREE -> DATA
Delete (TREE -> RIGHT, ITEM)
ELSE IF TREE -> LEFT AND TREE -> RIGHT
SET TEMP = find Largest Node (TREE -> LEFT)
SET TREE -> DATA = TEMP -> DATA
Delete (TREE -> LEFT, TEMP -> DATA)
ELSE
SET TEMP = TREE
IF TREE -> LEFT = NULL AND TREE -> RIGHT = NULL
SET TREE = NULL
ELSE IF TREE -> LEFT! = NULL
SET TREE = TREE -> LEFT
ELSE
SET TREE = TREE -> RIGHT
[END OF IF]
FREE TEMP
[END OF IF]
Step 2: END

N VENKATA VINOD KUMAR 34


JOINING BINARY TREE
• There are two types of join operation on a binary search tree:
1) threeWayJoin(small,mid,big): We simply create a new node and set its data field to mid, its left-
child pointer to small and its right-child pointer to big.
2) twoWayJoin(small,big):
i) If small is empty, then the result is big.
ii) If big is empty, then the result is small.
iii) If both are non-empty, then we have to first delete from 'small' the pair mid with the largest key.
After this, a 3-way join of small, mid and big must be performed.
SPLITTING BINARY TREE
• Splitting a binary search tree will produce three trees: small, mid and big.
1) If key is equal to root->data, then root->llink is the small, root->data is mid & root->rlink is big.
2) If key is lesser than the root->data, then the root's along with its right subtree would be in the big.
3) if key is greater than root->data, then the root’s along with its left subtree would be in the small

AVL Tree
AVL Tree is invented by GM Adelson - Velsky and EM Landis in 1962. The tree is named AVL in
honor of its inventors.
AVL Tree can be defined as height balanced binary search tree in which each node is associated with a
balance factor which is calculated by subtracting the height of its right sub-tree from that of its left sub-tree.
Tree is said to be balanced if balance factor of each node is in between -1, 0, 1, otherwise, the tree will
be unbalanced and need to be balanced

Definition:
An empty tree is height balanced. If T is a nonempty binary tree with T L and TR as its left and right subtrees
respectively, then T is height-balanced iff (1) TL and TR are height balanced and (2) │hL-hR│≤1 where hL and
hR are the heights of TL and TR, respectively.
Balance Factor (k) = height (left (k)) - height (right (k))
Definition:
The Balance factor, BF(T), of a node T in a binary tree is defined to be hL-hR, where hL and hR, respectively, are
the heights of the left and right subtrees of T. For any node T in an AVL tree, BF=-1,0, or 1.

✓ If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right sub-
tree.
✓ If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal height.
✓ If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right sub-
tree.
✓ An AVL tree is given in the following figure. We can see that, balance factor associated with each node
is in between -1, 0, and +1.

N VENKATA VINOD KUMAR 35


AVL Tree Properties
✓ Maximum possible number of nodes in AVL tree of height H = 2H+1 – 1
✓ Minimum number of nodes in AVL Tree of height H is given by a recursive relation- N(H) = N(H-1) +
N(H-2) + 1
✓ Minimum possible height of AVL Tree using N nodes = ⌊log2N⌋
✓ Maximum height of AVL Tree using N nodes is calculated using recursive relation- N(H) = N(H-1) +
N(H-2) + 1
NOTE
• If there are n nodes in AVL Tree, its maximum height can not exceed 1.44log2n.
• In other words, Worst case height of AVL Tree with n nodes = 1.44log2n.

Complexity for AVL Tree


Algorithm Average case Worst case
Space o(n) o(n)
Search o(log n) o(log n)
Insert o(log n) o(log n)
Delete o(log n) o(log n)
Rotations

SN Rotation Description
1 LL Rotation The new node is inserted to the left sub-tree of left sub-tree of critical node.

2 RR Rotation The new node is inserted to the right sub-tree of the right sub-tree of the critical node.

3 LR Rotation The new node is inserted to the right sub-tree of the left sub-tree of the critical node.

4 RL Rotation The new node is inserted to the left sub-tree of the right sub-tree of the critical node.

N VENKATA VINOD KUMAR 36


Single Left Rotation (LL Rotation)
In LL Rotation, every node moves one position to left from the current position. To understand LL
Rotation, let us consider the following insertion operation in AVL Tree...

Single Right Rotation (RR Rotation)


In RR Rotation, every node moves one position to right from the current position. To understand RR
Rotation, let us consider the following insertion operation in AVL Tree...

Left Right Rotation (LR Rotation)


The LR Rotation is a sequence of single left rotation followed by a single right rotation. In LR Rotation,
at first, every node moves one position to the left and one position to right from the current position. To
understand LR Rotation, let us consider the following insertion operation in AVL Tree...

Right Left Rotation (RL Rotation)

N VENKATA VINOD KUMAR 37


The RL Rotation is sequence of single right rotation followed by single left rotation. In RL Rotation, at
first every node moves one position to right and one position to left from the current position. To understand RL
Rotation, let us consider the following insertion operation in AVL Tree...

Operations on an AVL Tree


Due to the fact that, AVL tree is also a binary search tree therefore, all the operations are performed in the
same way as they are performed in a binary search tree. Searching and traversing do not lead to the violation in
property of AVL tree.
However, insertion and deletion are the operations which can violate this property and therefore, they need
to be revisited.

✓ Searching
✓ Insertion
✓ Deletion
Search Operation in AVL Tree
In an AVL tree, the search operation is performed with O (log n) time complexity. The search operation
in the AVL tree is similar to the search operation in a Binary search tree. We use the following steps to search
an element in AVL tree...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the value of root node in the tree.
Step 3 - If both are matched, then display "Given node is found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search element is smaller or larger than that node
value.
Step 5 - If search element is smaller, then continue the search process in left subtree.
Step 6 - If search element is larger, then continue the search process in right subtree.
Step 7 - Repeat the same until we find the exact element or until the search element is compared with the
leaf node.
Step 8 - If we reach to the node having the value equal to the search value, then display "Element is
found" and terminate the function.
Step 9 - If we reach to the leaf node and if it is also not matched with the search element, then display
"Element is not found" and terminate the function.

N VENKATA VINOD KUMAR 38


Insertion Operation in AVL Tree
In an AVL tree, the insertion operation is performed with O(log n) time complexity. In AVL Tree, a new
node is always inserted as a leaf node. The insertion operation is performed as follows...

Step 1 - Insert the new element into the tree using Binary Search Tree insertion logic.
Step 2 - After insertion, check the Balance Factor of every node.
Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.
Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then that tree is said to be
imbalanced. In this case, perform suitable Rotation to make it balanced and go for next operation.

Construct an AVL Tree by inserting numbers from 1 to 8

N VENKATA VINOD KUMAR 39


N VENKATA VINOD KUMAR 40
Deletion Operation in AVL Tree
The deletion operation in AVL Tree is similar to deletion operation in BST. But after every deletion
operation, we need to check with the Balance Factor condition. If the tree is balanced after deletion go for next
operation otherwise perform suitable rotation to make the tree Balanced.

B - Tree Data structure


In search trees like binary search tree, AVL Tree, Red-Black tree, etc., every node contains only one
value (key) and a maximum of two children. But there is a special type of search tree called B-Tree in which a
node contains more than one value (key) and more than two children. B-Tree was developed in the year 1972 by
Bayer and McCreight with the name Height Balanced m-way Search Tree. Later it was named as B-Tree.

B-Tree defined as

B-Tree is a self-balanced search tree in which every node contains multiple keys and has more than two
children.
Here, the number of keys in a node and number of children for a node depends on the order of B-Tree.
Every B-Tree has an order

B-Tree of Order ‘m’ properties:


Formal Definition:
A B-Tree of order m is an m-way search tree that either is empty or satisfies the following properties:
1) The root node has at least two children.
2) All nodes other than the root node and external nodes have at least [m/2] children.
3) All external nodes are at the same level.

✓ Property #1 - All leaf nodes must be at same level.


✓ Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
✓ Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2 children.
✓ Property #4 - If the root node is a non leaf node, then it must have at least 2 children.
✓ Property #5 - A non leaf node with n-1 keys must have n number of children.
✓ Property #6 - All the key values in a node must be in Ascending Order.

N VENKATA VINOD KUMAR 41


Observe that when m=3, all internal nodes of a B-Tree have a degree that is either 2 or 3 and when m=4, the
permissible degree for these nodes are 2, 3 and 4. For this reason, a B-Tree of order 3 is known as a 2-3 tree and
a B-Tree of order 4 is known as a 2-3-4 tree.
For example, B-Tree of Order 4 contains a maximum of 3 key values in a node and maximum of 4
children for a node.

Operations on a B-Tree
✓ Search
✓ Insertion
✓ Deletion
Search Operation in B-Tree
The search operation in B-Tree is similar to the search operation in Binary Search Tree. In a Binary
search tree, the search process starts from the root node and we make a 2-way decision every time (we go to
either left sub tree or right sub tree). In B-Tree also search process starts from the root node but here we make
an n-way decision every time. Where 'n' is the total number of children the node has. In a B-Tree, the search
operation is performed with O(log n) time complexity.

Step 1 - Read the search element from the user.


Step 2 - Compare the search element with first key value of root node in the tree.
Step 3 - If both are matched, then display "Given node is found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search element is smaller or larger than that key value.
Step 5 - If search element is smaller, then continue the search process in left subtree.
Step 6 - If search element is larger, then compare the search element with next key value in the same node and
repeated steps 3, 4, 5 and 6 until we find the exact match or until the search element is compared with last key
value in the leaf node.
Step 7 - If the last key value in the leaf node is also not matched then display "Element is not found" and
terminate the function.
Insertion Operation in B-Tree

In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue is always
attached to the leaf node only.
Step 1 - Check whether tree is Empty.

N VENKATA VINOD KUMAR 42


Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree as a root
node.
Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is added using
Binary Search Tree logic.
Step 4 - If that leaf node has empty position, add the new key value to that leaf node in ascending order
of key value within the node.
Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its parent node.
Repeat the same until the sending value is fixed into a node.
Step 6 - If the splitting is performed at root node then the middle value becomes new root node for the
tree and the height of the tree is increased by one.
Example
Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.

N VENKATA VINOD KUMAR 43


N VENKATA VINOD KUMAR 44
Deletion:
Example:

N VENKATA VINOD KUMAR 45


N VENKATA VINOD KUMAR 46
N VENKATA VINOD KUMAR 47
N VENKATA VINOD KUMAR 48
B+ - Trees:
A B+ - tree is a distinguished from B - trees as:
1) In a B+ - tree we have two types of nodes – index and data. The index nodes of B+ - tree correspond to the
internal nodes of a B-tree while the data nodes correspond to external nodes. The index nodes store keys and
pointers and the data nodes store elements (together with their keys but no pointers).
2) The data nodes are linked together, in left to right order, to form a doubly linked list.
Following fig gives an example of B+ - tree with order of 3. The data nodes are shaded while the index nodes
not. In figure data node can hold 3 elements while each index node can hold 2 keys.

Fig11.11: A B+ - tree of order 3

Definition:
A B+ - tree of order m is a tree that either is empty or satisfies the following properties:
1) All the data nodes are at the same level and are leaves. Data nodes contain elements only.
2) The index nodes define a B-tree of order m; each index node has keys but not elements.
3) Let
n, A0,(K1,A1),(K2,A2), …. (Kn, An)
Where Ai, 0≤i≤n≤m, are pointers to subtrees, and the Ki, 1≤i≤n≤m, are keys to be the format of some
index node. Let K0=-∞ and K n+1=∞. All the elements in the subtree Ai have key less than Ki+1 and
greater than or equal to Ki, 0≤i≤n.
Searching a B+ - tree:
B+ - tree support two types of searches –
➢ Exact match and
➢ Range

For example in above figure we can search for an element whose key is 32, we begin at the root A, which is an
index node. From the definition of B+ - tree we know that all elements in the left subtree of A have a key
smaller than 20; those in the subtree with root C have keys ≥20 and ≤40; and those in the subtree with root D
have keys ≥40. So, the search moves to the index node C. this is a procedure of exact search.
To search for all elements with keys in the range [16, 70], we proceed as in an exact match search for the start,
16, of the range. In our example 4 additional data nodes are examined.

N VENKATA VINOD KUMAR 49


Insertion into a B+ - tree:
In insertion when a data node becomes overfull, half the elements are moved into a new node; the key of the
smallest element so moved together with a pointer to the newly created data node are inserted into the parent
index node using the insertion procedure for a B-tree. The splitting of an index node is identical to splitting of
an internal node of a B-tree.
Consider inserting an element with key 27 into the B+ - tree of above fig. we first search for this key. The search
gets us to the data node that is the left child of C. since this data node contains no element with key 27 and since
this data node is not full, we insert the new element as the third element in this data node.
Next, consider the insertion of an element with key 14. The search for 14 gets us to the second data node, which
is full. Symbolically inserting the new element into this full node results in an overfull node with the key
sequence 12, 14, 16, 18. The overfull node split into two by moving the largest half of elements into a new data
node, which is then inserted into the doubly linked list of data nodes. The smallest key, 16, in this new data
node together with a pointer to the new data node are inserted in the parent index node B to get the
configuration of below fig.
Following fig shows the insertion of element with keys 14 and 86.

N VENKATA VINOD KUMAR 50


Deletion from a B+ - tree:
The definition of a B+ - tree does not specify a minimum occupancy for a data node. Following the split of an
overfull data node as well as the new each have at least [c/2] elements, where c is the capacity of a data node.
So, except when a data node is the root of the B+ - tree, its occupancy is at least [c/2]. We shall say that a non-
root node is deficient iff it has fewer than [c/2] elements; a root data node is deficient iff it is empty.
To delete an element with key 40, there will be no problem since it would not get deficient.
Next consider an element with key 71, node will get deficient at that time we check either it’s nearest right or
nearest left sibling and determine whether the checked sibling has more than the required minimum
number[c/2]. Suppose we check the nearest right sibling, which has the key sequence 80, 82, 84. Since the node
has an excess element, we borrow the smallest and update the in-between key in the parent D from 80 to that of
the smallest remaining element in the right sibling, 82 as in fig 11.13(a).
Another fig shows deletion of an element with key 80.

N VENKATA VINOD KUMAR 51


As an example for deletion, consider deleting the element with key 32 from the B+ - tree of fig 11.12(b). This
element is the middle child of C. Following the deletion middle child becomes deficient. Since its nearest
sibling has only [c/2] elements we cannot borrow from it. Instead, we combine the two data nodes deleting one
from its doubly linked list and delete the in between key (30) in the parent. As we can see, the index node C
now has become deficient.
When an index node deficient we examine a nearest sibling. If the examined nearest sibling has excess keys, we
balance the occupancy of the two index nodes; this balancing involves moving some keys and associated
subtrees as well as changing the in-between key in the parent.
For our example, the in-between key 20 is moved from A to C, the rightmost key 16 of B is moved to A, and
the right subtree of B moved to C in fig 11.14(b).

N VENKATA VINOD KUMAR 52


When we are deleting element with key 86, the middle child E becomes deficient and is combined with its
sibling; a data node is deleted from the doubly linked list of data nodes and the in-between key 84 in the parent
also is deleted as in 11.15(a). The deficient index node E combines with its sibling index node D and the in-
between key 80 as in 11.15(b). Finally, the deficient index node F combine with its sibling A and in-between
key 40 in its parent G. This causes the parent G, which is the root , to become deficient. The deficient root is
discarded and we get the B+ - tree of Fig 11.12(a).

N VENKATA VINOD KUMAR 53


UNIT – 4
Graphs and Hashing
The Graph Abstract Data Type, Elementary Graph Operations, Minimum Cost Spanning Trees, Shortest Paths and
Transitive Closure. Hashing: Introduction to Hash Table, Static Hashing, Dynamic Hashing

Graphs
GRAPH
A graph G consists of 2 sets, V and E.
V is a finite, on empty set of vertices.
E is a set of pairs of vertices, these pairs are called edges.
V(G) and E(G) represents the set of vertices and edges respectively of graph G (Fig 6.2).
• In an undirected graph, the pair of vertices representing any edge is unordered. Thus, the pairs (u,v) and (v,u)
represent the same edge.
• In a directed graph, each edge is represented by a directed pair <u,v>; u is the tail and v is the head of the
edge. Therefore, <u,v> and <v,u> represent two different edges.

Figure 6.2: Three sample graphs


Following are the restrictions on graphs
1) A graph may not have an edge from a vertex v back to itself. Such edges are known as self loops (Figure
6.3).
2) A graph may not have multiple occurrences of the same edge. If we remove this restriction, we obtain a data
object referred to as multigraph.

Figure 6.3: Examples of graph like structures

 Maximum number of edges in any n-vertex, undirected graph is n(n-1)/2.


 Maximum number of edges in any n-vertex, directed graph is n(n-1).
TERMINOLOGIES USED IN A GRAPH
• Subgraph of G is a graph G' such that V(G') € V(G) and E(G') € E(G) (Fig 6.4).
• A path from vertex u to vertex v in graph G is a sequence of vertices u,i 1,i2 . . . ik, v such that (u,i1),
(i1,i2) . . . . . . .(ik, v) are edges in E(G).
• A simple path is a path in which all vertices except possibly the first and last are distinct.
• A cycle is a simple path in which the first and last vertices are the same.
• A undirected graph is said to be connected iff
for every pair of distinct vertices u & v in V(G) there is a path from u to v in G

N VENKATA VINOD KUMAR 1


Figure 6.4:Some subgraph
• A connected component H of an undirected graph is a maximal connected subgraph (Figure 6.5)

Figure 6.5:A graph with two connected components


• A tree is a connected acyclic(i.e. has no cycles) graph.
• A directed graph G is said to be strongly connected iff for every pair of distinct vertices u and v in V(G),there
is a directed path from u to v and also from v to u (Figure 6.6).

Figure 6.6:Strongly connected components of G3


• The degree of a vertex is the number of edges incident to that vertex. (Degree of vertex 0 is 3)
• In a directed graph G, in-degree of a vertex v defined as the number of edges for which v is the head. The out-
degree is defined as the number of edges for which v is the tail. (Vertex 1 of G3 has in-degree 1, out-degree 2
and degree 3).
GRAPH ABSTRACT DATA TYPE

ADT 6.1:Abstract data type Graph


N VENKATA VINOD KUMAR 2
GRAPH REPRESENTATIONS
 Three commonly used representations are:
1) Adjacency matrices,
2) Adjacency lists and
3) Adjacency multilists
Adjacency Matrix
• Let G=(V,E) be a graph with n vertices, n>=1.
• The adjacency matrix of G is a two-dimensional n*n array ( say a) with the property that a[i][j]=1 iff the edge
(i,j) is in E(G). a[i][j]=0 if there is no such edge in G (Figure 6.7).
• The space needed to represent a graph using its adjacency matrix is n2 bits.
• About half this space can be saved in the case of undirected graphs by storing only the upper or lower triangle
of the matrix.

Figure 6.7 Adjacency matrices


Adjacency Lists
• The n rows of the adjacency matrix are represented as n chains.
• There is one chain for each vertex in G.
• The data field of a chain node stores the index of an adjacent vertex (Figure 6.8).
• For an undirected graph with n vertices and e edges, this representation requires an array of size n and 2e chain
nodes.

Figure 6.8 adjacency lists

N VENKATA VINOD KUMAR 3


Adjacency Multilists
• Each edge (u,v) is represented by two entries, one on the list for u and the other on the list for v.
• The new node structure is

Figure 6.12:Adjacency multilists for G1

Elementary Graph Operations:


Graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or
checking their values along the way. Tree traversal is a special case of graph traversal.
Depth First Search:
A depth-first search (DFS) is an algorithm for traversing a finite graph. DFS visits the child nodes before
visiting the sibling nodes; that is, it traverses the depth of any particular path before exploring its breadth. A
stack (often the program’s call stack via recursion) is generally used when implementing the algorithm.
The algorithm begins with a chosen “root” node; it then iteratively transitions from the current node to
an adjacent, unvisited node, until it can no longer find an unexplored node to transition to from its current
location. The algorithm then backtracks along previously visited nodes, until it finds a node connected to yet
more uncharted territory. It will then proceed down the new path as it had before, backtracking as it encounters
dead-ends, and ending only when the algorithm has backtracked past the original “root” node from the very first
step.

N VENKATA VINOD KUMAR 4


We begin the search by visiting the start vertex, v. in this simple application; visiting consists of printing the
node’s vertex field. Next, we select an unvisited vertex, w, from v’s adjacency list and carry out a depth first
search on w. We preserve our current position in v’s adjacency list by placing it on a stack. Eventually our
search reaches a vertex, u, that has no unvisited vertices on its adjacency list. At this point, we remove a vertex
from the stack and continue processing its adjacency list. Previously visited vertices are discarded; unvisited
vertices are visited and placed on the stack. The search terminates when the stack is empty.

Example: We wish to carry out a DFS of Graph G of Fig 6.16. If we initiate this search from vertex v 0, then the
vertices of G are visited in the following order: v0,v1,v3,v7,v4,v5,v2,v6.

Breadth First Search:


A breadth-first search (BFS) is another technique for traversing a finite graph. BFS visits the neighbor nodes
before visiting the child nodes, and a queue is used in the search process. This algorithm is often used to find
the shortest path from one node to another.
BFS starts at vertex v and marks it as visited. It then visits each of the vertices on v’s adjacency list. When we
have visited all the vertices on v’s adjacency list, we visit all the unvisited vertices that are adjacent to the first
vertex on v’s adjacency list. To implement this scheme, as we visit each vertex we place the vertex in queue.
When we have exhausted an adjacency list, we remove a vertex from the queue and proceed by examining each
of the vertices on its adjacency list. Unvisited vertices are visited and then placed on the queue; visited vertices
are ignored. We have finished the search when the queue is empty.
To implement BFS, we use a dynamically linked queue. Each queue node contains vertex and fields.

N VENKATA VINOD KUMAR 5


Connected Components:

Total time is needed to generate all the connected components is O(n+e).

N VENKATA VINOD KUMAR 6


Spanning Trees:
Spanning tree is any tree that consists solely of edges in G and that includes all the vertices in G.

When DFS is used, the resulting


spanning tree is known as a depth first spanning tree. When BFS is used, the resulting spanning tree is known as
a breadth first spanning tree. Following fig shows the results:

1) Now support we add a nontree edge, (v,w), into any spanning tree, T. The result is a cycle that consists
of the edge (v,w) and all the edges on the path from w to v in T.
2) A spanning tree is a minimal subgraph, G’, of G such that V(G’)=V(G) and G’ is connected. We define
a minimal subgraph as one with the fewest number of edges. Spanning tree has n-1 edges.

Biconnected Components:
Articulation point is vertex v of G, such that the deletion of v, together with all edges incident on v, produces on
graph, G’, that has at least two connected components. For example, the connected graph of Fig 6.19 has four
articulation points, 1, 3, 5 and 7.
A biconnected graph is a connected graph that has no articulation points. For example, the graph of Fig 6.16 is
biconnected, while the graph of Fig 6.19 obviously is not. In Fig 6.19(a), vertices represent communication
stations and edges represent communication links. Now suppose one of the stations that is an articulation point
fails. The result is a loss of communication not just to and from that single station, but also between certain
other pairs of stations.

N VENKATA VINOD KUMAR 7


The broken lines in Fig 6.20(b) represent nontree edges. A nontree edge (u,v) is a back edge iff either u is an
ancestor of v or v is an ancestor of u.

N VENKATA VINOD KUMAR 8


Minimum Cost Spanning Trees:
The cost of a spanning tree of weighted undirected graph is the sum of the costs (weights) of the edges
in the spanning tree. Minimum cost spanning tree is a spanning tree of least cost. There are 3 different
algorithms used to obtain a minimum cost spanning tree of a connected undirected graph. They are: Kruskal’s,
Prim’s, and Sollin’s algorithms respectively. All three use an algorithm design strategy called the greedy
method.
In greedy method, we construct an optimal solution in stages. At each stage we make a decision that is
the best decision at this time. Since we cannot change later, we make sure that the decision will result in a
feasible solution.
For spanning trees, we use a least cost criterion. Our solution must satisfy the following constraints:
1) We must use only edges within the graph.
2) We must use exactly n-1 edges.
3) We may not use edges that would produce a cycle.

Kruskal’s Algorithm:
Kruskal’s algorithm builds a minimum cost spanning tree T by adding edges to T one at a time. The algorithm
selects the edges for inclusion in T in nondecreasing order of their cost. An edge is added to T if it does not
form a cycle with the edges that are already in T. since G is connected and has n>0 vertices, exactly n-1 edges
will be selected for inclusion in T.

N VENKATA VINOD KUMAR 9


N VENKATA VINOD KUMAR 10
Prim’s Algorithm:
Prim’s Algorithm begins with a tree, T,that contains a single vertex. This may be any of the vertices in the
original graph. Next, we add least cost edge (u,v) to T such that T∪{(u,v)} is also a tree. We repeat this edge
addition step until T contains n-1 edges. To make sure that the added edge does not form a cycle, at each step
we choose the edge(u,v) such that exactly one of u or v is in T. Fig 6.24 shows the progress of Prim’s algorithm
on the graph of Fig 6.22(a).

N VENKATA VINOD KUMAR 11


Sollin’s Algorithm:
This algorithm selects several edges for inclusion in T at each stage. At the start of the stage, the selected edges,
together with all n vertices, form a spanning forest. During a stage we select one edge for each tree in the forest.
This edge is a minimum cost edge that has exactly one vertex in the tree. Since two trees in the forest could
select same edge, we need to eliminate multiple copies of edges. At the start of the first stage the set selected set
of edges is empty. The algorithm terminates when there is only one tree at the end of a stage or no edges remain
for selection.
Fig 6.25 shows Sollin’s algorithm applied to the graph of 6.22(a). The initial configuration of zero selected
edged is the same as shown in Fig 6.22(b). Each tree in this forest is single vertex. At the next stage, we select
edges for each of the vertices. The edges selected are (0,5),(1,6),(2,3),(3,2),(4,3),(5,0),(6,1). After eliminating
the duplicate edges, we are left with edges (0,5),(1,6),(2,3),and (4,3). We add these edges to the set of selected
N VENKATA VINOD KUMAR 12
edges, thereby producing the forest of Fig 6.25(a). In the next stage, the tree with vertex set{0,5} selects
edge(5,4), and the two remaining trees selects edge (1,2).

Shortest Paths and Transitive Closure:


Each edge has a weight representing the distance between two vertices. An edge weight is also referred to as an
edge length or edge cost. The length of a path is now defined to be the sum of the lengths of the edges on that
path, rather than the number of edges. The starting vertex of the path will be referred to as the source and the
last vertex the destination.
Single Source/All Destinations: Nonnegative Edge Costs:
In this problem we are given a directed graph, G=(V,E), a weighting function, w(e), w(e)>0, for the edges of G,
and a source vertex, v0. We wish to determine a shortest path from v 0 to each of the remaining vertices of G. As
an example, consider the graph of 6.26(a). If v0 is the source vertex, then the shortest path from v0 to v1 is
v0,v3,v4,v1. The length of this path is 10+15+20=45. Although there are three edges on this path, it is shorter than
the path v0 v1, which has a length of 50. Fig 6.26(b) may use greedy algorithm to generate shortest paths and it
lists the shortest paths from v0 to v1,v0,v3 and v4 in nondeceasing order of path length. There is no path from v 0
to v5.

Fig 6.26: Graph and Shortest paths from vertex 0 to all destinations

Let S denote set of vertices, including v 0, whose shortest paths have been found. For w not in S, let distance[w]
be the length of the shortest path starting from v 0, going through vertices only in S, and ending in w. we can
observe following points while generating paths:
(1) If the next shortest path is to vertex u, then the path from v0 to u goes through only those vertices that are in
S. There cannot be any intermediate vertex that is not in S.

N VENKATA VINOD KUMAR 13


(2) Vertex u is chosen so that so that it has the minimum distance, distance[u], among all the vertices not in S. If
there are several vertices not in S with the same distance, then we may select any one of them.
(3) Once we have selected u and generated the shortest path from v 0 to u, u becomes a member of S. Adding u
to S can change the distance of shortest paths starting at v 0, going through vertices only in S, and ending at a
vertex, w, that is not currently in S. if the distance changes, we have found a shorter such that path from v0 to
w. this path goes through u. The intermediate vertices on this path are in S and its subpath from u to w can be
chosen so as to have no intermediate vertices. The length of the shorter path is distance[u] + length (<u,w>).

To implement Dijkstra’s algorithm, we assume that the n vertices are numbered form 0 to n-1.

Prog 6.9: Single source shortest paths

Prog 6.10: Choosing the least cost edge

Example: Consider the eight-vertex digraph of Figure 6.27(a) with eight adjacency matrix as in 6.27(b).
Suppose that the source vertex is Boston. The values of dist and the vertex u selected in each iteration of the
outer for loop of program 6.9 are shown in figure 6.28.

N VENKATA VINOD KUMAR 14


Fig 6.27 Digraph for Example 6.4

Single Source/All Destinations: General Weights


Consider the general case when some or all of the edges of the directed graph G may have negative length.
When negative edge lengths are permitted, we require that the graph have no cycles of negative length. For
example, consider the graph of Fig 6.30. the length of the shortest path from vertex 0 to vertex 2 is -∞, as the
length of the path
N VENKATA VINOD KUMAR 15
0,1,0,1,0,1, ……., 0,1,2
can be arbitrarily small. This is so because of the presence of the cycle 0,1,0, which has a length of -1.

Fig: Directed Graph with a cycle of negative length


When there are no cycles of negative length, there is a shortest path between any two vertices of an n-vertex
graph that has at most n-1 edges on it. Elimination of the cycles from the path results in another path with the
same source and destination. This path is cycle-free and has a length that is no more than that of the original
path, as the length of the eliminated cycles was at least zero. This observation can be used to obtain an
algorithm to determine a shortest path from a source vertex to all remaining vertices in the graph.
When there are no cycles of negative length, we can limit our search for shortest paths to paths with at most n-1
edges. Hence, distn-1[u] rather than distl[u] is the length of an unrestricted shortest path from v to u.
While computing distn-1[u], we make the following observations:
(1) If the shortest path from v to u with at most k, k>1, edges has no more than k-1 edges, then
distk[u]=distk-1[u].
(2) If the shortest path from v to u with at most k, k>1, edges has exactly k edges, then it is comprised of a
shortest path from v to some vertex j followed by the edge <j,v>. The path from v to j has k-1 edges, and its
length is distk-1[j]. All vertices I such that the edge <i,u> is in graph are candidates for j. since we are interested
in a shortest path, the i that minimizes distk-1[i]+length[i][u] is the correct value for j.

The overall complexity is O(n3) when adjacency matrices are used and O(ne) when adjacency lists are used.

All Pairs Shortest Paths:


Here, we must find the shortest paths between all pairs of vertices, vi,vj,i≠j.
Basic idea in all pairs algorithm is to begin with the matrix A -1 and successively generate the matrices
A0,A1,A2….,An-1. If we have already generated Ak-1, then we may generate Ak by realizing that any pair of
vertices I,j one of the two rules below applies.
(1) The shortest path from i to j going through no vertex with index greater than k does not go through the
vertex with index k and so its cost is Ak-1[i][j].
(2) The shortest such path does go through vertex k. such a path consists of a path from i to k followed by one
N VENKATA VINOD KUMAR 16
from k to j. Neither of these goes through a vertex with index greater than k-1. Hence, their costs are A k-1[i][k]
and Ak-1[i][k].

Example:
For the digraph 6.33(a), the initial a matrix, A , plus its value after each of three iterations, A , A1, A2 is also
-1 0

given in Fig 6.33.

Fig 6.33: Example for all-pairs shortest paths problem


Transitive Closure:
Assume that we have a directed graph G with unweighted edges. We want to determine if there is a path from i
to j for all values of i and j. Two cases are there:
(1) requires positive lengths,
(2) requires only nonnegative path lengths.
These cases are known as the transitive closure and reflexive transitive closure of a graph, respectively.
Definition:
The transitive closure matrix, denoted A+, of a directed graph, G, is a matrix such that A+[i][j]=1 if there is a
path of length>0 from i to j; otherwise, A+[i][j]=0.
Definition:
The reflexive transitive closure matrix, denoted A *,of a directed graph, G, is a matrix such that A *[i][j]=1 if
there is a path of length ≥0 from i to j ; otherwise A*[i][j]=0.
Figure 6.34 shows A+ and A* for a digraph, clearly, the only difference between A * and A+ is in the terms on the
diagonal. A+[i][j]=1 iff there is a cycle of length >1 containing vertex i, whereas A *[i][j] is always one, as there
is a path 0 from i to j.

Transitive closure of a graph: Given a directed graph findout if a vertex j is reachable from another vertex i
for all vertex pairs (i,j) in the graph.
Reflective Transitive closure of a graph: Let R⸦ A2 be a directed graph defined on a set A. The reflexive
transitive closure of R is the relation, R*={(a,b):a,b€A and there is a path from a to b in R}, and its reflective
transitive closure R*={(a1,a1),(a1,a2), (a1,a3),(a1,a4),(a2,a2), (a2,a3), (a2,a4), (a3,a3), (a3,a4), (a4,a4)}.

N VENKATA VINOD KUMAR 17


Fig 6.34 Gaph G and its adjacency matrix A,A+,A*
The total time is O(n3).
The transitive closure of undirected graph G can be found very easily from its connected components. From the
definition of a connected component, it follows that there is a path between every pair of vertices in the
component and there is no path in G between two vertices that are in different components. Hence, if A is the
adjacency matrix of an undirected graph then its transitive closure A + may be determined in O(n2) time by first
determining the connected components of the graph. A+[i][j]=1 iff there is a path from vertex i to j. for every
pair of vertices in the same component, A +[i][j]=1. On the diagonal, A+[i][j]=1 iff the component containing i
has at least two vertices.

N VENKATA VINOD KUMAR 18


Hashing
STATIC HASHING

Hash Tables:
5.6.1 Basic concept:
In a hashed search, the key, through an algorithmic function, determines the location of the data.
We use a hashing algorithm to transform the key into the index that contains the data we need to locate. Another
way to describe hashing is as a key-to-address transformation in which the keys map to addresses in a list.
Hashing is a key-to address mapping process.

The memory that contains all of the home addresses is .known as the prime area.
The address produced by the hashing algorithm is known as the home address.
A collision occurs when a hashing algorithm produces an address for an insertion key and that address is
already occupied.

5.7 Hashing Methods

5.7.1 Direct Method


In direct hashing the key is the address without any algorithmic manipulation. The data structure must therefore
contain an element for every possible key. The situations in which you can use direct hashing are limited.

N VENKATA VINOD KUMAR 19


5.7.2 Subtraction Method
Sometimes keys are consecutive but do not start from 1. For example, a company may have only 100
employees, but the employee numbers start from 1001 and go to 1100. In this case we use subtraction hashing, a
very simple hashing function that subtracts 1000 from the key to determine the address.
The direct and subtraction hash functions both guarantee a search effort of one with no collisions. They
are 'one-to-one hashing methods: only one key hashes to each address.

5.7.3 Modulo-division Method


Also known as division remainder, the modulo-division method divides the key by the array size and uses the
remainder for the address. This method gives us the simple hashing algorithm shown below in which listSize is
the number of elements in the array:

N VENKATA VINOD KUMAR 20


5.7.4 Digit-extraction Method
Using digit extraction selected digits are extracted from the key and used as the address. For example, using
our six-digit employee number to hash to a three digit address (000-999), we could select the first, third, and
fourth digits (from the left) and use them as the address.

5.7.5 Mid square Method


In mid square hashing the key is squared and the address is selected from the middle of the squared number.

We can select the first three digits and then use the mid square method as shown below.

N VENKATA VINOD KUMAR 21


5.7.6 Folding Methods
Two folding methods are used: fold shift and fold boundary.
In fold shift the key value is divided into parts whose size matches the size of the required address. Then the
left and right parts are shifted and added with the middle part.
In fold boundary the left and right numbers are folded on a fixed boundary between them and the center
number.

5.7.7 Rotation Method


It is most useful when keys are assigned serially. Rotating the last character to the front of the key
minimizes this effect.

5.7.8 Pseudorandom Hashing


In pseudorandom hashing the key is used as the seed in a pseudorandom-number generator and the resulting
random number is then scaled into the possible address range using modulo-division. A common random-
number generator is shown below.

To use the pseudorandom-number generator as a hashing method, we set x to the key, multiply it by the
coefficient a, and then add the constant c. The result is then divided by the list size, with the remainder being the
hashed address. For maximum efficiency, the factors a and c should be prime numbers.
To keep the calculation reasonable, we use 17 and 7 for factors a and c, respectively. Also, the list size
in the example is the prime number 307.

N VENKATA VINOD KUMAR 22


5.8 Collision Resolutions
A collision occurs when a hashing algorithm produces an address for an insertion key and that address is
already occupied.

Concepts
a) The load factor of a hashed list is· the number of elements in the list divided by the number of physical
elements allocated for the list, expressed as a percentage. Traditionally, load factor is assigned the symbol alpha
(α). The formula in which ). The formula in which k represents the number of filled elements in the list and n represents the total
number of elements allocated to the list is

b) Computer scientists have identified two distinct types of clusters.


(i) Primary clustering occurs when data cluster around a home address. Primary clustering is easy to
identify.
(ii) Secondary clustering occurs when data become grouped along a collision throughout a list. This type of
clustering is not easy to identify.

5.8.1 Open Addressing


The first collision resolution method, open addressing, resolves collisions in the prime area-that is, the area that
contains all of the home addresses.
When a collision occurs, the prime area addresses are searched for an 0 or unoccupied element where the new
data can be placed.
5.8.1.1 Linear Probe
In a linear probe, which is the simplest, when data cannot be stored in the home address we resolve the collision
by adding 1 to the current address.
N VENKATA VINOD KUMAR 23
Linear probes have two advantages. First, they are quite simple to implement. Second, data tend to remain near
their home address.

5.8.1.2 Quadratic Probe


In the quadratic probe, the increment is the collision probe number squared.
Thus for the first probe we add 12, for the second collision probe we add 22, for the third collision probe we add
32, and so forth until we either find an empty element or we exhaust the possible elements.

5.8.1.3 Pseudorandom Collision Resolution


Pseudorandom collision resolution uses a pseudorandom number to resolve the collision. We now use it
a collision resolution method. In this case, rather than use the key as a factor in the random-number calculation,
we use the collision address.
We now resolve the collision using the following pseudorandom-number generator, where a is 3 and c is 5:

N VENKATA VINOD KUMAR 24


5.8.1.4 Key Offset
Key offset is a double hashing method that produces different collision paths for different keys. Whereas the
pseudorandom-number generator produces a new address as a function of the previous address, key offset
calculates the new address as a function of the old address and the key.
One of the simplest versions simply adds the quotient of the key divided by the list size to the address to
determine the next collision resolution address, as shown in the formula below.

Example:

5.8.2 Linked list Collision Resolution


A linked list is ordered collection of data in which each element contains the location of next element.
It uses two storage areas: the prime area and the overflow area.
Each element in the prime area contains an addtional field-a link head pointer to a linked list of overflow data in
the overflow area. When a collision occurs, one element is stored in the prime area and chained to its
corresponding linked list in the overflow area.

N VENKATA VINOD KUMAR 25


Although the overflow area can be any data structure, it is typically implemented as a linked list in dynamic
memory.
The linked list data can be stored in any order, but a last in-first out (LIFO) sequence-or a key sequence is the
most common.

5.8.3 Bucket Hashing


Another approach to handling the collision problems is bucket hashing, in which keys are hashed to buckets,
nodes that accommodate multiple data occurrences. Because a bucket can hold multiple data, collisions are
postponed until the bucket is full.

DYNAMIC HASHING:
Motivation for Dynamic Hashing:
Dynamic hashing is also known as extendible hashing, aims to reduce the rebuild time by ensuring that each
rebuild changes the home bucket for the entries in only 1 bucket. The objective of the dynamic hashing is to
provide acceptable hash table performance on a per operation basis.
We consider two forms of dynamic hashing – one uses a directory and the other does not. For both forms, we
use hash function h that maps keys into non-negative integers. The range of h is assumed to be sufficiently large
and we use h(k,p) to denote the integer formed by the p least significant bits of h(k).
For example here, we use a hash function h(k) that transforms keys into 6-bit non-negative integers. Our
example keys will be characters each and h transforms letters such as A,B and C into bit sequences 100,101,
and 110 respectively. Digits 0 through 7 are transformed into their 3-bit representation. For our example hash
function, h(A0,1)=0, h(A1,3)=1, h(B1,4)=1001=9, and h(C1,6)=110001=49.

N VENKATA VINOD KUMAR 26


k h(k)
A0 100 000
A1 100 001
B0 101 000
B1 101 001
C1 110 001
C2 110 010
C3 110 011
C5 110 101
Fig 8.7: An example of Hash Function
Dynamic Hashing using Directories:
We employ a directory, d, of pointers to buckets. The size of the directory depends on the number of bits of h(k)
used to index into the directory. When indexing is done using, say, h(k,2), the directory size is 2 2=4; when
indexing is done using, say, h(k,5), the directory size is 2 5=32. The number of bits of h(k) used to index the
directory is called the directory depth. The size of the directory is 2t, where t is the directory depth and the
number of buckets is at most equal to the directory size. Figure 8.8(a) shows dynamic hash table that contains
the keys A0, B0, A1, B1, C2, and C3. This hash table uses directory whose depth is 2 and uses buckets that
have 2 slots each other. In below figure directory is shaded while the buckets are not. In practice, the bucket
size is often chosen to match some physical characteristic of the storage media.

Fig 8.8: Dynamic Hash table with Directories

N VENKATA VINOD KUMAR 27


To search for a key k, we merely examine the bucket pointed to by d[h(k,t)], where t is the directory depth.
Suppose we insert C5 into the hash table of Figure 8.8(a). Since, h(C5,2)=01, we follow the pointer, d[01], in
position 01 of the directory. This gets us to bucket with A1, B1. This bucket is full and we get a bucket
overflow. To resolve the overflow, we determine the least u such that h(k,u) is not the same for all keys in the
overflow bucket. In case the least u is greater than the directory depth, we increase the directory depth to this
least u value. This requires us to increase the directory size but not number of buckets. A quadrupling of the
directory size may be handled as two doublings and so on. For our example, the least u for which h(k,u) is not
the same for A1,B1 and C5 is 3. So, the directory is expanded to have depth 3 and size 8.
Following the resizing of the directory, we split the overflowed bucket using h(k,u). In our case, the overflowed
bucket is split using h(k,3). For A1 and B1, h(k,3)=001 and for C5, h(k,3)=101. Figure 8.8(b) shows the result.
As a final insert example, consider inserting C1 into Fig 8.8(b). h(C1,3)=001. This time, bucket d[001]
overflows. The minimum u is 4 and so it is necessary to double the directory size and increase the directory
depth to 4. When the directory is doubled, we replicate the pointers in the first half into second half. Next we
split the overflowed bucket using h(k,4). Since h(k,4)=0001 for A1 and C1 and 1001 for B1, we create a new
bucket with B1 and put C1 into the slot previously occupied by B1. A pointer to the new bucket is placed in
d[1001]. Fig 8.8(c) shows the result.
Deletion from a dynamic hash table with a directory is similar to insertion. Although dynamic hashing employs
array doubling, the time for this is considerably less than that for the array doubling used in the static hashing.

Directoryless Dynamic Hashing:


Which also known as linear dynamic hashing, we dispense with the directory, d. in this method we use an array,
ht, of buckets is used. We assume that this array is as large as possible and so there is no possibility of
increasing its size dynamically. To avoid initializing such a large array, we use two variables q and r, 0≤q≤2 r, to
keep track of the active buckets. At any time, only buckets 0 through 2 r+q-1 are active. Each active bucket is the
start of the chain of buckets. The remaining buckets on chain are called overflow buckets.

N VENKATA VINOD KUMAR 28


Example Figure is shown below:

N VENKATA VINOD KUMAR 29


Sequential File Organization File Organization

We’ll study data structures appropriate for organizing


1.Large quantities of records.
2.Information that may have special space or performance requirements.
3.Information that need to be processed in different ways for different applications.
4.Information that enables new tasks to be accomplished.
Three primary file organizations
1. Sequential
2. Direct
3. Indexed Sequential

There is a distinction between how we store or organize information and how we process or access it.
Organization Access

Sequential Sequential
Indexed sequential Sequential and direct
Direct Direct

Sequential access refers to accessing multiple records, often an entire file, whereas direct access refers to locating a single record.
We have records which contain fields.

An example employee record:

• The primary key is a field, or a composite of several fields, which uniquely distinguishes a record from all others;
• All the remaining fields are the secondary keys or attributes.
• A file consists of records of the same format.
The sizes of the fields within a record may vary
• For fixed-length records, all the records within a file have the same length.
• With variable-length records, records within a file do not need to be the same length.
Organization

• Sequential organization is well suited for sequential access in which many or all the records will be processed.
• Can be processed easily with a loop.A sequential search processes the records of a file in their order of occurrence until it either
locates the desired record or processes all the records.
Binary Search
• In a file of n records, n/2 probes are needed to locate the record of interest, on the average.

• For an unsuccessful search we need to probe the entire file of n records.


• Appropriate when n is small.
• To improve performance, we can sort the records in the file. Then only n/2 records need to be examined, for when
we go beyond the position, we end the search.
• Example: Assume, we are searching for the record with key 39.
• The complexity of both successful and unsuccessful search is O(log n).Also referred to as
“logarithmic search” or “bisection”.
• As n becomes large, log n becomes significant. We want to be able to retrieve a record in a single access.
• The pre processing cost of ordering a file and the continuing cost of maintaining the order must also be considered in
binary search.

• Another example: retrieve a record with the key 17:


Interpolation Search

• Consider finding a phone number for Phyllis Bishop. Do you begin your search at the middle of the book? Most
likely not.
• You try to approximate its relative position.
• You would probably begin searching Bishop near the front of the book.
• An interpolation search chooses the next position for a comparison based on the estimated position of the sought key
relative to the remainder of the file to be searched.
• Instead of computing a MIDDLE position, it calculates the next position as follows:

key[sought] – key[LOWER]
NEXT =
LOWER + (UPPER – LOWER)
key[UPPER] – key[LOWER]

denotes “ceiling”.
• Although the worst case computational complexity is O(n), the average complexity is O(log log n).

• It improves as the distribution of the keys becomes more uniform

• A binary search is preferable when the data is stored in primary memory because the additional
calculations needed for the interpolation search cancel any savings.
• When the data is stored in auxiliary memory and the key distribution is uniform, an interpolation search is
preferable.
Self Organizing Sequential Search
• In the previous searches, we assumed that all records in the file would be accessed equally often. If this
assumption is not true, we would desire the most frequently retrieved records to appear at the beginning of the
file.
• Three popular algorithms:
(1) Move_to_front,(2) Transpose, (3) Count.
1)Move_to_front
When the sought record is located, it is moved to the front position of the file and all the intervening records are
moved back one position.This is an extensive amount of re positioning. So, a linked implementation is preferable
even though it takes more storage.It is a mistake if a record is accessed, moved to the front and rarely if ever
accessed again.
Locality means that a record that has recently been accessed is more likely to be accessed again in the near future.All the
self-organizing methods assume some degree of locality of accesses.Move_to_front is appropriate when space is not limited
and locality of access is important.

An example: The file consists of 26 records containing the letters of the alphabet. The records are accessed in the order
of “file organization”.
2)Transpose
This algorithm interchanges the sought record with its immediate predecessor unless the sought record is in the first
position.Converges to a near steady state more slowly than the Move_to_front algorithm.More stable. It does not make big
mistakes.Does not need the additional space required for a linked implementation.It should be used when space is at a premium.
3)Count
Keeps a count of the number of accesses of each record.The record is moved in the file to a position in front of all the records with fewer
accesses.The file is then always ordered in a decreasing order of frequency of access.Disadvantages: Requires extra storage to keep the count and
does not handle the locality of access phenomenon well.Because of its storage requirements, use it only when the counts are needed for another
purpose.
Locating Information
Ways to organize a file for direct access:
The key is a unique address.
The key converts to a unique address.
The key converts to a probable address (hashing).
The Key is a unique address
We want to go directly to the address where the record is stored. This is possible if the key were also an address.
If the employee’s nine-digit social security number were the record’s primary key, we may have one probe per
9
retrieval using a table of size 10 .
The disadvantage is that a great deal of space must be reserved. One location for each possible social security number.
But after employees began leaving the company, gaps would begin to appear.
The Key converts to a unique address
An algorithm to convert the primary key field (possibly a composite) into a unique storage address.
An example of convertion:
Location A[i] = α + m * s where A = an array
i = typical element
α = location of first element in array m = number of elements preceding ith element.

s = size of an element

• Another example of conversion:


Flights are numbered from 1 to 999.
Days are numbered from 1 to 366.
Flight number and day of the year could be concatenated to determine the location.
Location = flight number || day_of_the_year
The addresses would range from 001001 to 999366.
We need only about one-third as much storage if:
Location = day_of_the_year || flight number
It is preferable to place the wider range last so that we do not have large gaps in the key values.
Since the airline probably would not have 999 flights, many potential numbers would still go unused.
The Key converts to a Probable Address(HASHING)
• If we remove most of the empty spaces in the address space, we have a key space larger than the address space.
• We need a function for mapping. Such functions are referred to as hashing functions.
Hash(key)  probable address
• This initial probable address is known as the home address for the record.
A good function:
Evenly distributes the keys among the addresses
To reduce the number of collision
Executes efficiently
To keep the retrieval time to a minimum.
A collision occurs when two distinct keys map to the same address. Hashing is composed of two aspects:
The function
The collision resolution method
Key mod N (where N is the table size).
Key mod P (where P is the smallest prime number ≥ N)
Truncation (or sub stringing): If we have social security numbers as the primary keys such as 123-45-6789, the digits
may come from any part of the key.
Truncation: Would it be reasonable to truncate the social security number key after the first three digits?
Depends on the application. If we are applying this function to employees who work at a single plant, it would not be a
good choice. Numbers are assigned by geographic regions, most of the workers would have the same leftmost two or three
digits.
A better choice would be to use the three rightmost digits.
Folding: Two kinds of folding exist:
(1) Folding by boundary, (2) Folding by shifting
If we have the nine-digit key

Folding
Fold the number on the dashed boundaries into three parts. The digits would then be superimposed as

Codes were added together (without carry) to obtain the


probable address of 654.
Folding
Folding by shifting slides the parts one over another until all are superimposed.

Which would yield the different address of 258.

• Wecould also have added with carry. The results would


be (1)764 and (1)368 respectively.
• Squaring (a key and then substringing or trun- cating a portion of the result is another way).
• Radix Conversion (the key is considered to be in a base other than 10 and is then converted into a number in base 10. For
example the key 1234 in base 11 would become 1610 in base 10. Substringing or truncation could then be used).
Polynomial Hashing
The function
f(information area)  cyclic check bytes
is in fact a hashing function. The key is divided by a polynomial.
Alphabetic Keys
Alphabetic or alphanumeric key values can be input to a hashing function if the values are interpreted as
integers. All information is represented internally in a computer as bits.
Variant records or “unions” may be used.
• One hashing function may distribute the keys more evenly than another. One strategy for finding such a hashing function
would be to combine simpler functions.
• A hashing function that has a large number of collisions is said to exhibit primary clustering.
• Secondary memory is slower. It is better to have a slightly more expensive hashing function if it reduces the number of
collisions.
One method for reducing collisions is to change hashing functions.
Another method is to reduce the packing factor.
Packing factor =number of records/total number of storage locations
Other names for it: packing density and load factor.As the packing factor increases, the likelihood of a collision increases. The disadvantage of
decreasing the packing factor to reduce the number of collisions is that a lower packing factor takes more space.
Changing the hashing function or decreasing the packing factor may reduce the number of collisions but not eliminate them.
We need a procedure to position a synonym when a collision occurs. The method requires a minimum number of additional probes
from its home address.
One way is to point to the location of the synonym record.
We form a chain of synonym records. The NULL pointer indicates the end of the synonym chain.
A disadvantage is that storage is needed for the link fields.
• We need a mechanism that does not involve the use of actual links.
• We can use implied links by applying a convention, or set of rules. We compute the next search address by applying the set of
rules.
• A simple convention is to look at the next location. If it is occupied, we then examine the following location.
• The disadvantage is that we may need to make more probes.

There are several mechanisms for resolving collisions grouped according to the mechanism used to locate the next address to search
and the attribute of whether a record once stored can be relocated;
Collision resolution with links
Collision resolution without links
Static positioning of records.
Dynamic positioning of records
Collision resolution with pseudo links
• Uses pointers to connect the elements of a synonym chain.
• The insertion of a synonym is the same as the insertion into a linked list.
• On retrieval, the elements of the synonym chain are searched until the desired record is located or until the end
indicated by NULL is reached.
• “Coalescing” occurs when we attempt to insert a record with a home address that is already occupied by a record from
a chain with a different home address.
• The two chains with records having different home addresses coalesce or grow together.
• Coalesced hashing is also referred to as direct chaining.

A, B, C, D form one set of synonyms and X, Y form another


In (a), when we attempt to insert the record X, it collides with the record B which is not a synonym of X since B has a home
address of r and X has a home address of s.
The two chains coalesce where the bar on the right indicates the portion of the chain in which coalescing has occurred, |
represents insertions on the synonym chain with r as its home address, and | represents those with s as its home address.
This will result in more probes than would be required for non coalesced chains.
When X is inserted, it is inserted at the end of the chain. Instead of one, three probes are needed for X.
• In (a), 19 probes in total is needed to retrieve each record once, in (b) where two synonym chains were separated,
13 probes.
• Insertion of a new record is made at the bottommost (highest address) empty location. An available space
pointer is continually decremented until either an empty location is found or table is full.
• The available space pointer is initially set to a location one record length beyond the last storage location in the table.
For the collision resolution examples of this chapter we use
Hash(key) = key mod 11
Where 11 is the prime number table size.
For comparison purposes, we use the same initial subset of data with the keys of
27, 18, 29, 28, 39, 13, and 16
When we insert 29, there is a collision.
The available space pointer, R is decremented from the bottom to check if location 10 is available.

We set the link field in location 7 to point to location 10.


• If we add two additional records with keys of 42 and
17 and home addresses of 9 and 6 respectively, we will
have coalescing of the chains with these two home
addresses.

• If we add records to both of the probe chains the


performance of each chain would be degraded.
• One method of reducing the coalescing is to reduce the packing factor. The lower it is, the less chance that there will
be a collision from one chain with that of another.
• Also there are variants of coalesced hashing to reduce coalescing.
• An unsuccessful search takes more time since we need to examine more records.
• The number of retrieval probes is a function of the order of insertion. A record inserted early in the
process will be placed near the beginning.
• So if the frequencies are known, it is good to place the most frequently accessed records early in the insertion
process.
• We cannot merely remove the record as if we were deleting an element from a singly linked list.
• We might lose the remainder of a probe chain that had coalesced as a result of being its home address.

• A simple deletion procedure is to move a record later in the chain into the position of the deleted record. The
relocated record should be the last element on the chain having the home address of the location of the
deleted record.
• This relocation process is repeated at the vacated location until no other records need to be moved.
• Then, prior-to-moving predecessor of the most recently moved record is re linked to the moved record’s prior-
to-moving successor.
• A more complex scheme could be used to reposition records in the coalesced chains to reduce the amount of
coalescing.
• Determining if coalescing has occurred in a probe chain may time-consuming.
• If we know that a coalescing has not occurred, we just remove the deleted record from a singly linked list.
• Deleted record is reinitialized to null, the available space pointer, R, is reset to the maximum of (1) its current address and
(2) the vacated address plus one.
To delete the record with key 39 requires that

• Because coalescing has occurred, we move the last element in the chain with a
home address of 9 (record 42) into location 9 and adjust the links and table
entries.

• R is set to location 5 (the vacated address plus one).


Th

To reduce coalescing, the variants may be classified in three ways:


The table organization (whether or not a separate overflow area is used)
The manner of linking a colliding item into a chain.
The manner of choosing on occupied locations F

• In coalesced hashing, storage is needed for link fields. When this storage is not available, we need a convention for
where to search next.
• Progressive overflow (linear probing) is one convention. If a location is occupied, we look at the next location to
see if it is empty. Table is circular. We continue until we find an empty slot or we encounter the home address of
the record a second time (table is full).
• For retrieval, we follow the same process.
• Performance is poor for an unsuccessful search.
Insert 27, 18, 29, 28, 39, 13, 16

Hash(key) = key mod 11


• Secondary clustering is the consequence of two or more records following the same sequence or probe addresses. It
results in a bunching of records within the table.
• Contrasts with primary clustering which occurs when a large number of records have the same home address.
• In the example, the secondary clustering was caused by records with different home addresses.
Progressive Overflow - Deletion

• To delete, an indicator called a tombstone, in the location of


the deleted record is put.

• It tells us that additional records may follow and to keep on


searching on a retrieval, or that the location may be filled on a
subsequent insertion.

• On the left, the record (18) is deleted.

2006
Use of Buckets
Bucket (block or page) is a storage unit of multiple records at one file address.
Also the unit in which information is accessed and transferred between storage devices.
The number of records that may be stored in a bucket is called the blocking factor. As the blocking factor increases, the number of
auxiliary storage accesses decreases.
Within a bucket, we need a means of separating the individual records. We can achieve this by knowing the record length for fixed-
length records, or by placing a special delimiter.
Buckets are used to improve any collision res. methods.
Use of Buckets - Example

Blocking factor 2. Hash(key) = key mod 11 The keys are 27, 18, 29, 28, 39, 13, 16
Here we use a variable increment instead of a constant increment of one to reduce secondary clustering.
Here, the increment is a function of the key being inserted which may be viewed as another hashing function.
So, referred to as double hashing, H to get the home address, H to get the increment.
1 2
• Possibilities for H :
2
– H2 = Quotient(Key / P) mod P

– H2’ = (Key mod (P – 2)) + 1

• H2 requires two divide operations. H2’ requires only a single divide operation. H ’ is more difficult for people
2
to compute. So we use H in the example.
2
• The home address for a record will then be determined by the remainder of the key divided by the table size and the
increment for collision resolution by the quotient of the same operation.
If A, B, and C are synonyms at r as illustrated, B and C will
usually have different increments yielding different probe
Method requires a prime number table size, for otherwise searching could cycle
through a subset of the table several times.

In the example, locations 0, 2 and 4 were occupied. If we attempted to insert a


record with home address of 0 and an increment of 2, we would cycle through the
three occupied locations twice.

H1(key) = key mod 11

Keys = 27, 18, 29, 28, 39, 13, 16

Alternatively: New address = (current address + increment)mod table_size


• The mechanism for determining an unsuccessful search is the same as that for progressive overflow. When we
encounter an empty probe location, we terminate the search unsuccessfully.
• But an unsuccessful search requires fewer probes with linear quotient since we are more likely to encounter an
empty location to terminate the search as a result of eliminating secondary clusters.
• Deleting a record from a table requires the use of a tombstone.
• We can improve on linear quotient by observing that the number of retrieval probes is dependent on the placement of the
records.
• E.g. if we insert a record with a key of 67, it has a home address of 1 which is already filled with 39. We then try
locations 7, 2 8 and finally 3. Then it requires five retrieval probes to find 67. What if 39 had not been stored at
location 1?
• The next method moves an item already inserted in the table if the move reduces the average number of probes required
to retrieve all the records.
Brents Method
We examined static methods. We now examine several dynamic methods where an item once stored may be moved.
Requires additional processing when inserting a record but reduce the number of probes needed for retrieval (we
insert once but retrieve many times).
• The primary probe chain of a record is the sequence of locations visited during the insertion or retrieval of the
record

The primary probe chain for 39 is shown on the left. p1 is the home address.
Three positions had to be visited.

• What if its home address were empty? 39 could have been inserted at its
home address which requires only 1 probe for retrieval. We could make home
address available by moving what is stored there
• Move 28 to a location such that it could
still be retrieved (still want to use the linear
quotient method.

• The sequence of positions visited when


attempting to move a recor
d from the primary probe chain is called the
secondary probe chain.

• 28 will require one more probe for its retrieval but 39 will require two fewer probes for a net reduction of one probe
achieved by moving 28.

• The solid vertical line represents the primary probe chain


(the addresses that would be considered in storing an item using
linear quotient).

• The horizontal lines represent the secondary probe chains (the


addresses that would be searched in attempting to move an item
from a position along the primary probe chai
The q value along the primary probe chain is the increment for the item being inserted.
The q ’s along the secondary probe chains represent the increments associated with the item being moved.
i
The subscript i gives the number of probes needed to retrieve the item being inserted along its primary probe chain.
The subscript j gives the number of additional probes needed to retrieve the item being moved along its secondary
probe chain. To minimize the number of retrieval probes, we want to minimize (i + j).
In the case where i = j, we will arbitrarily choose to minimize on i.
Let s be the number of probes required to retrieve an item if nothing is moved.
We try all combinations of (i + j) < s such that we minimize (i + j).
On equality, since there would be no reduction in the number of probes, no movement would occur.
Hash(key) = key mod 11

And the increment function is i(key) = Quotient(Key / 11) mod 11

28 is moved from its original location to location 8.

Note that we use the increment associated with the item being moved and
not with the record being inserted.

27 is moved to location 0.
Some probes overall are saved by making the moves.
Binary Tree

Carries the concept in Brent’s method one step further and move items from secondary and subsequent probe chains.
Needs fewer retrieval probes than Brent’s method.
Two choices at each probable storage address—continue to the next address along the probe chain of the item being inserted or move the
item stored at that address to the next position on its probe chain.

A left branch signifies the continue option, a right branch the


move option.

• The decision tree is generated in a


breadth first fashion from the top down left
to right.

• It is used only as a control mechanism in


deciding where to store an item and is not used
for storing records.
A different binary tree is constructed for each insertion. Encountering either an empty leaf node in the binary tree or a full table
terminates the process.
By moving items from secondary and subsequent probe chains, we are achieving a placement of records that will further
reduce the average number of retrieval probes (we insert once but retrieve often).
Like Brent’s method, processing is only for insertion, retrieval is done with linear quotient.
Keys to insert: 27, 18, 29, 28, 39, 13, 16, 41, 17, 19
Hash(key) = key mod 11
i(key) = Quotient(key / 11) mod 11

• 27 and 18 are inserted without difficulty.

• 29 causes a collision

• So we generate a binary tree to determine which, if any, records to move.

• Location 9 is empty.

• The tree appears on the right.

• The bold numbers represent locations in the table and the values in parantheses
are the keys stored at those locations. The underlined node is the root node.

• The tree tells us to attempt to move 29 into location 7.

• That location is occupied so we continue along the primary probe chain until we
reach location 9. The table is shown on the right.
The path formed by the leftmost branch at each level of the tree is equivalent to the primary

probe chain of the linear quotient method and of Brent’s method


Binary tree used in inserting items is a complete binary tree.
The n nodes of a complete binary tree corresponds to the first n nodes of a full binary tree numbered top
down, left to right.
An advantage of it: can be implemented as a sequential structure:
lchild(i) = 2 * i
rchild(i) = 2 * i + 1
parent(i) = i/2
The depth of the binary tree is O(log n).
After the tree gets to be a certain size, secondary storage can be used (for example for the bottom two levels of the tree)
How to check for a full table?
Keep a counter of the depth of the tree. When it exceed log n table is full. But this requires the generation of a
possibly enormous tree.
Better solution: Keep a counter of how many records have already been inserted into table, check that number
before generating the tree.
Computed Chaining
The methods that use a link field provide better performance but require more storage.
• Those that don’t use a link field require less space but provide poorer performance.
• Consider a third class in between. Instead of storing an actual address in the link, they store a pseudolink (which
requires additional processing before it yields an actual address)
• If storage is limited, information needed can be computed rather than stored.
• The performance of the pseudolink methods are better than the nonlink methods. The number of offsets or locations
which would need to be searched to locate the successor item are stored in the pseudolink so that we can compute the
successor’s actual address.
The intermediate locations do not have to be searched so performance is improved.
It takes fewer bits (less storage) to store the number of offsets rather than a full address.
On the negative side, do not perform as well as the coalesced hashing methods and they require more storage than the
nonlink methods.
• Follows the process of coalesced hashing in that the first item on a chain points to its immediate successor which points to its
immediate successor.
• To eliminate the coalescing of chains, computed chaining also moves a record stored at another record’s home address.
• There is no coalescing (improves performance).
• Unlike the nonlink methods, computed chaining uses the key of the record stored at a probe address to locate the next probe address and
not the key of the record being inserted or retrieved.

• Using a function of the item stored at


a location ensures that only one actual
probe will be needed to locate the
successor record.
The probe function computes the address of the successor element given the key of the record stored at the current location and the
pseudolink f the current location. The incrementing scheme of linear quotie
Computed Chaining Example
Keys: 27, 18, 29, 28, 39, 13, 16, 38, 53
Hash(key) = key mod 11
i(key) = Quotient(Key / 11) mod 11

“nof” represents the number of offsets or the


pseudolink value.

• The increment is changed only when we reach the


successor element on a probe chain.

• The current increment is always the one associated with the


most recently visited record on the current probe chain.
• Which pseudolink field do we need to set?
• The one associated with 16.
• Since we had to look at two additional locations, the
pseudolink value is two.

• We finally add the record with key 53.


• We observe a move process. We move 16 plus all its successors. Otherwise we would no longer be able to retrieve
them.
• The record with key 53 may be inserted directly at location 9.
• We now reinsert the two records that were displaced by the insertion.
• We need to keep a pointer or remember location 9 so that we can find the end of the chain that those two records
were on previously.
• We need to identify the immediate predecessor to the item that was removed from location 9 so that we can relink it
to the new location for the removed record.
• We know where to begin the search by hashing 16 qhich yields location 5. We search from that location until we find
a pseudolink to location 9. We find that immediately at location 5.
In general, the search for the immediate predecessor of a removed item may be represented as

• We begin the search at the home address which contains a record and continue the search until we find a record that
points to the location of the removed item which is indicated with the temporary pointer.
• We now know the location of the pseudolink field that needs to be modified in the reinsertion process.
• In the example, we then need to use that increment associated with 27 until we can find another empty space to insert
the record with key 16.

• Did we degrade the performance for retrieving 16 by moving it?


• No, we essentially have a linked list such that the actual
physical location of a record is not important

The concept of having multiple probe chains instead of a single probe chain for organizing records.
Each chain will be smaller and the searching will be faster.
• In addition to the hashing function, Hash(key), to obtain the home address for a record and the incrementing function,
i(key), to obtain an increment, we introduce a third hashing function, g(key), to tell us which probe chain to insert into
or to search. This is an example of triple hashing.
g(key)  0, 1, ..., R – 1
where R is the number of subgroupings. A simple function for g(key) is
g(key) = key mod R

• Keys: 27, 18, 29, 28, 39, 13, 16, 38, 53


• Hash(key) = key mod 11
• i(key) = Quotient(Key / 11) mod 11
• g(key) = key mod 2

• If a key is even, it will go on the zeroth chain, if it is odd, it will go on the first chain. The records are inserted as
before with the only difference being the placement of the pseudolink values: in chain zero for even key values and in
chain one for odd key values.
Perfect Hashing
hash(key)  unique address
Both primary and secondary clustering are eliminated.
With perfect hashing, we need only a single probe to retrieve a record.
Perfect hashing is applicable to files with only a relatively small number of records because the computing time is
big.
A perfect hashing function maps a key into a unique address. If the range of potential addresses is the same as the
number of keys, the function is a minimal (in space) perfect hashing function.
A separate hashing needs to be devised for each set of keys. If one or more of the keys change, a new hashing function
must be constructed.
• In minicycle algorithm, for a table of size N, a perfect hashing function may be characterized as:

p.hash(key) = (h (key)+g[h (key)]+g[h (key)]) mod N


0 1 2

• What needs to be decided are the functions h , h , h and g.


0 1 2
• These functions should be efficient.
6
• The worst case of minicycle algorithm is O(r ) where r is the numbr of records in the set. An upper bound for r seems
to be about 512.
• A special case of minicycle alg. is the Cichelli’s algorithm which is not as efficient as minicycle alg., appropriate for r
upto 60, plus more disadvantages, but it is straightforward to understand.
• For larger amounts, the minicycle algorithm is appropriate.
Cichellis Algorithm

h = length(key)
0
h = first_character(key)
1
h = last_character(key)
2
g= T(x)
• where T is a table of values associated with individual characters x which may appear in a key. The time consuming
part is determining T.
• In the table, a value may be assigned to more than one character.
Let’s apply the alg. to the keyword begin. p.hash(begin) = 5 + T(h1(key)) + T(h2(key))

= 5 + T(b) + T(n)

= 5 + 15 + 13

= 33
The algorithm involves ordering the keys based on the frequencies of occurrence of the first and last characters in the keys.
Assignment of values are made to the charac- ters in the first and last positions of the keys from the top of the ordering to the
bottom.
The algorithm uses an exhaustive search with backtracking.
Keys: cat, ant, dog, gnat, chimp, rat, toad
• We assume that the maximum value that may be assigned to a character is 4. If we cannot find a solution using 4, we
would try a larger value. If this maximum value is too small, we will either have no solution or a great amount of
backtracking. If the value is too large we won’t obtain a minimal solution.
• The frequencies of occurrence of the first and last characters are
a=1 c=2 d=2 g=2 p=1 r=1 t=5

We can compute the sums of the frequencies of occurrence


of the first and last characters of each key. We obtain
• We next order the keys in descending order based on the
sum of the frequencies to obtain what is on the right.

• We arbitrarily chose the descending order. A random


ordering would have been equally acceptable.

• Then we check the ordering to see if any keys exist that have both
their first and last characters appearing in previous keys.

• Notice that the d and g of “dog” have appeared previously so that


“dog” would then be moved to the position following “gnat”.

• The ordering is shown on the right.


We begin assigning values to the characters.
t=0 d=0
p.hash(toad) = 4

t=0 d=0 g=1


p.hash(toad) = 4
p.hash(gnat) = 5
• In “dog” we have a conflict. So, backtrack
t=0 d=0 g=2
p.hash(toad) = 4
p.hash(gnat) = 6

t=0 d=0 g=2


p.hash(toad) = 4
p.hash(gnat) = 6
p.hash(dog) =5
t=0 d=0 g=2 c=0
p.hash(toad) = 4
p.hash(gnat) = 6
p.hash(dog) =5
p.hash(cat) =3

t=0 d=0 g=2 c=0 r=4


p.hash(toad) = 4
p.hash(gnat) = 6
p.hash(dog) =5
p.hash(cat) =3
p.hash(rat) =7
t=0 d=0 g=3 c=0 r=2
p.hash(toad) = 4
p.hash(gnat) = 7
p.hash(dog) =6
p.hash(cat) =3
p.hash(rat) =5
We are not able to assign a value to a without a collision. So, we backtracked.

t=0 d=0 g=4 c=0 r=2 a=3

p.hash(toad) = 4
• Problem still existed and
p.hash(gnat) = 8 we backtracked.
p.hash(dog) =7
p.hash(cat) =3
p.hash(rat) =5
p.hash(ant) =6
t=0 d=0 g=4 c=0 r=2 a=3 p=4

p.hash(toad) =4
p.hash(gnat) =8
p.hash(dog) =7
p.hash(cat) =3
p.hash(rat) =5 Since this solution maps the seven keys into
seven consecutive locations, it is a minimal
p.hash(ant) =6 perfect hashing function.

p.hash(chimp) = 9
Indexed Sequential File Organization
Indexed sequential file organization is a hybrid of sequential and direct file organization.
Suitable to access data both directly and sequentially. sequential search is slow.

• To accelerate the search, we order the information and put tabs or an index to groups.

• We search the tabs until we find the related group, then search exhaustively only those records in that group
• We apply this grouping process a second time and organize the current groups into larger ones, e.g. file drawers.
• We search a much smaller number of higher-level tabs, then search the tabs of the information in that group, then
search to locate the record.
• The tab or index structure is, a tree structure

• For insertions we associate an overflow area with each original or primary storage area for a group of records; so at most
we only have to move the number of records in such a group to open place for the record to insert.
• In computers with block addressable disks, we use tracks as the lowest level of grouping information.

• The next higher level is a grouping by cylinders.Then we have additional levels of master indexes.
• An indexed sequentially structured file is referred to as an ISAM (for indexed sequential access method) file.
The cylinder index contains pointers to several cylinders. A pair of entries is used for each cylinder

key is the highest key on that cylinder and ptr is a pointer to the track index for that cylinder.

Each track has two pairs of entries in the track index. One pair contain information on the primary storage area and the other pair
has information on the overflow records associated with the track.
The key in the first pair provides the highest key on that track in the primary area, and the key in the second pair provides the
highest key in the overflow area.
The primary pointer indicates the track containing the primary records, and the overflow pointer indicates the first overflow
record.
The highest key on cylinder 1 is 350.
Pointer entry notation is x-y, where x gives the cylinder number and y gives the track number where the track index for that cylinder
is stored.
The track index would not normally require an entire track of storage; so the remaining area could be used for additional primary or
overflow storage.
For simplicity, we assume that the records in primary storage have a blocking factor (bucket size) of one and that all primary storage is
filled.

• The NULL in the overflow pointer indicates that no overflow records currently exist.
• We can have overflow records associated with each track. We never have to move more than the number of records on a
single track.
• An available space list would link the overflow space together as records are added and deleted from the file.
• The overflow pointers follow the format z-w, where z gives the track number and w gives the record number.

Here we insert 8

When shifting 13, we update the entries for the


highest key in the primary storage area, which
becomes 10, and the overflow pointer to
9-1.
Here we insert 99
The next record to insert has a key of 14. The track to insert is 2. We move all of the records stored in the primary area of the
track.
Wouldn’t it be better to insert it onto the end of track 1? In that way it would be placed in the overflow area and no records would
need to be moved in the primary area of track 2.
But this adds complexity to the algorithm. It requires we search two tracks on each insertion and choose between them. No
gain on the average for retrieval.
The key of the final insertion record is 11.
• 11 < 13. To maintain order, it should be placed into the overflow area.
• All we need to do is to place 11 into its proper position in the chain of overflow records.
• What we are doing here, then is inserting into a linked list.

2006
Data Access

• To sequentially process the data in an indexed sequential file entails stepping through the indexes to locate track after track of
primary and overflow records.
• There may be intervening tracks containing overflow records or other info from a different file.
• We also need to use the index when we move from one cylinder to another since the cylinders may not be contiguous.
• Sequential processing of an indexed sequential file is faster than for a directly organized file but slower than for a sequentially
organized file.
• For direct access, we search the cylinder index, then search the primary and overflow entries of the track index. We follow the
link field of the overflow entry.
• We need to search either the primary or overflow area but not both, since each track has two pairs of entries
associated with it.

• Which takes more time: a successful search or an unsuccessful one?


• In the case of a sequentially organized file and a directly organized file, an unsuccessful search takes more time.
• But with an indexed sequential file, both require the same amount of time. Since the records in an indexed
sequential file are ordered, we can end a search when we move beyond the related position.
Performance
• Having overflow factor of one simplifies the insertion algorithm, but it can reduce the efficiency of search. Consider
• 4 overflow records mean 4 separate access of disk. We are adding to the depth of the subtree. Such a long chain would
degrade performance for both insertions and retrievals.
• To correct the imbalance, it is advisable to reorganize the file periodically.
• Reorganization takes time, and usually done at nonpeak times.
• Another technique to reduce the effect of long overflow chains is to include unused space or dummy records in the
primary area which records inserted later can occupy.

• If the record is in primary area, it is merely removed from the file.


• Removing a record may be accomplished by replacing it with a dummy record or marking its storage location as being
available (tombstone). It is not worthwhile to move records since an insertion would soon occur.
• On an overflow chain, it is relinked and space occupied by the record is returned to the available space list.
• May be implemented in either one of two ways:
(1) cylinder overflow, (2) independent overflow.
• With cylinder overflow, the overflow area is on the same cylinder as the primary storage area.
• With an independent overflow area, several cylinders of primary area share a separate cylinder for overflow.
• The advantage of the cylinder overflow is that disk head does not need to be repositioned to another cylinder to access an
overflow record.

In time-sharing environments, the advantages of a cylinder overflow may be lost. To lessen such an effect, some operating
systems schedule input/output to minimize seek time.

The advantage of the independent overflow area is that primary cylinders may share a single overflow region so that the
amount of overflow space that needs to be reserved per cylinder can be lessened.
• We don’t need to reserve more overflow tracks per cylinder, because we can share storage with other
cylinders.
• Understanding its basic structures allows us to use it more effectively.
• t helps if we sorted the records into descending order before insertion. Consider the following keys of records
(B is sorted)

A B A B

39 72 22 36
47 68 68 22
72 47 36 17

17 39

• When the primary area is full, the indexed sequential insertion procedure performs an insertion sort on the items
in an overflow chain of a particular group.
• That means stepping through the records already in the chain to locate the proper position for the insertion.
• Each element encountered on the chain requires an access of auxiliary storage, plus we need to do this processing
for each insertion.

• What is the advantage then of descending order?


• The new record that we are inserting is always at the top of the chain for the group of records being inserted. That eliminates
accessing a record multiple times during the insertion process for a group of records.
• To increase the number of transactions, one tech- nique is to place the cylinder index (or top level master index) in primary
memory since all acces- ses of an indexed sequential file need to use it.

You might also like