Data Structure Using C
Data Structure Using C
Data Structure Using C
A data structure is a data organization, management and storage format that enable efficient access
and modification. a data structure is a collection of data values, the relationships among them, and the
functions or operations that can be applied to the data.
Introduction
Data Structure can be defined as the group of data elements which provides an efficient way of
storing and organizing data in the computer so that it can be used efficiently. Some examples of Data
Structures are arrays, Linked List, Stack, Queue, etc. Data Structures are widely used in almost every aspect
of Computer Science i.e. operating System, Compiler Design, Artificial intelligence, Graphics and many
more.
Data Structures are the main part of many computer science algorithms as they enable the
programmers to handle the data in an efficient way. It plays a vital role in enhancing the performance of
software or a program as the main function of the software is to store and retrieve the user's data as fast as
possible
Basic Terminology
Data structures are the building blocks of any program or the software. Choosing the appropriate data
structure for a program is the most difficult task for a programmer. Following terminology is used as far as
data structures are concerned
Group Items: Data items which have subordinate data items are called Group item.
For example, name of a student can have first name and the last name.
Attribute and Entity: An entity represents the class of certain objects. it contains various attributes. Each
attribute represents the particular property of that entity.
Field: Field is a single elementary unit of information representing the attribute of an entity.
Processor speed: To handle very large amount of data, high speed processing is required, but as the data is
growing day by day to the billions of files per entity, processor may fail to deal with that much amount of
data.
Data Search: Consider an inventory size of 106 items in a store, If our application needs to search for a
particular item, it needs to traverse 106 items every time, results in slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web server, then there
are the chances that a very large server can be failed during that process. In order to solve the above
problems, data structures are used. Data is organized to form a data structure in such a way that all items are
not required to be searched and required data can be searched instantly.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data structure, we can
use it at any other place. Implementation of data structures can be compiled into libraries which can be used
by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The client
program uses the data structure through interface only, without getting into the implementation details.
Arrays: An array is a collection of similar type of data items and each data item is called an element of the
array. The data type of the element may be any valid data type like char, int, float or double.
The elements of array share the same variable name but each one carries a different index number
known as subscript. The array can be one dimensional, two dimensional or multidimensional.
Linked List: Linked list is a linear data structure which is used to maintain a list in the memory. It can be
seen as the collection of nodes stored at non-contiguous memory locations. Each node of the list contains a
pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called top.
A stack is an abstract data type (ADT), can be implemented in most of the programming languages. It
is named as stack because it behaves like a real-world stack.
For example: - piles of plates or deck of cards etc.
Queue: Queue is a linear list in which elements can be inserted only at one end called rear and deleted only
at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows First-In-
First-Out (FIFO) methodology for storing the data items.
Non Linear Data Structures: This data structure does not form a sequence i.e. each item or element is
connected with two or more other items in a non-linear arrangement. The data elements are not arranged in
sequential structure.
Trees: Trees are multilevel data structures with a hierarchical relationship among its elements known as
nodes. The bottommost nodes in the hierarchy are called leaf node while the topmost node is called root
node. Each node contains pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree can have
more than one child except the leaf nodes whereas each node can have at most one parent except the root
node. Trees can be classified into many categories.
1) Traversing: Every data structure contains the set of data elements. Traversing the data structure means
visiting each element of the data structure in order to perform some specific operation like searching or
sorting.
Example: If we need to calculate the average of the marks obtained by a student in 6 different subject,
we need to traverse the complete array of marks and calculate the total sum, then we will divide that sum by
the number of subjects i.e. 6, in order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the data structure at any
location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion: The process of removing an element from the data structure is called Deletion. We can delete an
element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is called Searching.
There are two algorithms to perform searching, Linear Search and Binary Search.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are
many algorithms that can be used to perform sorting, for example, insertion sort, selection sort, bubble sort,
etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements,
clubbed or joined to produce the third list, List C of size (M+N), then this process is called merging
Algorithm
An algorithm is a procedure having well defined steps for solving a particular problem. Algorithm is
finite set of logic or instructions, written in order for accomplish the certain predefined task. It is not the
complete program or code, it is just a solution (logic) of a problem, which can be represented either as an
informal description using a Flowchart or Pseudo code.
Time complexity: It is a way of representing the amount of time needed by a program to run to the
completion.
Space complexity: It is the amount of memory space required by an algorithm, during a course of its
execution. Space complexity is required in situations when limited memory is available and for the multi user
system.
Each algorithm must have:
Specification: Description of the computational procedure.
Pre-conditions: The condition(s) on input.
Body of the Algorithm: A sequence of clear and unambiguous instructions.
Post-conditions: The condition(s) on output.
Example: Design an algorithm to multiply the two numbers x and y and display the result in z.
Step 1: START
Step 2: declare three integers x, y & z
Step 3: define values of x & y
Step 4: multiply values of x & y
Step 5: stores the output of step 4 in z
Step 6: print z
Step 7: STOP
Characteristics of an Algorithm
An algorithm must follow the mentioned below characteristics:
There is something or someone capable of understanding and following the instructions given. We call this a
"computer".
Pseudocode Conventions
Algorithm can be described in many ways like English for specifying algorithm in step by step and
Flowchart for graphical representation.
But these two ways work well only if algorithm is small or simple. For large or big algorithm we are
going to use pseudo code.
Pseudo code is a kind of structured English for describing algorithms.
It allows the designer to focus on the logic of the algorithm without being distracted (diverted) by
details of language syntax.
At the same time, the pseudo code needs to be complete. It describes the entire logic of the algorithm
so that implementation becomes a rote mechanical task of translating line by line into source code.
ALGORITHM Sum(num1,num2)
Input: read num1,num2
Output: write sum of num1&num2
{
Read num1,num2;
Sum = num1+num2;
Write sum;
}
Pseudo code conversion:
1. “ // ” Is used to provide comment line.
2. Using of matching parenthesis “{}” to form blocks. A compound statement (i.e., Collection of simple
statements) can be represented as a block.
3. An identifier begins with a letter. Data types of variables are not explicitly declared. But types will be
clear from the context whether a variable is global or local to a procedure.
For example:
node=record
{
datatype_1 data_1;
.
.
datatype_n data_n;
node *link;
}
4. Assignment of values to variables is done using assignment statement.
<variable> := <expression>;
5. There are
Two Boolean values true and false.
Logical operator “and, or and not” instead of “&, ||, !” respectively.
Relation operator “<, ≤, =, ≠, ≥ and >”.
9 Input and output are done using the instructions read and write. No format is used to specify the size
of input or output quantities.
10 There is only one type of procedure: Algorithm. An algorithm consists of a heading and a body. The
heading takes the form
A very elegant (Style or design or dress) solution for this tower of Hanoi by using Recursion.
Algorithm TowersOfHanoi(n, x, y, z)
//Move the top n disks form Tower x to Tower y
{
if(n≥1) then
TowerOfHanoi(n-1, x, z,y);
write(“Move top disk form Tower x to top of tower y);
TowerOfHanoi(n-1,z,y,x);
}
}
For example:
Input: if number of disks n= 2
Output: Total number of moves =2n-1
22-1=3 moves as shown below
S-1: Disk 1 moved from A to B
S-2: Disk 2 moved from A to C
S-3: Disk 1 moved from B to C
Similarly for number of disks =3
Input : 3
Output :
S-1: Disk 1 moved from A to C
S-2: Disk 2 moved from A to B
S-3: Disk 1 moved from C to B
S-4: Disk 3 moved from A to C
S-5: Disk 1 moved from B to A
S-6: Disk 2 moved from B to C
S-7: Disk 1 moved from A to C
Performance Analysis
Introduction:
Analyze Algorithms or performance analysis:
If an algorithm is executed, it used the
Computer’s CPU to perform the operations.
Memory (both RAM and ROM) to hold the program & data.
Analysis of algorithms is the task of determining how much computing time and storage memory required
for an algorithm.
They are many criteria upon which we can judge an algorithm to say it perform well.
1) Does it do what we want it to do?
2) Does it work correctly according to the original specifications of the task?
3) Is there documentation that describes how to use it and how it works?
4) Are procedures created in such a way that they perform logical sub functions?
5) Is the code readable?
A fixed partthat is independent of the characteristics of input and output. Example Number & size.
In this includes instructions space [space for code] + Space for simple variables + Fixed-size
component variables + Space for constant & soon.
A variable part that consists of the space needed by component variables whose size is dependent
on the particular problem instance being solved + The space needed by referenced variables +
Recursion stack space.
S(P)=C+ SP
CConstant
SP Instance characteristics.
Find space complexity of iterative Find space complexity for Recursive algorithm:
Algorithm of sum of ‘n’ numbers.
Algorithm: Algorithm:
Algorithm sum(a,n) Algorithm RSUM(a,n)
//Here a is array of Size n //a is an array of size n
{ {
S:=0; If(n≤0) then return 0;
if(n≤0)) then return 0; Else return a[n]+RSUM(a,n-1);
for i:=1 to n do }
s:=s+a[i];
return s;
}
Space complexity S(P): Space Complexity:
s1 word The recursion stack space includes space for the
i1 word formal parameters, the local variables and the return
n1 word S(P)≥ n+3 address.
an word Return address requires1 word memory
Each call to RSUM requires at least 3words (It
Word is a String of bits stored in computer includes space for the values of n, the return address
memory, its size is a 4 to 64 bits. and a pointer to a[]).
The depth recursion is n+1
The recursion stack space needed is ≥3(n+1)
Definition:
Time complexity of an algorithm is the amount of computer time it needs to run to completion.
Here RUN means Compile + Execution.
Time Complexity
T(P)=tc+tp
But we are neglecting t c Because the compile time does not depends on the instance characteristics. The
compiled program will be run several times without recompilation.
So T(P)= tp
Here tp instance characteristics.
For example:
The program p do some operations like ADD, SUB, MUL etc.
If we knew the characteristics of the compiler to be used, we could process to determine the number of
additions, subtractions, multiplications, divisions, compares, loads, stores and so on.
We obtain tp(n) express as follow:
tp(n)=CaADD(n)+ CSSUB(n)+ CmMUL(n)+ CdDIV(n)+…........
n Instance characteristics
Ca, CS, Cm, Cd, …... time needed for an addition, Subtraction, multiplication, division, and etc.
ADD, SUBB, MUL, DIV Are functions, whose values are the numbers of additions, subtractions,
multiplications,.. etc, that are performed when the code for p is used on
an instance with characteristics.
Performance Measurement
Program step:Program step is the syntactically or semantically meaningful segment of a program. And it
has an execution time that is independent of the instance characteristics.
Example:
For comment//-- zero steps
For assignment statements (Which does not involve any calls to other algorithms)
:= one step
For iterative statements such as “for, while and until-repeat” statements, we consider the step counts
only for control part of the statement.
For while loop “while (<expr>) do “ : the step count for control part of a while stmt is
Number of step counts for assignable to <expr>
For for loop ie for i:=<expr> to <expr1> do: The step count of control part of “for” statement is
Sum of the count of <expr> & <expr1> N and remaining execution of the “for” statement, i.e., one.
We determine the number of steps needed by a program to solve a particular problem. For this there
are two methods.
2) Second method is, building a “table” in which we list the total number of steps contributed by each
statement.
Find time complexity of Iterative algorithm Find time complexity for Recursive algorithm:
of sum of ‘n’ numbers.
Algorithm: Algorithm:
Algorithm sum(a,n) Algorithm RSUM(a,n)
{ {
// count is global it is initially zero Count :=count+1; // for if condition
S:=0; If(n≤0) then
Count:=count+1; // count for assignment {
for i:=1 to n do Count:=count+1; // for return stmt
{ return 0;
Count :=count+1; //for “for” loop }
s:=s+a[i]; Else {
count :=count+1; //for assingment Count :=count+1;
} //for the addition, function invocation & return
Count :=count+1; //for last time of for loop return a[n]+RSUM(a,n-1);
Count :=count+1; // for return stmt }
return s; }
}
In the above example the recursive algorithm has less time complexity then iterative algorithm.
2n+3
2 2+x
Here x= tRsum(n-1)
Asymptotic Notation:
A problem may have numerous (many) algorithmic solutions. In order to choose the best algorithm
for a particular task, you need to be able to judge how long a particular solution will take to run.
Asymptotic notation is used to judge the best algorithm among numerous algorithms for a particular
problem.
Asymptotic complexity is a way of expressing the main component of algorithms like
Cost
Time complexity
Space complexity
Some Asymptotic notations are
1. Big ohO
2. OmegaΩ
3. Theta θ
4. Little oho
5. Little Omegaω
Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity.
That means Big - Oh notation always indicates the maximum time required by an algorithm for all input
values. That means Big - Oh notation describes the worst case of an algorithm time complexity.
The function f(n) =O(g(n)) (read as “f of n is big oh of g of n) iff (if and only if) there exit positive
constants c and n0 such that f(n)<=c*g(n) for all n, n>=n0
f(n)=O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates the
algorithm's upper bound.
Example
Big - Omega notation is used to define the lower bound of an algorithm in terms of Time Complexity.
That means Big - Omega notation always indicates the minimum time required by an algorithm for all input
values. That means Big - Omega notation describes the best case of an algorithm time complexity.
The function f(n) = Ω(g(n)) (read as “f of n is omega of g of n) iff (if and only if) there exit positive
constants c and n0 such that f(n)>=c*g(n) for all n, n>=n0 f(n) =
Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C x g(n) is less than f(n) which indicates the
algorithm's lower bound.
Example
Big - Theta notation is used to define the average bound of an algorithm in terms of Time Complexity.
That means Big - Theta notation always indicates the average time required by an algorithm for all input
values. That means Big - Theta notation describes the average case of an algorithm time complexity.
The function f(n) = Θ (g(n)) (read as “f of n is theta of g of n) iff (if and only if) there exist positive
constants c1,c2 and n0 such thatc1*g(n)<= f(n)<=c2*g(n) for all n, n>=n0
f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is greater
than f(n) which indicates the algorithm's average bound.
Example
Lim ((3n+2)/n2)=0
n~
2. Little Omega:ω
The function f(n)= ω (g(n)) (read as “f of n is little ohomega of g of n”) iff
Lim (n2/(3n+2) =0
n~
Performance Measurement
Order of Growth
The simplest and most obvious way to search a table is to use Linear search i.e. examine the 1st, 2nd,
3rd,...entries until the entry with the required key is found or the end of the table is reached without the entry
being found. The body of the search function could be written as follows if Linear search is used:
int found;
int i;
found = FALSE;
i = 0;
while (!found && i<n)
{
if (!strcmp(word, table[i].key)) // strcmp returns zero
// if strings are same
{
found = TRUE;
index = i;
}
else i++;
}
return found;
The efficiency of Linear search is now considered. In assessing the efficiency of an algorithm it is
usual to count the number of occurrences of some typical operation as a function of the size of the
problem.
In searching the major operation is the number of comparisons of the searched-for key against the
table entries and the problem size is taken to be the number of entries in the table. Hence in Linear
search of a table with n entries we have:
Best Case:
Is the minimum number of steps that can be achieved for given parameters.
Worst Case
Is the maximum number of steps that can be achieved for given parameters
Average Case
Average case is the situation in which it is not either best case or worst case
C Avg(n) = [1*p/n+2*p/n+3*p/n+….+n*p/n]+n[1-p]
= p/n[1+2+3+4+……….+n]+n[1-p]
=p/n [n (n+1)/2]+n[1-p]
=p(n+1)/2 +n[1-p]
= n+1/2
=n
Hence Linear Search is an order(n) process - i.e. it takes a time proportional to the number of entries. For
example if n was doubled then, on average, the search would take twice as long.
Performance measurement is the process of executing a correct program on different data sets to
measure the time and space that it takes to compute the results. Complexity of a program is generally
some function of the instance characteristics
The ultimate test is performed to ensure that the program developed on the basis of the designed
algorithm, runs satisfactorily. Testing a program involves two phases viz., debugging and profiling.
However, it is pointed out that “debugging can only indicate the presence of errors but not their
absence” i.e., a program that yields unsatisfactory results on a sample data set is definitely faulty, but
on the other hand a program producing the desirable results on one/more data sets need not be correct.
In order to actually prove that a program is perfect, a process of “proving” is necessary wherein the
program is analytically proved to be correct and in such cases, it is bound to yield perfect results for
all possible sets of data.
On the other hand, profiling or performance measurement is the process of executing a correct
program on different data sets to measure the time and space that it takes to compute the results. That
several different programs may do a given job satisfactorily.
This is the final stage of algorithm evaluation. A question to be answered when the program is ready
for execution, (after the algorithm has been devised, made a priori analysis of that, coded into a
program debugged and compiled) is how do we actually evaluate the time taken by the program?
Obviously, the time required to read the input data or give the output should not be taken into
account. If somebody is keying in the input data through the keyboard or if data is being read from an
input device, the speed of operation is dependent on the speed of the device, but not on the speed of
the algorithm. So, we have to exclude that time while evaluating the programs
The entire procedure explained above is called “profiling”. However, unfortunately, the times
provided by the system clock are not always dependable. Most often, they are only indicative in
nature and should not be taken as an accurate measurement. Especially when the time durations
involved are of the order of 1-2 milliseconds, the figures tend to vary often between one run and the
other, even with the same program and all same input values .
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)
ARRAYS
Array is a container which can hold a fix number of items and these items should be of the same type. Most
of the data structures make use of arrays to implement their algorithms. Following are the important terms to
understand the concept of Array.
Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's take C array declaration.
As per the above illustration, following are the important points to be considered.
Index starts with 0.
Each element can be accessed via its index. For example, we can fetch an element at index 6 as 9.
Basic Operations
Traverse − print all the array elements one by one.
Insertion − Adds an element at the given index.
Deletion − Deletes an element at the given index.
Search − Searches an element using the given index or by the value.
Merging -- two arrays
Sorting an array in ascending order or descending order
In C, when an array is initialized with size, then it assigns defaults values to its elements in following order.
bool false
char 0
int 0
float 0.0
double 0.0f
void
wchar_t 0
Step 1. Start
Step 2. [INITIALIZATION] Set I lower bound
Step 3. Repeat steps 3to 4 while I< = upper bound
Step 4. Apply process to A [I]
Step 5. Set I = I + 1
[End loop]
Step 6: Exit
Example Program to read and display n numbers using an array
#include<stdio.h>
#include<conio.h>
int main( )
{
inti,n,arr[20];
clrscr( );
printf(“\n enter the number of elements in the array:”);
scanf(“%d”,&n);
for(i=0; i<n; i++)
{
Printf(“\n arr[%d]= “, i);
Scanf(“%d”,&arr[i]);
}
Printf(“\n the array elements are”);
For(i=0; i<n; i++)
Printf(“\t %d”,arr[i]);
return 0;
}
Output
Enter the number of elements in the array:5
Arr[0] =1
Arr[1]=2
Arr[2]=3
Arr[3]=4
Arr[4]=5
The elements in the array are 1 2 3 4 5
Step 1. Start
Step 2. Set upper_bound = upper_bound+1
Step 3. Set A[upper_bound] = val
Step 4. Stop
Example
Data [ ]={12, 23, 34, 45, 56, 67, 78, 89, 90,100};
Solution:
Index Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6] Data[7] Data[8] Data[9]
Data value 12 23 34 45 56 67 78 89 90 100
Index Data[0] Data[1] Data[2] Data[3] Data[4] Data[5] Data[6] Data[7] Data[8] Data[9] Data[10]
Data value 12 23 34 45 56 67 75 78 89 90 100
Step 1. Start
Step 2. [INITIALIZATION] set I = N
Step 3. Repeat Steps 4 and 5 While I > = pos
Step 4. Set A[ I+1 ] = A[ I ]
Step 5. Set I = I – 1
[End of the loop]
Step 6. Set N = N + 1
Step 7. Set A[pos] = value
Step 8. Stop
Calling Insert (Data, 3, 100) will lead to the following in the array
#include <stdio.h>
#include <conio.h>
int main()
{
int i, n, num, pos, arr[10] ;
clrscr();
printf("\n Enter the number of elements in the array:" );
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n Arr[%d] = :",i);
scanf("%d", &arr [i]);
}
printf("\n Enter the number to be inserted:");
scanf("%d",&num);
printf("\n Enter the position at which the number has to be added:");
scanf("%d", &pos);
for(i=n;i>=pos;i--)
{
arr[i+1] = arr[i];
}
arr[pos] = num;
n++;
printf("\n The array after insertion is: ",&num);
for(i=0;i<n;i++)
{
printf (" \n Arr [%d]=%d",i,arr[i]);
}
getch();
return 0;
}
1. Start
2. Set upper_bound = upper_bound -1
3. Stop
#include <stdio.h>
#define MAX_SIZE 100
intmain()
{
intarr[MAX_SIZE];
int i, size,pos;
printf("Enter size of the array : ");
scanf("%d",&size);
printf("Enter elements in array : ");
for(i=0; i<size; i++)
{
scanf("%d",&arr[i]);
}
printf("Enter the element position to delete : ");
scanf("%d",&pos);
if(pos<0||pos> size)
{
printf("Invalid position! Please enter position between 1 to %d", size);
}
else
{
for(i=pos-1; i<size-1; i++)
{
arr[i]=arr[i +1];
}
size--;
}
printf("\nElements of array after delete are : ");
for(i=0; i<size; i++)
{
printf("%d\t",arr[i]);
}
return0;
}
Output
Enter size of the array: 5
Enter elements in array: 10 20 30 40 50
Enter the element position to delete: 2
Elements of array after delete are: 10 30 40 50
Array 1 90 56 89 77 69
Array 2 45 88 76 99 12 58 81
Array 3 90 56 89 77 69 45 88 76 99 12 58 81
Merging of two unsorted arrays
If we have two sorted arrays and the resultant merged array and needs to be a sorted one, thenthe
task of merging the arrays become a little difficult
.
Array 1 20 30 40 50 60
Array 2 15 22 31 45 56 62 78
Array 3 15 20 22 30 31 40 45 50 56 60 62 78
Merging of two sorted arrays
The merged array is formed using two sorted arrays. Here we first compare the 1 st element of the
array 1 with the 1st element of the array 2, and then put the smaller element in the merged array. Since 20>
15, we put 15 as the first element in the merged array. Then compare the 2nd element of the second array
with the 1st element of the first array, since 20 < 22, now 20 is stored as second element of the merged array
and the process will continues until the merged array will complete.
Passing addresses
Time Complexity
Algorithm Average Case Worst Case
Space Complexity
In array, space complexity for worst case is O(n).
Advantages of Array
Array provides the single name for the group of variables of the same type therefore, it is easy to
remember the name of all the elements of an array.
Traversing an array is a very simple process, we just need to increment the base address of the array
in order to visit each element one by one.
Any element in the array can be directly accessed by using the index.
0 (zero - based indexing) : The first element of the array will be arr[0].
1 (one - based indexing) : The first element of the array will be arr[1].
n (n - based indexing) : The first element of the array can reside at any random index number.
we have shown the memory allocation of an array arr of size 5. The array follows 0-based indexing approach.
The base address of the array is 100th byte. This will be the address of arr[0]. Here, the size of int is 4 bytes
therefore each element will take 4 bytes in the memory.
In an array, A[-10 ..... +2 ], Base address (BA) = 999, size of an element = 2 bytes,
find the location of A[-1].
L(A[-1]) = 999 + [(-1) - (-10)] x 2
= 999 + 18
= 1017
Memory Allocation Process
Global variables, static variables and program instructions get their memory in permanent storage
area whereas local variables are stored in a memory area called Stack.
The memory space between these two regions is known as Heap memory. This region is used for
dynamic memory allocation during execution of the program. The size of heap keeps changing.
allocates requested size of bytes and returns a void pointer pointing to the first byte of the
malloc ()
allocated space
allocates space for an array of elements, initialize them to zero and then returns a void pointer to
calloc ()
the memory
Let’s understand the difference between static memory allocation and dynamic memory allocation.
Memory can't be increased while executing program. Memory can be increased while executing program.
Malloc () function
The malloc() function allocates single block of requested memory.
It doesn't initialize memory at execution time, so it has garbage value initially.
It returns NULL if memory is not sufficient.
#include<stdio.h>
#include<stdlib.h>
int main()
{
int n,i,*ptr,sum=0;
clrscr();
realloc() function
If memory is not sufficient for malloc() or calloc(), you can reallocate the memory by realloc()
function. In short we can say that, it changes the memory size.
Calloc() Malloc()
1 .calloc() initializes the allocated memory with 0 1. Malloc () initializes the allocated memory with
value. garbage values.
2. Number of arguments is 2 2.Number of argument is 1
Example:
#include <stdio.h>
int summation(int[]);
void main ()
{
int arr[5] = {0,1,2,3,4};
int sum = summation(arr);
printf("%d",sum);
}
int summation (int arr[])
{
int sum=0,i;
for (i = 0; i<5; i++)
{
sum = sum + arr[i];
}
return sum;
}
The above program defines a function named as summation which accepts an array as an argument.
The function calculates the sum of all the elements of the array and returns it.
However, we can store the value stored in any particular cell of a 2D array to some variable x by using the
following syntax.
int x = a[i][j];
where i and j is the row and column number of the cell respectively.
Example:
for ( int i=0; i<n ;i++)
{
for (int j=0; j<n; j++)
{
a[i][j] = 0;
}
}
We know that, when we declare and initialize one dimensional array in C programming simultaneously, we
don't need to specify the size of the array. However this will not work with 2D arrays. We will have to define
at least the second dimension of the array.
#include <stdio.h>
void main ()
{
int arr[3][3],i,j;
clrscr();
for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
printf("Enter a[%d][%d]: ",i,j);
scanf("%d",&arr[i][j]);
}
}
printf("\n printing the elements ....\n");
for(i=0;i<3;i++)
{
printf("\n");
for (j=0;j<3;j++)
{
printf("%d\t",arr[i][j]);
getch();
}
}
}
There are two main techniques of storing 2D array elements into memory
First, the 1st row of the array is stored into the memory completely, then the 2nd row of the array is
stored into the memory completely and so on till the last row.
First, the 1st column of the array is stored into the memory completely, and then the 2nd row of the
array is stored into the memory completely and so on till the last column of the array.
APPLICATION OF ARRAYS
Arrays are widely used to implement mathematical vectors, matrices, and other kinds of rectangular
tables.
Many databases include one-dimensional arrays whose elements are records.
Arrays are also used to implement other data structures such as strings, stacks, queues, heaps, and
hash tables. We will read about these data structures in the subsequent chapters.
Arrays can be used for sorting elements in ascending or descending order.
Structure
A structure is a composite data type that defines a grouped list of variables that are to be placed under
one name in a block of memory. It allows different variables to be accessed by using a single pointer to the
structure.
data_type memeber;
};
Let's see the example to define a structure for an entity employee.
struct employee
{
int id;
char name[10];
float salary;
};
Here, struct is the keyword; employee is the name of the structure; id, name, and salary are the members or
fields of the structure.
Advantages
It can hold variables of different data types.
We can create objects containing different types of attributes.
It allows us to re-use the data layout across programs.
It is used to implement other data structures like linked lists, stacks, queues, trees, graphs etc.
Union example
#include <stdio.h>
#include <string.h>
Sorting
Sorting refers to the operation or technique of arranging and rearranging sets of data in some specific
order. A collection of records called a list where every record has one or more fields. The fields which
contain a unique value for each record is termed as the key field.
For example, a phone number directory can be thought of as a list where each record has three fields -
'name' of the person, 'address' of that person, and their 'phone numbers'. Being unique phone number can
work as a key to locate any record in the list.
Sorting is the operation performed to arrange the records of a table or list in some order according to
some specific ordering criterion. Sorting is performed according to some key value of each record.
The records are either sorted either numerically or alphanumerically. The records are then arranged in
ascending or descending order depending on the numerical value of the key. Here is an example, where the
sorting of a lists of marks obtained by a student in any particular subject of a class.
Categories of Sorting
Internal Sorting
External Sorting
Internal Sorting: If all the data that is to be sorted can be adjusted at a time in the main memory, the internal
sorting method is being performed.
External Sorting: When the data that is to be sorted cannot be accommodated in the memory at the same
time and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes etc, then
external sorting methods are performed.
Best case
Worst case
Average case
Hence, the result of these cases is often a formula giving the average time required for a particular sort of
size 'n.' Most of the sort methods have time requirements that range from O(nlog n) to O(n 2).
Types of Sorting Techniques
Bubble Sort
Selection Sort
Insertion Sort
Merge Sort
Quick Sort
Heap Sort
Radix Sort
Quick Sort Algorithm
Quick sort is a fast sorting algorithm used to sort a list of elements. Quick sort algorithm is invented
by C. A. R. Hoare.
The quick sort algorithm attempts to separate the list of elements into two parts and then sort each
part recursively. That means it use divide and conquer strategy. In quick sort, the partition of the list is
performed based on the element called pivot. Here pivot element is one of the elements in the list.
The list is divided into two partitions such that "all elements to the left of pivot are smaller than the
pivot and all elements to the right of pivot are greater than or equal to the pivot".
Step 1 - Consider the first element of the list as pivot (i.e., Element at first position in the list).
Step 2 - Define two variables i and j. Set i and j to first and last elements of the list respectively.
Step 3 - Increment i until list[i] > pivot then stop.
Step 4 - Decrement j until list[j] < pivot then stop.
Step 5 - If i < j then exchange list[i] and list[j].
Step 6 - Repeat steps 3,4 & 5 until i > j.
Step 7 - Exchange the pivot element with list[j] element.
To sort an unsorted list with 'n' number of elements, we need to make ((n-1) + (n-2) + (n-3) +......+1) = (n (n-
1))/2 number of comparisons in the worst case. If the list is already sorted, then it requires 'n' number of
comparisons.
#include<stdio.h>
#include<conio.h>
void quickSort(int [10],int,int);
void main()
{
int list[20],size,i;
printf("Enter size of the list: ");
scanf("%d",&size);
printf("Enter %d integer values: ",size);
for(i = 0; i < size; i++)
scanf("%d",&list[i]);
quickSort(list,0,size-1);
printf("List after sorting is: ");
for(i = 0; i < size; i++)
printf(" %d",list[i]);
getch();
}
void quickSort(int list[10],int first,int last){
int pivot,i,j,temp;
if(first < last)
{
pivot = first;
i = first;
j = last;
while(i < j)
{
while(list[i] <= list[pivot] && i < last)
i++;
while(list[j] > list[pivot])
j--;
if(i <j)
{
temp = list[i];
list[i] = list[j];
In Merge Sort, the given unsorted array with n elements is divided into n sub arrays, each having one
element, because a single element is always sorted in itself. Then, it repeatedly merges these sub arrays, to
produce new sorted sub arrays, and in the end, one complete sorted array is produced.
But breaking the original array into 2 smaller sub arrays is not helping us in sorting the array.
So we will break these sub arrays into even smaller sub arrays, until we have multiple sub arrays with
single element in them. Now, the idea here is that an array with a single element is already sorted, so once we
break the original array into sub arrays which has only a single element, we have successfully broken down
our problem into base problems.
And then we have to merge all these sorted sub arrays, step by step to form one single sorted array.
Let's consider an array with values {14, 7, 3, 12, 9, 11, 6, and 12}
We take a variable p and store the starting index of our array in this. And we take another variable r
and store the last index of array in it.
Then we find the middle of the array using the formula (p + r)/2 and mark the middle index as q, and
break the array into two sub arrays, from p to q and from q + 1 to r index.
Then we divide these 2 sub arrays again, just like we divided our main array and this continues.
Once we have divided the main array into sub arrays with single elements, then we start merging the
sub arrays.
Example program
#include<stdio.h>
void mergeSort(int[],int,int);
void merge(int[],int,int,int);
void main ()
{
int a[10]= {10, 9, 7, 101, 23, 44, 12, 78, 34, 23};
int i;
clrscr();
mergeSort(a,0,9);
printf("printing the sorted elements");
for(i=0;i<10;i++)
{
printf("\n%d\n",a[i]);
}
getch();
}
void mergeSort(int a[], int beg, int end)
{
int mid;
if(beg<end)
{
mid = (beg+end)/2;
mergeSort(a,beg,mid);
mergeSort(a,mid+1,end);
merge(a,beg,mid,end);
}
}
void merge(int a[], int beg, int mid, int end)
{
int i=beg,j=mid+1,k,index = beg;
int temp[10];
while(i<=mid && j<=end)
{
if(a[i]<a[j])
{
temp[index] = a[i];
Example Problem
To sort an unsorted list with 'n' number of elements, following are the complexities...
Example program
#include<stdio.h>
int temp;
void heapify(int arr[], int size, int i)
{
int largest = i;
int left = 2*i + 1;
int right = 2*i + 2;
if (left < size && arr[left] >arr[largest])
largest = left;
INTRODUCTION
Stack is an important data structure which stores its elements in an ordered manner. We will explain the
concept of stacks using an analogy. You must have seen a pile of plates where one plate is placed on top of
another as shown in Fig. Now, when you want to remove a plate, you remove the topmost plate first. Hence,
you can add and remove an element (i.e., a plate) only at/from one position which is the topmost position.
A stack is a linear data structure which uses the same principle, i.e., the elements in a stack are
added and removed only from one end, which is called the TOP. Hence, a stack is called a LIFO (Last-
In-First-Out) data structure, as the element that was inserted last is the first one to be taken out.
Now the question is where do we need stacks in computer science? The answer is in function calls.
Consider an example, where we are executing function A. In the course of its execution, function A calls
another function B. Function B intern calls another function C, which calls function D.
In order to keep track of the returning point of each active function, a special stack called system
stack or call stack is used. Whenever a function calls another function, the calling function is pushed onto the
Top of the stack. This is because after the called function gets executed, the control is passed back to the
calling function.
Now when function E is executed, function D will be removed from the top of the stack and
executed. Once function D gets completely executed, function C will be removed from the stack for
execution. The whole procedure will be repeated until all the functions get executed. Let us look at the stack
after each function is executed.
N Venkata Vinod Kumar 1
The system stack ensures a proper execution order of functions. Therefore, stacks are frequently used
in situations where the order of processing is very important, especially when the processing needs to be
postponed until other conditions are fulfilled.
The stack in Fig. shows that TOP = 4, so insertions and deletions will be done at this position.In the above
stack, five more elements can still be stored.
OPERATIONS ON A STACK
A stack supports three basic operations: push, pop, and peek. The push operation adds an element to
the top of the stack and the pop operation removes the element from the top of the stack. The peek operation
returns the value of the topmost element of the stack.
Push Operation
The push operation is used to insert an element into the stack. The new element is added at thetopmost
position of the stack. However, before inserting the value, we must first check if TOP=MAX–1,because if
that is the case, then the stack is full and no more insertions can be done. If an attemptis made to insert a
value in a stack that is already full, an OVERFLOW message is printed.
To insert an element with value 6, we first check if TOP=MAX–1. If the condition is false, then we
increment the value of TOP and store the new element at the position given by stack [TOP].
Pop Operation
The pop operation is used to delete the topmost element from thestack. However, before deleting the
value, we must first check ifTOP=NULL because if that is the case, then it means the stack is emptyand no
more deletions can be done. If an attempt is made to delete a value from a stack that isalready empty, an
UNDERFLOW message is printed.
To delete the topmost element, we first check if TOP=NULL. If the condition is false, then wedecrement the
value pointed by TOP.
To delete an element from a stack, we first check for the UNDERFLOW condition, the value of the location
in the stack pointed by TOP is stored in VAL, TOP is decremented.
Peek/Peep Operation
Peek is an operation that returns the value of the top most element of the stack without deleting it
from the stack. However, the Peek operation first checks if the stack is empty ,i.e., if TOP = NULL, then an
appropriate message is printed, else the value is returned.
Here, the Peek operation will return 5, as it is the value of the topmost element of the stack.
MULTIPLE STACKS
While implementing a stack using an array, we had seen that the size of the array must be known in
advance. If the stack is allocated less space, then frequent OVERFLOW conditions will be encountered.
To deal with this problem, the code will have to be modified to reallocate more space for the array. In case
we allocate a large amount of space for the stack, it may result in sheer wastage of memory. Thus, there lies a
trade-off between the frequency of overflows and the space allocated. So, a better solution to deal with this
problem is to have multiple stacks or to have more than one stack in the same array of sufficient size.
Assume that we have ‘n’ stacks, we can divide the available memory into ‘n’ segments (Fig: 3.18).
Let ‘i’ = stack number of one of n stacks.
Let boundary[i] (0<=i<MAX_STACKS) points to position immediately to left of bottom element of
stack i, while top[i] (0<=i<MAX_STACKS) points to top element.
Stack i is empty iff boundary[i]=top[i]
The relevant declaration are:
#define MEMORY_SIZE 100 /* size of memory */
#define MAX_STACKS 10 /* max number of stacks plus 1 */
element memory[MEMORY_SIZE];
int top[MAX_STACKS];
int n; /* number of stacks entered by the user */
To divide the array into equal segment, we use the following code:
top[0]=boundary[0]= -1;
for(j=1;j<n; j++)
top[j]=boundary[j]=(MEMORY_SIZE/n)*j;
boundary[n]=MEMORY_SIZE-1;
APPLICATIONS OF STACKS
In this section we will discuss typical problems where stacks can be easily applied for a simple and
efficient solution. The topics that will be discussed in this section include the following:
Reversing a list
Parentheses checker
Conversion of an infix expression into a postfix expression
Evaluation of a postfix expression
Conversion of an infix expression into a prefix expression
Evaluation of a prefix expression
Recursion
Tower of Hanoi
Parsing
Browsers
Editors
Tree Traversals
A postfix operation does not even follow the rules of operator precedence. The operator which occurs first in
the expression is operated first on the operands. For example, given a postfix notation AB+C*. While
evaluation, addition will be performed prior to multiplication.
Thus we see that in a postfix notation, operators are applied to the operands that are immediately left
to them. In the example, AB+C*, + is applied on A and B, then * is applied on the result of addition and C.
Although a prefix notation is also evaluated from left to right, the only difference between a postfix
notation and a prefix notation is that in a prefix notation, the operator is placed before the operands. For
example, if A+B is an expression in infix notation, then the corresponding expression in prefix notation is
given by +AB.
While evaluating a prefix expression, the operators are applied to the operands that are present
immediately on the right of the operator. Like postfix, prefix expressions also do not follow the rules of
operator precedence and associativity, and even brackets cannot alter the order of evaluation.
Solution
(i) (A–B) * (C+D)
[AB–] * [CD+]
AB–CD+*
(ii) (A + B) / (C + D) – (D * E)
[AB+] / [CD+] – [DE*]
[AB+CD+/] – [DE*]
AB+CD+/DE*–
Let I be an algebraic expression written in infix notation. I may contain parentheses, operands, and
operators. For simplicity of the algorithm we will use only +, –, *, /, %operators. The precedence of these
operators can be given as follows:
Higher priority *, /, %
No doubt, the order of evaluation of these operators can be changed by making use of parentheses.
For example, if we have an expression A + B * C, then first B * C will be done and the result will be added
to A. But the same expression if written as, (A + B) * C, will evaluate A + B first and then the result will be
multiplied with C.
The algorithm given below transforms an infix expression into postfix expression, as shown in Fig.
The algorithm accepts an infix expression that may contain operators, operands, and parentheses. For
simplicity, we assume that the infix operation contains only modulus (%), multiplication (*), division (/),
addition (+), and subtraction (―) operators and that operators with same precedence are performed from left-
to-right.
The algorithm uses a stack to temporarily hold operators. The postfix expression is obtained from
left-to-right using the operands from the infix expression and the operators which are removed from the
stack. The first step in this algorithm is to push a left parenthesis on the stack and to add a corresponding
right parenthesis at the end of the infix expression. The algorithm is repeated until the stack is empty.
Convert the following infix expression into postfix expression using the algorithm
A – (B / C + (D % E * F) / G)* H
(A – (B / C + (D % E * F) / G)* H)
Let us now take an example that makes use of this algorithm. Consider the infix expressiongiven as 9 – ((3 *
4) + 8) / 4. Evaluate the expression.
Queues
A queue is an important data structure which is extensively used in computer applications. In this we
will study the operations that can be performed on a queue. Will also discuss the implementation of a queue
by using both arrays as well as linked lists, Will illustrate different types of queues like multiple queues,
double ended queues, circular queues, and priority queues. And also lists some real-world applications of
queues.
INTRODUCTION
Let us explain the concept of queues using the analogies given below.
People moving on an escalator. The people who got on the escalator first will be the first one to step
out of it.
People waiting for a bus. The first person standing in the line will be the first one to get into the bus.
People standing outside the ticketing window of a cinema hall. The first person in the line will get the
ticket first and thus will be the first one to move out of it.
Luggage kept on conveyor belts. The bag which was placed first will be the first to come out at the
other end.
Cars lined at a toll bridge. The first car to reach the bridge will be the first to leave.
A queue is a FIFO (First-In, First-Out) data structure in which the element that is inserted first is the first
one to be taken out. The elements in a queue are added at one end called the REAR and removed from the
other end called the FRONT. Queues can be implemented by using either arrays or linked lists.
In above Fig FRONT = 0 and REAR = 5. Suppose we want to add another element with value 45, then
REAR would be incremented by 1 and the value would be stored at the position pointed by REAR.
Here, FRONT = 0 and REAR = 6. Every time a new element has to be added, we repeat the same procedure.
If we want to delete an element from the queue, then the value of FRONT will be incremented. Deletions are
done from only this end of the queue. Here, FRONT = 1 and REAR = 6.
The algorithm to insert an element in a queue, In Step 1, we first check for the overflow condition. In
Step 2, we check if the queue is empty. In case the queue is empty, then both FRONT and REAR are set to
zero, so that the new value can be stored at the 0th location. Otherwise, if the queue already has some values,
then REAR is incremented so that it points to the next location in the array. In Step 3, the value is stored in
the queue at the location pointed by REAR.
Similarly, before deleting an element from a queue, we must check for underflow conditions. An underflow
condition occurs when we try to delete an element from a queue that is already empty. If FRONT = –1 and
REAR = –1, it means there is no element in the queue.
The algorithm to delete an element from a queue, In Step 1, we check for underflow condition. An underflow
occurs if FRONT = –1 or FRONT > REAR. However, if queue has some values, then FRONT is
incremented so that it now points to the next value in the queue.
TYPES OF QUEUES
A queue data structure can be classified into the following types:
1. Circular Queue
2. Deque
3. Priority Queue
4. Multiple Queues
Circular Queues
In linear queues, we have discussed so far that insertions can be done only at one end called theREAR
and deletions are always done from the other end called the FRONT.
Even though thereis space available, the overflow condition still exists because the condition rear =
MAX – 1 still holdstrue. This is a major drawback of a linear queue.
To resolve this problem, we have two solutions. First, shift the elements tothe left so that the vacant
space can be occupied and utilized efficiently. Butthis can be very time-consuming, especially when the
queue is quite large.The second option is to use a circular queue. In the circular queue, thefirst index comes
right after the last index.
The circular queue will be full only when front = 0 and rear = Max – 1. A circular queue is
implemented in the same manner as a linear queue is implemented. The only difference will be in the code
that performs insertion and deletion operations. For insertion, we now have to check for the following three
conditions:
If front! = 0 and rear = MAX – 1, then it means that the queue is not full. So, set rear = 0 and insert
the new element there
The algorithm to insert an element in a circular queue, In Step 1, we check for the overflow condition.
In Step 2, we make two checks. First to see if the queue is empty, and second to see if the REAR end has
already reached the maximum capacity while there are certain free locations before the FRONT end. In Step
3, the value is stored in the queue at the location pointed by REAR.
After seeing how a new element is added in a circular queue, let us now discuss how deletions are
performed in this case. To delete an element, again we check for three conditions.
Which shows the algorithm to delete an element from a circular queue, In Step 1, we check for the
underflow condition. In Step 2, the value of the queue at the location pointed by FRONT is stored in VAL. In
Step 3,we make two checks. First to see if the queue has become empty after deletion and second to see if
FRONT has reached the maximum capacity of the queue. The value of FRONT is then updated based on the
outcome of these checks.
N Venkata Vinod Kumar 14
CIRCULAR QUEUES USING DYNAMICALLY ALLOCATED ARRAYS
Figure3.8:Doublingqueuecapacity
• To get a proper circular queue configuration, we must slide the elements in the right segment to the
right end of the array(Figure3.8d).
• Thearraydoublingandtheslidetotherighttogethercopyatmost2*capacity-2elements.
• Thenumberofelementscopiedcanbelimitedtocapacity-
1bycustomizingthearraydoublingcodesoastoobtaintheconfigurationofFigure3.8e.Thisconfigurationm
aybeobtainedasfollows:
1) Create a new array newQueue of twice the capacity.
2) Copy the second segment to positions in newQueue begining at0.
3) Copy the first segment to positions in newQueue begining at capacity-front-1.
Multiple Queues
When we implement a queue using an array, the size of the array must be known in advance. If the queue is
allocated less space, then frequent overflow conditions will be encountered. To deal with this problem, the
code will have to be modified to reallocate more space for the array.
In case we allocate a large amount of space for the queue, it will result in sheer wastage of the memory.
Thus, there lies a tradeoff between the frequency of overflows and the space allocated.
So a better solution to deal with this problem is to have multiple queues or to have more than one queue in
the same array of sufficient size. Figure 8.31 illustrates this concept.
In the figure, an array Queue[n] is used to represent two queues, Queue A and Queue B. The value of n is
such that the combined size of both the queues will never exceed n. While operating on these queues, it is
important to note one thing queue A will grow from left to right, whereas queue B will grow from right to
left at the same time.
Extending the concept to multiple queues, a queue can also be used to represent n number of queues in the
same array. That is, if we have a QUEUE[n],then each queue I will be allocated an equal amount of space
bounded by indices b[i] and e[i].
APPLICATIONS OF QUEUES
Queues are widely used as waiting lists for a single shared resource like printer, disk, CPU.
Queues are used to transfer data asynchronously (data not necessarily received at same rateas sent)
between two processes (IO buffers), e.g., pipes, file IO, sockets.
LINKED LISTS
A linked list is ordered collection of finite. Homogeneous data elements called nodes where the linear order
is maintained by means of links or pointers.
A linked list is a data structure consisting of a group of nodes which together represent a sequence. Under
the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in
the sequence; more complex variants add additional links. This structure allows for efficient insertion or
removal of elements from any position in the sequence.
START
1
DATA NEXT
1 H 4
2
3
4 E 7
5
6
7 L 8
8 L 10
9
10 O -1
Let us see how a linked list is maintained in the memory, in order to form a linked list, we need a
structure called node which has two fields, DATA and NEXT. Data will store the information part and next
will store the address of the node in sequence. We see that the variable START is used to store the address of
the first node.
Representing Chains in C
Types of Linked Lists
Linked lists are a way to store data with structures so that the programmer can automatically create a new
place to store data whenever necessary. Specifically, the programmer writes a struct definition that contains
variables holding information about something and that has a pointer to a struct of its same type (it has to be
a pointer--otherwise, every time an element was created, it would create a new element, infinitely). Each of
these individual structs or classes in the list is commonly known as a node or element of the list. I.inkedLists
The entry point into a linked list is called the head of the list. It should be noted that head is not a
separate node, but the reference to the first node. If the list is empty then the head is a null reference.
Linked list is a dynamic data structure.
Node is represented as
Struct node
{
int data;
struct node *next;
}
Linked List Operations
Display/ Traverse− Displays the complete list.
Search − Searches an element using the given key.
Insertion − Adds an element at the beginning of the list.
Deletion − Deletes an element at the beginning of the list.
Traversing/Display
Start with the head and access each node until you reach null. Do not change the head reference
Algorithm for Traversing a Linked List
Suppose we want to add a new node with the data 9 and add it as the first node of the list. Then the
changes will do in the linked list. Allocate the memory for the new node and initialize its DATA part to 9.
Add the new node as the first node of the list by making the NEXT part of the new node contain the address
of the START. Now make START to point to the first node of the list.
Algorithm for GETNODE(NODE) to get a pointer of a memory block which suits the type NODE:
Example:
N Venkata Vinod Kumar 21
INSERTING A NODE AT ANY POSITION:
Data link
top
Front Rear
. data link
.
. Linked Queues
.
Linked Stacks
Directions of links for both the stack and queue are such as to facilitate easy insertion and deletion of nodes.
In the case of figure (a), one can easily add a node at the top or delete one from the top. In figure (b), one can
easily add a node at the rear and both addition and deletion can be performed at the front, though for a queue
we normally would not wish to add nodes at the front.
We assume initial condition for the stack is:
Top[i]=NULL, 0<=i<MAX_STACKS
And boundary condition is:
Top[i]=NULL iff the ith stack is empty
Functions push and pop, add and delete items to/from a stack. The code for each is straight forward. Function
push creates a new node, temp, and places item in the data field and top in the link field. The variable top is
then changed to point to temp. A typical function call to add an element to the ith stack would be push
(i,item). Function of returns the top element and changes top to point to the address contained in its link field.
The removed node is then returned to system memory. A typical function call to delete an element from the
ith stack would be item=pop(i);
void addq(i,item)
{
queuePointer temp;
MALLOC(temp,sizeof(*temp));
temp data = item;
temp link = NULL;
if(front[i])
rear[i] link = temp;
else
front[i] = temp;
rear[i] = temp;
}
Additional List Operations
Operations for Chains
Inverting a chain is another useful operation.
Typedef struct list Node *listPointer;
Typedef struct
{
Char data;
listPointer link;
}listNode;
For a list of length>=1 nodes, the while loop is executed length times and so the computing time is linear or
O(length).
listPointer invert(listPointer lead)
{
Another function is one that concatenates two chains, ptr1 and ptr2. The complexity of this function is
O(length of list ptr1).
By keeping a pointer last to the last node in the list rather than to the first, we are able to insert an element at
both the front and end with ease. Had we kept a pointer to the first node instead of the last node, inserting at
the front would require us to must move down the entire length of the list until we find the last node so that
we can change the pointer in the last node to point to the new first node.
From the figure, it can be noticed that two links, viz. RLINK and LLINK, point to the nodes on the right side
and left side of the node, respectively. Thus, every node, except the header node and the last node, points to
its immediate predecessor and immediate successor.
Operations on a Double Linked List
In this section, only the insertion and deletion operations are discussed.
Inserting a node into a double linked list
Figure shows a schematic representation of various cases of inserting a node into a double linked list. Let us
consider the algorithms of various cases of insertion.
Inserting a node in the front
The algorithm Insertliront Dl: is used to define the insertion operation in a double linked list.
TREES
TREE:
• This is a finite set of one or more nodes such that
1) There is a specially designated node called root.
2) Remaining nodes are partitioned into disjoint sets T1, T2. . . . . Tn where each of these are called sub
trees of root (Figure 5.2).
Consider the tree shown below
Tree Terminology
Root
✓ The first node from where the tree originates is called as a root node.
✓ In any tree, there must be only one root node.
✓ We can never have multiple root nodes in a tree data structure.
Example
Parent
✓ The node which has a branch from it to any other node is called as a parent node.
✓ In other words, the node which has one or more children is called as a parent node.
✓ In a tree, a parent node can have any number of child nodes.
Example
Here,
• Node A is the parent of nodes B and C
• Node B is the parent of nodes D, E and F
• Node C is the parent of nodes G and H
• Node E is the parent of nodes I and J
• Node G is the parent of node K
Child
✓ The node which is a descendant of some node is called as a child node.
✓ All the nodes except root node are child nodes.
Example
Siblings
✓ Nodes which belong to the same parent are called as siblings.
✓ In other words, nodes with the same parent are sibling nodes.
Example
Here,
• Nodes B and C are siblings
• Nodes D, E and F are siblings
• Nodes G and H are siblings
• Nodes I and J are siblings
Degree
✓ Degree of a node is the total number of children of that node.
✓ Degree of a tree is the highest degree of a node among all the nodes in the tree.
Example
Here,
• Degree of node A = 2
• Degree of node B = 3
• Degree of node C = 2
• Degree of node D = 0
Internal Node
✓ The node which has at least one child is called as an internal node.
✓ Internal nodes are also called as non-terminal nodes.
✓ Every non-leaf node is an internal node.
Example
Here
• Height of node A = 3
• Height of node B = 2
• Height of node C = 2
• Height of node D = 0
• Height of node E = 1
• Height of node F = 0
• Height of node G = 1
• Height of node H = 0
• Height of node I = 0
• Height of node J = 0
• Height of node K = 0
Depth
✓ Total number of edges from root node to a particular node is called as depth of that node.
✓ Depth of a tree is the total number of edges from root node to a leaf node in the longest path.
✓ Depth of the root node = 0
✓ The terms “level” and “depth” are used interchangeably.
Example
Here
• Depth of node A = 0
• Depth of node B = 1
• Depth of node C = 1
N VENKATA VINOD KUMAR 5
• Depth of node D = 2
• Depth of node E = 2
• Depth of node F = 2
• Depth of node G = 2
• Depth of node H = 2
• Depth of node I = 3
• Depth of node J = 3
• Depth of node K = 3
Sub-tree
✓ In a tree, each child from a node forms a sub-tree recursively.
✓ Every child node forms a sub-tree on its parent node.
Example
Forest
✓ A forest is a set of disjoint trees.
Advantages of Tree
✓ Tree reflects structural relationships in the data.
✓ It is used to represent hierarchies.
✓ It provides an efficient insertion and searching operations.
Trees are flexible. It allows to move subtrees around with minimum effort.
TERMINOLOGIES USED IN A TREE
• Node contains
✓ item of information &
✓ links to other nodes
• Number of subtrees of a node is called its degree.
For e.g., degree of A=3; degree of C=1
• Nodes with degree=0 are called terminal (leaf) nodes (For e.g., K, L, F, G, M, I, J) whereas other nodes
REPRESENTATION OF TREES
A tree can be represented in three forms, namely:
1) List representation
2) Left-child right-sibling representation
3) Degree-two tree representation (Binary Tree)
LIST REPRESENTATION
Consider the tree shown below
Advantage: For complete binary tree, array representation is ideal, as no space is wasted.
Disadvantage: For skewed tree, less than half the array is utilized. In the worst case, a skewed tree of depth k
will require 2k-1 spaces. Of these, only k will be used.
LINKED REPRESENTATION
• Shortcoming of array representation: Insertion and deletion of nodes from middle of a tree requires
movement of potentially many nodes to reflect the change in level number of these nodes. These
problems can be overcome easily through the use of a linked representation (Figure 5.14).
• Each node has three fields:
1) leftChild,
2) data and
3) rightChild (Figure 5.13).
typedef struct node *treePointer;
typedef struct
{
int data;
treePointer leftChild,rightChild;
}
node;
• Root of tree is stored in the data member 'root' of Tree. This data member serves as access-pointer to the
tree.
LEVEL-ORDER TRAVERSAL
• This traversal uses a queue (Program 5.7).
Binary Tree
✓ Binary tree is a special tree data structure in which each node can have at most 2 children.
✓ Thus, in a binary tree, each node has either 0 child or 1 child or 2 children.
Example
Maximum number of nodes in a binary tree of height 3
= 23+1 – 1
= 16 – 1
= 15 node
Thus, in a binary tree of height = 3, maximum number of nodes that can be inserted = 15.
3. Total Number of leaf nodes in a Binary Tree = Total Number of nodes with 2 children + 1
Example
Here
Clearly, number of leaf nodes is one greater than number of nodes with 2 children.
This verifies the above relation.
Example
Maximum number of nodes at level-2 in a binary tree
= 22
=4
Thus, in a binary tree, maximum number of nodes that can be present at level-2 = 4.
Here
• First binary tree is not a full binary tree.
• This is because node C has only 1 child.
A complete binary tree is a binary tree that satisfies the following 2 properties
✓ Every internal node has exactly 2 children.
✓ All the leaf nodes are at the same level.
Complete binary tree is also called as Perfect binary tree.
Here
• First binary tree is not a complete binary tree.
• This is because all the leaf nodes are not at the same level.
An almost complete binary tree is a binary tree that satisfies the following 2 properties
✓ All the levels are completely filled except possibly the last level.
✓ The last level must be strictly filled from left to right.
Example:
To represent a binary tree of depth 'n' using array representation, we need one dimensional array with a
maximum size of 2n + 1.
The above example of the binary tree represented using Linked list representation is
Linked Representation
struct node
✓ Pre-order Traversal
✓ In-order Traversal
✓ Post-order Traversal
Pre-order Traversal
In this traversal method, the root node is visited first, then the left sub tree and finally the right sub tree.
We start from A, and following pre-order traversal, we first visit A itself and then move to its left sub
tree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The output of pre-order
traversal of this tree will be
A→B→D→E→C→F→G
Steps
✓ Visit the root node
✓ traverse the left sub-tree in pre-order
✓ traverse the right sub-tree in pre-order
Root → Left → Right
Algorithm
Step 1: Repeat Steps 2 to 4 while TREE! = NULL
Traverse the entire tree starting from the root node keeping yourself to the left.
Example
Traverse the following binary tree by using pre-order traversal
✓ Since, the traversal scheme, we are using is pre-order traversal, therefore, the first element to be printed
is 18.
✓ Traverse the left sub-tree recursively. The root node of the left sub-tree is 211, print it and move to left.
✓ Left is empty therefore print the right children and move to the right sub-tree of the root.
✓ 20 are the root of sub-tree therefore, print it and move to its left. Since left sub-tree is empty therefore
move to the right and print the only element present there i.e. 190.
✓ Therefore, the printing sequence will be 18, 211, 90, 20, and 190.
Applications
✓ Preorder traversal is used to get prefix expression of an expression tree.
✓ Preorder traversal is used to create a copy of the tree.
In order Traversal
In this traversal method, the left sub tree is visited first, then the root and later the right sub-tree. We
should always remember that every node may represent a sub tree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order.
✓ Print the left most node of the left sub-tree i.e. 23.
✓ Print the root of the left sub-tree i.e. 211.
✓ Print the right child i.e. 89.
✓ Print the root node of the tree i.e. 18.
We start from A, and following Post-order traversal, we first visit the left subtree B. B is also traversed
post-order. The process goes on until all the nodes are visited. The output of post-order traversal of this tree will
be
D→E→B→F→G→C→A
Steps
✓ Traverse the left sub-tree in post-order
✓ Traverse the right sub-tree in post-order
✓ visit the root
Left → Right → Root
Algorithm
Step 1: Repeat Steps 2 to 4 while TREE! = NULL
Step 2: POSTORDER (TREE -> LEFT)
Step 3: POSTORDER (TREE -> RIGHT)
Step 4: Write TREE -> DATA
[END OF LOOP]
Step 5: END
Example
Traverse the following tree by using post-order traversal
Application
Level order traversal is used to print the data in the same order as stored in the array representation of a
complete binary tree.
Testing Equality
Another problem that is especially easy to salve using recursion is determining the equivalence of two
binary trees. Binary trees are equivalent if they have the same topology and the information in corresponding
nodes is identical. By the same topology we mean that every branch in one tree corresponds to a branch in the
second in the same order and vice versa.
The function operator=O calls the helper function equal (Program 5.10) which traverses the binary trees
in preorder, though any order could be used.
This set defines the formulas of the propositional calculations (other operations such as implication can be
expressed using ˅,˄ and ¬).
x1 ˅(x2˄¬x3)
is a formula (read as “x1 or x2 and not x3”). If x1 and x3 are false and x2 is true, then the value of expression is:
false ˅ (true ˄ ¬false)
=false ˅ true
= true
The satisfiability problem for formulas of propositional calculus asks if there is an assignment of values to the
variables that causes the value of the expression to be true.
Again, let us assume that our formula is already in a binary tree, say
(x1˄¬x2)˅(¬x1˄x3)˅¬x3
In the tree of below fig., the inorder traversal of this tree is:
x1˄¬x2˅¬x1˄x3˅¬x3
which is the infix form of the expression. The most obvious algorithm to determine satisfiability is to let (x1, x 2,
x3) take on all possible combinations of true and false .values and to check the formula for each combination.
For n variables there are 2n possible combinations of true = t and false =f. For example, for n = 3, the eight
combinations are: (t,t,t), (t,t,f), (t,f,t) (t,f,f),(f,t,t), (f,t,f), (f,f,t),(f,f,f). The algorithm will take O(g 2″). or
exponential time, where 8 is the time to substitute. values for x1, x2, ……xn and evaluate the expression.’
This is the node structure. The left child and right child fields are similar to those used previously. The fields
data holds either the value of a variable or a propositional calculus operator, while value holds either a value of
TRUE and FALSE.
Also we assume that for leaf nodes, node →data contains the current value (i.e., Logical True or Logical False)
of the variable represented at this node. The first version of our algorithm for the satisfiability problem is
Program 5.11. In this, n is the number of variables in the formula and formula is the binary tree that represents
the formula.
✓ Binary Search tree can be defined as a class of binary trees, in which the nodes are arranged in a specific
order. This is also called ordered binary tree.
✓ In a binary search tree, the value of all the nodes in the left sub-tree is less than the value of the root.
✓ Similarly, value of all the nodes in the right sub-tree is greater than or equal to the value of the root.
✓ This rule will be recursively applied to all the left and right sub-trees of the root.
Create the binary search tree using the following data elements.
43, 10, 79, 90, 12, 54, 11, 9, 50
✓ Insert 43 into the tree as the root of the tree.
✓ Read the next element, if it is lesser than the root node element insert it as the root of the left sub-tree.
✓ Otherwise, insert it as the root of the right of the right sub-tree.
Algorithm:
Search (ROOT, ITEM)
Example
Insertion
Insert function is used to add a new element in a binary search tree at appropriate location. Insert
function is to be designed in such a way that, it must node violate the property of binary search tree at each
value.
1) Deletion of a leaf: To delete 35 from the tree of figure 5.29b, the left-child field of its parent is set to NULL.
In the following Example, the node 50 is to be deleted which is the root node of the tree. The in-order traversal
of the tree given below
6, 25, 30, 50, 52, 60, 70, 75
Replace 50 with its in-order successor 52. Now, 50 will be moved to the leaf of the tree, which will simply be
deleted.
LEFT>VECTOR (ROOT)>RIGHT
AVL Tree
AVL Tree is invented by GM Adelson - Velsky and EM Landis in 1962. The tree is named AVL in
honor of its inventors.
AVL Tree can be defined as height balanced binary search tree in which each node is associated with a
balance factor which is calculated by subtracting the height of its right sub-tree from that of its left sub-tree.
Tree is said to be balanced if balance factor of each node is in between -1, 0, 1, otherwise, the tree will
be unbalanced and need to be balanced
Definition:
An empty tree is height balanced. If T is a nonempty binary tree with T L and TR as its left and right subtrees
respectively, then T is height-balanced iff (1) TL and TR are height balanced and (2) │hL-hR│≤1 where hL and
hR are the heights of TL and TR, respectively.
Balance Factor (k) = height (left (k)) - height (right (k))
Definition:
The Balance factor, BF(T), of a node T in a binary tree is defined to be hL-hR, where hL and hR, respectively, are
the heights of the left and right subtrees of T. For any node T in an AVL tree, BF=-1,0, or 1.
✓ If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right sub-
tree.
✓ If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal height.
✓ If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right sub-
tree.
✓ An AVL tree is given in the following figure. We can see that, balance factor associated with each node
is in between -1, 0, and +1.
SN Rotation Description
1 LL Rotation The new node is inserted to the left sub-tree of left sub-tree of critical node.
2 RR Rotation The new node is inserted to the right sub-tree of the right sub-tree of the critical node.
3 LR Rotation The new node is inserted to the right sub-tree of the left sub-tree of the critical node.
4 RL Rotation The new node is inserted to the left sub-tree of the right sub-tree of the critical node.
✓ Searching
✓ Insertion
✓ Deletion
Search Operation in AVL Tree
In an AVL tree, the search operation is performed with O (log n) time complexity. The search operation
in the AVL tree is similar to the search operation in a Binary search tree. We use the following steps to search
an element in AVL tree...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the value of root node in the tree.
Step 3 - If both are matched, then display "Given node is found!!!" and terminate the function
Step 4 - If both are not matched, then check whether search element is smaller or larger than that node
value.
Step 5 - If search element is smaller, then continue the search process in left subtree.
Step 6 - If search element is larger, then continue the search process in right subtree.
Step 7 - Repeat the same until we find the exact element or until the search element is compared with the
leaf node.
Step 8 - If we reach to the node having the value equal to the search value, then display "Element is
found" and terminate the function.
Step 9 - If we reach to the leaf node and if it is also not matched with the search element, then display
"Element is not found" and terminate the function.
Step 1 - Insert the new element into the tree using Binary Search Tree insertion logic.
Step 2 - After insertion, check the Balance Factor of every node.
Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.
Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then that tree is said to be
imbalanced. In this case, perform suitable Rotation to make it balanced and go for next operation.
B-Tree defined as
B-Tree is a self-balanced search tree in which every node contains multiple keys and has more than two
children.
Here, the number of keys in a node and number of children for a node depends on the order of B-Tree.
Every B-Tree has an order
Operations on a B-Tree
✓ Search
✓ Insertion
✓ Deletion
Search Operation in B-Tree
The search operation in B-Tree is similar to the search operation in Binary Search Tree. In a Binary
search tree, the search process starts from the root node and we make a 2-way decision every time (we go to
either left sub tree or right sub tree). In B-Tree also search process starts from the root node but here we make
an n-way decision every time. Where 'n' is the total number of children the node has. In a B-Tree, the search
operation is performed with O(log n) time complexity.
In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue is always
attached to the leaf node only.
Step 1 - Check whether tree is Empty.
Definition:
A B+ - tree of order m is a tree that either is empty or satisfies the following properties:
1) All the data nodes are at the same level and are leaves. Data nodes contain elements only.
2) The index nodes define a B-tree of order m; each index node has keys but not elements.
3) Let
n, A0,(K1,A1),(K2,A2), …. (Kn, An)
Where Ai, 0≤i≤n≤m, are pointers to subtrees, and the Ki, 1≤i≤n≤m, are keys to be the format of some
index node. Let K0=-∞ and K n+1=∞. All the elements in the subtree Ai have key less than Ki+1 and
greater than or equal to Ki, 0≤i≤n.
Searching a B+ - tree:
B+ - tree support two types of searches –
➢ Exact match and
➢ Range
For example in above figure we can search for an element whose key is 32, we begin at the root A, which is an
index node. From the definition of B+ - tree we know that all elements in the left subtree of A have a key
smaller than 20; those in the subtree with root C have keys ≥20 and ≤40; and those in the subtree with root D
have keys ≥40. So, the search moves to the index node C. this is a procedure of exact search.
To search for all elements with keys in the range [16, 70], we proceed as in an exact match search for the start,
16, of the range. In our example 4 additional data nodes are examined.
Graphs
GRAPH
A graph G consists of 2 sets, V and E.
V is a finite, on empty set of vertices.
E is a set of pairs of vertices, these pairs are called edges.
V(G) and E(G) represents the set of vertices and edges respectively of graph G (Fig 6.2).
• In an undirected graph, the pair of vertices representing any edge is unordered. Thus, the pairs (u,v) and (v,u)
represent the same edge.
• In a directed graph, each edge is represented by a directed pair <u,v>; u is the tail and v is the head of the
edge. Therefore, <u,v> and <v,u> represent two different edges.
Example: We wish to carry out a DFS of Graph G of Fig 6.16. If we initiate this search from vertex v 0, then the
vertices of G are visited in the following order: v0,v1,v3,v7,v4,v5,v2,v6.
1) Now support we add a nontree edge, (v,w), into any spanning tree, T. The result is a cycle that consists
of the edge (v,w) and all the edges on the path from w to v in T.
2) A spanning tree is a minimal subgraph, G’, of G such that V(G’)=V(G) and G’ is connected. We define
a minimal subgraph as one with the fewest number of edges. Spanning tree has n-1 edges.
Biconnected Components:
Articulation point is vertex v of G, such that the deletion of v, together with all edges incident on v, produces on
graph, G’, that has at least two connected components. For example, the connected graph of Fig 6.19 has four
articulation points, 1, 3, 5 and 7.
A biconnected graph is a connected graph that has no articulation points. For example, the graph of Fig 6.16 is
biconnected, while the graph of Fig 6.19 obviously is not. In Fig 6.19(a), vertices represent communication
stations and edges represent communication links. Now suppose one of the stations that is an articulation point
fails. The result is a loss of communication not just to and from that single station, but also between certain
other pairs of stations.
Kruskal’s Algorithm:
Kruskal’s algorithm builds a minimum cost spanning tree T by adding edges to T one at a time. The algorithm
selects the edges for inclusion in T in nondecreasing order of their cost. An edge is added to T if it does not
form a cycle with the edges that are already in T. since G is connected and has n>0 vertices, exactly n-1 edges
will be selected for inclusion in T.
Fig 6.26: Graph and Shortest paths from vertex 0 to all destinations
Let S denote set of vertices, including v 0, whose shortest paths have been found. For w not in S, let distance[w]
be the length of the shortest path starting from v 0, going through vertices only in S, and ending in w. we can
observe following points while generating paths:
(1) If the next shortest path is to vertex u, then the path from v0 to u goes through only those vertices that are in
S. There cannot be any intermediate vertex that is not in S.
To implement Dijkstra’s algorithm, we assume that the n vertices are numbered form 0 to n-1.
Example: Consider the eight-vertex digraph of Figure 6.27(a) with eight adjacency matrix as in 6.27(b).
Suppose that the source vertex is Boston. The values of dist and the vertex u selected in each iteration of the
outer for loop of program 6.9 are shown in figure 6.28.
The overall complexity is O(n3) when adjacency matrices are used and O(ne) when adjacency lists are used.
Example:
For the digraph 6.33(a), the initial a matrix, A , plus its value after each of three iterations, A , A1, A2 is also
-1 0
Transitive closure of a graph: Given a directed graph findout if a vertex j is reachable from another vertex i
for all vertex pairs (i,j) in the graph.
Reflective Transitive closure of a graph: Let R⸦ A2 be a directed graph defined on a set A. The reflexive
transitive closure of R is the relation, R*={(a,b):a,b€A and there is a path from a to b in R}, and its reflective
transitive closure R*={(a1,a1),(a1,a2), (a1,a3),(a1,a4),(a2,a2), (a2,a3), (a2,a4), (a3,a3), (a3,a4), (a4,a4)}.
Hash Tables:
5.6.1 Basic concept:
In a hashed search, the key, through an algorithmic function, determines the location of the data.
We use a hashing algorithm to transform the key into the index that contains the data we need to locate. Another
way to describe hashing is as a key-to-address transformation in which the keys map to addresses in a list.
Hashing is a key-to address mapping process.
The memory that contains all of the home addresses is .known as the prime area.
The address produced by the hashing algorithm is known as the home address.
A collision occurs when a hashing algorithm produces an address for an insertion key and that address is
already occupied.
We can select the first three digits and then use the mid square method as shown below.
To use the pseudorandom-number generator as a hashing method, we set x to the key, multiply it by the
coefficient a, and then add the constant c. The result is then divided by the list size, with the remainder being the
hashed address. For maximum efficiency, the factors a and c should be prime numbers.
To keep the calculation reasonable, we use 17 and 7 for factors a and c, respectively. Also, the list size
in the example is the prime number 307.
Concepts
a) The load factor of a hashed list is· the number of elements in the list divided by the number of physical
elements allocated for the list, expressed as a percentage. Traditionally, load factor is assigned the symbol alpha
(α). The formula in which ). The formula in which k represents the number of filled elements in the list and n represents the total
number of elements allocated to the list is
Example:
DYNAMIC HASHING:
Motivation for Dynamic Hashing:
Dynamic hashing is also known as extendible hashing, aims to reduce the rebuild time by ensuring that each
rebuild changes the home bucket for the entries in only 1 bucket. The objective of the dynamic hashing is to
provide acceptable hash table performance on a per operation basis.
We consider two forms of dynamic hashing – one uses a directory and the other does not. For both forms, we
use hash function h that maps keys into non-negative integers. The range of h is assumed to be sufficiently large
and we use h(k,p) to denote the integer formed by the p least significant bits of h(k).
For example here, we use a hash function h(k) that transforms keys into 6-bit non-negative integers. Our
example keys will be characters each and h transforms letters such as A,B and C into bit sequences 100,101,
and 110 respectively. Digits 0 through 7 are transformed into their 3-bit representation. For our example hash
function, h(A0,1)=0, h(A1,3)=1, h(B1,4)=1001=9, and h(C1,6)=110001=49.
There is a distinction between how we store or organize information and how we process or access it.
Organization Access
Sequential Sequential
Indexed sequential Sequential and direct
Direct Direct
Sequential access refers to accessing multiple records, often an entire file, whereas direct access refers to locating a single record.
We have records which contain fields.
• The primary key is a field, or a composite of several fields, which uniquely distinguishes a record from all others;
• All the remaining fields are the secondary keys or attributes.
• A file consists of records of the same format.
The sizes of the fields within a record may vary
• For fixed-length records, all the records within a file have the same length.
• With variable-length records, records within a file do not need to be the same length.
Organization
• Sequential organization is well suited for sequential access in which many or all the records will be processed.
• Can be processed easily with a loop.A sequential search processes the records of a file in their order of occurrence until it either
locates the desired record or processes all the records.
Binary Search
• In a file of n records, n/2 probes are needed to locate the record of interest, on the average.
• Consider finding a phone number for Phyllis Bishop. Do you begin your search at the middle of the book? Most
likely not.
• You try to approximate its relative position.
• You would probably begin searching Bishop near the front of the book.
• An interpolation search chooses the next position for a comparison based on the estimated position of the sought key
relative to the remainder of the file to be searched.
• Instead of computing a MIDDLE position, it calculates the next position as follows:
key[sought] – key[LOWER]
NEXT =
LOWER + (UPPER – LOWER)
key[UPPER] – key[LOWER]
denotes “ceiling”.
• Although the worst case computational complexity is O(n), the average complexity is O(log log n).
• A binary search is preferable when the data is stored in primary memory because the additional
calculations needed for the interpolation search cancel any savings.
• When the data is stored in auxiliary memory and the key distribution is uniform, an interpolation search is
preferable.
Self Organizing Sequential Search
• In the previous searches, we assumed that all records in the file would be accessed equally often. If this
assumption is not true, we would desire the most frequently retrieved records to appear at the beginning of the
file.
• Three popular algorithms:
(1) Move_to_front,(2) Transpose, (3) Count.
1)Move_to_front
When the sought record is located, it is moved to the front position of the file and all the intervening records are
moved back one position.This is an extensive amount of re positioning. So, a linked implementation is preferable
even though it takes more storage.It is a mistake if a record is accessed, moved to the front and rarely if ever
accessed again.
Locality means that a record that has recently been accessed is more likely to be accessed again in the near future.All the
self-organizing methods assume some degree of locality of accesses.Move_to_front is appropriate when space is not limited
and locality of access is important.
An example: The file consists of 26 records containing the letters of the alphabet. The records are accessed in the order
of “file organization”.
2)Transpose
This algorithm interchanges the sought record with its immediate predecessor unless the sought record is in the first
position.Converges to a near steady state more slowly than the Move_to_front algorithm.More stable. It does not make big
mistakes.Does not need the additional space required for a linked implementation.It should be used when space is at a premium.
3)Count
Keeps a count of the number of accesses of each record.The record is moved in the file to a position in front of all the records with fewer
accesses.The file is then always ordered in a decreasing order of frequency of access.Disadvantages: Requires extra storage to keep the count and
does not handle the locality of access phenomenon well.Because of its storage requirements, use it only when the counts are needed for another
purpose.
Locating Information
Ways to organize a file for direct access:
The key is a unique address.
The key converts to a unique address.
The key converts to a probable address (hashing).
The Key is a unique address
We want to go directly to the address where the record is stored. This is possible if the key were also an address.
If the employee’s nine-digit social security number were the record’s primary key, we may have one probe per
9
retrieval using a table of size 10 .
The disadvantage is that a great deal of space must be reserved. One location for each possible social security number.
But after employees began leaving the company, gaps would begin to appear.
The Key converts to a unique address
An algorithm to convert the primary key field (possibly a composite) into a unique storage address.
An example of convertion:
Location A[i] = α + m * s where A = an array
i = typical element
α = location of first element in array m = number of elements preceding ith element.
s = size of an element
Folding
Fold the number on the dashed boundaries into three parts. The digits would then be superimposed as
There are several mechanisms for resolving collisions grouped according to the mechanism used to locate the next address to search
and the attribute of whether a record once stored can be relocated;
Collision resolution with links
Collision resolution without links
Static positioning of records.
Dynamic positioning of records
Collision resolution with pseudo links
• Uses pointers to connect the elements of a synonym chain.
• The insertion of a synonym is the same as the insertion into a linked list.
• On retrieval, the elements of the synonym chain are searched until the desired record is located or until the end
indicated by NULL is reached.
• “Coalescing” occurs when we attempt to insert a record with a home address that is already occupied by a record from
a chain with a different home address.
• The two chains with records having different home addresses coalesce or grow together.
• Coalesced hashing is also referred to as direct chaining.
• A simple deletion procedure is to move a record later in the chain into the position of the deleted record. The
relocated record should be the last element on the chain having the home address of the location of the
deleted record.
• This relocation process is repeated at the vacated location until no other records need to be moved.
• Then, prior-to-moving predecessor of the most recently moved record is re linked to the moved record’s prior-
to-moving successor.
• A more complex scheme could be used to reposition records in the coalesced chains to reduce the amount of
coalescing.
• Determining if coalescing has occurred in a probe chain may time-consuming.
• If we know that a coalescing has not occurred, we just remove the deleted record from a singly linked list.
• Deleted record is reinitialized to null, the available space pointer, R, is reset to the maximum of (1) its current address and
(2) the vacated address plus one.
To delete the record with key 39 requires that
• Because coalescing has occurred, we move the last element in the chain with a
home address of 9 (record 42) into location 9 and adjust the links and table
entries.
• In coalesced hashing, storage is needed for link fields. When this storage is not available, we need a convention for
where to search next.
• Progressive overflow (linear probing) is one convention. If a location is occupied, we look at the next location to
see if it is empty. Table is circular. We continue until we find an empty slot or we encounter the home address of
the record a second time (table is full).
• For retrieval, we follow the same process.
• Performance is poor for an unsuccessful search.
Insert 27, 18, 29, 28, 39, 13, 16
2006
Use of Buckets
Bucket (block or page) is a storage unit of multiple records at one file address.
Also the unit in which information is accessed and transferred between storage devices.
The number of records that may be stored in a bucket is called the blocking factor. As the blocking factor increases, the number of
auxiliary storage accesses decreases.
Within a bucket, we need a means of separating the individual records. We can achieve this by knowing the record length for fixed-
length records, or by placing a special delimiter.
Buckets are used to improve any collision res. methods.
Use of Buckets - Example
Blocking factor 2. Hash(key) = key mod 11 The keys are 27, 18, 29, 28, 39, 13, 16
Here we use a variable increment instead of a constant increment of one to reduce secondary clustering.
Here, the increment is a function of the key being inserted which may be viewed as another hashing function.
So, referred to as double hashing, H to get the home address, H to get the increment.
1 2
• Possibilities for H :
2
– H2 = Quotient(Key / P) mod P
• H2 requires two divide operations. H2’ requires only a single divide operation. H ’ is more difficult for people
2
to compute. So we use H in the example.
2
• The home address for a record will then be determined by the remainder of the key divided by the table size and the
increment for collision resolution by the quotient of the same operation.
If A, B, and C are synonyms at r as illustrated, B and C will
usually have different increments yielding different probe
Method requires a prime number table size, for otherwise searching could cycle
through a subset of the table several times.
The primary probe chain for 39 is shown on the left. p1 is the home address.
Three positions had to be visited.
• What if its home address were empty? 39 could have been inserted at its
home address which requires only 1 probe for retrieval. We could make home
address available by moving what is stored there
• Move 28 to a location such that it could
still be retrieved (still want to use the linear
quotient method.
• 28 will require one more probe for its retrieval but 39 will require two fewer probes for a net reduction of one probe
achieved by moving 28.
Note that we use the increment associated with the item being moved and
not with the record being inserted.
27 is moved to location 0.
Some probes overall are saved by making the moves.
Binary Tree
Carries the concept in Brent’s method one step further and move items from secondary and subsequent probe chains.
Needs fewer retrieval probes than Brent’s method.
Two choices at each probable storage address—continue to the next address along the probe chain of the item being inserted or move the
item stored at that address to the next position on its probe chain.
• 29 causes a collision
• Location 9 is empty.
• The bold numbers represent locations in the table and the values in parantheses
are the keys stored at those locations. The underlined node is the root node.
• That location is occupied so we continue along the primary probe chain until we
reach location 9. The table is shown on the right.
The path formed by the leftmost branch at each level of the tree is equivalent to the primary
• We begin the search at the home address which contains a record and continue the search until we find a record that
points to the location of the removed item which is indicated with the temporary pointer.
• We now know the location of the pseudolink field that needs to be modified in the reinsertion process.
• In the example, we then need to use that increment associated with 27 until we can find another empty space to insert
the record with key 16.
The concept of having multiple probe chains instead of a single probe chain for organizing records.
Each chain will be smaller and the searching will be faster.
• In addition to the hashing function, Hash(key), to obtain the home address for a record and the incrementing function,
i(key), to obtain an increment, we introduce a third hashing function, g(key), to tell us which probe chain to insert into
or to search. This is an example of triple hashing.
g(key) 0, 1, ..., R – 1
where R is the number of subgroupings. A simple function for g(key) is
g(key) = key mod R
• If a key is even, it will go on the zeroth chain, if it is odd, it will go on the first chain. The records are inserted as
before with the only difference being the placement of the pseudolink values: in chain zero for even key values and in
chain one for odd key values.
Perfect Hashing
hash(key) unique address
Both primary and secondary clustering are eliminated.
With perfect hashing, we need only a single probe to retrieve a record.
Perfect hashing is applicable to files with only a relatively small number of records because the computing time is
big.
A perfect hashing function maps a key into a unique address. If the range of potential addresses is the same as the
number of keys, the function is a minimal (in space) perfect hashing function.
A separate hashing needs to be devised for each set of keys. If one or more of the keys change, a new hashing function
must be constructed.
• In minicycle algorithm, for a table of size N, a perfect hashing function may be characterized as:
h = length(key)
0
h = first_character(key)
1
h = last_character(key)
2
g= T(x)
• where T is a table of values associated with individual characters x which may appear in a key. The time consuming
part is determining T.
• In the table, a value may be assigned to more than one character.
Let’s apply the alg. to the keyword begin. p.hash(begin) = 5 + T(h1(key)) + T(h2(key))
= 5 + T(b) + T(n)
= 5 + 15 + 13
= 33
The algorithm involves ordering the keys based on the frequencies of occurrence of the first and last characters in the keys.
Assignment of values are made to the charac- ters in the first and last positions of the keys from the top of the ordering to the
bottom.
The algorithm uses an exhaustive search with backtracking.
Keys: cat, ant, dog, gnat, chimp, rat, toad
• We assume that the maximum value that may be assigned to a character is 4. If we cannot find a solution using 4, we
would try a larger value. If this maximum value is too small, we will either have no solution or a great amount of
backtracking. If the value is too large we won’t obtain a minimal solution.
• The frequencies of occurrence of the first and last characters are
a=1 c=2 d=2 g=2 p=1 r=1 t=5
• Then we check the ordering to see if any keys exist that have both
their first and last characters appearing in previous keys.
p.hash(toad) = 4
• Problem still existed and
p.hash(gnat) = 8 we backtracked.
p.hash(dog) =7
p.hash(cat) =3
p.hash(rat) =5
p.hash(ant) =6
t=0 d=0 g=4 c=0 r=2 a=3 p=4
p.hash(toad) =4
p.hash(gnat) =8
p.hash(dog) =7
p.hash(cat) =3
p.hash(rat) =5 Since this solution maps the seven keys into
seven consecutive locations, it is a minimal
p.hash(ant) =6 perfect hashing function.
p.hash(chimp) = 9
Indexed Sequential File Organization
Indexed sequential file organization is a hybrid of sequential and direct file organization.
Suitable to access data both directly and sequentially. sequential search is slow.
• To accelerate the search, we order the information and put tabs or an index to groups.
• We search the tabs until we find the related group, then search exhaustively only those records in that group
• We apply this grouping process a second time and organize the current groups into larger ones, e.g. file drawers.
• We search a much smaller number of higher-level tabs, then search the tabs of the information in that group, then
search to locate the record.
• The tab or index structure is, a tree structure
• For insertions we associate an overflow area with each original or primary storage area for a group of records; so at most
we only have to move the number of records in such a group to open place for the record to insert.
• In computers with block addressable disks, we use tracks as the lowest level of grouping information.
• The next higher level is a grouping by cylinders.Then we have additional levels of master indexes.
• An indexed sequentially structured file is referred to as an ISAM (for indexed sequential access method) file.
The cylinder index contains pointers to several cylinders. A pair of entries is used for each cylinder
key is the highest key on that cylinder and ptr is a pointer to the track index for that cylinder.
Each track has two pairs of entries in the track index. One pair contain information on the primary storage area and the other pair
has information on the overflow records associated with the track.
The key in the first pair provides the highest key on that track in the primary area, and the key in the second pair provides the
highest key in the overflow area.
The primary pointer indicates the track containing the primary records, and the overflow pointer indicates the first overflow
record.
The highest key on cylinder 1 is 350.
Pointer entry notation is x-y, where x gives the cylinder number and y gives the track number where the track index for that cylinder
is stored.
The track index would not normally require an entire track of storage; so the remaining area could be used for additional primary or
overflow storage.
For simplicity, we assume that the records in primary storage have a blocking factor (bucket size) of one and that all primary storage is
filled.
• The NULL in the overflow pointer indicates that no overflow records currently exist.
• We can have overflow records associated with each track. We never have to move more than the number of records on a
single track.
• An available space list would link the overflow space together as records are added and deleted from the file.
• The overflow pointers follow the format z-w, where z gives the track number and w gives the record number.
Here we insert 8
2006
Data Access
• To sequentially process the data in an indexed sequential file entails stepping through the indexes to locate track after track of
primary and overflow records.
• There may be intervening tracks containing overflow records or other info from a different file.
• We also need to use the index when we move from one cylinder to another since the cylinders may not be contiguous.
• Sequential processing of an indexed sequential file is faster than for a directly organized file but slower than for a sequentially
organized file.
• For direct access, we search the cylinder index, then search the primary and overflow entries of the track index. We follow the
link field of the overflow entry.
• We need to search either the primary or overflow area but not both, since each track has two pairs of entries
associated with it.
In time-sharing environments, the advantages of a cylinder overflow may be lost. To lessen such an effect, some operating
systems schedule input/output to minimize seek time.
The advantage of the independent overflow area is that primary cylinders may share a single overflow region so that the
amount of overflow space that needs to be reserved per cylinder can be lessened.
• We don’t need to reserve more overflow tracks per cylinder, because we can share storage with other
cylinders.
• Understanding its basic structures allows us to use it more effectively.
• t helps if we sorted the records into descending order before insertion. Consider the following keys of records
(B is sorted)
A B A B
39 72 22 36
47 68 68 22
72 47 36 17
17 39
• When the primary area is full, the indexed sequential insertion procedure performs an insertion sort on the items
in an overflow chain of a particular group.
• That means stepping through the records already in the chain to locate the proper position for the insertion.
• Each element encountered on the chain requires an access of auxiliary storage, plus we need to do this processing
for each insertion.