Data Structure Module 1
Data Structure Module 1
Data Structure Module 1
System Life Cycle, Algorithms, Performance Analysis, Space Complexity, Time Complexity,
Asymptotic Notation, Complexity Calculation of Simple Algorithms
Polynomial representation using Arrays, Sparse matrix, Stacks, Queues-Circular Queues, Priority
Queues, Double Ended Queues, Evaluation of Expressions Linear Search and Binary Search
Self Referential Structures, Dynamic Memory Allocation, Singly Linked List-Operations on Linked
List. Doubly Linked List, Circular Linked List, Stacks and Queues using Linked List, Polynomial
representation using Linked List Memory allocation and de-allocation- First-fit, Best-fit and
Worst-fit allocation schemes
Trees, Binary Trees-Tree Operations, Binary Tree Representation, Tree Traversals, Binary Search
Trees- Binary Search Tree Operations Graphs, Representation of Graphs, Depth First Search and
Breadth First Search on Graphs, Applications of Graphs
Sorting Techniques – Selection Sort, Insertion Sort, Quick Sort, Merge Sort and Heap Sort
Hashing- Hashing Techniques, Collision Resolution, Overflow handling, Hashing functions – Mid
square, Division, Folding, Digit Analysis
Text Book 1. Ellis Horowitz, Sartaj Sahni and Susan Anderson-Freed, Universities Press,
Fundamentals of Data Structures in C
Module 1 : Basic Concepts of Data Structures
Algorithms
There can be any number of ways; a specific set of instructions can be defined to perform the
same task.
● input: there are zero or more quantities which are externally supplied;
● finiteness: if we trace out the instructions of an algorithm, then for all cases the algorithm
will terminate after a finite number of steps;
Performance Analysis
Analogy: If we want to go from city "A" to city "B", there can be many ways of doing this. We can
go by flight, by bus, by train and also by bicycle. Depending on the availability and convenience,
we choose the one which suits us. Similarly, in computer science, there are multiple algorithms to
solve a problem. When we have more than one algorithm to solve a problem, we need to select
the best one. Performance analysis helps us to select the best algorithm from multiple algorithms
to solve a problem.
Generally, the performance of an algorithm depends on the following elements...
1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,
When we want to analyse an algorithm, we consider only the space and time required by that
particular algorithm and we ignore all the remaining elements.
“Performance analysis of an algorithm is the process of calculating space and time
required by that algorithm.”
Space Complexity
Space Complexity of an algorithm is the amount of memory it needs to run to completion i.e.
from start of execution to its termination. Space need by any algorithm is the sum of following
components:
1. Fixed Component: This is independent of the characteristics of the inputs and outputs. This part
includes: Instruction Space, Space of simple variables, fixed size component variables, and
constants variables.
2. Variable Component: This consist of the space needed by component variables whose size is
dependent on the particular problems instances(Inputs/Outputs) being solved, the space needed
by referenced variables and the recursion stack space is one of the most prominent components.
Also this included the data structure components like Linked list, heap, trees, graphs etc.
Therefore the total space requirement of any algorithm 'A' can be provided as
Fixed components as 'result','count' and 'size' variable there for total space required is three(3)
words.
Variable components is characterized as the value stored in 'size' variable (suppose value store in
variable 'size 'is 'n') because this will decide the size of 'number' list and will also drive the ‘for
loop’. Therefore if the space used by size is one word then the total space required by 'number'
variable will be 'n'(value stored in variable 'size').
Time Complexity
Time Complexity of an algorithm (basically when converted to program) is the amount of
computer time it needs to run to completion. The time taken by a program is the sum of the
compile time and the run/execution time.
Ex:
int i, j, n = 8;
for (i = 1; i <= n; i++)
{
for (j = 1; j <= n; j++)
{
printf("Complexity");
}
}
In the above snippet, the first & the second for loops get executed n times individually. So the
time complexity accounts to n*n = O(n2)
There exists a relation between the input data size (n) and number of operations performed (N)
with respect to time. This relation is denoted as Order of growth in Time complexity and given
notation O[n] where O is the order of growth and n is the length of input. It is also called as ‘Big O
Notation’.
Example 1: Activity
Assume that someone in your class has stolen your favourite chocolate. You need to find that
chocolate. Below are some ways to find it.
● By asking each and every person in the class, which means if n number of students are
there, then you need to ask n persons. Hence, the complexity is O(n).
● By asking only one of the students in the class and if he/she doesn't have that chocolate
then ask the same person about the remaining students. So, to each and every person you
are asking about the same person and also about the remaining persons. Hence, the
complexity is twice when compared to the previous case. i.e. O(n2). The above two ways
seem to be difficult.
● So, say if you are divide the entire class into two halves and ask whether the chocolate is
on the right side or the left side. If it is on the right side, then divide the number of people
on the right side again into two and repeat the same process. Here, the number of
enquiries is decreasing exponentially. Hence, the complexity is O(log n).
I might need to do the O(n2) search if only one student knows on which student the chocolate is
hidden. I’d use the O(n) if one student had the chocolate and only they knew it. I’d use the O(log
n) search if all the students knew, but would only tell me if I guessed the right side.
Asymptotic Notation
Asymptotic notations are the mathematical notations used to describe the running time of an
algorithm when the input tends towards a particular value or a limiting value.
For example, consider the following time complexities of two algorithms...
● Algorithm 1 : 5n2 + 2n + 1
● Algorithm 2 : 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time complexity for larger values of
input data (i.e. 'n' value). In above two time complexities, for larger value of 'n' the term '2n +
1' in algorithm 1 has least significance than the term '5n2', and the term '8n + 3' in algorithm 2
has least significance than the term '10n2'.
Here, for larger value of 'n' the value of most significant terms ( 5n2 and 10n2 ) is very larger
than the value of least significant terms ( 2n + 1 and 8n + 3 ). So for larger value of 'n' we ignore
the least significant terms to represent overall time required by an algorithm. In asymptotic
notation, we use only the most significant terms to represent the time complexity of an
algorithm.
There are THREE types of Asymptotic Notations
1. Big - Oh (O)
2. Big - Omega (Ω)
3. Big - Theta (Θ)
f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis.
f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates the
algorithm's lower bound.
Example
Consider the following f(n) and g(n)
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C >
0 and n0>= 1
f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is
greater than f(n) which indicates the algorithm's average bound.
Example
Consider the following f(n) and g(n) f(n) = 3n+ 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all
values of C1 > 0, C2 > 0 and n0>= 1
Example1: In the first example, we have an integer i and a for loop running from i equals 1 to n.
Now the question arises, how many times does the name get printed?
A()
{
int i;
for (i=1 to n)
printf("Edward");
}
Name gets printed ‘n’ times. Time complexity is O(n).
Example2:
A()
{
int i, j:
for (i=1 to n)
for (j=1 to n)
printf("Edward");
}
In this case, firstly, the outer loop will run n times, such that for each time, the inner loop will also
run n times. Thus, the time complexity will be O(n2).
Example 2
{
int sum = 0, i;
for(i = 0; i < n; i++)
sum = sum + A[i];
return sum;
}
Cost is the amount of computer time required for a single operation in each line.
Repetition is the amount of computer time required by each operation for all its repetitions.
So above code requires '4n+4' Units of computer time to complete the task. Here the exact time
is not fixed. And it changes based on the n value. If we increase the n value then the time
required also increases linearly.
Totally it takes '4n+4' units of time to complete its execution and it is Linear Time Complexity.
Searching Algorithms
Linear Search
int linearsearch( int a[], int first, int last, int key)
{
for(int i=first;i<=last;i++)
{
if(key==a[i])
return i;
}
}
Found element from first location of array, then complexity is O(1).
If we have to search the entire array for element, then complexity is O(n).
Linear Search follows the sequential access. The time complexity of Linear Search in the best
case is O(1). In the worst case, the time complexity is O(n).
Binary Search
first = 0;
last = n - 1;
middle = (first+last)/2;
while (first <= last)
{
if (array[middle] < search)
first = middle + 1;
else if (array[middle] == search)
{
printf("%d found at location %d.\n", search, middle+1);
break;
}
else
last = middle - 1;
If we have to run the entire algorithm to find element , then the complexity is O(log2n)
Binary Search is the faster of the two searching algorithms. However, for smaller arrays, linear
search does a better job. The time complexity of Binary Search in the best case is O(1). In the
worst case, the time complexity is O(log n).
Sorting algorithms
The time complexity of Insertion Sort in the best case is O(n). In the worst case, the time
complexity is O(n^2).
The time complexity of Bubble Sort in the best case is O(n). In the worst case, the time
complexity is O(n^2).
The time complexity of Quick Sort in the best case is O(nlogn). In the worst case, the time
complexity is O(n^2). Quicksort is considered to be the fastest of the sorting algorithms due to its
performance of O(nlogn) in best and average cases.
This sorting technique has a stable time complexity for all kinds of cases. The time complexity of
Merge Sort in the best case is O(nlogn). In the worst case, the time complexity is O(nlogn). This is
because Merge Sort implements the same number of sorting steps for all kinds of cases.