Data Structure Module 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

SYLLABUS- DATA STRUCTURES

Module 1 : Basic Concepts of Data Structures

System Life Cycle, Algorithms, Performance Analysis, Space Complexity, Time Complexity,
Asymptotic Notation, Complexity Calculation of Simple Algorithms

Module 2 : Arrays and Searching

Polynomial representation using Arrays, Sparse matrix, Stacks, Queues-Circular Queues, Priority
Queues, Double Ended Queues, Evaluation of Expressions Linear Search and Binary Search

Module 3 : Linked List and Memory Management

Self Referential Structures, Dynamic Memory Allocation, Singly Linked List-Operations on Linked
List. Doubly Linked List, Circular Linked List, Stacks and Queues using Linked List, Polynomial
representation using Linked List Memory allocation and de-allocation- First-fit, Best-fit and
Worst-fit allocation schemes

Module 4 : Trees and Graphs

Trees, Binary Trees-Tree Operations, Binary Tree Representation, Tree Traversals, Binary Search
Trees- Binary Search Tree Operations Graphs, Representation of Graphs, Depth First Search and
Breadth First Search on Graphs, Applications of Graphs

Module 5 : Sorting and Hashing

Sorting Techniques – Selection Sort, Insertion Sort, Quick Sort, Merge Sort and Heap Sort
Hashing- Hashing Techniques, Collision Resolution, Overflow handling, Hashing functions – Mid
square, Division, Folding, Digit Analysis

Text Book 1. Ellis Horowitz, Sartaj Sahni and Susan Anderson-Freed, Universities Press,
Fundamentals of Data Structures in C
Module 1 : Basic Concepts of Data Structures

System Life Cycle

Algorithms

An Algorithm, in computer programming, is a finite sequence of well-defined instructions,


typically executed in a computer, to solve a class of problems or to perform a common task.

There can be any number of ways; a specific set of instructions can be defined to perform the
same task.

Every algorithm must satisfy the following criteria:

● input: there are zero or more quantities which are externally supplied;

● output: at least one quantity is produced;

● definiteness: each instruction must be clear and unambiguous;

● finiteness: if we trace out the instructions of an algorithm, then for all cases the algorithm
will terminate after a finite number of steps;

● effectiveness: every instruction must be sufficiently basic that it can in principle be


carried out by a person using only pencil and paper. It is not enough that each operation
be definite as in (iii), but it must also be feasible.

Performance Analysis

Analogy: If we want to go from city "A" to city "B", there can be many ways of doing this. We can
go by flight, by bus, by train and also by bicycle. Depending on the availability and convenience,
we choose the one which suits us. Similarly, in computer science, there are multiple algorithms to
solve a problem. When we have more than one algorithm to solve a problem, we need to select
the best one. Performance analysis helps us to select the best algorithm from multiple algorithms
to solve a problem.
Generally, the performance of an algorithm depends on the following elements...

1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,

When we want to analyse an algorithm, we consider only the space and time required by that
particular algorithm and we ignore all the remaining elements.
“Performance analysis of an algorithm is the process of calculating space and time
required by that algorithm.”

Space Complexity

Space Complexity of an algorithm is the amount of memory it needs to run to completion i.e.
from start of execution to its termination. Space need by any algorithm is the sum of following
components:
1. Fixed Component: This is independent of the characteristics of the inputs and outputs. This part
includes: Instruction Space, Space of simple variables, fixed size component variables, and
constants variables.
2. Variable Component: This consist of the space needed by component variables whose size is
dependent on the particular problems instances(Inputs/Outputs) being solved, the space needed
by referenced variables and the recursion stack space is one of the most prominent components.
Also this included the data structure components like Linked list, heap, trees, graphs etc.

Therefore the total space requirement of any algorithm 'A' can be provided as

Space(A) = Fixed Components(A) + Variable Components(A)

Fixed components as 'result','count' and 'size' variable there for total space required is three(3)
words.

Variable components is characterized as the value stored in 'size' variable (suppose value store in
variable 'size 'is 'n') because this will decide the size of 'number' list and will also drive the ‘for
loop’. Therefore if the space used by size is one word then the total space required by 'number'
variable will be 'n'(value stored in variable 'size').

Therefore the space complexity can be written as Space(Sum) = 3 + n;

Time Complexity
Time Complexity of an algorithm (basically when converted to program) is the amount of
computer time it needs to run to completion. The time taken by a program is the sum of the
compile time and the run/execution time.

Ex:

int i, j, n = 8;
for (i = 1; i <= n; i++)
{
for (j = 1; j <= n; j++)
{
printf("Complexity");
}
}
In the above snippet, the first & the second for loops get executed n times individually. So the
time complexity accounts to n*n = O(n2)

There exists a relation between the input data size (n) and number of operations performed (N)
with respect to time. This relation is denoted as Order of growth in Time complexity and given
notation O[n] where O is the order of growth and n is the length of input. It is also called as ‘Big O
Notation’.

Example 1: Activity

Assume that someone in your class has stolen your favourite chocolate. You need to find that
chocolate. Below are some ways to find it.

● By asking each and every person in the class, which means if n number of students are
there, then you need to ask n persons. Hence, the complexity is O(n).
● By asking only one of the students in the class and if he/she doesn't have that chocolate
then ask the same person about the remaining students. So, to each and every person you
are asking about the same person and also about the remaining persons. Hence, the
complexity is twice when compared to the previous case. i.e. O(n2). The above two ways
seem to be difficult.
● So, say if you are divide the entire class into two halves and ask whether the chocolate is
on the right side or the left side. If it is on the right side, then divide the number of people
on the right side again into two and repeat the same process. Here, the number of
enquiries is decreasing exponentially. Hence, the complexity is O(log n).

I might need to do the O(n2) search if only one student knows on which student the chocolate is
hidden. I’d use the O(n) if one student had the chocolate and only they knew it. I’d use the O(log
n) search if all the students knew, but would only tell me if I guessed the right side.

Asymptotic Notation

“Asymptotic notation of an algorithm is a mathematical representation of its complexity.”

Asymptotic notations are the mathematical notations used to describe the running time of an
algorithm when the input tends towards a particular value or a limiting value.
For example, consider the following time complexities of two algorithms...

● Algorithm 1 : 5n2 + 2n + 1
● Algorithm 2 : 10n2 + 8n + 3

Generally, when we analyze an algorithm, we consider the time complexity for larger values of
input data (i.e. 'n' value). In above two time complexities, for larger value of 'n' the term '2n +
1' in algorithm 1 has least significance than the term '5n2', and the term '8n + 3' in algorithm 2
has least significance than the term '10n2'.

Here, for larger value of 'n' the value of most significant terms ( 5n2 and 10n2 ) is very larger
than the value of least significant terms ( 2n + 1 and 8n + 3 ). So for larger value of 'n' we ignore
the least significant terms to represent overall time required by an algorithm. In asymptotic
notation, we use only the most significant terms to represent the time complexity of an
algorithm.
There are THREE types of Asymptotic Notations

1. Big - Oh (O)
2. Big - Omega (Ω)
3. Big - Theta (Θ)

Big - Oh Notation (O)


Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity.
That means Big - Oh notation always indicates the maximum time required by an algorithm for
all input values. That means Big - Oh notation describes the worst case of an algorithm time
complexity.
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) <= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as O(g(n)).

f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis.

Big – Omega Notation (Ω)


Big - Omega notation is used to define the lower bound of an algorithm in terms of Time
Complexity.
That means Big-Omega notation always indicates the minimum time required by an algorithm
for all input values. That means Big-Omega notation describes the best case of an algorithm time
complexity.
Big - Omega Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) >= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as Ω(g(n)).

f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis

In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates the
algorithm's lower bound.
Example
Consider the following f(n) and g(n)
f(n) = 3n + 2
g(n) = n

If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C >
0 and n0>= 1

f(n) >= C g(n)


⇒3n + 2 >= C n

Above condition is always TRUE for all values of C = 1 and n >= 1.


By using Big - Omega notation we can represent the time complexity as follows...
3n + 2 = Ω(n)

Big - Theta Notation (Θ)


Big - Theta notation is used to define the average bound of an algorithm in terms of Time
Complexity.
That means Big - Theta notation always indicates the average time required by an algorithm for
all input values. That means Big - Theta notation describes the average case of an algorithm time
complexity.
Big - Theta Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If C1 g(n)
<= f(n) <= C2 g(n) for all n >= n0, C1 > 0, C2 > 0 and n0 >= 1. Then we can represent f(n) as Θ(g(n)).

f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis

In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is
greater than f(n) which indicates the algorithm's average bound.
Example
Consider the following f(n) and g(n) f(n) = 3n+ 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all
values of C1 > 0, C2 > 0 and n0>= 1

C1 g(n) <= f(n) <= C2 g(n)


⇒C1 n <= 3n + 2 <= C2 n

Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.


By using Big - Theta notation we can represent the time complexity as follows...
3n + 2 = Θ(n)
Complexity Calculation of Simple Algorithms

Example1: In the first example, we have an integer i and a for loop running from i equals 1 to n.
Now the question arises, how many times does the name get printed?

A()
{
int i;
for (i=1 to n)
printf("Edward");
}
Name gets printed ‘n’ times. Time complexity is O(n).

Example2:
A()
{
int i, j:
for (i=1 to n)
for (j=1 to n)
printf("Edward");
}
In this case, firstly, the outer loop will run n times, such that for each time, the inner loop will also
run n times. Thus, the time complexity will be O(n2).

Example 2

int sum(int A[], int n)

{
int sum = 0, i;
for(i = 0; i < n; i++)
sum = sum + A[i];
return sum;
}

Cost is the amount of computer time required for a single operation in each line.

Repetition is the amount of computer time required by each operation for all its repetitions.

Total is the amount of computer time required by each operation to execute.

So above code requires '4n+4' Units of computer time to complete the task. Here the exact time
is not fixed. And it changes based on the n value. If we increase the n value then the time
required also increases linearly.

Totally it takes '4n+4' units of time to complete its execution and it is Linear Time Complexity.

Searching Algorithms

Linear Search
int linearsearch( int a[], int first, int last, int key)
{
for(int i=first;i<=last;i++)
{
if(key==a[i])
return i;
}
}
Found element from first location of array, then complexity is O(1).

If we have to search the entire array for element, then complexity is O(n).

Linear Search follows the sequential access. The time complexity of Linear Search in the best
case is O(1). In the worst case, the time complexity is O(n).

Binary Search

first = 0;
last = n - 1;
middle = (first+last)/2;
while (first <= last)
{
if (array[middle] < search)
first = middle + 1;
else if (array[middle] == search)
{
printf("%d found at location %d.\n", search, middle+1);
break;
}
else
last = middle - 1;

middle = (first + last)/2;


}
if (first > last)
printf("Not found! %d isn't present in the list.\n", search);

If the element to search is at the middle, then the complexity is O(1).

If we have to run the entire algorithm to find element , then the complexity is O(log2n)

Binary Search is the faster of the two searching algorithms. However, for smaller arrays, linear
search does a better job. The time complexity of Binary Search in the best case is O(1). In the
worst case, the time complexity is O(log n).

Sorting algorithms

Time Complexity of Insertion Sort:

The time complexity of Insertion Sort in the best case is O(n). In the worst case, the time
complexity is O(n^2).

void insert(int a[], int n) /* function to sort an with insertion sort */


{
int i, j, temp;
for (i = 1; i < n; i++)
{
temp = a[i];
j = i - 1;
while(j>=0 && temp <= a[j])
{
a[j+1] = a[j];
j = j-1;
}
a[j+1] = temp;
}
}

Time Complexity of Bubble Sort:

The time complexity of Bubble Sort in the best case is O(n). In the worst case, the time
complexity is O(n^2).

for (c = 0 ; c < n - 1; c++)


{
for (d = 0 ; d < n - c - 1; d++)
{
if (array[d] > array[d+1])
{
swap = array[d];
array[d] = array[d+1];
array[d+1] = swap;
}
}

Time Complexity of Quick Sort:

The time complexity of Quick Sort in the best case is O(nlogn). In the worst case, the time
complexity is O(n^2). Quicksort is considered to be the fastest of the sorting algorithms due to its
performance of O(nlogn) in best and average cases.

Time Complexity of Merge Sort:

This sorting technique has a stable time complexity for all kinds of cases. The time complexity of
Merge Sort in the best case is O(nlogn). In the worst case, the time complexity is O(nlogn). This is
because Merge Sort implements the same number of sorting steps for all kinds of cases.

You might also like