Unit 1
Unit 1
Unit 1
If the data is not organized effectively, it is very difficult to perform any task on large amount
of data. If it is organized effectively then any operation can be performed easily on that data.
If the data is stored in a well-organized way on storage media and in computer's memory then it
can be accessed quickly for processing that further reduces the latency and the user is provided a
fast response.
A data structure is a particular way of organizing a large amount of data more efficiently in a
computer so that any operation on that data becomes easy.
In other words, Data structure is a way of collecting and organizing data in such a way that we
can perform operations on these data in an effective way.
A data structure is about rendering data elements in terms of some relationship, for
better performance, organization and storage.
The logical or mathematical model for a particular organization of data is termed as a data
structure.
In simple words, Data structures are structures programmed to store ordered data, so that
various operations can be performed on it easily.
Data Structure is useful in representing the real world data and its operations in the computer
program.
Primitive data structures are those which are the predefined way of storing
data by the system. And the set of operations that can be performed on these
data are also predefined.
Primitive data structures are char, int, double and float.
Most of the programming languages have built-in support for the primitive
data structures.
Non-primitive data structures are more complicated data structures and they
are derived from primitive data structures.
Non-primitive data structures are used to store large and connected data.
Some example of non-primitive data structures are: Linked
List, Tree, Graph, Stack and Queue.
Characteristic Explanation
Homogeneous Data Structures In Homogeneous data structure, all the data items
Characteristic Explanation
One should be able to select the best data structures based on which type of
operations are required.
Introduction to Algorithms
The sequence of steps of an algorithm can be stated in human readable English statements
or pseudo-code (meaning - a notation resembling a simplified programming language).
After the analysis stage, producing an algorithm is the first step in arriving at the solution
process.
There may be more than one way to solve a problem, hence there may be more than
one algorithms for a given problem.
Before attempting to write an algorithm, one should find out what the
expected inputs and outputs are for the given problem.
Time complexity
Space complexity
Time complexity is a way to represent the amount of time required by the program to run till its
completion.
The amount of time taken by the program depends on lot of things like hardware, operating
system, number of processors, processor architecture etc.
We do not consider all these factors when analyzing the algorithms, we only consider the number
of operations to be executed for the completion of the algorithm.
The reason is very simple because we want the algorithm analysis to be system independent.
Space complexity is the amount of memory space that is required by the algorithm for its
execution.
Space complexity is very useful to identify the best algorithm in situations where limited
memory is available for the program to run.
When we are measuring the space complexity analysis of an algorithm, we consider only the data
space of the algorithm and ignore the instruction space and environmental space.
That is we will calculate only the memory required to store variables, constants and structures.
We will learn more about Time complexity and Space complexity in the later sections.
Space and Time Complexities
In other words, Space complexity is the number of memory cells which an algorithm needs.
A good algorithm keeps this number as small as possible. When writing the code we should
focus on keeping the space complexity to the minimum.
When we design an algorithm to solve a problem, it needs some computer memory to execute.
For any algorithm to execute it requires memory for the following purposes:
An algorithm may require inputs, variables, and constants of different data types.
For calculating the space complexity, we need to know the amount of memory used by variables
of different data types, which generally varies for different operating systems.
The method for calculating the space complexity remains the same and independent of the
operating system.
Let us see the procedure for calculating the space complexity of a code with an example:
int sum(int x, int y) {
int z = x + y;
return z;
}
The above code taken two inputs x and y of type int as formal parameters.
In the code, another local variable z of type int is used for storing the sum of x and y.
The int data type takes 2 bytes of memory, so the total space complexity is 3 (number of
variables) * 2 (Size of each variable) = 6 bytes.
The space requirement of this algorithm is fixed for any input given to the algorithm, hence it is
called as constant space complexity
Hence the total space requirement for the above code would be (2 * n + 6).
The space complexity of the program is increasing linearly with the size of
the array (input) n then it is called as Linear Space Complexity.
The total time required by the algorithm to run till its completion depends on the number of
(machine) instructions the algorithm executes during its running time.
The actual time taken differs from machine to machine as it depends on several factors like CPU
speed, memory access speed, OS, processor etc.
So, we will take the number of operations executed in the algorithm as the time complexity.
Thus, the amount of time taken and the number of elementary operations performed by the
algorithm differ by at most a constant factor.
1. Thus, the amount of time taken and the number of elementary operations
performed by the algorithm differ by at most a constant factor.
2. Timing Complexity Comparison Example
Let us consider the problem statement of the algorithm is "Find the sum of
first n natural numbers".
Here we are considering two algorithms for the same problem statement
(there can be many more) and then calculate the time complexity of both
algorithms and choose the best algorithm.
Algorithm - 1 :
In the above solution we are looping from 1 to n and adding that number
to sum. After the end of the loop the sum will be equal to 1 + 2 + 3 + .... +
n.
Let us calculate the time complexity of the algorithm by counting all the
elementary operations.
1 (One Assignment
int i, total = 0; 1 1*1=1
Operation)
for (i = 1; i <=
n; i++) { 1 (One Assignment
1 1*1=1
... Operation)
}
for (i = 1; i <=
n + 1 (n times it evaluates to
n; i++) { 1 (One Comparision 1 * (n + 1)
true, 1 time it evaluates to
... Operation) =n+1
false)
}
for (i = 1; i <=
n; i++) { 1 (One Increment n times (i is
1*n=n
... Operation) incremented n times)
}
total =
total + i; Assignment operation)
}
1 (One return
return total; 1 1*1=1
operation)
As the value of n increases the time taken will also increase linearly. Thus
this algorithm has a linear time complexity.
Click on the Live Demo to understand how to find time complexity of sum
of n natural numbers in action.
Algorithm - 2 :
In the above code, we are calculating the sum of n natural numbers using a
formula n * (n + 1) / 2.
Let us calculate the time complexity of the algorithm by counting all the
elementary operations.
No of Times Total
Code No. of Elementary Operations
Executed Operations
For the above code, the time complexity is constant, because it will never
be dependent on the value of n, it will always give the result in 4 elementary
operations.
In the above two simple algorithms, you saw how a single problem can have
many solutions.
While the first solution required a loop which will execute for n number of
times, the second solution used mathematical expression to return the result
in one line.
Asymptotic Notations
Asymptotic notations are used to express some standard notations to analyze the complexity of
any algorithm in terms of time and space.
When we analyze any algorithm, we generally get a formula to represent the amount of time
required for execution or the time required by the computer to run the lines of code of the
algorithm, number of memory accesses, number of comparisons etc. for a given input.
This formula contains a lot of insignificant information that do not contribute to the running
time.
For example, if some algorithm has a time complexity of T(n) = (n2 + 8n + 2), which is a
quadratic equation, for larger values of n, the 8 * n + 2 part becomes insignificant when
compared to n2.
For example, let us assume that an algorithm takes a time of 200n2 will be faster than some other
algorithm that takes n3 time, for any value of n larger than 200.
Since we're only interested in the asymptotic behavior of the growth of the function, the constant
factor can be ignored too.
The word asymptotic means approaching a value / limit or curve arbitrarily closely.
Let us understand the concept of asymptotic notations and its advantages using an example. If
we have two algorithms with the following time complexity T(n):
Algorithm - 1: T(n) = 23n2 + 6n - 1
Algorithm - 2: T(n) = n3 + 8n - 2
Now, as per asymptotic notations [avoiding the coefficients and insignificant parts], we should
just worry about how the function will grow as the value of n (input) will grow, and that will
entirely depend on n2 for the Algorithm - 1, and on n3 for the Algorithm - 2.
Hence, we can clearly say that the Algorithm - 2 will grow faster than the other one, simply by
analyzing the highest power coefficient and ignoring the other constants (23 in 23n2) and
insignificant parts of the expression(6n - 1 and 8n - 2).
The reason we are following asymptotic notations is to make things easy to manage.
All we need to do is, first analyse the algorithm to find out an expression to define it's time
requirements and then analyse how that expression will grow as the input(n) will grow.
Three types of asymptotic notations are used to represent the growth of any algorithm, as input
increases:
1. Big-Theta (Θ) : denotes "the same as" iterations (represents the average case).
2. Big-Oh (O) : denotes "fewer than or the same as" iterations (represents the worst case)
3. Big-Omega (Ω) : denotes "more than or the same as" iterations (represents the best case)
If the time complexity is represented by the Big-Θ notation, it is like the average value or range
within which the actual time of execution of the algorithm will be.
For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big Theta (Θ) notation to represent this, then the time complexity would
be Θ(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = Θ(n2)
Here in the above example, if we say f(n) = 20n2 + 8n and g(n) = (n2), it means that there are
positive constants c1, c2, and k, such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ k.
The values of c1, c2, and k must be fixed for the function f and must not depend on n.
It tells us that a certain function will never exceed a specified time for any value of input n.
Why do we need this representation when we already have the Big-Θ notation, which represents
the tightly bound running time for any algorithm.
Consider a linear search algorithm, in which we traverse an array elements from starting
element to the last element, one by one to search a given number.
In worst case, we need to search till the end of the array to conform that the element is not found
in the array.
But it can also happen, that the element that we are searching for is the first element of the array,
in which case the time complexity will be 1.
Now in this case, saying that the Big-Θ or tight bound time complexity for linear search is T(n),
will mean that the time required will always be related to n, as this is the right way to represent
the average time complexity.
But when we use the Big-O notation, we mean to say that the time complexity is O(n), which
means that the time complexity will never exceed n, defining the upper bound, hence saying that
it can be less than or equal to n, which is the correct representation.
This is the reason, most of the time you will see Big-O notation being used to represent the time
complexity of any algorithm, because it considers the worst case of the running algorithm.
For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big-Oh (O) notation to represent this, then the time complexity would
be O(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = O(n2)
Here, in the example above, if we say f(n) = 20n2 + 8n and g(n) = n2 it means there are positive
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n = k.
The values of c and k must be fixed for the function f and must not depend on n.
Big-Omega (Ω) notation is used to define the lower bound of an algorithm or we can say it as
the best case of that algorithm.
This always indicates the minimum time required to an algorithm for all the input values, which
is the best case of that algorithm.
For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big-Omega (Ω) notation to represent this, then the time complexity would
be Ω(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = Ω(n2)
Here, in the example above, if we say f(n) = 20n2 + 8n and g(n) = (n2) it means there are positive
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k.
The values of c and k must be fixed for the function f and must not depend on n.
Let us raise a question "So why should we bother about time complexity?"
Suppose time taken by one operation = 1 Micro second and n = 100 then the time complexities
are:
Complexity Time Taken by Algorithm
O(2n) 4 * 10 6 Years
As you can see the timing complexity drastically increases the amount of time taken by the
algorithm to execute.
We should focus on reducing the timing complexity as much as possible while designing and
developing algorithm.
int calculate_square(int n) {
int square = 0;
square = n * n;
return square;
}
The time complexity can be calculated as:
The running time of the statement will not change in relation to n. So, the time complexity
is O(1) (read as “Big O One”).
Let us see the below code to find the factorial of a given number and try to calculate the time
complexity.
int calculate_factorial(int n) {
int i, fact = 1;
for (i = 1; i <= n; i++) {
fact = fact * i;
}
return fact;
}
The time complexity can be calculated as:
1 (One Assignment
int i, fact = 1; 1 1*1=1
Operation)
1 (One Assignment
for (i = 1; i <= n; 1 1*1=1
i++) Operation)
for (i = 1 ;i <= n;
i++) { 2 (One multiplication,
fact = n (loop executed n times) 2 * n = 2n
One Assignment)
fact * i;
}
As the value of n increases the time taken will also increase linearly. Thus this algorithm has
a linear time complexity.
Removing the insignificant part and the coefficients, the timing complexity is T(n) = O(n).
The running time of the single for loop is directly proportional to n. The time complexity
is O(n) (read as “Big O of n”).
Let us consider the below code which consists of a nested for loop.
int sum(int n) {
int i, j, total = 0;
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
total++;
}
}
return total;
}
The time complexity can be calculated as:
No. of Elementary Total
Code No. of Times Executed
Operations Operations
1 (One
int i, j, total = 0; Assignment 1 1*1=1
Operation)
1 (One
for (i = 0; i < n; i++) Assignment 1 1*1=1
Operation)
1 (One
n+1 times (n times it evaluates to 1 * (n + 1)
for (i = 0; i < n;i++) Comparison
true, 1 time it evaluates to false) =n+1
Operation)
1 (One Increment
for (i = 0; i < n; i++) n times (i increments from 0 to n) 1*n=n
Operation)
1 (One
n times (As this statement is inside a
for (j = 0; j < n; j++) Assignment 1*n=n
loop which runs n times)
Operation)
1 (One n * (n + 1) times (n + 1 * n * (n
for (j = 0; j < n; j++) Comparision 1 comparisions execute for n times + 1) = n2 +
Operation) as it is in the inner loop) n
n * n times (n increments
1 (One Increment 1*n*n=
for (j = 0; j < n; +) done n times as it is in the inner
operation) n2
loop )
for (i = 0; i < n; i++) 1 (One Increment n * n times (It is a statement inside 1*n*n=
{ operation) two loops which run n times each) n2
for (j = 0; j
< n; j++) {
No. of Elementary Total
Code No. of Times Executed
Operations Operations
total++;
}
}
1 (One return
return total; 1 time 1*1=1
operation)
3n2 + 4n +
Total Number of Operations
4
The running time of the two loops is proportional to the square of n. When n doubles, the
running time increases by n * n. So, the time complexity is O(n2).
Consider searching a word from the dictionary. We know very well that all the words in the
dictionary are alphabetically sorted.
Given this scenario, we can do a more efficient search than the linear search, called binary
search.
In this search, we will divide the complete dictionary into two halves and then select the half that
has the possibility of containing the word.
The above step can be repeated until we find the word in the dictionary.
This algorithm will have a logarithmic time complexity which is O(log n).