Unit 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Unit 1 - Lesson 1/Introduction to Data Structures/

What is Data Structure?


In the modern world, data and its information is an essential part, where data is just collection of
facts or set of values that are in particular format and the information is the processed data.

If the data is not organized effectively, it is very difficult to perform any task on large amount
of data. If it is organized effectively then any operation can be performed easily on that data.

If the data is stored in a well-organized way on storage media and in computer's memory then it
can be accessed quickly for processing that further reduces the latency and the user is provided a
fast response.

A data structure is a particular way of organizing a large amount of data more efficiently in a
computer so that any operation on that data becomes easy.

In other words, Data structure is a way of collecting and organizing data in such a way that we
can perform operations on these data in an effective way.

A data structure is about rendering data elements in terms of some relationship, for
better performance, organization and storage.

The logical or mathematical model for a particular organization of data is termed as a data
structure.

In simple words, Data structures are structures programmed to store ordered data, so that
various operations can be performed on it easily.

It represents the knowledge of data to be organized in memory. It should be designed and


implemented in such a way that it reduces the complexity and increases the efficiency.

Data Structure is useful in representing the real world data and its operations in the computer
program.

1. Select all the correct statements given below.


2. Types of Data Structures

Based on the organizing method of a data structure, data structures are


divided into two types:

1. Primitive Data Structures (Built-In Data Structures)


2. Non-primitive Data Structures (User-defined Data Structures)

Primitive data structures are those which are the predefined way of storing
data by the system. And the set of operations that can be performed on these
data are also predefined.
Primitive data structures are char, int, double and float.

Most of the programming languages have built-in support for the primitive
data structures.

Non-primitive data structures are more complicated data structures and they
are derived from primitive data structures.

Non-primitive data structures are used to store large and connected data.
Some example of non-primitive data structures are: Linked
List, Tree, Graph, Stack and Queue.

The non-primitive data structures are subcategorized into two


ways: Linear data structures and Non-linear data structures.

o If a data structure is organizing the data in sequential order then


that data structure is called a linear data structure.
 Some of the examples are Arrays, Linked
Lists, Stacks and Queues.
o If a data structure is organizing the data in random
order or hierarchical order, not in sequential order, then that data
structure is called as non-linear data structure.
 Some of the examples
are Trees, Graphs, Dictionaries and Heaps.

1. Select all the correct statements given below.


2. Data Structure Characteristics

Let us learn the different characteristics of the data structures.

Characteristic Explanation

In linear data structures, all the data items are


Linear Data Structures
stored in a linear sequence. Example : Arrays

In Non-Linear data structures, all the data items are


stored in random order or hierarchical order. The
Non-Linear Data Structures data Items are not stored in a linear sequence.
Example : Trees, Dictionaries, Graphs and
Heaps

Homogeneous Data Structures In Homogeneous data structure, all the data items
Characteristic Explanation

are of same type. Example : Arrays

In Non-Homogeneous data structures, all the data


Non-Homogeneous Data Structures items may or may not be of same type
Example : Structures

In Static data structures, the size and structure's


Static Data Structures associated memory locations are fixed during
compile time. Example : Arrays

Dynamic data structures can expand or shrink


depending upon the program need and its
execution. This expansion and shrinking happens
during the program runtime. Also, their associated
Dynamic Data Structures
memory locations can change during program
runtime. Example : Linked Lists, Stack using
Linked Lists, Queues using Linked List, Trees,
Heaps etc.

All the data structures allow us to perform different operations on data.

Each data structure has advantages and disadvantages compared to other


data structures.

One should be able to select the best data structures based on which type of
operations are required.

Introduction to Algorithms

An Algorithm is a finite set of instructions or logic, written in order, to accomplish a certain


predefined task.

An Algorithm is independent of the programming language. An Algorithm is the core logic to


solve a given problem.

An Algorithm is expressed generally as flow chart or as an informal high level description


called as pseudocode Algorithm can be defined as “a sequence of steps to be performed for
getting the desired output for a given input.”

The sequence of steps of an algorithm can be stated in human readable English statements
or pseudo-code (meaning - a notation resembling a simplified programming language).

After the analysis stage, producing an algorithm is the first step in arriving at the solution
process.

There may be more than one way to solve a problem, hence there may be more than
one algorithms for a given problem.

Before attempting to write an algorithm, one should find out what the
expected inputs and outputs are for the given problem.

The properties of an algorithm are:

1. Input: Algorithm should be accepting 0 or more inputs supplied externally.


2. Output: Algorithm should be generating at least one output.
3. Definiteness: Each step of an algorithm must be precisely defined. Meaning the step
should perform a clearly defined task without much complication.
4. Finiteness: An algorithm must always terminate after a finite number of steps.
5. Effectiveness: The efficiency of the steps and the accuracy of the output determine the
effectiveness of the algorithm.
6. Correctness: Each step of the algorithm must generate a correct output.

Understanding Efficiency of an Algorithm


An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less
memory space.

The performance of an algorithm is measured on the basis of following properties:

 Time complexity
 Space complexity

Time complexity is a way to represent the amount of time required by the program to run till its
completion.

The amount of time taken by the program depends on lot of things like hardware, operating
system, number of processors, processor architecture etc.

We do not consider all these factors when analyzing the algorithms, we only consider the number
of operations to be executed for the completion of the algorithm.

The reason is very simple because we want the algorithm analysis to be system independent.

Space complexity is the amount of memory space that is required by the algorithm for its
execution.

Space complexity is very useful to identify the best algorithm in situations where limited
memory is available for the program to run.

An algorithm generally requires space for following:


 Instruction space: Space required to store the executable version (Also
known as Binary Code) of the program. This space is fixed and depends
on the number of instructions in the algorithm.
 Data space: Space required to store all the program variables (Constants,
Variables, Temporary Variables etc.).
 Environment space: Space required to store the environment information
needed to resume the suspended function.
One heap and one stack is maintained during the program execution.
 Heap is the segment where dynamic memory allocation usually takes
place.
 Stack is a segment where automatic variables and function call stack is
stored.

When we are measuring the space complexity analysis of an algorithm, we consider only the data
space of the algorithm and ignore the instruction space and environmental space.

That is we will calculate only the memory required to store variables, constants and structures.

We will learn more about Time complexity and Space complexity in the later sections.
Space and Time Complexities

Understanding Space Complexity


Space complexity is the amount of memory used by the algorithm (including the input values to
the algorithm) to execute and produce the result.

In other words, Space complexity is the number of memory cells which an algorithm needs.

A good algorithm keeps this number as small as possible. When writing the code we should
focus on keeping the space complexity to the minimum.

When we design an algorithm to solve a problem, it needs some computer memory to execute.

For any algorithm to execute it requires memory for the following purposes:

1. Memory required to store the program instructions. (Also called


as Instruction space)
2. Memory required to store constants and variables. (Also called as Data
space )
3. Memory that is to be dynamically allocated. Memory that is required for
storing data between functions. (Also called as Environment space)

An algorithm may require inputs, variables, and constants of different data types.

For calculating the space complexity, we need to know the amount of memory used by variables
of different data types, which generally varies for different operating systems.
The method for calculating the space complexity remains the same and independent of the
operating system.

Data Type Size

bool, char, unsigned char, signed char, __int8 1 byte

__int16, short, unsigned short, wchar_t, __wchar_t 2 bytes

float, __int32, int, unsigned int, long, unsigned


4 bytes
long

double, __int64, long double, long long 8 bytes


Calculating a Constant Space Complexity

Let us see the procedure for calculating the space complexity of a code with an example:
int sum(int x, int y) {
int z = x + y;
return z;
}
The above code taken two inputs x and y of type int as formal parameters.

In the code, another local variable z of type int is used for storing the sum of x and y.

The int data type takes 2 bytes of memory, so the total space complexity is 3 (number of
variables) * 2 (Size of each variable) = 6 bytes.

The space requirement of this algorithm is fixed for any input given to the algorithm, hence it is
called as constant space complexity

1. Calculating Linear Space Complexity

Let us understand the calculation of space complexity with one more


example:

//Function to calculate a sum of n elements in the array


//n is the number of elements in the array.
int sum(int a[ ] , int n) {
int x = 0, i = 0
for (i = 0; i < n; i++) {
x= x + a[i];
}
return x;
}
In the above code, 2 * n bytes of memory (size of int data type is 2) is
required by the array a[ ] and 2 bytes of memory for each variable
of x, n and i.

Hence the total space requirement for the above code would be (2 * n + 6).

The space complexity of the program is increasing linearly with the size of
the array (input) n then it is called as Linear Space Complexity.

Similarly, when the memory requirement of the algorithm increases


quadratic to the given input then it is called as a "Quadratic Space
Complexity".

Similarly, when the memory requirement of the algorithm increases cubic to


the given input then it is called as a "Cubic Space Complexity" and so on.

Select all the correct statements given below.

Understanding Time Complexity


Time complexity is the computational complexity that measures or estimates the time taken to
run an algorithm.

The total time required by the algorithm to run till its completion depends on the number of
(machine) instructions the algorithm executes during its running time.

The actual time taken differs from machine to machine as it depends on several factors like CPU
speed, memory access speed, OS, processor etc.

So, we will take the number of operations executed in the algorithm as the time complexity.

Time complexity is commonly estimated by counting the number of elementary operations


performed by the algorithm, supposing that each elementary operation takes a fixed amount of
time to perform.

Thus, the amount of time taken and the number of elementary operations performed by the
algorithm differ by at most a constant factor.

1. Thus, the amount of time taken and the number of elementary operations
performed by the algorithm differ by at most a constant factor.
2. Timing Complexity Comparison Example

Let us consider the problem statement of the algorithm is "Find the sum of
first n natural numbers".

Here we are considering two algorithms for the same problem statement
(there can be many more) and then calculate the time complexity of both
algorithms and choose the best algorithm.

Algorithm - 1 :

// Calculating the sum of first n natural numbers


int sum(int n) {
int i, total = 0;
for (i = 1; i <= n; i++) {
total = total + i;
}
return total;
}

In the above solution we are looping from 1 to n and adding that number
to sum. After the end of the loop the sum will be equal to 1 + 2 + 3 + .... +
n.

Let us calculate the time complexity of the algorithm by counting all the
elementary operations.

No. of Elementary Total


Code No of Times Executed
Operations Operations

1 (One Assignment
int i, total = 0; 1 1*1=1
Operation)

for (i = 1; i <=
n; i++) { 1 (One Assignment
1 1*1=1
... Operation)
}

for (i = 1; i <=
n + 1 (n times it evaluates to
n; i++) { 1 (One Comparision 1 * (n + 1)
true, 1 time it evaluates to
... Operation) =n+1
false)
}

for (i = 1; i <=
n; i++) { 1 (One Increment n times (i is
1*n=n
... Operation) incremented n times)
}

for (i = 1; i <= 2 (One Addition n (n times as loop 2*n=2*


n; i++) { operation, One executes n times) n
No. of Elementary Total
Code No of Times Executed
Operations Operations

total =
total + i; Assignment operation)
}

1 (One return
return total; 1 1*1=1
operation)

Total Number of Operations 4*n+4

Time complexity of the algorithm T(n) = 4 * n + 4 [ Assuming each operation


completes in unit time ]

The total number of elementary operation executed in the above algorithm


is 4 * n + 4.

As the value of n increases the time taken will also increase linearly. Thus
this algorithm has a linear time complexity.

Click on the Live Demo to understand how to find time complexity of sum
of n natural numbers in action.

Algorithm - 2 :

// calculating the sum of first n natural numbers using formula


int sum(int n) {
return n * (n + 1) / 2;
}

In the above code, we are calculating the sum of n natural numbers using a
formula n * (n + 1) / 2.

Let us calculate the time complexity of the algorithm by counting all the
elementary operations.

No of Times Total
Code No. of Elementary Operations
Executed Operations

return n * (n 4 (One multiplication, One Addition, One 1 4


No of Times Total
Code No. of Elementary Operations
Executed Operations

+ 1) / 2 division, One retrun operation)

Total Number of Operations 4

Time complexity of the algorithm T(n) = 4 [Assuming each operation completes in


unit time]

For the above code, the time complexity is constant, because it will never
be dependent on the value of n, it will always give the result in 4 elementary
operations.

Thus the time complexity of the Algorithm - 2 is constant. In other words,


the running time of the algorithm does not change with the input n.

In the above two simple algorithms, you saw how a single problem can have
many solutions.

While the first solution required a loop which will execute for n number of
times, the second solution used mathematical expression to return the result
in one line.

It is clearly evident that second solution is the best one.

Asymptotic Notations

Asymptotic notations are used to express some standard notations to analyze the complexity of
any algorithm in terms of time and space.

When we analyze any algorithm, we generally get a formula to represent the amount of time
required for execution or the time required by the computer to run the lines of code of the
algorithm, number of memory accesses, number of comparisons etc. for a given input.

This formula contains a lot of insignificant information that do not contribute to the running
time.

For example, if some algorithm has a time complexity of T(n) = (n2 + 8n + 2), which is a
quadratic equation, for larger values of n, the 8 * n + 2 part becomes insignificant when
compared to n2.

For n = 1000, n2 = 1000000 and 8 * n + 2 = 8002. 8002 is very insignificant compared


to 1000000.
Also, when we compare the execution times of two algorithms the constant coefficients of higher
order terms are also neglected.

For example, let us assume that an algorithm takes a time of 200n2 will be faster than some other
algorithm that takes n3 time, for any value of n larger than 200.

Since we're only interested in the asymptotic behavior of the growth of the function, the constant
factor can be ignored too.

The word asymptotic means approaching a value / limit or curve arbitrarily closely.

In case of the asymptotic notations, we use the same model

 to ignore the constant factors and insignificant parts of an expression


 to device a better way of representing complexities of algorithms, in a single coefficient

So that, the comparison between algorithms can be done easily.

Let us understand the concept of asymptotic notations and its advantages using an example. If
we have two algorithms with the following time complexity T(n):
Algorithm - 1: T(n) = 23n2 + 6n - 1
Algorithm - 2: T(n) = n3 + 8n - 2
Now, as per asymptotic notations [avoiding the coefficients and insignificant parts], we should
just worry about how the function will grow as the value of n (input) will grow, and that will
entirely depend on n2 for the Algorithm - 1, and on n3 for the Algorithm - 2.

Hence, we can clearly say that the Algorithm - 2 will grow faster than the other one, simply by
analyzing the highest power coefficient and ignoring the other constants (23 in 23n2) and
insignificant parts of the expression(6n - 1 and 8n - 2).

The reason we are following asymptotic notations is to make things easy to manage.

All we need to do is, first analyse the algorithm to find out an expression to define it's time
requirements and then analyse how that expression will grow as the input(n) will grow.

Three types of asymptotic notations are used to represent the growth of any algorithm, as input
increases:

1. Big-Theta (Θ) : denotes "the same as" iterations (represents the average case).
2. Big-Oh (O) : denotes "fewer than or the same as" iterations (represents the worst case)
3. Big-Omega (Ω) : denotes "more than or the same as" iterations (represents the best case)
If the time complexity is represented by the Big-Θ notation, it is like the average value or range
within which the actual time of execution of the algorithm will be.

For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big Theta (Θ) notation to represent this, then the time complexity would
be Θ(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = Θ(n2)
Here in the above example, if we say f(n) = 20n2 + 8n and g(n) = (n2), it means that there are
positive constants c1, c2, and k, such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ k.

The values of c1, c2, and k must be fixed for the function f and must not depend on n.

Unit 1 - Week 1 - Lesson 2/Asymptotic Notations/Understanding Big-Oh (O) Notation


This notation is known as the upper bound of the algorithm, or a worst case of an algorithm.

It tells us that a certain function will never exceed a specified time for any value of input n.

Why do we need this representation when we already have the Big-Θ notation, which represents
the tightly bound running time for any algorithm.

Let us take a small example to understand this.

Consider a linear search algorithm, in which we traverse an array elements from starting
element to the last element, one by one to search a given number.

In worst case, we need to search till the end of the array to conform that the element is not found
in the array.

But it can also happen, that the element that we are searching for is the first element of the array,
in which case the time complexity will be 1.
Now in this case, saying that the Big-Θ or tight bound time complexity for linear search is T(n),
will mean that the time required will always be related to n, as this is the right way to represent
the average time complexity.

But when we use the Big-O notation, we mean to say that the time complexity is O(n), which
means that the time complexity will never exceed n, defining the upper bound, hence saying that
it can be less than or equal to n, which is the correct representation.

This is the reason, most of the time you will see Big-O notation being used to represent the time
complexity of any algorithm, because it considers the worst case of the running algorithm.

For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big-Oh (O) notation to represent this, then the time complexity would
be O(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = O(n2)
Here, in the example above, if we say f(n) = 20n2 + 8n and g(n) = n2 it means there are positive
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n = k.

The values of c and k must be fixed for the function f and must not depend on n.

Big-Omega (Ω) notation is used to define the lower bound of an algorithm or we can say it as
the best case of that algorithm.

This always indicates the minimum time required to an algorithm for all the input values, which
is the best case of that algorithm.

For example, if for some algorithm the time complexity is represented by the expression 20n2 +
8n, and we use the Big-Omega (Ω) notation to represent this, then the time complexity would
be Ω(n2), ignoring the constant coefficient and removing the insignificant part, which is 8n.
i.e., 20n2 + 8n = Ω(n2)
Here, in the example above, if we say f(n) = 20n2 + 8n and g(n) = (n2) it means there are positive
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k.

The values of c and k must be fixed for the function f and must not depend on n.

Let us raise a question "So why should we bother about time complexity?"

Suppose time taken by one operation = 1 Micro second and n = 100 then the time complexities
are:
Complexity Time Taken by Algorithm

O(1) 1 Micro second

O(n) 100 Micro seconds


Complexity Time Taken by Algorithm

O(n2) 0.01 Seconds

O(n4) 100 Seconds

O(2n) 4 * 10 6 Years

As you can see the timing complexity drastically increases the amount of time taken by the
algorithm to execute.

We should focus on reducing the timing complexity as much as possible while designing and
developing algorithm.

The typical growth rates of different time complexities are:


O(1) < O(log2n) < O(log2n) < O(n2/3) < O(n) < O(nlog2n) < O(n2) < O(n3) < O(2n)
Let us see the below code to find the square of a given number and try to calculate the time
complexity.

int calculate_square(int n) {
int square = 0;
square = n * n;
return square;
}
The time complexity can be calculated as:

No. of Times Total


Code No. of Elementary Operations
Executed Operations

int square = 1 (One Assignment operation) 1 1


0;

2 (One Multiplication, One


square = n * 1 2
n; Assignment)

return 1 (One return operation) 1 1


square;

Total Number of Operations 4

Timing Complexity of the Algorithm T(n) = 4 [Assuming each operation


No. of Times Total
Code No. of Elementary Operations
Executed Operations

completes in unit time]

Here the time complexity is constant.

The running time of the statement will not change in relation to n. So, the time complexity
is O(1) (read as “Big O One”).

Let us see the below code to find the factorial of a given number and try to calculate the time
complexity.

int calculate_factorial(int n) {
int i, fact = 1;
for (i = 1; i <= n; i++) {
fact = fact * i;
}
return fact;
}
The time complexity can be calculated as:

No. of Elementary Total


Code No. of Times Executed
Operations Operations

1 (One Assignment
int i, fact = 1; 1 1*1=1
Operation)

1 (One Assignment
for (i = 1; i <= n; 1 1*1=1
i++) Operation)

1 (One Comparision n + 1 (n times it evaluates to 1 * (n + 1)


for (i = 1; i <= n;
i++) Operation) true, 1 time it evaluates to false) =n+1

1 (One Increment n times (i is incremented


for (i = 1; i <= n; 1*n=n
i++) Operation) by n times)
No. of Elementary Total
Code No. of Times Executed
Operations Operations

for (i = 1 ;i <= n;
i++) { 2 (One multiplication,
fact = n (loop executed n times) 2 * n = 2n
One Assignment)
fact * i;
}

return fact; 1 (One return operation) 1 1*1=1

Total No. of Operations 4*n+4

Timing Complexity of the Algorithm T(n) = 4 * n + 4 [Assuming each operation completes in


unit time]

The total number of elementary operation executed in the above algorithm is 4 * n + 4.

As the value of n increases the time taken will also increase linearly. Thus this algorithm has
a linear time complexity.

Removing the insignificant part and the coefficients, the timing complexity is T(n) = O(n).

The running time of the single for loop is directly proportional to n. The time complexity
is O(n) (read as “Big O of n”).

Let us consider the below code which consists of a nested for loop.

int sum(int n) {
int i, j, total = 0;
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
total++;
}
}
return total;
}
The time complexity can be calculated as:
No. of Elementary Total
Code No. of Times Executed
Operations Operations

1 (One
int i, j, total = 0; Assignment 1 1*1=1
Operation)

1 (One
for (i = 0; i < n; i++) Assignment 1 1*1=1
Operation)

1 (One
n+1 times (n times it evaluates to 1 * (n + 1)
for (i = 0; i < n;i++) Comparison
true, 1 time it evaluates to false) =n+1
Operation)

1 (One Increment
for (i = 0; i < n; i++) n times (i increments from 0 to n) 1*n=n
Operation)

1 (One
n times (As this statement is inside a
for (j = 0; j < n; j++) Assignment 1*n=n
loop which runs n times)
Operation)

1 (One n * (n + 1) times (n + 1 * n * (n
for (j = 0; j < n; j++) Comparision 1 comparisions execute for n times + 1) = n2 +
Operation) as it is in the inner loop) n

n * n times (n increments
1 (One Increment 1*n*n=
for (j = 0; j < n; +) done n times as it is in the inner
operation) n2
loop )

for (i = 0; i < n; i++) 1 (One Increment n * n times (It is a statement inside 1*n*n=
{ operation) two loops which run n times each) n2
for (j = 0; j
< n; j++) {
No. of Elementary Total
Code No. of Times Executed
Operations Operations

total++;
}
}

1 (One return
return total; 1 time 1*1=1
operation)

3n2 + 4n +
Total Number of Operations
4

Timing Complexity of the Algorithm T(n) = 3n2 + 4n + 4 [Assuming each operation


completes in unit time]

The time complexity for the above code will be Quadratic.

The running time of the two loops is proportional to the square of n. When n doubles, the
running time increases by n * n. So, the time complexity is O(n2).

Consider searching a word from the dictionary. We know very well that all the words in the
dictionary are alphabetically sorted.

Given this scenario, we can do a more efficient search than the linear search, called binary
search.

In this search, we will divide the complete dictionary into two halves and then select the half that
has the possibility of containing the word.

The above step can be repeated until we find the word in the dictionary.

The pseudo code for binary search is as follows:

while (low <= high) {


mid = (low + high) / 2;
if (target < list[mid])
high = mid - 1;
else if (target > list[mid])
low = mid + 1;
else break;
}
The algorithm can be easily understood using the below image:

This algorithm will have a logarithmic time complexity which is O(log n).

You might also like