Unit1 Intro BDS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Introduction to Data

Structures
Introduction
• The Data Structure is the way of storing and organizing data in a
computer so that it can be used efficiently.
• Some common examples of data structures are arrays, linked lists,
queues, stacks, binary trees, graphs, and hash tables.
• Data structures are widely applied in areas like:
✓ Compiler design ✓ English Dictionary
✓ Operating system ✓ City maps
✓ Statistical analysis package ✓ Banking
✓ Library
✓ DBMS
✓ Stores
✓ Numerical analysis
✓ Simulation
✓ Artificial Intelligence
✓ Graphics
Classification of Data Structures
• Primitive Data Structures are the fundamental data types which
are supported by a programming language.

• Some basic data types are integer, real, and boolean. The terms
‘data type’, ‘basic data type’, and ‘primitive data type’ are often
used interchangeably.

• Non-primitive Data Structures are those data structures which are


created using primitive data structures. Examples of such data
structures include linked lists, Queues, stacks, trees, and graphs.

• Non-primitive data structures can further be classified into two


categories: linear and non-linear data structures.
Classification of Data Structures
A primitive data structure is generally a basic structure that is usually
built into the language, such as an integer, a float.

A non-primitive data structure is built out of primitive data structures


linked together in meaningful ways, such as arrays or a linked-list,
Stacks, Queues, Tree, Graph, etc.

• If the elements of a data structure are stored in a linear or sequential


order, then it is a linear data structure. Examples are arrays, linked
lists, stacks, and queues.
• If the elements of a data structure are not stored in a sequential
order, then it is a non-linear data structure. Examples are trees and
graphs.
Classification of Data Structures
Data Structures (DS)

Primitive DS Non-Primitive DS

Integer Linear DS Non Linear DS

Float Arrays Trees


Link List
Character
Graphs
Stacks
Pointer Queues
Arrays
• An array is a collection of similar data elements.

• The elements of an array are stored in consecutive memory locations


and are referenced by an index (also known as the subscript).

• Arrays are declared using the following syntax:

datatype name[size];

For example: int marks[10];

1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th
element element element element element element element element element element

marks[0] marks[1] marks[2] marks[3] marks[4] marks[5] marks[6] marks[7] marks[8] marks[9 ]
Linked Lists
• A linked list is a very flexible dynamic data structure in which
elements can be added to or deleted from anywhere.

• In a linked list, each element (called a node) is allocated space as it is


added to the list.

• Every node in the list points to the next node in the list. Therefore, in
a linked list every node contains two types of information:

✓ The data stored in the node

✓ A pointer or link to the next node in the list


1 2 3 4 5 6 7 X
Stack
• A stack is a last-in, first-out (LIFO) data structure in which insertion
and deletion of elements are done only at one end, known as TOP of
the stack.

• Every stack has a variable TOP associated with it, which is used to
store the address of the topmost element of the stack.

• If TOP = NULL, then it indicates that the stack is empty.

• If TOP = MAX-1, then the stack is full.

A AB ABC ABCD ABCDE

0 1 2 3 TOP = 4 5 6 7 8 9
PUSH POP

[STACK]

Cock Box shows the structure of Queue


CD Box shows the structure of Stacks
Queue
• A queue is a FIFO (first-in, first-out) data structure in which the
element that is inserted first is the first one to be taken out.

• The elements in a queue are added at one end called the REAR
and removed from the other one end called FRONT.

• When REAR = MAX – 1, then the queue is full.

• If FRONT = NULL and Rear = NULL, this means there is no


element in the queue.
9 7 18 14 36 45

0 FRONT = 1 2 3 4 5 REAR = 6 7 8 9
Tree
• A tree is a non-linear data structure which consists of a collection
of nodes arranged in a hierarchical order.

• One of the nodes is designated as the root node, and the remaining
nodes can be partitioned into disjoint sets such that each set is the
sub-tree of the root.
ROOT NODE
• A binary tree is the simplest form of tree which 1
T2
T1
2 3

consists of a root node and left and right sub-trees.


4
5 6 7

• The root element is pointed by ‘root’ pointer. 12


8 9 10 1
1

• If root = NULL, then it means the tree is empty.


Graph
• A graph is a non-linear data structure which is a collection of vertices
(also called nodes) and edges that connect these vertices.

• A graph is often viewed as a generalization of the tree structure,


where instead of a having a purely parent-to-child relationship
between nodes, any kind of complex relationship can exist.

• Every node in the graph can be connected with any other node.

• When two nodes are connected via an edge, the two nodes are known
D B C A

as neighbors.
B C D

A E E F G

H I
Operations on Data Structures
Traversing: It means to access each data item exactly once so that it can be processed.
For example, to print the names of all the students in a class.
Searching: It is used to find the location of one or more data items that satisfy the given
constraint. Such a data item may or may not be present in the given collection of data
items.
For example, to find the names of all the students who secured 100 marks in
mathematics.
Merging Lists of two sorted data items can be combined to form a single list of sorted data
items.
Inserting: It is used to add new data items to the given list of data items.
For example, to add the details of a new student who has recently joined the course.
Deleting: It means to remove (delete) a particular data item from the given collection of
data items. For example, to delete the name of a student who has left the course.
Sorting Data: items can be arranged in some order like ascending order or descending order
depending on the type of application.
For example, arranging the names of students in a class in an alphabetical order, or calculating
the top three winners by arranging the participants’ scores in descending order and then
extracting the top three.
Abstract Data Type
• Data type of a variable is the set of values that the variable can take.
The basic data types in C include int, char, float, and double.

• An Abstract Data Type (ADT) is the way at which we look at a data


structure, focusing on what it does and ignoring how it does its job.

• For example, stacks and queues are perfect examples of an abstract


data type. We can implement both these ADTs using an array or a
linked list. This demonstrates the "abstract" nature of stacks and
queues.
Abstract Data Type
• The end user is not concerned about the details of how the methods carry out their tasks.
They are only aware of the methods that are available to them and are concerned only about
calling those methods and getting back the results but not HOW they work.
Abstract data types offer several advantages:
•ADT is reusable, robust, and is based on principles of Object Oriented Programming (OOP)
and Software Engineering (SE)
•An ADT can be re-used at several places and it reduces coding efforts.
•Representation Independence: Most of the program becomes independent of the
abstract data type's representation, so that representation can be improved without
breaking the entire program.
•Modularity: With representation independence, the different parts of a program
become less dependent on other parts and on how those other parts are implemented.
•Interchangeability of Parts: Different implementations of an abstract data type may
have different performance characteristics. With abstract data types, it becomes easier
for each part of a program to use an implementation of its data types that will be more
efficient for that particular part of the program.
Algorithm
• An “algorithm" is a formally defined procedure for performing some
calculation. It provides a blueprint to write a program to solve a
particular problem.

• A set of rules that precisely defines a sequence of operations

• It is considered to be an effective procedure for solving a problem in


finite number of steps. That is, a well-defined algorithm always
provides an answer and is guaranteed to terminate.

• Algorithms are mainly used to achieve software re-use. Once we


have an idea or a blueprint of a solution, we can implement it in any
high level language like C, C++, Java, so on and so forth.
Algorithm
Different Approaches to Designing an Algorithm:
There are two main approaches to design an algorithm—top-down approach and bottom-up approach

In the top-down approach, we start from an abstract design and then at each
step, this design is refined into more concrete levels until a level is reached that
requires no further refinement.
In the bottom-up design, we start with designing the most basic or concrete
modules and then proceed towards designing higher level modules.
Algorithm
CONTROL STRUCTURES USED IN ALGORITHMS:
An algorithm may employ one of the following control structures: (a) sequence, (b) decision,
and (c) repetition.
Sequence: By sequence, we mean that each step of an algorithm is executed in a specified
order. Let us write an algorithm to add two numbers.

Decision: Decision statements are used when the execution of a process depends on the outcome of some
condition.

Repetition: Which involves executing one or more steps for a number of times, can be implemented using
constructs such as while, do–while, and for loops. These loops execute one or more steps until some
condition is true.
Algorithm
Write an algorithm to find whether a number is even or odd
Step 1: Input the first number as A
Step 2: IF A%2 =0
Then Print "EVEN"
ELSE
Print "ODD"
Step 3: END
Algorithm
Write an algorithm to find the larger of two Write an algorithm to find the sum of
numbers. first N natural numbers.

Step 1: Input N Step 2: SET I = 1, SUM = 0


Step 3: Repeat Step 4 while I <= N
Step 4: SET SUM = SUM + I
SET I = I + 1
[END OF LOOP]
Step 5: PRINT SUM
Step 6: END
Algorithm
Write an algorithm to print the grade obtained by a student using the following rules.
Time and Space Complexity of
Algorithm
• To analyze an algorithm means determining the amount of
resources (such as time and storage) needed to execute it.

• Efficiency or complexity of an algorithm is stated in terms of time


complexity and space complexity.

• Time complexity of an algorithm is basically the running time of


the program as a function of the input size.

• Similarly, space complexity of an algorithm is the amount of


computer memory required during the program execution, as a
function of the input size.
Time and Space Complexity of
Algorithm
Worst-case, Average-case, Best-case Time Complexity
Worst-case running time This denotes the behavior of an algorithm
with respect to the worst possible case of the input instance.
The worst-case running time of an algorithm is an upper bound on the
running time for any input. Therefore, having the knowledge of worst-
case running time gives us an assurance that the algorithm will never
go beyond this time limit.

Average-case running time The average-case running time of an


algorithm is an estimate of the running time for an ‘average’ input. It
specifies the expected behavior of the algorithm when the input is
randomly drawn from a given distribution. Average-case running time
assumes that all inputs of a given size are equally likely.
Time and Space Complexity of
Algorithm
Best-case running time The term ‘best-case performance’ is used to
analyze an algorithm under optimal conditions.
For example, the best case for a simple linear search on an array occurs
when the desired element is the first in the list. However, while
developing and choosing an algorithm to solve a problem, we hardly
base our decision on the best-case performance.
It is always recommended to improve the average performance and the
worst-case performance of an algorithm
Time and Space Complexity of
Algorithm
• Time complexity of an algorithm depends on the number of
instructions executed. This number is primarily dependent on the
size of the program's input and the algorithm used.

• The space needed by a program depends on:

✓ Fixed part includes space needed for storing instructions,


constants, variables, and structured variables.

✓ Variable part includes space needed for recursion stack, and for
structured variables that are allocated space dynamically during
the run-time of the program.
Time and Space Complexity of
Algorithm
Time–Space Trade-off:

The best algorithm to solve a particular problem at hand is no doubt the one that
requires less memory space and takes less time to complete its execution. But practically,
designing such an ideal algorithm is not a trivial task. There can be more than
one algorithm to solve a particular problem. One may require less memory
space, while the other may require less CPU time to execute. Thus, it is not
uncommon to sacrifice one thing for the other. Hence, there exists a time–
space trade-off among algorithms.

So, if space is a big constraint, then one might choose a program that takes less
space at the cost of more CPU time. On the contrary, if time is a major
constraint, then one might choose a program that takes minimum time to
execute at the cost of more space.
Calculating Algorithm Efficiency
• The efficiency of an algorithm is expressed in terms of the number of
elements that has to be processed. So, if n is the number of elements, then
the efficiency can be stated as Efficiency = f(n).

• If a function is linear (without any loops or recursions), the efficiency of


that algorithm or the running time of that algorithm can be given as the
number of instructions it contains.

• If an algorithm contains certain loops or recursive functions then its


efficiency may vary depending on the number of loops and the running
time of each loop in the algorithm.
Calculating Algorithm Efficiency
Linear loops: To calculate the efficiency of an algorithm that has a single loop,
we need to first determine the number of times the statements in the loop will be
executed. This is because the number of iterations is directly proportional to the loop
factor. Greater the loop factor, more is the number of iterations.
for(i=0;i<100;i++)
statement block;
Here, 100 is the loop factor. We have already said that efficiency is directly
proportional to the number of iterations. Hence, the general formula in the case of
linear loops may be given as
f(n) = (n)

for(i=0;i<100;i+=2)
statement block;
Here, the number of iterations is half the number of the loop factor. So, here the
efficiency can be given as
f(n) = n/2
Calculating Algorithm Efficiency
Logarithmic Loops: The loop-controlling variable is either multiplied or divided
during each iteration of the loop. For example, look at the loops given below:
for(i=1;i<1000;i*=2) for(i=1000;i>=1;i/=2)
Statement block; Statement block;
Consider the first for loop in which the loop-controlling variable i is multiplied
by 2. The loop will be executed only 10 times and not 1000 times because in each
iteration the value of i doubles.
Now, consider the second loop in which the loop-controlling variable i is divided
by 2. In this case also, the loop will be executed 10 times. Thus, the number of
iterations is a function of the number by which the loop-controlling variable is
divided or multiplied. In the examples discussed, it is 2. That is, when n = 1000, the
number of iterations can be given by log 1000 which is approximately equal to 10.
Therefore, putting this analysis in general terms, we can conclude that the efficiency
of loops in which iterations divide or multiply the loop-controlling variables can be
given as f(n) = log n
Nested Loops :Loops that contain loops are known as nested loops. In order to analyse nested loops, we
need to determine the number of iterations each loop completes.
The total is then obtained as the product of the number of iterations in the inner loop and the number of
iterations in the outer loop. In this case, we analyse the efficiency of the algorithm based on whether it is
a linear logarithmic, quadratic, or dependent quadratic nested loop.

Linear logarithmic loop


The number of iterations in the inner loop is log 10. This inner loop is controlled by an outer loop which
iterates 10 times. Therefore, the number of iterations for this code can be given as 10 log 10.
The generalized formula for Linear logarithmic loop can be given as f(n) = n log n.

Quadratic loop In a quadratic loop, the number of iterations in the inner loop is equal to the number
of iterations in the outer loop. Consider the following code in which the outer loop executes 10 times
and for each iteration of the outer loop, the inner loop also executes 10 times. Therefore, the efficiency
here is 100.
The generalized formula for quadratic loop can be given as f(n) = n^2.

Dependent quadratic loop In a dependent quadratic loop, the number of iterations in the inner loop
is dependent on the outer loop. the inner loop will execute just once in the first iteration, twice in the
second iteration, thrice in the third iteration, so on and so forth. In this way, the number of iterations can
be calculated as
1 + 2 + 3 + ... + 9 + 10 = 55
The efficiency of such a code can be given as f(n) = n (n + 1)/2
Big O Notation
• Big-O notation, where the "O" stands for "order of", is concerned
with what happens for very large values of n.

• For example, if a sorting algorithm performs n2 operations to sort


just n elements, then that algorithm would be described as an O(n2)
algorithm.

• When expressing complexity using Big O notation, constant


multipliers are ignored. So a O(4n) algorithm is equivalent to O(n),
which is how it should be written.
Big O Notation
• If f(n) and g(n) are functions defined on positive integer number n, then

f(n) = O(g(n)) ie. f of n is big O of g of n if and only if there


exists positive constants c and n, such that

0 ≤ f (n) ≤ cg(n) c > 0, ∀ n ≥ n0


• This means that for large amounts of data, f(n) will grow no more than a
constant factor than g(n). Hence, g provides an upper bound.
Big O Notation
→ Show that 4n^2 = O(n^3)
0 ≤ f (n) ≤ cg(n) c > 0, ∀ n ≥ n0

0 ≤ 1 ≤ n0 This means n0=1. Therefore, 0 ≤ 4n^2 ≤ 4n^3, ∀ n ≥ n0


Limitations of Big O Notation:
•Many algorithms are simply too hard to analyze mathematically.
•There may not be sufficient information to calculate the behavior of the algorithm in the average case.
•Big O analysis only tells us how the algorithm grows with the size of the problem, not how efficient it is,
as it does not consider the programming effort.
• It ignores important constants. For example, if one algorithm takes O(n^2) time to execute and the other
takes O(100000n^2) time to execute, then as per Big O, both algorithm have equal time complexity. In
real-time systems, this may be a serious consideration.
Categories of Algorithms
• Constant time algorithms have running time complexity given as O(1)
• Linear time algorithms have running time complexity given as O(n)
• Logarithmic time algorithms have running time complexity given as O(log n)
• Polynomial time algorithms have running time complexity given as O(n^k)
where k>1
• Exponential time algorithms have running time complexity given as O(2n)

n O(1) O(log n) O(n) O(n log n) O(n2) O(n3)


1 1 1 1 1 1 1
2 1 1 2 2 4 8
4 1 2 4 8 16 64
8 1 3 8 24 64 512
16 1 4 16 64 256 4,096
Omega Notation
• Omega notation provides a lower bound for f(n). This means that the
function can never do better than the specified value but it may do
worse.
• If f(n) and g(n) are functions defined on positive integer number n, then
f(n) = Ω (g(n)) ie. f of n is Omega of g of n if and only if there
exists positive constants c and n, such that
0 ≤ cg(n) ≤ f (n) c > 0, ∀ n ≥ n0

• Examples of functions in Ω(n2) include: n2, n2.9, n2 + n, 540n2 + 10

• Examples of functions not in Ω(n3) include: n, n2.9, n2


Theta Notation
• Theta notation provides an asymptotically tight bound for f(n).
• If f(n) and g(n) are functions defined on positive integer number n, then
f(n) = Θ (g(n)) ie. f of n is Theta of g of n if and only if there
exists positive constants c and n, such that
c1 g(n) ≤ f(n) ≤ c2 g(n) c > 0, ∀ n ≥ n0
• Hence, we can say that Θ(g(n)) comprises a set of all the functions
f(n) that are between c1 g(n) and c2 g(n) for all values of n ≥ n0.
• To summarize,
✓ The best case in Θ notation is not used.
✓ Worst case Θ describes asymptotic bounds for worst case
combination of input values.
✓ If we simply write Θ, it means same as worst case Θ.
Recursion
• A recursive function is a function that calls itself to solve a
smaller version of its task until a final call is made which does not
require a call to itself.
• Every recursive solution has two major cases:
• the base case in which the problem is simple enough to be
solved directly without making any further calls to the same
function.
• Recursive case, in which first the problem at hand is divided
into simpler subparts. Second the function calls itself but with
subparts of the problem obtained in the first step. Third, the
result is obtained by combining the solutions of simpler
subparts.
Types of Recursion
• Any recursive function can be characterized based on:

▪ whether the function calls itself directly or indirectly (direct or


indirect recursion).

▪ whether any operation is pending at each recursive call (tail-


recursive or not).

▪ the structure of the calling pattern (linear or tree-recursive).


Recursion

Direct Indirect Linear Tree Tail


Direct Recursion
• A function is said to be directly recursive if it explicitly calls itself.
• For example, consider the function given below.
int Func( int n)
{
if(n==0)
retrun n;
return (Func(n-1));
}
• Write a program to calculate the factorial of a given number.
#include <stdio.h> int Fact(int n)
int Fact(int); // FUNCTION DECLARATION { if(n==1)
int main() return 1;
{ int num, val; else
printf("\n Enter the number: "); return (n * Fact(n–1));
scanf("%d", &num); }
val = Fact(num);
Output
printf("\n Factorial of %d = %d", num, val);
Enter the number : 5
return 0;
Factorial of 5 = 120
}
Indirect Recursion
• A function is said to be indirectly recursive if it contains a call to
another function which ultimately calls it.
• Look at the functions given below. These two functions are
indirectly recursive as they both call each other.

int Func1(int n) int Func2(int x)


{ {
if(n==0) return Func1(x-1);
return n; }
return Func2(n);
}
Fibonacci Series
• The Fibonacci series can be given as:
0 1 1 2 3 5 8 13 21 34 55……
• That is, the third term of the series is the sum of the first and second
terms. Similarly, fourth term is the sum of second and third terms,
so on and so forth.
• A recursive solution to find the nth term of the Fibonacci series can
be given as:
FIB(n) = 1, if n<=2
FIB (n - 1) + FIB (n - 2), otherwise
FIB(7)

FIB(6) FIB(5)

FIB(5) FIB(4) FIB(4) FIB(3)

FIB(4) FIB(3) FIB(3) FIB(2) FIB(3) FIB(2) FIB(2) FIB(1)

FIB(3) FIB(2) FIB(2) FIB(1) FIB(2) FIB(1) FIB(2) FIB(1)

FIB(2) FIB(1)
•Write a program to print the Fibonacci series using recursion.
#include <stdio.h>
int Fibonacci(int);
int main()
{ int n, i = 0, res;
printf("Enter the number of terms\n");
scanf("%d",&n);
printf("Fibonacci series\n");
for(i = 0; i < n; i++ )
{ res = Fibonacci(i);
printf("%d\t",res);
}
return 0;
}
int Fibonacci(int n)
{ if ( n == 0 )
return 0;
else if ( n == 1 )
return 1; Output
else Enter the number of terms
return ( Fibonacci(n–1) + Fibonacci(n–2) ); 7
} Fibonacci series
0 1 1 2 3 5 8 13
Iteration Vs Recursion

You might also like