Data Structures & Algorithms
Data is a collection of facts from which conclusion may be drawn.
Data is the quantities, characters or symbols on which operations are performed by
a computer, which may be stored and transmitted in the form of electrical signals
and recorded on magnetic, optical or mechanical recording media.
Application/Programs read data, store data temporarily, process it and finally
output results.
Types of Data: Textual, Numeric, Audio, Video
Data Type:
Defines a certain domain of values.
Defines Operations allowed on these values.
Compiler signals an error if wrong operation is performed on data of a certain
type.
User defined data types:- Structure, union and enumeration.
Abstraction is anything that hides details and provides only the essentials.
Abstract Data Types(ADTs) are like UDDTs which defines operations on values using
functions without specifying what is there inside the function and how the
operations are preformed.
There are multiple ways to implement an ADT.
An ADTs designer has to deal with two types of questions:
1. What values are in the domain? What operations can be performed on the values of
a particular data type?
ADTs specification answers the 'What' questions(it's is written first).
2. How is the data type represented? How are the operations implemented?
ADTs implementation answers the 'How' questions(done after specification).
The program which uses data structure is called a client program.
Users of an ADT need only know the specification(interface)(no implementation
details).
The program which implements the data structure is known as the implementation.
Programmer(Implementer) who implements ADT is concerned with specification,
representation, implementation.
Advantage
Let say, if someone wants to use the stack in the program, then he can simply use
push and pop operations without knowing its implementation.
Also, if in future, the implementation of stack is changed from array to linked
list, then the client program will work in the same way without being affected.
Data Structure is a systematic way of management, storing and organizing data in a
computer so that it can be modified and accessed efficiently.
It is the logical or mathematical model of a particular organization of data.
A data structure is used to implement an ADT.
ADT tells us what is to be done and data structures tells us how to do it.
It is a group of data elements grouped together under one name.
Each data structure allowed data to be stored in a specific manner.
Specific data structure allows to work on a specific problems.
It allows efficient data:
Navigating->accessing each data element exactly once, so that certain items in the
data may be processed.
Searching->finding the location of the data element(key) in the structure.
Insertion->adding a new data element to the structure.
Deletion->removing a data element from the structure.
Sorting->arrange the data elements in a logical order(ascending/descending)
Merging->combining data elements from two or more data structures into one
Advantages
Efficiency:- proper choice of data structures make program efficient in terms of
space and time.
Reusability:- one implementation can be used by multiple client programs.
Abstraction:- DS is specified by an ADT which provides a level of abstraction. The
client program doesn't have to worry about the implementation details.
It allows to manage large amount of data such as large database and indexing
service.
Classification of Data Structures
Primitive Data Structures
They are basic structures and directly operated upon by the machine instructions.
In general, there are d/t representation on d/t computers.
Int, Floating-point number, Char constants, Str constants, Pointers etc.
Non-Primitive Data Structures
They are more sophisticated data structures.
They are derived from the primitive ones.
They emphasize on structuring of a group of homogenous(same type) or heterogeneous
(different type) data items.
Linear DS
A DS is linear when all the elements are arranged in a linear(sequential) order.
When each element of the data structure has only one predecessor and one successor.
Array, Linked List, Stack, Queue
Non-Linear DS
A DS is non linear when all the elements are not arranged in a linear(sequential)
order.
When each element of the data structure has not only one predecessor and one
successor.
Graph and Trees
Static DS
In these type of DSs, the memory is allocated at compile time.
Therefore, maximum size is fixed.
Fast access, but Slower insertion and deletion.
eg Array
Dynamic DS
In these type of DSs, the memory is allocated at run time.
Therefore, maximum size is flexible.
Fast insertion and deletion, but Slower access.
eg. Linked List
An Algorithm is a finite set of instructions which accomplish a particular task.
It is a method or process to solve a problem.
It transforms input of a problem to output.
Algorithm = input + process + output
Good algorithm:
It must be correct.
It must be finite(time and size).
It must terminate.
It must be unambiguous(must be clear that which step comes next).
It must be space and time efficient.
A program is an instance of an algorithm, written in some specific programming
language.
Basic Algorithm Development
Clearly Identify:
What output is required?
What is the input?
What steps are required to transform input into output?
The most crucial bit
Needs problem solving skills
A problem can be solved in many d/t ways
Which solution, amongst the d/t possible solutions is optimal?
How to express an algorithm?
Algorithm is a sequence of steps to solve a problem.
1. Natural Language(NL) is an obvious choice, but not a good choice, because it's
notoriously ambiguous(unclear).
2. Programming Language(PL) is another choice, but again not a good choice, because
the algorithm should be PL independent.
We need some balance:
PL independence
clarity
Pseudo-code provides the right balance.
Pseudo-code is a short hand way of describing a computer program.
It is a mixture of NL and PL expressions, in a systematic way.
It is easier for a non-programmer to understand the general workings of the
program.
It is an artificial and informal language that helps programmers develop
algorithms.
Refine the algorithm successively to get step by step detailed algorithm that is
very close to a computer language.
The Flowchart is a graphical representation of the sequence of operations in an
information system or program.
Information system flowcharts show how data flows from source documents through the
computer to final distribution to users.
Program flowcharts show the sequence of instructions in a single program or
subroutine. D/t symbols are used to draw each type of flowchart.
Flowchart Symbols:
Terminal Symbol ->
Process Symbol ->
Input-output Symbol ->
Disk Storage I/O Symbol ->
Printer Output Symbol ->
Selection Symbol ->
Off-page connector ->
On-page connector ->
Flow lines ->
Basic Analysis on Algorithm
Considerations in Algorithm Analysis:
It is a study to provide theoretical estimation for the required resources of an
algorithm to solve a specific computational problem.
Analysis of algorithms focuses on computation of space and time complexity.
Generally,
The efficiency an algorithm is related to the input length, known as time
complexity.
The efficiency an algorithm is related to the volume of memory, known as space
complexity.
Time Complexity is the computer time an algorithm might require for its execution
which usually depends on the size of the algorithm and input.
Space complexity is defined in terms of space required to store the instructions
and data. It depends upon input size and nature of input.
Types of Time Complexities:
a. Best Case Time Complexity: Big()
It is a measure of the minimum time that the algorithm will require for an input of
size 'n'.
b. Worst Case Time Complexity: Big(O)
It is a measure of the maximum time that the algorithm will require for an input of
size 'n'.
c. Average Case Time Complexity: Big(Θ)
The time that an algorithm will require to execute a typical input data of size 'n'
is known as average case time complexity.
Basics of Asymptotic Analysis
Efficiency of data structures is always measured in terms of TIME and SPACE.
An ideal data structure could be the one that takes the least possible time for all
its operations and consumes the least memory space.
Time Complexity or Running Time
Examining the exact running time is not the best solution to calculate the time
complexity.
The running time generally depends on the size of the input.
Generally time grows with size of input, so running time of an algorithm is usually
measured as functions of input size.
Running time is measured in terms of number of steps/primitive operations
performed.(Steps or elementary operations like: +,-,*,<,=,A[i], etc.
Therefore, if the size of the input is n, then f(n) is a function of n denotes the
time complexity.
In other words, f(n) represents the number of instructions executed for the input
value n.
We can compare two data structures for a particular operation by comparing their
f(n) values.
We are interested in growth rate of f(n) with respect to n because it might be
possible that for smaller input size, one data structure may seem better than the
other but for larger input size it may not.
For this purpose, we plug in d/t values of n into the function and plot a graph to
see how it grows.
This concept is applicable in comparing the two algorithms as well.
This approximate measure of time complexity is called Asymptotic Complexity.
Big O Notation
Big O notation is used to measure the perforamnce of any algorithm by providing the
order of growth of the function.
It gives the least upper bound on a function by which we can make sure that the
function will never grow faster than this upper bound.
It gives the approximate time of complexity.
That least upper bound tells how worst an algorithm can perform.
If f(n) and g(n) are the two functions, then
f(n) = O(g(n))
If there exists constants c and no such that for all n >= no
f(n) <= c.g(n)
for all sufficiently large n.
This simply means that f(n) doesn not grow faster than g(n).
The function c.g(n) always dominates f(n) to the right of no.
Big O notation helps us in finding the growth rate of the function without plugging
in d/t values of n.
Big O notation is simplifying the task. It eliminates all the unnecessary terms
from the function which are not contributing much in the overall running time.
Ex:
f(n) = 5n^2 + 4 g(n) = n^2
Is f(n) = O(g(n))? Let's take c = 6 f(n) <= c.g(n)
f(n) <= c.g(n) 5n^2 + 4 <= 6.n^2 for all n >= 2
n^2 >= 4 Where c = 6 and no = 2
n >= 2
f(n) = O(n^2)
Growth Rate of Standard Functions
g(n)
n log2n n nlog2n n^2 n^3 2^n
1 0 1 0 1 1 2
2 1 2 2 4 8 4
4 2 4 8 16 64 16
8 3 8 24 64 512 256
16 4 16 64 256 4096 65536
32 5 32 160 1024 32768 429x10^7
Guidelines for Asymptotic Analysis
Loops Loop executes n times
for(i = 1; i<=n; i++)
//statement
Total time = O(n) Linear Growth
Nested Loops
for(i = 1; i<=n; i++)
for(j = 1; j<=n; j++) {
//statement
}
}
Total time = n x n = O(n^2)
Consecutive Statements
int x = 2;
int i; Total time = 3 units
x = x + 1;
for(i = 1; i<=n; i++) Loop executes n times
//statement
for(i = 1; i<=n; i++)
for(j = 1; j<=n; j++) { Total time = n^2 units
//statement
}
}
Total time = n^2 + n + 3 = O(n^2)
f(n) = n^2 + n + 3 Let g(n) = n^2
Is f(n) = O(g(n))? f(n) <= c.g(n)
f(n) <= c.g(n) For all n >= no
Take c = 2 Where c = 2 and no = 3
n^2 + n + 3 <= 2n^2 f(n) = O(n^2) Quadratic Growth
n^2 - n >= 3
n(n - 1) >= 3
n >= 3 or n >= 4
If then Else Statement
scanf("%d", &n);
if (n == 0) { Worst case running time = Test + if part or
//statement else part (whichever is larger)
}
else {
for(i=1; i<=n; i++)
//statement
}
Logarithmic Complexity
Logarithmic time complexity is achieved when the problem size is cut down by a
fraction.
for(i=1; i<=n; )
{
//statement
i = i*2;
}
i = n = 2^(k-1)
for(i=n; i>=1; )
{
//statement
i = i/2;
}
i = n/2^(k-1) = 1
n = 2^(k-1)
k-1 = log2n => k = log2n + 1 O(log2n)
O(1)
O(n^2 log2n)
Iter 1 i = n/2 + 0
Iter 2 i = n/2 + 1
Iter 3 i = n/2 + 2
Iter 4 i = n/2 + 3
.
.
.
Iter k i = n/2 + k-1 = n
n-n/2 = k - 1
n/2 + 1 = k
k = n/2
O(n)
Iter 1 j = 1 = 2^0
Iter 2 j = 2 = 2^1
Iter 3 j = 4 = 2^2
Iter 4 j = 8 = 2^3
.
.
.
Iter k j = 2^(k-1) = n
n = 2^(k-1)
log2n = k - 1
k = log2n + 1
O(log2n)
Iter 1 k = 1
Iter 2 k = 2
Iter 3 k = 3
Iter 4 k = 4
.
.
.
Iter p k = p = n
p = n
O(n)
O(1)
O(n)
Void pointer is a pointer which has no associated data type with it.
It can point to any data of any data type and can be typecasted to any type.
We cannot dereference a void pointer.
malloc and calloc are used to allocate memory at runtime
NULL pointer is a pointer that does not point to any memory location.
It represents an invalid memory location.
When a NULL value is assigned to a pointer, then the pointer is considered as NULL
pointer.
Dangling pointer is a pointer which points to some non-existing memory location.
Wild pointers(uninitialized pointers) usually point to some arbitrary memory
location and may cause a program to crash or misbehave.
Static memory is memory allocated during compile time.
It is fixed and cannot be increased or decreased during run time.
Dynamic memory is memory allocated at the time of execution.
Heap is the segment of memory where dynamic memory allocation takes place.
Unlike stack where memory is allocated or deallocated in a defined order, heap is
an area of memory where memory is allocated or deallocated without any order or
randomly.
Allocated memory can only be accessed through pointers.
<stdlib.h>
(void* )malloc(size_t size)
int *ptr = (int* )malloc(4)
int *ptr = (int* )malloc(n*sizeof(int));
void *calloc(size_t n, size_t size);
void *realloc(void *ptr, size_t newSize)
void free(ptr)
Structures and Functions
A structure is a user defined type that can be used to group elements of d/t types
into a single type.
Passing structure member as argument
Passing structure variable as argument
Passing pointers to structures as argument
Using (->) operator access the members
Returning a structure variable from the function
Returning a pointer to a structure from the function
struct point {
int x;
int y;
};
struct point* fun(int a, int b) {
struct point *ptr = (struct point *)malloc(sizeof(struct point));
ptr->x = a;
ptr->y = b + 5;
return ptr;
}
void print(struct point *ptr) {
printf("%d %d\n", ptr->x, ptr->y);
}
int main() {
struct point *ptr1, *ptr2;
ptr1 = fun(2, 3);
ptr2 = fun(6, 9);
print(ptr1); print(ptr2);
free(ptr1);
free(ptr2);
return 0;
}
Passing array of structures as argument
struct point arr[2] = {{1,'A'}, {2, 'B'}};
Self referential structures(Linked List)
struct self {
int p;
struct self *ptr;
};
struct code {
int i;
char c;
struct code *ptr;
};
int main() {
struct code var1;
struct code var2;
var1.i = 65;
var1.c = 'A';
var1.ptr = NULL;
var2.i = 66;
var2.c = 'B';
var2.ptr = NULL;
var1.ptr = &var2;
printf("%d %c", var1.ptr->i, var1.ptr->c
}