UNIT-1
INTRODUCTION TO DATA
STRUCTURES
Abhishek Chaudhary
INTRODUCTION
Data Structures are the programmatic way of storing
data so that data can be used efficiently.
Almost every enterprise application uses various types
of data structures in one or the other way.
In simple language, Data Structures are structures
programmed to store ordered data, so that various
operations can be performed on it easily.
It represents the knowledge of data to be organized in
memory.
It should be designed and implemented in such a way
that it reduces the complexity and increases the
efficiency.
CHARACTERISTICS OF A DATA
STRUCTURE
Correctness − Data structure
implementation should implement its
interface correctly.
Time Complexity − Running time or the
execution time of operations of data
structure must be as small as possible.
Space Complexity − Memory usage of a
data structure operation should be as little as
possible
NEED FOR DATA STRUCTURE
As applications are getting complex and data rich, there are
three common problems that applications face now-a-days.
Data Search − Consider an inventory of 1 million(106) items of a
store. If the application is to search an item, it has to search an item
in 1 million(106) items every time slowing down the search. As data
grows, search will become slower.
Processor speed − Processor speed although being very high, falls
limited if the data grows to billion records.
Multiple requests − As thousands of users can search data
simultaneously on a web server, even the fast server fails while
searching the data.
To solve the above-mentioned problems, data structures come
to rescue.
Data can be organized in a data structure in such a way that
the required data can be searched almost instantly.
WHAT IS AN ALGORITHM?
Algorithm is a step-by-step procedure, which
defines a set of instructions to be executed in
a certain order to get the desired output.
Algorithms are generally created
independent of underlying languages, i.e. an
algorithm can be implemented in more than
one programming language.
WHAT IS THE DIFFERENCE BETWEEN ALGORITHM & PROGRAM?
Algorithm Program
Step by Step Procedure of given task Implementation of the given task
(Design) (Implement)
Person writing the algorithm should Person writing the program should be
have domain knowledge a programmer
It can be written in any language It can be written a programming
language
It is hardware or software It is hardware or software dependent
independent
Algorithms are analyzed Programs are tested
CHARACTERISTICS OF AN ALGORITHM
Not all procedures can be called an algorithm. An
algorithm should have the following characteristics −
Unambiguous − Algorithm should be clear and
unambiguous. Each of its steps (or phases), and their
inputs/outputs should be clear and must lead to only one
meaning.
Input − An algorithm should have 0 or more well-defined
inputs.
Output − An algorithm should have 1 or more well-defined
outputs, and should match the desired output.
Finiteness − Algorithms must terminate after a finite
number of steps.
Feasibility − Should be feasible with the available resources.
Independent − An algorithm should have step-by-step
directions, which should be independent of any programming
code.
ALGORITHM ANALYSIS
Efficiency of an algorithm can be analyzed at two
different stages, before implementation and
after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of
an algorithm. Efficiency of an algorithm is measured
by assuming that all other factors, for example,
processor speed, are constant and have no effect on
the implementation.
A Posterior Analysis − This is an empirical analysis
of an algorithm. The selected algorithm is
implemented using programming language. This is
then executed on target computer machine. In this
analysis, actual statistics like running time and space
required, are collected.
CRITERIAS TO ANALYSIS OF AN
ALGORITHM
Following are the criterias to analyze an
algorithm-
Time (Time function)
Space
Network Consumption
For apps based on Web / Internet/ Cloud
Power Consumption
CPU Registers
ALGORITHM COMPLEXITY
Suppose X is an algorithm and n is the size of input
data, the time and space used by the algorithm X are
the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number
of key operations such as comparisons in the sorting
algorithm.
Space Factor − Space is measured by counting the
maximum memory space required by the algorithm.
The time complexity of an algorithm f(n) gives the
running time and
Space complexity of an algorithm S(n) gives the
storage space required by the algorithm
(in terms of n as the size of input data.)
TIME ANALYSIS
Time function is written as f(n)
Every (non nested) statement in an algorithm
takes 1 unit of time.
Eg-
p=5*a+6*b ------- 1 (constant)
Detail analysis
x= 5*a ------ 1
y= 6*b ------ 1
z= x+y ------ 1
p=z ------ 1
f(n)= 4 (constant)
Constant is written as O(1) [order of 1]
SPACE ANALYSIS
Space function is written as S(n)
Space is dependent on number of variables
used.
Eg-
p=5*a+6*b
p ----- 1
a ----- 1
b ----- 1
Total = 3 variables
S(n) = 3 ----- constant
Constant is written as O(1) [order of 1]
FREQUENCY COUNT METHOD FOR
TIME COMPLEXITY
Eg- Algorithm to find sum of all elements in an
array
Algorithm Sum (A,n)
{
S=0; -------- 1
for(i=0; i<n; i++) --------
{ n+1
S=S+A[i];
} -------- n
return S;
} -------- 1
f(n) = 1 + (n+1) + n + 1
f(n) = 2n+3 Degree of Polynomial=1
TYPES OF TIME FUNCTIONS
O(1) - constant
O(n) – Linear
O(n2 ) – Quadratic
O(n3 )– Cubic
O(log n) – Logarithmic
O(2n ) - Exponential
O(3n ) - Exponential
SEQUENCE OF FUNCTIONS
1 < log n < n < n log n < n2 < n3 …. < 2n < 3n <
…. < nn
ASYMPTOTIC ANALYSIS
In Asymptotic Analysis, we evaluate the
performance of an algorithm in terms of input size
(we don’t measure the actual running time).
We calculate, how does the time (or space) taken
by an algorithm increases with the input size.
Asymptotic Analysis is not perfect, but that’s the
best way available for analyzing algorithms.
Lets take an example of Linear Search and analyze
it using Asymptotic analysis.
We can have three cases to analyze an algorithm:
1) Worst Case
2) Average Case
3) Best Case
WORST CASE ANALYSIS (USUALLY
DONE)
In the worst case analysis, we calculate
upper bound on running time of an
algorithm.
We must know the case that causes
maximum number of operations to be
executed.
For Linear Search, the worst case happens
when the element to be searched (say x ) is
not present in the array.
When x is not present, the search() functions
compares it with all the elements of arr[] one
by one.
Therefore, the worst case time complexity of
AVERAGE CASE ANALYSIS
(SOMETIMES DONE)
In average case analysis, we take all possible
inputs and calculate computing time for all of
the inputs.
Sum all the calculated values and divide the
sum by total number of inputs.
We must know (or predict) distribution of
cases.
BEST CASE ANALYSIS
In the best case analysis, we calculate lower
bound on running time of an algorithm.
We must know the case that causes minimum
number of operations to be executed. In the
linear search problem, the best case occurs when
x is present at the first location.
The number of operations in the best case is
constant (not dependent on n). So time
complexity in the best case would be Θ(1)
ASYMPTOTIC NOTATIONS
The main idea of asymptotic analysis is to have a measure
of efficiency of algorithms that doesn’t depend on machine
specific constants, and doesn’t require algorithms to be
implemented and time taken by programs to be compared.
The following 3 asymptotic notations are mostly used to
represent time complexity of algorithms.
1. Big OH (Upper bound) – denoted as “O”
2. Big Omega (Lower Bound) – denoted as “Ω”
3. Theta (Average Bound) – denoted as “θ”
Any function can be represented using any of the above
three notations
Useful- Theta Notation (since it gives average bound)
If we are unable to specify exact place (complexity) of any
function then we specify it using upper or lower bound
CONTD..
1 < log n < n < n log n < n2 < n3 …. < 2n < 3n <
…. < nn
Time complexity of any function may be-
One among the above series
May be multiple of above series
If not even multiple then we may not be able to find
Theta (θ) Notation. In that case we use Big Oh (O) or
Big Omega (Ω) notation.
BIG OH NOTATION (UPPER BOUND)
Definition-
The function f(n) = O(g(n)) iff there exists
positive constants ‘c’ and ‘n0’ such that-
f(n) <= c*g(n) ∀ n>=
n0
Example-
f(n) = 2n+3
2n+3<= 10n …. Assumption
(Single term, must have some
coefficient)
f(n) = O(n)
CONTD..
BIG OMEGA NOTATION (LOWER
BOUND)
Definition-
The function f(n) = Ω(g(n)) iff there exists
positive constants ‘c’ and ‘n0’ such that-
f(n) >= c*g(n) ∀ n>=
n0
Example-
f(n) = 2n+3
2n+3>= 1n …. Assumption
(Single term, must have some
coefficient)
f(n) = Ω(n)
CONTD..
THETA NOTATION (LOWER BOUND)
Definition-
The function f(n) = θ(g(n)) iff there exists
positive constants ‘c1’ and ‘c2’ and ‘n0’ such
that-
c1*g(n) <= f(n) <= c2*g(n) ∀ n>=
n0
Example-
f(n) = 2n+3
1n <= 2n+3 <= 5n …. Assumption
(Single term, must have some
coefficient)
f(n) = θ(n)
CONTD..