1.1.
Introduction to Data Structures
Given a problem, the first step to
solve the problem is obtaining ones
own abstract view, or model, of the
problem. This process of modeling is
called abstraction.
Cont.
The model defines an abstract view
to the problem. This implies that the
model focuses only on problem
related stuff and that a programmer
tries to define the properties of the
problem.
Cont.
These properties include
The data which are affected and
The operations that are involved in the
problem.
With abstraction you create a well-defined
entity that can be properly handled. These
entities define the data structure of the
program.
An entity with the properties just described is
called an abstract data type (ADT).
Data Structures and Algorithms Analysis
1. Introduction to Data Structures and Algorithms Analysis
A program is written in order to solve a problem. A
solution to a problem actually consists of two things:
A way to organize the data
Sequence of steps to solve the problem
The way data are organized in a computers
memory is said to be Data Structure and the
sequence of computational steps to solve a problem
is said to be an algorithm. Therefore, a program is
nothing but data structures plus algorithms.
1.1.1. Abstract Data Types
An ADT consists of an abstract data
structure and operations. Put in other
terms, an ADT is an abstraction of a
data structure.
Cont.
The ADT specifies:
What can be stored in the Abstract
Data Type
What operations can be done on/by
the Abstract Data Type.
Cont.
A data structure is a language
construct that the programmer has
defined in order to implement an
abstract data type.
There are lots of formalized and
standard Abstract data types such as
Stacks, Queues, Trees, etc.
1.1.2. Abstraction
Abstraction is a process of
classifying characteristics as relevant
and irrelevant for the particular
purpose at hand and ignoring the
irrelevant ones.
Algorithms
An algorithm is a well-defined computational
procedure that takes some value or a set of
values as input and produces some value or a set
of values as output. Data structures model the
static part of the world. They are unchanging
while the world is changing. In order to model
the dynamic part of the world we need to work
with algorithms. Algorithms are the dynamic
part of a program’s world model.
Cont.
An algorithm transforms data structures
from one state to another state in two ways:
An algorithm may change the value held by
a data structure
An algorithm may change the data structure
itself.
1.2.1. Properties of an algorithm
Finiteness: Algorithm must complete after
a finite number of steps.
Definiteness: Each step must be clearly
defined, having one and only one
interpretation. At each point in
computation, one should be able to tell
exactly what happens next.
Sequence: Each step must have a unique
defined preceding and succeeding step. The
first step (start step) and last step (halt
step) must be clearly noted.
Cont.
Feasibility: It must be possible
to perform each instruction.
Correctness: It must compute
correct answer all possible legal
inputs.
Language Independence: It must
not depend on any one programming
language.
Completeness: It must solve the
problem completely.
Effectiveness: It must be possible to
perform each step exactly and in a
finite amount of time.
Cont.
Efficiency: It must solve with the least
amount of computational resources such
as time and space.
Generality: Algorithm should be valid
on all possible inputs.
Input/Output: There must be a
specified number of input values, and
one or more result values.
1.2.2. Algorithm Analysis Concepts
Algorithm analysis refers to the
process of determining how much
computing time and storage that
algorithms will require. In other
words, it’s a process of predicting the
resource requirement of algorithms in
a given environment.
Cont.
In order to solve a problem, there are
many possible algorithms. One has to
be able to choose the best algorithm
for the problem at hand using some
scientific method. To classify some
data structures and algorithms as
good, we need precise ways of
analyzing them in terms of resource
requirement.
Cont.
Themain resources are:
Running Time
Memory Usage
Communication Bandwidth
Cont.
Running time is usually treated as the
most important since computational
time is the most precious resource in
most problem domains.
There are two approaches to measure
the efficiency of algorithms:
Cont.
Empirical: Programming competing
algorithms and trying them on
different instances.
Theoretical: Determining the
quantity of resources required
mathematically (Execution time,
memory space, etc.) needed by each
algorithm.
Cont.
However, it is difficult to use actual
clock-time as a consistent measure of
an algorithm’s efficiency, because
clock-time can vary based on many
things. For example,
Cont.
Specific processor speed
Current processor load
Specific data for a particular run of
the program
◦Input Size
◦Input Properties
Operating Environment
1.2.3. Complexity Analysis
Complexity Analysis is the
systematic study of the cost of
computation, measured either in time
units or in operations performed, or
in the amount of storage space
required.
There are two things to consider:
Time Complexity: Determine the
approximate number of operations
required to solve a problem of size n.
Space Complexity: Determine the
approximate memory required to
solve a problem of size n.
Complexity analysis involves two
distinct phases:
Algorithm Analysis: Analysis of the
algorithm or data structure to
produce a function T (n) that
describes the algorithm in terms of
the operations performed in order to
measure the complexity of the
algorithm.
Order of Magnitude Analysis:
Analysis of the function T (n) to
determine the general complexity
category to which it belongs.
There is no generally accepted set
of rules for algorithm analysis.
However, an exact count of
operations is commonly used.
1.2.3.1. Analysis Rules
1. We assume an arbitrary time unit.
2.Execution of one of the following
operations takes time 1:
Assignment Operation
Single Input/Output Operation
Single Boolean Operations
Single Arithmetic Operations
Function Return
Cont.
3.Running time of a selection statement
(if, switch) is the time for the
condition evaluation + the maximum
of the running times for the individual
clauses in the selection.
Cont.
4.Loops: Running time for a loop is
equal to the running time for the
statements inside the loop * number
of iterations.
Cont.
The total running time of a statement
inside a group of nested loops is the
running time of the statements multiplied
by the product of the sizes of all the
loops.
For nested loops, analyze inside out.
Always assume that the loop executes the
maximum number of iterations possible.
Cont.
5.Running time of a function call is 1
for setup + the time for any
parameter calculations + the time
required for the execution of the
function body.
Examples:
1.int count(){
int k=0;
cout<< “Enter an integer”;
cin>>n;
for (i=0;i<n;i++)
k=k+1;
return 0;}
Time Units to Compute
1 for the assignment statement: int k=0
1 for the output statement.
1 for the input statement.
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an
addition.
1 for the return statement.
----------------------------------------------------------------
---
T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)
Example2
int total(int n)
{
int sum=0;
for (int i=1;i<=n;i++)
sum=sum+1;
return sum;
}
Time Units to Compute
1 for the assignment statement: int sum=0
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)
Example 3
void func()
{
int x=0;
int i=0;
int j=1;
cout<< “Enter an Integer value”;
cin>>n;
while (i<n){
x++;
i++;
}
while (j<n)
{
j++;
}
}
Time Units to Compute
1 for the first assignment statement: x=0;
1 for the second assignment statement: i=0;
1 for the third assignment statement: j=1;
1 for the output statement.
1 for the input statement.
In the first while loop:
n+1 tests
n loops of 2 units for the two increment (addition)
operations
In the second while loop:
n tests
n-1 increments
-------------------------------------------------------------------
T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n)
Example 4
int sum (int n)
{
int partial_sum = 0;
for (int i = 1; i <= n; i++)
partial_sum = partial_sum +(i * i * i);
return partial_sum;
}
Time Units to Compute
1 for the assignment.
1 assignment, n+1 tests, and n increments.
n loops of 4 units for an assignment, an addition, and
two multiplications.
1 for the return statement.
---------------------------------------------------------
T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)
Exercise
Suppose we have hardware capable of
executing 106 instructions per second.
How long would it take to execute an
algorithm whose complexity function was:
T (n) = 2n2 on an input size of n=108?
The total number of operations to be
performed would be T (108):
solution
T(108) = 2*(108)2 =2*1016
The required number of seconds
required would be given by
T(108)/106 so:
Running time =2*1016/106 = 2*1010
.
Measures of Times
In order to determine the running time
of an algorithm it is possible to
define three functions Tbest(n), Tavg(n)
and Tworst(n) as the best, the average
and the worst case running time of
the algorithm respectively.
Average Case (Tavg): The amount of time the
algorithm takes on an "average" set of inputs.
Worst Case (T
worst): The amount of time the
algorithm takes on the worst possible set of
inputs.
Best Case (T ): The amount of time the
best
algorithm takes on the smallest possible set of
inputs.
We are interested in the worst-case time, since it
provides a bound for all input – this is called the
“Big-Oh” estimate.
1.4. Asymptotic Analysis
Asymptotic analysis is concerned
with how the running time of an
algorithm increases with the size of
the input in the limit, as the size of
the input increases without bound.
There are five notations used to
describe a running time function
These are:
Big-Oh Notation (O)
Big-Omega Notation ()
Theta Notation ()
Little-o Notation (o)
Little-Omega Notation ()
1.4.1. The Big-Oh Notation(O)
Big-Oh notation is a way of
comparing algorithms and is used for
computing the complexity of
algorithms; i.e., the amount of time
that it takes for computer program to
run . It’s only concerned with what
happens for very a large value of n..
Cont.
Therefore only the largest term in the
expression (function) is needed.
For example, if the number of operations in
an algorithm is n2 – n, n is insignificant
compared to n2 for large values of n. Hence
the n term is ignored.
Big-Oh is mainly concerned with large
values of n.
Formal Definition:
ℛ+ such that for all n≥ k, f (n) ≤ c.g
f(n)= O (g (n)) if there exist c, k ∊
(n).
Examples:
The following points are facts that
you can use for Big-Oh problems:
1<=n for all n>=1
n<=n2 for all n>=1
2n <=n! for all n>=4
log n<=n for all n>=2
2
n<=nlog n for all n>=2
2
Example 2
f(n)=10n+5 and g(n)=n. Show that
f(n) is O(g(n)).
To show that f(n) is O(g(n)) we must
show that constants c and k such that
f(n) <=c.g(n) for all n>=k
Or 10n+5<=c.n for all n>=k
Try c=15. Then we need to show that
Cont.
10n+5<=15n
Solving for n we get: 5<5n or 1<=n.
So f(n) =10n+5 <=15.g(n) for all
n>=1.
(c=15
,k=1).
1.4.2. Big-Omega Notation ()
notation provides an asymptotic
lower bound.
Formal Definition: A function f(n) is
and k ∊ ℛ+ such that
( g (n)) if there exist constants c
f(n) >=c. g(n) for all n>=k.
Cont.
f(n)=( g (n)) means that f(n) is
greater than or equal to some
constant multiple of g(n) for all
values of n greater than or equal to
some k.
Example:
Iff(n) =n2, then f(n)= ( n)
In simple terms, f(n)= ( g (n))
means that the growth rate of f(n) is
greater that or equal to g(n).
Example:
F(n)=5n+10 g(n)=2n
F(n)= (g(n))
f(n) ) >=c.g(n)
5n+10>=c.(2n) c=1
5n+10>=2n
3n>=10
N>=3.3
1.4.3. Theta Notation()
f(n)= (g(n)) iff
c1.g(n)<=f(n)<=c2.g(n)
two conditions
Cond1.f(n)>=c1.g(n) =>f(n)= (g(n))
Cond2. f(n)<=c2. g(n) => f(n)= (g(n
A function f (n) belongs to the
set of (g(n)) if there exist
positive constants c1 and c2
such that it can be sandwiched
between c1.g(n) and c2.g(n),
for sufficiently large values of
n.
Formal Definition:
f (n) is (g(n)) if it is both O( g(n) )
and ( g(n) ). In other words, there
exist constants c1, c2, and k >0 such
that c1.g (n)<=f(n)<=c2. g(n) for all
n >= k
Cont.
Iff(n)= (g(n)), then g(n) is an
asymptotically tight bound for f(n).
In simple terms, f(n)= (g(n))
means that f(n) and g(n) have the
same rate of growth.
Example
F(n)=n2+n g(n)=5n2
f(n)= (g(n))
n2+n<=c2.5n2
5n2>=n2+n
4n2>= which is true n>=0
Example
If f(n)=2n+1, then f(n) = (n)
f(n) =2n2 then
f(n)=O(n4)
f(n)=O(n3)
f(n)=O(n2)
1.4.4. Little-o Notation (o)
Big-Oh notation may or may not be
asymptotically tight, for example:
2n2 = O(n2)
=O(n3)
f(n)=o(g(n)) means for all c>0 there
exists some k>0 such that f(n)<c.g(n)
for all n>=k. Informally, f(n)=o(g(n))
Cont.
means f(n) becomes insignificant
relative to g(n) as n approaches
infinity.
Example:
f(n)=3n+4 is o(n2)
In simple terms, f(n) has less growth
rate compared to g(n).
g(n)= 2n2 g(n) =o(n3), O(n2), g(n) is
not o(n2).
1.4.5. Little-Omega ( notation)
Little-omega () notation is to big-
omega () notation as little-o
notation is to Big-Oh notation. We
use notation to denote a lower
bound that is not asymptotically
tight.
Formal Definition
f(n)= (g(n)) if there exists a
constant no>0 such that 0<= c.
g(n)<f(n) for all n>=k.
Example: 2n2=(n) but it’s not (n).
1.2. Review of elementary Data Structures
Heaps
A Heap is a special Tree-based Data
Structure that has the following properties.
It is a complete Complete Binary Tree.
It either follows max heap or min heap
property.
Max-Heap: The value of the root node must be
the greatest among all its descendant nodes
and the same thing must be done for its left and
right sub-tree also.
Min-Heap: The value of the root node must
be the smallest among all its descendant
nodes and the same thing must be done for
its left and right sub-tree also.
Operations on Heaps
Insertion:
•Insert the new element at the end of the heap
(maintaining completeness).
Deletion (typically root deletion):
Replace the root with the last element in the
heap.
Peek:
•Access the root element (max in max-heap or min in
min-heap) without removing it.
Heapify Operation:
Used to convert an arbitrary array into a heap.
Heap Complexities