(ITec2083)
Improve ability to solve problems abstractly.
Improve ability to analyze algorithm.
2010E.C
To exactly know, what is data structure? We
must know:
What is a computer program?
Some mysterious
processing
Input Output
-Computer program development process required
us to represent data in effective way.
4
Data structure is structural representation of the logical
relationship existing between individual elements of data.
In other words, a data structure is a way of organizing all
data items that considers not only the elements stored but
also their relationship to each other.
It is a representation of data and the operations allowed on
that data.
It is a way to store and organize data in order to facilitate the
access and modifications.
Data Structure is the method of representing of logical
relationships between individual data elements related to
the solution of a given problem.
Data structure affects the design of both structural &
functional aspects of a program.
Program = algorithm + Data Structure
Data Structure = Operation + Organized Data
Storage structure is the representation of particular DS in
the memory of the computer.
You know that a algorithm is a step by step procedure to
solve a particular function.
That means, algorithm is a set of instruction written to
carry out certain tasks & the data structure is the way of
organizing the data with their logical relationship
retained.
To develop a program of an algorithm, we should select
an appropriate data structure for that algorithm.
To convert a formal algorithm to a program we use step
wise Refinement Techniques (mathematical model,
pseudo code, DS, C/C++ program etc.)
Data structure are normally divided into two broad
categories:
Primitive Data Structure
Non-Primitive Data Structure
Data structure
Primitive DS Non-Primitive DS
Integer Float Character Pointer
Non-Primitive DS
Linear List Non-Linear List
Array Queue Graph Trees
Link List Stack
array
Linked list
queue
tree stack
PDSs are basic structures and directly operated upon
by the machine instructions.
In general, there are different representation on
different computers.
Integer, Floating-point number, Character constants,
string constants, pointers etc, fall in this category.
NPDSs are more sophisticated data structures.
These are derived from the primitive data structures.
The non-primitive data structures emphasize on
structuring of a group of homogeneous (same type) or
heterogeneous (different type) data items.
The design of an efficient data structure must take
operations to be performed on the data structure.
The most commonly used operation on data
structure are broadly categorized into following
types:
Create
Selection
Updating
Searching
Sorting
Merging
Destroy or Delete
Linear: In Linear data structure, values are arrange in
linear fashion.
Array: Fixed-size
Linked-list: Variable-size
Stack: Add to top and remove from top
Queue: Add to back and remove from front
Priority queue: Add anywhere, remove the highest
priority
Non-Linear: The data values in this structure are not
arranged in order.
Hash tables: Unordered lists which use a ‘hash
function’ to insert and search
Tree: Data is organized in branches.
Graph: A more general branching structure, with less
strict connection conditions than for a tree
A primitive data structure is generally a basic
structure that is usually built into the language, such
as an integer, a float.
A non-primitive data structure is built out of
primitive data structures linked together in
meaningful ways, such as a or a linked-list, binary
search tree, AVL Tree, graph etc.
The choice of particular data model depends on two
consideration:
It must be rich enough in structure to represent the
relationship between data elements
The structure should be simple enough that one can
effectively process the data when necessary
Definition:- ADT
In Object Oriented Programming data and the operations that manipulate
that data are grouped together in classes
Abstract Data Types (ADTs) stores data and allow various operations on
the data to access and change it.
A mathematical model, together with various operations defined on the
model
An ADT is a collection of data and associated operations for manipulating
that data
Data Structures
Physical implementation of an ADT
data structures used in implementations are provided in a language
(primitive or built-in) or are built from the language constructs (user-
defined)
Each operation associated with the ADT is implemented by one or more
subroutines in the implementation
ADTs support abstraction, encapsulation, and
information hiding.
Abstraction is the structuring of a problem into well-
defined entities by defining their data and operations.
The principle of hiding the used data structure and to
only provide a well-defined interface is known as
encapsulation.
Every Collection ADT should provide a way to:
add an item
remove an item
find an item
retrieve an item or
access an item
when implementing an ADT the operations and
behaviors are already specified
Implementer’s first choice is what to use as the internal
storage container for the concrete data type
the internal storage container is used to hold the items in
the collection
often an implementation of an ADT
22
No single data structure works well for all
purposes, and so it is important to know the
strengths and limitations of several of them
Collection with access only to the last element inserted
Last in first out
insert/push
remove/pop Data4 Top
top Data3
make empty
Data2
Data1
Collection with access only to the item that has been
present the longest
Last in last out or first in first out
enqueue, dequeue, front
priority queues and dequeue
Front Back
Data1 Data2 Data3 Data4
Items have a position in this Collection
Random access or not?
Array Lists
internal storage container is native array
26
A Flexible structure, because can grow and shrink on
demand.
Elements can be:
Inserted
Accessed
Deleted
At any position
last
first
A Tree is a collection of elements called nodes.
One of the node is distinguished as a root, along with
a relation (“parenthood”) that places a hierarchical
structure on the nodes. Root
Prepared
CENG 213by Alemu
Data Ginjo
Structures 29
ALGORITHM AND
ALGORITHM ANALYSIS
CENG 213 Data Structures 30
Algorithm
An algorithm is a clearly specified set of instructions to
be followed to solve a problem.
There can be more than one solution (more than one
algorithm) to solve a given problem.
It can be implemented using different programming
languages on different platforms.
An algorithm must be correct. It should correctly solve
the problem.
e.g. For sorting, this means even if (1) the input is
already sorted, or (2) it contains repeated elements.
Once we have a correct algorithm for a problem, we
have to determine the efficiency of that algorithm.
Prepared by Alemu Ginjo
CENG 213 Data Structures 31
Problems
Problem: is a task to be performed.
• Best thought of as inputs and matching outputs.
• Problem definition should include constraints on the
resources that may be consumed by any acceptable
solution.
Prepared by Alemu Ginjo
CENG 213 Data Structures
Algorithm Properties
An algorithm possesses the following properties:
It must be correct.
It must be composed of a series of concrete steps.
There can be no ambiguity as to which step will be
performed next.
It must be composed of a finite number of steps.
It must terminate.
A computer program is an instance, or concrete
representation, for an algorithm in some programming
language.
32 Prepared by Alemu Ginjo
Prepared
CENG 213by Alemu
Data Ginjo
Structures 33
Algorithmic Performance
There are two aspects of algorithmic performance:
Time
• Instructions take time.
• How fast does the algorithm perform?
• What affects its runtime?
=> computer, compiler, algorithm used & input to the algorithm
Space
• Data structures take space
• What kind of data structures can be used?
• How does choice of data structure affect the runtime?
We will focus on time:
How to estimate the time required for an algorithm
How to reduce the time required
Prepared
CENG 213by Alemu
Data Ginjo
Structures 34
Analysis of Algorithms
Algorithm analysis refers to the process of determining
the amount of computing time and storage space required
by different algorithms.
In other words, it’s a process of predicting the resource
requirement of algorithms in a given environment.
In order to solve a problem, there are many possible
algorithms.
One has to be able to choose the best algorithm for the
problem at hand using some scientific method
CENG 213by
Prepared Data Structures
Alemu Ginjo 35
Analysis of Algorithms(cont.)
- To classify some data structures and algorithms as
good, we need precise ways of analyzing them in terms
of resource requirement.
- The main resources are:
Running Time
Memory Usage
Communication Bandwidth
Prepared
CENG 213by Alemu
Data Ginjo
Structures 36
Analysis of Algorithms(cont.)
Running time is usually treated as the most important
since computational time is the most precious resource
in most problem domains.
There are two approaches to measure the efficiency of
algorithms:
• Empirical: Programming competing algorithms
and trying them on different instances.
• Theoretical: Determining the quantity of resources
required mathematically (Execution time, memory
space, etc.) needed by each algorithm.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 37
Analysis of Algorithms(cont.)
However, it is difficult to use actual clock-time as a
consistent measure of an algorithm’s efficiency, because
clock-time can vary based on many things. For example,
• Specific processor speed
• Current processor load
• Specific data for a particular run of the program
Input size
Input properties
• Operating Environment
Prepared
CENG 213by Alemu
Data Ginjo
Structures 38
Analysis of Algorithms(cont.)
Accordingly, we can analyze an algorithm according to
the number of operations required, rather than
according to an absolute amount of time involved.
This can show how an algorithm’s efficiently changes
according to the size of the input.
CENG 213by
Prepared Data Structures
Alemu Ginjo 39
Complexity Analysis
• Complexity Analysis is the systematic study of the cost of
computation, measured either in time units or in operations
performed, or in the amount of storage space required.
• The goal is to have a meaningful measure that permits
comparison of algorithms independent of operating
platform.
I.Time Complexity: Determine the approximate number of
operations required to solve a problem of size n.
II.Space Complexity: Determine the approximate memory
required to solve a problem of size n.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 40
Complexity Analysis
Complexity analysis involves two distinct phases:
1.Algorithm Analysis: Analysis of the algorithm or data
structure to produce a function T (n) that describes the
algorithm in terms of the operations performed in order to
measure the complexity of the algorithm.
2.Order of Magnitude Analysis: Analysis of the function T
(n) to determine the general complexity category to which it
belongs.
There is no generally accepted set of rules for algorithm
analysis.
However, an exact count of operations is commonly used.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 41
Analysis Rules
Analysis Rules:
1. We assume an arbitrary time unit.
2. Execution of one of the following operations takes time 1:
• Assignment operation, Single Input/output operation.
• Single Boolean Operations. ex: AND, OR, NOR, NAND,
XOR, etc.
• Single Arithmetic Operations. ex: +, *, /, etc. and Function
Return.
3. Running time of a selection statement (if, switch) is the
time for the condition evaluation + the maximum of the
running times for the individual clauses in the selection.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 42
Analysis Rules(cont.)
4. Loops: Running time for a loop is equal to the running
time for the statements inside the loop * number of iteration.
The total running time of a statement inside a group of
nested loops is the running time of the statements
multiplied by the product of the sizes of all the loops.
For nested loops, analyze inside out.
Always assume that the loop executes the maximum
number of iterations possible.
5. Running time of a function call is 1 for setup + the time
for any parameter calculations + the time required for the
execution of the function body.
CENG 213by
Prepared Data Structures
Alemu Ginjo 43
Analysis Rules(cont.)
Generally:-
1 for the assignment statement.
1 for the output statement.
1 for the input statement.
In the for loop:
1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an addition.
1 for the return statement.
CENG 213by
Prepared Data Structures
Alemu Ginjo 44
Examples: 1
int count( ){
int k=0; 1
cout<<”Enter an integer”; 1
cin>>n; 1
for ( i=0; i<n; i++) 1+n+1+n
k=k+1; 2n
return 0; 1
}
T(n) = 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)
CENG 213by
Prepared Data Structures
Alemu Ginjo 45
2.
int total(int n )
{
int sum=0;
for (int i=1; i<=n; i++)
sum=sum+1;
return sum;
}
T(n) = 1+(1+n+1+n)+2n+1 = 4n+4 = O(n)
CENG 213by
Prepared Data Structures
Alemu Ginjo 46
3. void func() Time Units to Compute
{ -------------------------------------------------
int x=0; 1 for the first assignment statement: x=0;
int i=0; 1 for the second assignment statement: i=0;
int j=1; 1 for the third assignment statement: j=1;
cout<< “Enter an Integer value”; 1 for the output statement.
cin>>n;
while (i<n){
1 for the input statement.
x++; In the first while loop:
i++; n+1 tests
} n loops of 2 units for the two increment (addition) operations
while (j<n) In the second while loop:
{ n tests
j++; n-1 increments
} -------------------------------------------------------------------
}
T (n)= 1+1+1+1+1+n+1+2n+n+n-1 = 5n+5 = O(n)
CENG 213by
Prepared Data Structures
Alemu Ginjo 47
4. int sum (int n)
{
int partial_sum = 0;
for (int i = 1; i <= n; i++)
partial_sum = partial_sum +(i * i * i);
return partial_sum;
}
Time Units to Compute
-------------------------------------------------
1 for the assignment.
1 assignment, n+1 tests, and n increments.
n loops of 4 units for an assignment, an addition, and two multiplications.
1 for the return statement.
-------------------------------------------------------------------
T (n)= 1+(1+n+1+n)+4n+1 = 6n+4 = O(n)
Prepared
CENG 213by Alemu
Data Ginjo
Structures 48
Analysis of Algorithms(cont.)
• Analysis of Algorithms is the area of computer science
that provides tools to analyze the efficiency of different
methods of solutions.
• It predicts resources that an algorithm requires.
• How do we compare the time efficiency of two
algorithms that solve the same problem?
Naïve Approach: implement these algorithms in a
programming language (C++), and run them to compare
their time requirements.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 49
Analysis of Algorithms(cont.)
Comparing the programs(instead of algorithms) has difficulties.
• How are the algorithms coded?
Comparing running times means comparing the
implementations.
We should not compare implementations, because they are
sensitive to programming style that may cloud the issue of
which algorithm is inherently more efficient.
• What computer should we use?
We should compare the efficiency of the algorithms
independently of a particular computer.
• What data should the program use?
Any analysis must be independent of specific data.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 50
Analysis of Algorithms(cont.)
• When we analyze algorithms, we should employ
mathematical techniques that analyze algorithms
independently of specific implementations, computers,
or data.
• To analyze algorithms:
First, we start to count the number of significant
operations in a particular solution to assess its
efficiency.
Then, we will express the efficiency of algorithms
using growth functions.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 51
The Execution Time of Algorithms
Each operation in an algorithm (or a program) has a cost.
Each operation takes a certain of time.
count = count + 1; take a certain amount of time, but it is constant
A sequence of operations:
count = count + 1; Cost: c1
sum = sum + count; Cost: c2
Total Cost = c1 + c2
Prepared
CENG 213by Alemu
Data Ginjo
Structures 52
The Execution Time of
Algorithms(cont.)
Example: Simple If-Statement
Cost Times
if (n < 0) c1 1
absval = -n c2 1
else
absval = n; c3 1
Total Cost <= c1 + max(c2,c3)
Prepared
CENG 213by Alemu
Data Ginjo
Structures 53
The Execution Time of
Algorithms(cont.)
Example: Simple Loop
Cost Times
i = 1; c1 1
sum = 0; c2 1
while (i <= n) { c3 n+1
i = i + 1; c4 n
sum = sum + i; c5 n
}
Total Cost = c1 + c2 + (n+1)*c3 + n*c4 + n*c5
The time required for this algorithm is proportional to n
Prepared
CENG 213by Alemu
Data Ginjo
Structures 54
The Execution Time of
Algorithms(cont.)
Example: Nested Loop
Cost Times
i=1; c1 1
sum = 0; c2 1
while (i <= n) { c3 n+1
j=1; c4 n
while (j <= n) { c5 n*(n+1)
sum = sum + i; c6 n*n
j = j + 1; c7 n*n
}
i = i +1; c8 n
}
Total Cost = c1 + c2 + (n+1)*c3 + n*c4 + n*(n+1)*c5+n*n*c6+n*n*c7+n*c8
The time required for this algorithm is proportional to n2
Prepared
CENG 213by Alemu
Data Ginjo
Structures 55
General Rules for Estimation
• Loops: The running time of a loop is at most the
running time of the statements inside of that loop times
the number of iterations.
• Nested Loops: Running time of a nested loop
containing a statement in the inner most loop is the
running time of statement multiplied by the product of
the sized of all loops.
• Consecutive Statements: Just add the running times of
those consecutive statements.
• If/Else: Never more than the running time of the test
plus the larger of running times of S1 and S2.
CENG 213by
Prepared Data Structures
Alemu Ginjo 56
Formal Approach to Analysis
In the above examples we have seen that analysis is a bit
complex.
However, it can be simplified by using some formal approach in
which case we can ignore initializations, loop control, and book
keeping.
1. For Loops: Formally
• In general, a for loop translates to a summation. The index and
bounds of the summation are the same as the index and bounds
of the for loop.
N
f
or(
i
nti
= 1
;i<=N;
i++
){
s
um= s
um+
i;
1N
}
i1
Suppose we count the number of additions that are done. There
is 1 addition per iteration of the loop, hence N additions in total.
CENG 213by
Prepared Data Structures
Alemu Ginjo 57
2. Nested Loops: Formally
Nested for loops translate into multiple summations, one for each
for loop.
Again, count the number of additions. The outer summation is for
the outer for loop.
CENG 213by
Prepared Data Structures
Alemu Ginjo 58
3. Consecutive Statements: Formally
Add the running times of the separate blocks of your code
CENG 213by
Prepared Data Structures
Alemu Ginjo 59
4. Conditionals: Formally
If (test) s1 else s2: Compute the maximum of the running time
for s1 and s2.
CENG 213by
Prepared Data Structures
Alemu Ginjo 60
Example: Suppose we have hardware capable of executing 106
instructions per second. How long would it take to execute an
algorithm whose complexity function was:
T (n) = 2n2 on an input size of n=108?
The total number of operations to be performed would be T (108):
T(108) = 2*(108)2 =2*1016
The required number of seconds
required would be given by
T(108)/106 so:
Running time =2*1016/106 = 2*1010
•The number of seconds per day is 86,400 so this is about
231,480 days (634 years).
Prepared
CENG 213by Alemu
Data Ginjo
Structures 61
What to Analyze
An algorithm can require different times to solve
different problems of the same size.
E.g. Searching an item in a list of n elements using
sequential search. Cost: 1,2,...,n
Worst-Case Analysis –The maximum amount of time
that an algorithm require to solve a problem of size n.
This gives an upper bound for the time complexity of
an algorithm.
Normally, we try to find worst-case behavior of an
algorithm.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 62
What to Analyze (cont.)
Best-Case Analysis –The minimum amount of time that an
algorithm require to solve a problem of size n.
The best case behavior of an algorithm is NOT so useful.
Average-Case Analysis –The average amount of time that an
algorithm require to solve a problem of size n.
Sometimes, it is difficult to find the average-case behavior
of an algorithm.
We have to look at all possible data organizations of a
given size n, and their distribution probabilities of
these organizations.
Worst-case analysis is more common than average-case
analysis.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 63
Measures of Times
• In order to determine the running time of an algorithm it is
possible to define three functions Tbest(n), Tavg(n) and
Tworst(n), the average and the worst case running time of the
algorithm respectively.
1.Average Case (Tavg): The amount of time the algorithm
takes on an “average” set of inputs.
Worst Case (Tworst): The amount of time the algorithm takes
on the worst possible set of inputs.
That is the longest running time for any input of size n.
2.Best Case (Tbest): The amount of time the algorithm takes
on the smallest possible set of inputs.
Prepared
CENG 213by Alemu
Data Ginjo
Structures 64
Asymptotic Analysis
Asymptotic analysis is concerned with how the running
time of an algorithm increases with the size of the input in
the limit, as the size of the input increases without bound.
There are five notations used to describe a running time
function. These are:
Big-Oh Notation (O)
Big-Omega Notation ()
Theta Notation ()
Little-o Notation (o)
Little-Omega Notation ()
CENG 213by
Prepared Data Structures
Alemu Ginjo 65
The Big-Oh Notation
Big-Oh notation is a way of comparing algorithms and is used for
computing the complexity of algorithms; i.e., the amount of time
that it takes for computer program to run.
It’s only concerned with what happens for very a large value of n.
Therefore only the largest term in the expression (function) is
needed.
For example, if the number of operations in an algorithm is n2 – n,
n is insignificant compared to n2 for large values of n.
Hence the n term is ignored. Of course, for small values of n, it may
be important.
However, Big-Oh is mainly concerned with large values of n.
CENG 213by
Prepared Data Structures
Alemu Ginjo 66
Formal Definition: f (n)= O (g (n)) if there exist c, k ∊ ℛ+
such that for all n ≥ k, f (n) ≤ c.g (n).
Examples: The following points are facts that you can use
for Big-Oh problems:
1<=n for all n>=1
n<=n2 for all n>=1
2n <=n! for all n>=4
log2n<=n for all n>=2
n<=nlog2n for all n>=2
CENG 213by
Prepared Data Structures
Alemu Ginjo 67
1. f(n)=10n+5 and g(n)=n. Show that f(n) is O(g(n)).
To show that f(n) is O(g(n)) we must show that
constants c and k such that
f(n) <=c.g(n) for all n>=k
Or 10n+5<=c.n for all n>=k
Try c=15. Then we need to show that 10n+5<=15n
Solving for n we get: 5<5n or 1<=n.
So f(n) =10n+5 <=15.g(n) for all n>=1.
(c=15,k=1).
CENG 213by
Prepared Data Structures
Alemu Ginjo 68
2. f(n) = 3n2 +4n+1. Show that f(n)=O(n2).
4n <=4n2 for all n>=1 and 1<=n2 for all n>=1
3n2 +4n+1<=3n2+4n2+n2 for all n>=1
<=8n2 for all n>=1
So we have shown that f(n)<=8n2 for all n>=1
Therefore, f (n) is O(n2) (c=8,k=1)
CENG 213by
Prepared Data Structures
Alemu Ginjo 69
Typical Orders
Here is a table of some typical cases. This uses logarithms to base 2, but these are simply
proportional to logarithms in other base.
N O(1) O(log n) O(n) O(n log n) O(n2) O(n3)
1 1 1 1 1 1 1
2 1 1 2 2 4 8
4 1 2 4 8 16 64
8 1 3 8 24 64 512
16 1 4 16 64 256 4,096
1024 1 10 1,024 10,240 1,048,576 1,073,741,824
CENG 213by
Prepared Data Structures
Alemu Ginjo 70
Demonstrating that a function f(n) is big-O of a function g(n)
requires that we find specific constants c and k for which the
inequality holds (and show that the inequality does in fact hold).
Big-O expresses an upper bound on the growth rate of a function,
for sufficiently large values of n.
An upper bound is the best algorithmic solution that has been
found for a problem.
“ What is the best that we know we can do?”
Exercise:
f(n) = (3/2)n2+(5/2)n-3
Show that f(n)= O(n2)
In simple words, f (n) =O(g(n)) means that the growth rate of f(n)
is less than or equal to g(n).
Prepared
CENG 213by Alemu
Data Ginjo
Structures 71
End of chapter 1