0% found this document useful (0 votes)
8 views53 pages

Pmscs 623p Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views53 pages

Pmscs 623p Lecture 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Data Structure and

Algorithm
(PMSCS 623P)

Lecture 1
Data Structures
 Data
• Data are simply values or sets of values, raw materials used as inputs.
• A data item refers to a single unit of values.
• Data items that are divided into sub items are group items. Those that are
not are called elementary items. For example, a student’s name may be
divided into three sub items – [first name, middle name and last name] but
the ID of a student would normally be treated as a single item.

Student

ID Name Address Age Gender

First Middle Last Street Area


Data Structures
What is Data Structure?

• A data structure is defined as a particular way of storing and


organizing data in our devices to use the data efficiently and
effectively. The main idea behind using data structures is to
minimize the time and space complexities. An efficient data
structure takes minimum memory space and requires
minimum time to execute the data.
Classification of Data Structure
Classification of Data Structure
Linear Data structure:
• Data structure in which data elements are arranged
sequentially or linearly, where each element is
attached to its previous and next adjacent elements,
is called a linear data structure.
Example: Array, Stack, Queue, Linked List, etc.
Classification of Data Structure
Non-Linear Data Structure:

• Data structures where data elements are NOT placed


sequentially or linearly are called non-linear data
structures.

• This structure is mainly used to represent data


containing a hierarchical relationship between elements.
Examples: Trees and Graphs.
What is an algorithm?
• A computational procedure that takes
 some value, or set of values, as input
 produces some value, or set of values, as output
 may be specified:
o In English
o As a pseudocode
o As a computer program

• A sequence of computational steps that transform


the input into the output.
What is an algorithm?
• Algorithm are the systematic ordered logical
approach which is a well-defined, step-by-step procedure
that allows a computer to solve a problem.
• Algorithm must
o It must produce the correct result
o It must finish in some finite time

• Algorithms are the ideas behind computer programs.


• An algorithm is the thing that stays the same whether the
program is in Pascal running on a Windows or is in JAVA
running on a Macintosh!
Algorithm
• Examples
Algorithm
• Linear search algorithm
Pseudocode
• High-level description
of an algorithm Example: find max element
of an array
• More structured than
English prose Algorithm arrayMax(A, n)
Input array A of n integers
• Less detailed than a Output maximum element of A
program currentMax  A[0]
for i  1 to n  1 do
• Preferred notation for if A[i]  currentMax then
describing algorithms currentMax  A[i]
return currentMax
• Hides program
design issues
Pseudocode
• It is one of the methods which can be used to represent an
algorithm for a program.
• Many time algorithms are presented using pseudocode
since they can be read and understood by programmers
who are familiar with different programming languages.
• Pseudocode allows us to include several control structures
such as While, If-then-else, Repeat-until, for and case,
which is present in many high-level languages.

• Note: Pseudocode is not an actual programming language.


Pseudocode
Program
• A program is a set of ordered instructions for
the computer to follow.
• The machine can’t read a program directly,
because it only understands machine code.
But we can write stuff in a computer
language, and then a compiler or interpreter
can make it understandable to the computer.
Program
Algorithm vs Pseudocode vs Program
Algorithm Pseudocode
•An algorithm is defined as a •Pseudocode is one of the
well-defined sequence of steps methods that can be used to
that provides a solution for a represent an algorithm.
given problem. •Pseudocode is written in a
•Algorithms are generally format that is similar to the
written in a natural language or structure of a high-level
plain English language. programming language.

Program on the other hand allows us


to write a code in a particular programming language.
How to express algorithms?
Increasing precision

English

Pseudocode

Real programming languages


Ease of expression
The Problem of Sorting

Input: sequence a1, a2, …, an of numbers.

Output: permutation a'1, a'2, …, a'n such


that a'1  a'2 …  a'n .
Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
Insertion Sort
 Commonly used by card players: As each card is
picked up, it is placed into the proper sequence in
their hand.
 Divide the list into a sorted sublist and an
unsorted sublist.
 In each pass, one or more pieces of data are
removed from the unsorted sublist and inserted
into their correct position in a sorted sublist.
A[0] unused, valid
elements: A[1] …

Insertion Sort Pseudocode


Algorithm Name with parameters
(like a C function-header)
Commen
t
Equivalent CPP function
A[n]

INSERTION-SORT (A, n) ⊳ A[1 . . n]


void insertionSort (int A[], int n)
{ //here A[0 . . n] is an int array
for j ← 2 to n int i, j;
do ⊳ Insert A[ j ] into the sorted subarray A[1..j
for (j-1]= 2; j<=n; j++) {
⊳ in such a position that A[1..j] becomeskey = A[ j];
sorted
Algorithm body

key ← A[ j]
For loop body

i = j – 1;
i←j–1 while(i > 0 && A[i] > key){
while i > 0 and A[i] > key A[i+1] = A[i];
do A[i+1] ← A[i] i = i – 1;
While
Loop
body

i←i–1 }//while
A[i+1] ← key A[i+1] = key;
}//for
Indentation/spacing determines where }
the algorithm/loop/if/else-body ends
i j
Insertion Sort Simulation 1 2 3 4 5

⊳ A[1 . . n]
key 4 7 4 9 5 1
INSERTION-SORT (A, n)
for j ← 2 to n i j

do ⊳ Insert A[ j ] into the sorted subarray A[1..j -1] 1 2 3 4 5

⊳ in such a position that A[1..j] becomes sorted key 9 4 7 9 5 1


key ← A[ j]
i←j–1 i j
while i > 0 and A[i] > key 1 2 3 4 5
do A[i+1] ← A[i] key 5 4 7 9 5 1
i←i–1
A[i+1] ← key i j
1 2 3 4 5
key 1 4 5 7 9 1

1 2 3 4 5
1 4 5 7 9
Analysis of Algorithms
• What does it mean to analyze an algorithm?
– To have an estimate about how much time an algorithm may take to
finish, or in other words, to analyze its running time (aka. Time
complexity)
– Sometimes, instead of running time, we are interested in how much
memory/space the algorithm may consume while it runs (space
complexity)
– It enables us to compare between two algorithms
• What do we mean by running time analysis?
– Also referred to as time-complexity analysis
– To determine how running time increases as the size of the problem
increases.
Analysis of Algorithms

• Input size (number of elements in the input)


– size of an array

– # of elements in a matrix

– # of bits in the binary representation of the input

– vertices and edges in a graph


Why study algorithms and performance?

• Algorithms help us to understand scalability.


• Performance often draws the line between
what is feasible and what is impossible.
• Algorithmic mathematics provides a language
for talking about program behavior.
• Performance is the currency of computing.
Measure Actual Running Time?

• We can measure the actual running time of a


program
– Use stopwatch or insert timing code into program

• However, actual running time is not meaningful


when comparing two algorithms
– Coded in different languages
– Using different data sets
– Running on different computers
Counting Operations
• What we can control?
 Count operations instead of time
 The number of operations and count them.
 Large input, more operations
 Small input, less operations

 Focus on how performance scales (how the number of operations


grows as our input grows)

 Go beyond input size

(how the algorithm performs in all sort of situations.. How the internal
structure of the input)
Example: Counting
Operations
How many operations are required?
for (int i = 1; i <= n; i++) {
perform 100 operations; // A
for (int j = 1; j <= n; j++) {
perform 2 operations; // B
n n n
}
} Total Ops = A + 
 100  ( 2) 
B i 1 i 1 j 1
n
 100n  2n2  2n2 100n
 100n 2n
i1
Example: Counting
Operations
How many operations are required?
for (int i = 1; i <= n; i++) {
perform 100 operations; // A
for (int j = 1; j <= n; j++) {
perform 2 operations; // B
n n n
}
} Total Ops = A + 
 100  ( 2) 
B i 1 i 1 j 1
n
 100n  2n2  2n2 100n
 100n 2n
i1
Example: Counting
Operations
 Knowing the number of operations required
by the algorithm, we can state that
 Algorithm X takes 2n2 + 100n operations to solve
problem of size n
 If the time t needed for one operation is
known, then we can state
 Algorithm X takes (2n2 + 100n)t time units
Example: Counting
Operations
 However, time t is directly dependent on
the factors mentioned earlier
 e.g. different languages, compilers and
computers
 Instead of tying the analysis to actual
time t, we can state
 Algorithm X takes time that is proportional
to 2n2 + 100n for solving problem of size n
Approximation of Analysis
Results
 Suppose the time complexity of
 Algorithm A is 3n2 + 2n + log n + 1/(4n)
 Algorithm B is 0.39n3 + n
 Intuitively, we know Algorithm A will outperform B
 When solving larger problem, i.e. larger n
 The dominating term 3n2 and 0.39n3 can tell us
approximately how the algorithms perform
 The terms n2 and n3 are even simpler and preferred
 These terms can be obtained through asymptotic
analysis
Asymptotic
Analysis
 Asymptotic analysis is an analysis of
algorithms that focuses on
 Analyzing problems of large input size
 Consider only the leading term of the formula
 Ignore the coefficient of the leading term
Why Choose Leading Term?
• Growth rates of 1E+30
functions: 1E+28 Cubic
1E+26
– Linear  n 1E+24 Quadratic
– Quadratic  n2 1E+22
Linear
1E+20
– Cubic  n3 1E+18

T (n )
1E+16
1E+14
• In a log-log chart, the 1E+12
1E+10
slope of the line 1E+8
1E+6
corresponds to the 1E+4
growth rate of the 1E+2
1E+0
function 1E+0 1E+2 1E+4 1E+6 1E+8 1E+10
n
Why Choose Leading Term?
• The growth rate is 1E+26
1E+24 Quadratic

not affected by 1E+22 Quadratic


Linear
1E+20
– constant factors or 1E+18 Linear
1E+16
– lower-order terms 1E+14
T (n )
1E+12
• Examples 1E+10
1E+8
– 102n  105 is a 1E+6
linear function 1E+4
1E+2
– 105n2  108n is a 1E+0
quadratic function 1E+0 1E+2 1E+4 1E+6 1E+8 1E+10
n
Why Choose Leading Term?
• Lower order terms contribute lesser to the
overall cost as the input grows larger

n f(n) = n3 f(n) = n3 + n2 f(n) = n3 + n2 + 20n


1 1 2 22
10 1,000 1,100 1,300
1,000 1,000,000,000 1,001,000,000 1,001,020,000
1,000,000 1.0 × 1018 1.000001 × 1018 1.000001 × 1018
Why Choose Leading Term?
• Lower order terms contribute lesser to the
overall cost as the input grows larger

n f(n) = n3 f(n) = n3 + n2 f(n) = n3 + n2 + 20n


1 1 2 22
10 1,000 1,100 1,300
1,000 1,000,000,000 1,001,000,000 1,001,020,000
1,000,000 1.0 × 1018 1.000001 × 1018 1.000001 × 1018
Why Choose Leading Term?
• Lower order terms contribute lesser to the
overall cost as the input grows larger

n f(n) = n3 f(n) = n3 + n2 f(n) = n3 + n2 + 20n


1 1 2 22
10 1,000 1,100 1,300
1,000 1,000,000,000 1,001,000,000 1,001,020,000
1,000,000 1.0 × 1018 1.000001 × 1018 1.000001 × 1018
Why Choose Leading Term?
• Lower order terms contribute lesser to the
overall cost as the input grows larger

n f(n) = n3 f(n) = n3 + n2 f(n) = n3 + n2 + 20n


1 1 2 22
10 1,000 1,100 1,300
1,000 1,000,000,000 1,001,000,000 1,001,020,000
1,000,000 1.0 × 1018 1.000001 × 1018 1.000001 × 1018
Why Choose Leading Term?
• Lower order terms contribute lesser to the
overall cost as the input grows larger

n f(n) = n3 f(n) = n3 + n2 f(n) = n3 + n2 + 20n


1 1 2 22
10 1,000 1,100 1,300
1,000 1,000,000,000 1,001,000,000 1,001,020,000
1,000,000 1.0 × 1018 1.000001 × 1018 1.000001 × 1018
Why ignore the coefficient of the leading term
How Do We Analyze Running Time?
• We need to define an objective measure.
– Count the number of statements executed
• Associate a "cost" with each statement and find the "total cost“ by
finding the total number of times each statement is executed.
• Not good: number of statements vary with the programming language as
well as the style of the individual programmer; therefore it can’t be an
objective measure of running time. For e.g. following two algorithms do
the same task using different programming style.
Algorithm 1 Algorithm 2
Time (micro sec) c1 c2 c3 Time (micro sec)
arr[0] = 0; c1 for(i=0; i<N; i++) c1+c2(N+1)+ c3N
arr[i] = 0; c1N
arr[1] = 0; c1 -----------------------------
arr[2] = 0; c1 g(n)= c1+c2(N+1)+c3N+c1N = (c1+c2+c3)N+(c1+c2)
...
arr[N-1] = 0; c1
----------------------
Observation: For very
f(n)= c1+clarge values of N, execution time of both programs is similar.
1+...+c1 = c1N
Thus we can say that both of them have roughly the same running time.
Ideal Solution to Express running time
• Express runtime as a function of the input size n (i.e., f(n)) in order to understand
how f(n) grows with n
• and count only the most significant term of f(n) and ignore everything else
(because those won’t affect running time much for very large values of n).

• Thus the running times (also called time complexity) of the programs of the
previous slide becomes:
• f(N)= c1N ≈ N*(some constant)
• g(N) = (c1+c2+c3)N+(c1+c2) ≈ N*(some constant)
• Thus both these functions grows linearly with N and as such both have the same
running time (specifically, linear running time). We say that both of them are
O(N) functions, which means that, the running time of each of these algorithms is
a constant multiple of N (we ignore the value of the constant).
• We compare running times of different functions in an asymptotic manner (i.e.,
we check if f(n) > g(n) for very large values of n). That’s why, the task of
computing time complexity (of an algorithm) is also called asymptotic analysis.
Intuition behind Asymptotic Analysis
• Consider the example of buying elephants and goldfish:
Cost: cost_of_elephants + cost_of_goldfish
Cost ~ cost_of_elephants (approximation)
• The low order terms, as well as constants in a function are
relatively insignificant for large n
f(n) = 6n + 4 ≈ n
g(n) = n4 + 100n2 + 10n + 50 ≈ n4
i.e., we say that n4 + 100n2 + 10n + 50 and n4 have the same
rate of growth. However, f(n) and g(n) have different rate of
growths.
The Purpose of Asymptotic Analysis
 To estimate how long a program will run.
 To estimate the largest input that can reasonably
be given to the program.
 To compare the efficiency of different
algorithms.
 To help focus on the parts of code that are
executed the largest number of times.
 To choose an algorithm for an application.
Asymptotic Notations

• O notation: asymptotic “upper bound”:

•  notation: asymptotic “lower bound”:

•  notation: asymptotic “tight bound”:


Asymptotic Notations
• O-notation (Big Oh)

O(g(n)) is the set of functions


with smaller or same order of
growth as g(n)

Examples:
T(n) = 3n2+10nlgn+8 is O(n2), O(n2lgn), O(n3), O(n4), …
T’(n) = 52n2+3n2lgn+8 is O(n2lgn), O(n3), O(n4),

Loose upper
bounds
Asymptotic Notations
•  - notation (Big Omega)

(g(n)) is the set of functions


with larger or same order of
growth as g(n)
Examples:
T(n)=3n2+10nlgn+8 is Ω(n2), Ω(nlgn), Ω(n), Ω(lgn),Ω(1)
T’(n) = 52n2+3n2lgn+8 is Ω(n2lgn), Ω(n2), Ω(n), …
Asymptotic Notations
• -notation (Big Theta)

(g(n)) is the set of functions with the same


order of growth as g(n)

* f(n) is both O(g(n)) & (g(n)) ↔ f(n) is (g(n))

Examples:
T(n) = 3n2+10nlgn+8 is Θ(n2)
T’(n) = 52n2+3n2lgn+8 is Θ(n2lgn)
Big O
• Informally, O (g(n)) is the set of all functions
with a smaller or same order of growth as g(n),
within a constant multiple
• If we say f(n) is in O(g(n)), it means that g(n) is
an asymptotic upper bound of f(n)
– Intuitively, it is like f(n) ≤ g(n)
• What is O(n2)?
– The set of all functions that grow slower than or in
the same order as n2
Examples
• Show that 30n+8 is O(n).
– Show c,n0: 30n+8  cn, n ≥ n0 .

– Let c=31, n0=8


cn = 31n = 30n + n ≥ 30n+8,
– so 30n+8  cn.
Big-O example, graphically
• Note 30n+8 isn’t
less than n
anywhere (n>0). cn =
31n 30n+8

Value of function 
• It isn’t even
less than 31n
everywhere.
• But it is less than
30n+8
n
31n everywhere to O(n)
the right of n=8.
n>n0=8 
Increasing n 

You might also like