0% found this document useful (0 votes)
10 views15 pages

Algorithms

The document discusses the divide-and-conquer strategy in parallel and distributed computing, explaining how problems can be solved by breaking them into subproblems that can be solved independently and concurrently. It outlines the key functions involved in this strategy, the structure of a parallel program, and the conditions under which this approach is effective. Additionally, it contrasts divide-and-conquer with dynamic programming, highlighting their differences in handling subproblems and their respective methodologies.

Uploaded by

malvinrobotics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Algorithms

The document discusses the divide-and-conquer strategy in parallel and distributed computing, explaining how problems can be solved by breaking them into subproblems that can be solved independently and concurrently. It outlines the key functions involved in this strategy, the structure of a parallel program, and the conditions under which this approach is effective. Additionally, it contrasts divide-and-conquer with dynamic programming, highlighting their differences in handling subproblems and their respective methodologies.

Uploaded by

malvinrobotics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Faculty of Computer Engineering Informatics and

Communications

Department of Computer Engineering

Module: Parallel and Distributed Computing Code: HCE/HHE/HSE 311

Parallel Algorithm Design


Divide and Conquer

Consider the divide-and-conquer strategy employed in many sequential algorithms. With this
strategy, a problem is solved by splitting it into subproblems, solving them independently, and
merging their solutions into a solution for the whole problem. The subproblems can be solved
directly, or they can in turn be solved using the same divide-and-conquer strategy, leading to an
overall recursive program structure. The potential concurrency in this strategy is not hard to see:
Since the subproblems are solved independently, their solutions can be computed concurrently,
leading to a parallel program that is very similar to its sequential counterpart. The following figure
illustrates the strategy and the potential concurrency.

HCE/HHE/HSE 311 by Ignatius Chikukuza


The divide-and-conquer strategy can be more formally described in terms of the following
functions (where N is a constant):

 Solution solve(Problem P): Solve a problem (returns its solution).


 Problem[] split(Problem P): Split a problem into N subproblems, each strictly
smaller than the original problem (returns the subproblems).
 Solution merge(Solution[] subS): Merge N subsolutions into solution (returns
the merged solution).
 boolean baseCase(Problem P): Decide whether a problem is a "base case" that
can be solved without further splitting (returns TRUE if base case, FALSE if not).
 Solution baseSolve(Problem P): Solve a base-case problem (returns its
solution).

The strategy then leads to the following top-level program structure:

Solution solve(Problem P) {
if (baseCase(P))
return baseSolve(P);
else {
Problem subProblems[N];
Solution subSolutions[N];
subProblems = split(P);
for (int i = 0; i < N; i++)
subSolutions[i] = solve(subProblems[i]);
return merge(subSolutions);
HCE/HHE/HSE 311 by Ignatius Chikukuza
}
}

If the subproblems of a given problem can be solved independently, then we can solve them in any
order we like, including concurrently. This means that we can produce a parallel application by
replacing the for loop in the above program with a parallel-for construct, so that the subproblems
will be solved concurrently rather than in sequence. It is worth noting at this point that program
correctness is independent of whether the subproblems are solved sequentially or concurrently, so
we can even design a hybrid program that sometimes solves them sequentially and sometimes
concurrently, based on which approach is likely to be more efficient. This will be discussed further
in the Implementation section.

We can map this strategy onto a design in terms of tasks by defining one task for each invocation
of the solve function, as illustrated in the following figure (rectangular boxes correspond to
tasks):

Note the recursive nature of the design, with each task in effect generating and then absorbing a
subtask for each subproblem.

Note also that either the split or the merge phase can be essentially absent:

HCE/HHE/HSE 311 by Ignatius Chikukuza


No split phase is needed if all the base-case problems can be derived directly from the whole
problem (without recursive splitting). In this case, the overall design will look like the bottom half
of our figures.

No merge phase is needed if the problem can be considered solve when all of the base-case
problems have been identified and solved. In this case, the overall design will look like the bottom
half of our figures.

Applicability:

Use the DivideAndConquer pattern when:

 The problem can be solved using the divide-and-conquer strategy, with subproblems being
solved independently.

The pattern is particularly effective when:

 The amount of work required to solve the base case is large compared to the amount of
work required for the recursive splits and merges.
 The split produces subproblems of roughly equal size.

Structure:

Implementations of this pattern include the following key elements:

 Definitions of the functions described in the Motivation section above


(solve, split, merge, baseCase, and baseSolve).
 A way of scheduling the tasks that exploits the available concurrency (subproblems can be
solved concurrently) efficiently.

Usage:

This pattern is typically used to provide high-level structure for an application; that is, the
application is typically structured as an instance of this pattern.

Consequences:

 The traditional divide-and-conquer strategy is a widely useful approach to algorithm


design. The parallel DivideAndConquer pattern shares this characteristic. It has the
additional advantage that a sequential program developed using the divide-and-conquer
strategy is almost trivial to parallelize using the DivideAndConquer pattern.
 A downside to this pattern is that, as the first figure in the Motivation section suggests, the
amount of exploitable concurrency varies over the life of the program; at the outermost
level of the recursion (initial split and final merge) there is no exploitable concurrency,
while at the innermost level (base-case solves) the number of concurrently-executable tasks
is the number of base-case problems (which is often the same as the problem size). Ideally,
we would like to always have at least as many concurrently-executable tasks as processors,
HCE/HHE/HSE 311 by Ignatius Chikukuza
and clearly this pattern falls short in that respect, and the problem only gets worse as we
increase the number of processors, so in general this pattern does not scale well. Effective
use of this pattern depends on reducing the fraction of the program's lifespan during which
there are fewer concurrently-executable tasks than processors, and there are several factors
that contribute to this goal:
o Having a problem whose split and merge operations are computationally trivial
compared to its base-case solve.
o Having a problem size that is large compared to the maximum number of
processors available on the target environment.
o Reducing the number of levels of recursion required to arrive at the base-case solve
by splitting each problem into more subproblems. This generally requires some
algorithmic cleverness but can be quite effective, especially in the limiting case of
"one-deep divide-and-conquer", in which the initial split is into P subproblems,
where P is the number of available processors. See the Related Patterns section for
more discussion of this strategy.
 As in sequential divide-and-conquer, this pattern is more efficient when the subproblems
into which each problem is split are roughly equal in size / computational complexity.

Implementation:

Key issues.

Definitions of functions.

It is usually straightforward to produce a program structure that defines the required functions:
What is required is almost the same as the equivalent sequential program, except for code to
schedule tasks, as described in the next section.

Scheduling the tasks.

Where a parallel divide-and-conquer program differs from its sequential counterpart is that the
parallel version is also responsible for scheduling the tasks in a way that exploits the potential
concurrency (subproblems can be solved concurrently) efficiently.

The simplest approach is to simply replace the sequential for loop over subproblems with a
parallel-for construct, allowing the corresponding tasks to execute concurrently. (Thus, in the
second figure in the Motivation section, the two lower-level splits execute concurrently, the four
base-case solves execute concurrently, and the two lower-level merges execute concurrently.) To
improve efficiency (as discussed later in "Efficiency considerations"), we can also use a
combination of parallel-for constructs and sequential for loops, typically using parallel-for at the
top levels of the recursion and sequential for at the more deeply nested levels. In effect, this
approach combines parallel divide-and-conquer with sequential divide-and-conquer.

Correctness issues.

 Most of the correctness issues in implementing this pattern are the same ones involved in
sequential divide-and-conquer. Considering first the sequential divide-and-conquer strategy
HCE/HHE/HSE 311 by Ignatius Chikukuza
expressed in our earlier pseudocode for solve, we can guarantee that solve(P) returns
a correct solution of P if the other functions meet the following specifications, expressed in
terms of preconditions and postconditions. (As before, N is an integer constant.)
o Solution baseSolve(Problem P):

 Precondition: baseCase(P) = TRUE .

Postcondition: returned value is a solution of P.

o Problem[] split(Problem P):



 Precondition: baseCase(P) = FALSE .

Postcondition: returned value is an array of N subproblems, each strictly


smaller than P, whose solutions can be combined to give a solution of P.
Here, "strictly smaller" means smaller with respect to some integer measure
that, when small enough, indicates a base-case problem. (This ensures that
the recursion is finite.)

o Solution merge(Solution[] subS):



 Precondition: subS is an array of N solutions such that for some
problem P and array of subproblems subP = split(P), subS[i] is a
solution of subP[i], for all i from 1 through N.

Postcondition: returned value is a solution of P.

 One additional restriction is required to make the concurrency work: Solutions of


subproblems must be computed independently. That is, for two distinct
subproblems subP[i] and subP[j] of P, solve(subP[i]) and solve(subP[j]
) must be computed independently. This will be true if neither call to solve modifies
variables shared with the other call.

Efficiency issues.

 If problem size is large compared to the number of available processors, at some point in
the computation the number of concurrently-executable tasks will exceed the number of
processors. If we take the simple approach of always using the parallel-for construct to
schedule the tasks corresponding to subproblems, this approach produces a situation in
which at some point the number of units of execution exceeds the number of available
processors. If such a situation would be inefficient in the target environment (i.e., if
context-switching among UEs is expensive), or if there is significant overhead associated
with the parallel-for construct, it will probably be more efficient to use the parallel-for
construct only for the outer levels of the recursion, switching to a sequential loop when the
total number of subproblems (number of subproblems per split multiplied by recursion
level) exceeds number of available processors.
HCE/HHE/HSE 311 by Ignatius Chikukuza
Examples:

Mergesort.

Mergesort is a well-known sorting algorithm based on the divide-and-conquer strategy, applied as


follows to sort an array of N elements:

 The base case is an array of size 1, which is already sorted and can thus be returned without
further processing.
 In the split phase, the array is split by simply partitioning it into two contiguous subarrays,
each of size N/2 (or (N+1)/2 and (N-1)/2, if N is odd).
 In the solve-subproblems phase, the two subarrays are sorted (by applying the mergesort
procedure recursively).
 In the merge phase, the two (sorted) subarrays are recombined into a single sorted array in
the obvious way.

This algorithm is readily parallelized by performing the two recursive mergesorts in parallel. Code
for this example may be added later.

Matrix diagonalization.

[Dongarra87] describes a parallel algorithm for diagonalizing (computing the eigenvectors and
eigenvalues of) a symmetric tridiagonal matrix T. The problem is to find a matrix Q such that QT ·
T · Q is diagonal; the divide-and-conquer strategy goes as follows (omitting the mathematical
details):

 The base case is a 1-by-1 matrix, which is already diagonal and can be returned without
further processing.
 The split phase consists of finding matrix T' and vectors u, v, such that T = T' + uvT,
and T' has the form

where T1 and T2 are symmetric tridiagonal matrices (which can be diagonalized by


recursive calls to the same procedure).

 The merge phase recombines the diagonalizations of T1 and T2 into a diagonalization of T.

HCE/HHE/HSE 311 by Ignatius Chikukuza


Dynamic Programming

Dynamic Programming is also used in optimization problems. Like divide-and-conquer method,


Dynamic Programming solves problems by combining the solutions of sub problems. Moreover,
Dynamic Programming algorithm solves each sub-problem just once and then saves its answer in a
table, thereby avoiding the work of re-computing the answer every time. Two main properties of a
problem suggest that the given problem can be solved using Dynamic Programming. These
properties are overlapping sub-problems and optimal substructure.

Overlapping Sub-Problems

Similar to Divide-and-Conquer approach, Dynamic Programming also combines solutions to sub-


problems. It is mainly used where the solution of one sub-problem is needed repeatedly. The
computed solutions are stored in a table, so that these don’t have to be re-computed. Hence, this
technique is needed where overlapping sub-problem exists.

For example, Binary Search does not have overlapping sub-problem. Whereas recursive program
of Fibonacci numbers have many overlapping sub-problems.

Optimal Sub-Structure

A given problem has Optimal Substructure Property, if the optimal solution of the given problem
can be obtained using optimal solutions of its sub-problems.

For example, the Shortest Path problem has the following optimal substructure property −

If a node x lies in the shortest path from a source node u to destination node v, then the shortest
path from u to v is the combination of the shortest path from u to x, and the shortest path from
x to v.

The standard All Pair Shortest Path algorithms like Floyd-Warshall and Bellman-Ford are typical
examples of Dynamic Programming

HCE/HHE/HSE 311 by Ignatius Chikukuza


Steps of Dynamic Programming Approach

Dynamic Programming algorithm is designed using the following four steps −

1. Characterize the structure of an optimal solution.

2. Recursively define the value of an optimal solution.

3. Compute the value of an optimal solution, typically in a bottom-up fashion.

4. Construct an optimal solution from the computed information.

Difference between Dynamic programming and Divide and Conquer Technique

Divide & Conquer algorithm partition the problem into disjoint sub problems solve the sub
problems recursively and then combine their solution to solve the original problems. Divide &
Conquer algorithm can be a Bottom-up approach and Top down approach.

Dynamic Programming is used when the sub problems are not independent, e.g. when they share
the same sub problems. In this case, divide and conquer may do more work than necessary,
because it solves the same sub problem multiple times. Dynamic Programming solves each sub
problems just once and stores the result in a table so that it can be repeatedly retrieved if needed
again.
Dynamic Programming is a Bottom-up approach- we solve all possible small problems and then
combine to obtain solutions for bigger problems.
Dynamic Programming is a paradigm of algorithm design in which an optimization problem is
solved by a combination of achieving sub-problem solutions and appearing to the "principle of
optimality".

Divide & Conquer Method Dynamic Programming

HCE/HHE/HSE 311 by Ignatius Chikukuza


It deals (involves) three steps at each level of It involves the sequence of four steps:
recursion:
o Characterize the structure of
Divide the problem into a number of sub
optimal solutions.
problems.
o Recursively defines the
Conquer the sub problems by solving them
values of optimal solutions.
recursively.
o Compute the value of optimal
Combine the solution to the sub problems into solutions in a Bottom-up
the solution for original sub problems. minimum.
o Construct an Optimal Solution
from computed information.
It is Recursive. It is non Recursive.

It does more work on sub problems and hence It solves sub problems only once and then stores
has more time consumption. in the table.

It is a top-down approach. It is a Bottom-up approach.

In this sub problems are independent of each In this sub problems are interdependent.
other.

For example: Merge Sort & Binary Search etc. For example: Matrix Multiplication.

Elements of Dynamic Programming

There are basically three elements that characterize a dynamic programming algorithm:-

1. Substructure: Decompose the given problem into smaller sub problems. Express the
solution of the original problem in terms of the solution for smaller problems.
2. Table Structure: After solving the sub-problems, store the results to the sub
problems in a table. This is done because sub problem solutions are reused many
times, and we do not want to repeatedly solve the same problem over and over
again.
3. Bottom-up Computation: Using table, combine the solution of smaller sub
problems to solve larger sub problems and eventually arrives at a solution to
complete problem.

HCE/HHE/HSE 311 by Ignatius Chikukuza


Components of Dynamic programming

1. Stages: The problem can be divided into several sub problems, which are called
stages. A stage is a small portion of a given problem. For example, in the shortest
path problem, they were defined by the structure of the graph.
2. States: Each stage has several states associated with it. The states for the shortest
path problem were the node reached.
3. Decision: At each stage, there can be multiple choices out of which one of the best
decisions should be taken. The decision taken at every stage should be optimal; this
is called a stage decision.
4. Optimal policy: It is a rule which determines the decision at each stage; a policy is
called an optimal policy if it is globally optimal. This is known as Bellman
principle of optimality.
5. Given the current state, the optimal choices for each of the remaining states do not
depend on the previous states or decisions. In the shortest path problem, it was not
necessary to know how we got a node only that we did.
6. There exists a recursive relationship that identifies the optimal decisions for stage j,
given that stage j+1, has already been solved.
7. The final stage must be solved by itself.

Applications of dynamic programming

1. 0/1 knapsack problem

0/1 Knapsack Problem: Dynamic Programming Approach: Knapsack Problem:

Knapsack is basically means bag. A bag of given capacity. We want to pack n items in your

luggage.

o The ith item is worth vi dollars and weight wi pounds.


o Take as valuable a load as possible, but cannot exceed W pounds.
o vi wi W are integers.

HCE/HHE/HSE 311 by Ignatius Chikukuza


1. W ≤ capacity
2. Value ← Max

Let us consider that the capacity of the knapsack is W = 25 and the items are as shown in the
following table.
Item A B C D

Profit 24 18 18 10

Weight 24 10 10 7

Without considering the profit per unit weight (pi/wi), if we apply dynamic approach to solve
this problem, first item A will be selected as it will contribute maximum profit among all the
elements.

After selecting item A, no more item will be selected. Hence, for this given set of items total
profit is 24. Whereas, the optimal solution can be achieved by selecting items, B and C, where the
total profit is 18 + 18 = 36.

HCE/HHE/HSE 311 by Ignatius Chikukuza


Analysis 0f Parallel Algorithms
A generic algorithm is mainly analysed on the basis of the following parameters: the time
complexity (execution time) and the space complexity (amount of space required).

Usually we give much more importance to time complexity in comparison with space complexity.
The subsequent section highlights the criteria of analysing the complexity of parallel algorithms.
The fundamental parameters required for the analysis of parallel algorithms are as follow:

o Time Complexity
o The Total Number of Processors Required
o The Cost Involved.

Time Complexity

As it happens, most people who implement algorithms want to know how much of a particular
resource (such as time or storage) is required for a given algorithm. The parallel architectures have
been designed for improving the computation power of the various algorithms. Thus, the major
concern of evaluating an algorithm is the determination of the amount of time required to execute.
Usually, the time complexity is calculated on the basis of the total number of steps executed to
accomplish the desired output.

The Parallel algorithms usually divide the problem into more symmetrical or asymmetrical
subproblems and pass them to many processors and put the results back together at one end. The
resource consumption in parallel algorithms is both processor cycles on each processor and also
the communication overhead between the processors.

Thus, first in the computation step, the local processor performs an arithmetic and logic operation.
Thereafter, the various processors communicate with each other for exchanging messages and/or
data. Hence, the time complexity can be calculated on the basis of computational cost and
communication cost involved.

The time complexity of an algorithm varies depending upon the instance of the input for a given
problem. For example, the already sorted list (10,17, 19, 21, 22, 33) will consume less amount of
time than the reverse order of list (33, 22, 21,19,17,10). The time complexity of an algorithm has
been categorised into three forms, viz:

i) Best Case Complexity;

ii) Average Case Complexity; and

iii) Worst Case Complexity.

The best case complexity is the least amount of time required by the algorithm for a given input.
The average case complexity is the average running time required by the algorithm for a given

HCE/HHE/HSE 311 by Ignatius Chikukuza


input. Similarly, the worst case complexity can be defined as the maximum amount of time
required by the algorithm for a given input.

Therefore, the main factors involved for analysing the time complexity depends upon the
algorithm, parallel computer model and specific set of inputs. Mostly the size of the input is a
function of time complexity of the algorithm. The generic notation for describing the time-
complexity of any algorithm is discussed in the subsequent sections.

Asymptotic Notations

These notations are used for analysing functions. Suppose we have two functions f(n) and g(n)
defined on real numbers,

i) Theta Θ Notation: The set Θ(g(n)) consists of all functions f(n), for which there exist positive
constants c1,c2 such that f(n) is sandwiched between c1*g(n) and c2*g(n), for sufficiently large
values of n. In other words, Θ(g(n)) ={ 0<=c1*g(n) <= f(n) <= c2*g(n) for all n >= no }

ii) Big O Notation: The set O(g(n)) consists of all functions f(n), for which there exists positive
constants c such that for sufficiently large values of n, we have 0<= f(n) <= c*g(n). In other words,
O(g(n)) ={ 0<= f(n) <= c*g(n) for all n >= no }

iii) Ω Notation: The function f(n) belongs to the set Ω (g(n)) if there exists positive constants c
such that for sufficiently large values of n, we have 0<= c*g(n) <=f(n). In other words, O(g(n)) ={
0<= c*g(n) <=f(n) for all n >= no }.

Suppose we have a function f(n)= 4n2 + n, then the order of function is O(n2). The asymptotic
notations provide information about the lower and upper bounds on complexity of an algorithm
with the help of Ω and O notations. For example, in the sorting algorithm the lower bound is Ω (n
ln n) and upper bound is O (n ln n). However, problems like matrix multiplication have
complexities like O(n3 ) to O(n2.38). Algorithms which have similar upper and lower bounds are
known as optimal algorithms. Therefore, few sorting algorithms are optimal while matrix
multiplication based algorithms are not.

Another method of determining the performance of a parallel algorithm can be carried out after
calculating a parameter called “speedup”. Speedup can be defined as the ratio of the worst case
time complexity of the fastest known sequential algorithm and the worst case running time of the
parallel algorithm. Basically, speedup determines the performance improvement of parallel
algorithm in comparison to sequential algorithm.

HCE/HHE/HSE 311 by Ignatius Chikukuza


Number of Processors

One of the other factors that assist in analysis of parallel algorithms is the total number of
processors required to deliver a solution to a given problem. Thus, for a given input of size say n,
the number of processors required by the parallel algorithm is a function of n, usually denoted by
TP (n).

Overall Cost

Finally, the total cost of the algorithm is a product of time complexity of the parallel algorithm and
the total number of processors required for computation.

Cost = Time Complexity * Total Number of Processors

The other form of defining the cost is that it specifies the total number of steps executed
collectively by the n number of processors, i.e., summation of steps. Another term related with the
analysis of the parallel algorithms is efficiency of the algorithm. It is defined as the ratio of the
worst case running time of the best sequential algorithm and the cost of the parallel algorithm. The
efficiency would be mostly less than or equal to 1. In a situation, if efficiency is greater than 1 then
it means that the sequential algorithm is faster than the parallel algorithm.

HCE/HHE/HSE 311 by Ignatius Chikukuza

You might also like