0% found this document useful (0 votes)

76 views21 pages

Parallel Algorithms

This document discusses concepts related to parallel programming and designing parallel algorithms. It covers parallelization techniques like Amdahl's law and Gustafson's law. It also discusses partitioning a problem into parallel tasks, defining communication between tasks, and strategies for load balancing work across processors. Key steps in designing parallel algorithms are decomposing problems and data, specifying inter-task communication and dependencies, and mapping tasks to processors while minimizing communication overhead.

Uploaded by

Mvm Fatehpur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views21 pages

Parallel Algorithms

Uploaded by

Mvm Fatehpur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Parallel Programming Concepts

Parallel Algorithms

Peter Tröger

Sources:

• Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995.

• Mattson, Timothy G.; S, Beverly A.; ers,; Massingill, Berna L.: Patterns for Parallel
Programming (Software Patterns Series). 1st. Addison-Wesley Professional, 2004.
• Breshears, Clay: The Art of Concurrency: A Thread Monkey's Guide to Writing
Parallel Applications. O'Reilly Media, Inc., 2009.
Why Parallel ?

• P is the portion of the program that benefits from parallelization

• Amdahl‘s Law (1967)

(1 P )+P
• Maximum speedup sAmdahl by N processors sAmdahl = P
(1 P )+ N
• Largest impact of parallelization with small N and / or small (1-P)

• Speedup by increasing N is limited

• Gustafson‘s Law (1988)

(1 P )N +N ⇤PN
• Maximum speedup sGustafson by N processors sGustaf son = (1 P )N +PN

• Assumption: Problem size grows with N, so the = (1 P ) N + N ⇤ PN

inheritly serial portion becomes smaller as
proportion to the overall problem

• With neglection of the parallelization overhead, speedup can grow as N

ParProg | Algorithms 2 PT 2012

Amdahl's Law
20
P=95%
Speedup

P=90%
10

5
P=75%

P=50%
P=25%
P=10%

1 10 100 1000 1⋅104

Number of processors
ParProg | Algorithms 3 PT 2012
Parallel Algorithms and Design Patterns

• Vast body of knowledge in books and scientific publications

• Typically discussion based on abstract machine model (e.g. PRAM),

to allow theoretical complexity analysis

• Rule of thumb: Somebody else is smarter than you - reuse !!

• Jaja, Joseph: An introduction to parallel algorithms. Redwood City, CA,

USA : Addison Wesley Longman Publishing Co., Inc., 1992. , 0-201-54856-9

• Herlihy, Maurice; Shavit, Nir: The Art of Multiprocessor Programming.

Morgan Kaufmann, 2008. , 978-0123705914

• ParaPLoP - Workshop on Parallel Programming Patterns

• ,Our Pattern Language‘ (http://parlab.eecs.berkeley.edu/wiki/patterns/)

• Programming language support libraries

ParProg | Algorithms 4 PT 2012

Distributed Algorithms [Lynch]
• Originally only for concurrent algorithms across geographically
distributed processors

• Attributes

• IPC method (shared memory, point-to-point, broadcast, RPC)

• Timing model (synchronous, partially synchronous, asynchronous)

• Fault model

• Problem domain

• Have to deal with uncertainties

• Unknown number of processors, unknown network topology, inputs at

different locations, non-synchronized code execution, processor
nondeterminism, uncertain message delivery times, unknown message
ordering, processor and communication failures, ...
ParProg | Algorithms 5 PT 2012
Designing Parallel Algorithms [Breshears]
• Parallel solution must keep sequential consistency property

• „Mentally simulate“ the execution of parallel streams on suspected parts of the

sequential application

• Amount of computation per parallel task must offset the overhead that is always
introduced by moving from serial to parallel code

• Granularity: Amount of computation done before synchronization is needed

• Fine-grained granularity overhead vs.

coarse-grained granularity concurrency

• Iterative approach of finding the right granularity

• Decision might be only correct only for the execution host under test

• Execution order dependency vs. data dependency

ParProg | Algorithms 6 PT 2012

Designing Parallel Algorithms [Foster]
• Translate problem specification into an algorithm achieving concurrency,
scalability, and locality

• Best parallel solution typically differs massively from the sequential version

• Four distinct stages of a methodological approach

• Search for concurrency and scalability:

• 1) Partitioning - decompose computation and data into small tasks

• 2) Communication - define necessary coordination of task execution

• Search for locality and other performance-related issues:

• 3) Agglomeration - consider performance and implementation costs

• 4) Mapping - maximize processor utilization, minimize communication

• Might require backtracking or parallel investigation of steps

ParProg | Algorithms 7 PT 2012
Partitioning Step
• Expose opportunities for parallel execution - fine-grained decomposition

• Good partition keeps computation and data together

• First deal with data partitioning - domain / data decomposition

• First deal with computation partitioning - functional / task decomposition

• Complementary approaches, can lead to different algorithm versions

• Reveal hidden structures of the algorithm that have potential through

complementary views on the problem

• Avoid replication of either computation or data, can be revised later to reduce

communication overhead

• Step results in multiple candidate solutions

ParProg | Algorithms 8 PT 2012

Partitioning - Decomposition Types
• Domain Decomposition

• Define small data fragments, then specify

computation for them

• Different phases of computation on the

same data are handled separately

• Rule of thumb: First focus on large or

frequently used data structures

• Functional Decomposition

• Split up computation into disjoint tasks,

ignore the data accessed for the moment

• Example: Producer / consumer

• With significant data overlap, domain

decomposition is more appropriate
ParProg | Algorithms 9 PT 2012
Partitioning Strategies [Breshears]

• Loop parallelization

• Reason about code behavior when loop would be executed backwards -

strong indicator for independent iterations

• Produce at least as many tasks as there will be threads / cores

• But: Might be more effective to use only fraction of the cores (granularity)

• Computation part must pay-off with respect to parallelization overhead

• Avoid synchronization, since it adds up as overhead to serial execution time

• Patterns for data decomposition: by element, by row, by column group,

by block

• Influenced by surface-to-volume ratio

ParProg | Algorithms 10 PT 2012

Partitioning - Checklist

• Checklist for resulting partitioning scheme

• Order of magnitude more tasks than processors ?

-> Keeps flexibility for next steps

• Avoidance of redundant computation and storage requirements ?

-> Scalability for large problem sizes

• Tasks of comparable size ?

-> Goal to allocate equal work to processors

• Does number of tasks scale with the problem size ?

-> Algorithm should be able to solve larger tasks with more processors

• Resolve bad partitioning by estimating performance behavior,

and eventually reformulating the problem

ParProg | Algorithms 11 PT 2012

Communication Step
• Specify links between data consumers and data producers

• Specify kind and number of messages on these links

• Domain decomposition problems might have tricky communication

infrastructures, due to data dependencies

• Communication in functional decomposition problems can easily be modeled

from the data flow between the tasks

• Categorization of communication patterns

• Local communication (few neighbors) vs. global communication

• Structured communication (e.g. tree) vs. unstructured communication

• Static vs. dynamic communication structure

• Synchronous vs. asynchronous communication

ParProg | Algorithms 12 PT 2012
Communication - Hints

• Distribute computation and communication, don‘t centralize algorithm

• Bad example: Central manager for parallel reduction

• Divide-and-conquer helps as mental model to identify concurrency

• Unstructured communication is hard to agglomerate, better avoid it

• Checklist for communication design

• Do all tasks perform the same amount of communication ?

-> Distribute or replicate communication hot spots

• Does each task performs only local communication ?

• Can communication happen concurrently ?

• Can computation happen concurrently ?

ParProg | Algorithms 13 PT 2012

Ghost Cells

• Domain decomposition might lead to chunks that demand data

from each other for their computation

• Solution 1: Copy necessary portion of data (,ghost cells‘)

• Feasible if no synchronization is needed after update

• Data amount and frequency of update influences

resulting overhead and efficiency

• Additional memory consumption

• Solution 2: Access relevant data ,remotely‘ as needed

• Delays thread coordination until the data is really needed

• Correctness („old“ data vs. „new“ data) must be

considered on parallel progress

ParProg | Algorithms 14 PT 2012

Agglomeration Step

• Algorithm so far is correct, but not specialized for some execution environment

• Check again partitioning and communication decisions

• Agglomerate tasks for more efficient execution on some machine

• Replicate data and / or computation for efficiency reasons

• Resulting number of tasks can still be greater than the number of processors

• Three conflicting guiding decisions

• Reduce communication costs by coarser granularity of computation

and communication

• Preserve flexibility with respect to later mapping decisions

• Reduce software engineering costs (serial -> parallel version)

ParProg | Algorithms 15 PT 2012

Agglomeration [Foster]

ParProg | Algorithms 16 PT 2012

Agglomeration - Granularity vs. Flexibility

• Reduce communication costs by coarser granularity

• Sending less data

• Sending fewer messages (per-message initialization costs)

• Agglomerate tasks, especially if they cannot run concurrently anyway

• Reduces also task creation costs

• Replicate computation to avoid communication (helps also with reliability)

• Preserve flexibility

• Flexible large number of tasks still prerequisite for scalability

• Define granularity as compile-time or run-time parameter

ParProg | Algorithms 17 PT 2012

Agglomeration - Checklist

• Communication costs reduced by increasing locality ?

• Does replicated computation outweighs its costs in all cases ?

• Does data replication restrict the range of problem sizes / processor counts ?

• Does the larger tasks still have similar computation / communication costs ?

• Does the larger tasks still act with sufficient concurrency ?

• Does the number of tasks still scale with the problem size ?

• How much can the task count decrease, without disturbing load balancing,
scalability, or engineering costs ?

• Is the transition to parallel code worth the engineering costs ?

ParProg | Algorithms 18 PT 2012

Mapping Step

• Only relevant for distributed systems, since shared memory systems typically
perform automatic task scheduling

• Minimize execution time by

• Place concurrent tasks on different nodes

• Place tasks with heavy communication on the same node

• Conflicting strategies, additionally restricted by resource limits

• In general, NP-complete bin packing problem

• Set of sophisticated (dynamic) heuristics for load balancing

• Preference for local algorithms that do not need global scheduling state

ParProg | Algorithms 19 PT 2012

Surface-To-Volume Effect [Foster, Breshears]

• Communication requirements of a task are proportional to the surface of the

data part it operates upon - amount of ,borders‘ on the data

• Computational requirements of a task are proportional to the volume of the

data part it operates upon - granularity of decomposition

• Communication / computation ratio

decreases for increasing data size
per task

• Better to have coarse granularity

by agglomerating tasks in all dimensions

• For given volume (computation),

the surface area (communication)
then goes down -> good

ParProg | Algorithms 20 PT 2012

Surface-to-Volume Effect [Foster]

• Computation on 8x8 grid

• (a): 64 tasks, one point each

• 64x4=256 communications

• 256 data values are

transferred

• (b): 4 tasks, 16 points each

• 4x4=16 communications

• 16x4=64 data values are

transferred

ParProg | Algorithms 21 PT 2012

Python MQL Connectivity Programs
67% (3)
Python MQL Connectivity Programs
8 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
01 - Patterns For Finding Concurrency
No ratings yet
01 - Patterns For Finding Concurrency
36 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
31 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Pda 1
No ratings yet
Pda 1
72 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Partitioning
No ratings yet
Partitioning
37 pages
Massing Ill 4
No ratings yet
Massing Ill 4
29 pages
Presented by
No ratings yet
Presented by
23 pages
Module 3 - Principles of Parallel Algorithm Design
No ratings yet
Module 3 - Principles of Parallel Algorithm Design
39 pages
HPC Chapter 2
No ratings yet
HPC Chapter 2
16 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Lecture1 IO BLG336E 2022
No ratings yet
Lecture1 IO BLG336E 2022
87 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Lecture06 - High-Level Digital Design Automation
No ratings yet
Lecture06 - High-Level Digital Design Automation
31 pages
CS302 Design and Analysis of Algorithm: Week-1a
No ratings yet
CS302 Design and Analysis of Algorithm: Week-1a
39 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Parallel and Distributed Lec 8
No ratings yet
Parallel and Distributed Lec 8
24 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
37 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
Parallel Paradigms
No ratings yet
Parallel Paradigms
16 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Debugging, Profiling, Performance Analysis, Optimization PDF
No ratings yet
Debugging, Profiling, Performance Analysis, Optimization PDF
56 pages
Syllabus For Guest Lecture Subject
No ratings yet
Syllabus For Guest Lecture Subject
4 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Overheads
No ratings yet
Overheads
139 pages
Lecture 3 and 4HPC
No ratings yet
Lecture 3 and 4HPC
24 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
PA Lesson Plan
No ratings yet
PA Lesson Plan
10 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Parallel Algorithm Main Single
No ratings yet
Parallel Algorithm Main Single
289 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Parallel Algorithms Complete Notes
No ratings yet
Parallel Algorithms Complete Notes
13 pages
CSC 201 Design & Analysis of Algorithms: Khalid Mahmood Lectu Rer
No ratings yet
CSC 201 Design & Analysis of Algorithms: Khalid Mahmood Lectu Rer
43 pages
Week 3 Parallel Algorithms
No ratings yet
Week 3 Parallel Algorithms
10 pages
Unit 2
No ratings yet
Unit 2
64 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
Unit 1 Dsa
No ratings yet
Unit 1 Dsa
55 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
CS353 Midterm Slides
No ratings yet
CS353 Midterm Slides
297 pages
Parallel Algorithm Design Principles and Programming
No ratings yet
Parallel Algorithm Design Principles and Programming
8 pages
Unit 3
No ratings yet
Unit 3
49 pages
Apache Nemo Data Processing Optimization: The Complete Guide for Developers and Engineers
From Everand
Apache Nemo Data Processing Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Project Quantity Sheet (PQS) Guidance Document
No ratings yet
Project Quantity Sheet (PQS) Guidance Document
28 pages
Linkers and Loaders
100% (2)
Linkers and Loaders
34 pages
Python Worksheet 1: Task Completed
No ratings yet
Python Worksheet 1: Task Completed
4 pages
Difference
No ratings yet
Difference
24 pages
Resume 2023
No ratings yet
Resume 2023
2 pages
Nice Material Crack The Interview
100% (3)
Nice Material Crack The Interview
551 pages
Vs Code Darcula
No ratings yet
Vs Code Darcula
7 pages
UNIT II-Syntax Analysis: CS416 Compilr Design 1
No ratings yet
UNIT II-Syntax Analysis: CS416 Compilr Design 1
29 pages
Step-By-Step Guide To Migrate From Active Directory 2012 R2 To Active Directory 2019 (PowerShell Guide) - Technical Blog - REBELADMIN-English
No ratings yet
Step-By-Step Guide To Migrate From Active Directory 2012 R2 To Active Directory 2019 (PowerShell Guide) - Technical Blog - REBELADMIN-English
13 pages
Given A Square Matrix
No ratings yet
Given A Square Matrix
6 pages
Solution Architect - Hybris
No ratings yet
Solution Architect - Hybris
2 pages
Brianson S. Dagos
No ratings yet
Brianson S. Dagos
4 pages
C++ Developer CV
No ratings yet
C++ Developer CV
7 pages
10 - The World Wide Web
No ratings yet
10 - The World Wide Web
51 pages
Company Profile PDF
No ratings yet
Company Profile PDF
39 pages
VBA While Loop
No ratings yet
VBA While Loop
16 pages
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
No ratings yet
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
13 pages
CORBA (Common Object Request Broker Architecture)
No ratings yet
CORBA (Common Object Request Broker Architecture)
3 pages
C Standard Library
100% (1)
C Standard Library
17 pages
Unit - II
100% (2)
Unit - II
44 pages
Awesome Cloud Security Labs 1694611003
No ratings yet
Awesome Cloud Security Labs 1694611003
8 pages
BCA Semester IV Web Programming Using PHP - Mark Wise Questi
No ratings yet
BCA Semester IV Web Programming Using PHP - Mark Wise Questi
6 pages
Group Project Berkumpulan Coding
No ratings yet
Group Project Berkumpulan Coding
17 pages
My Resume
No ratings yet
My Resume
3 pages
Role Creation SAP
No ratings yet
Role Creation SAP
4 pages
Cisy112 Bbit326 Assign
No ratings yet
Cisy112 Bbit326 Assign
3 pages
Technical Questions With Answers - Data Management
No ratings yet
Technical Questions With Answers - Data Management
12 pages
Itika Jindal Resume React&Dotnet
No ratings yet
Itika Jindal Resume React&Dotnet
2 pages
Partial Agile-Lab Manual
100% (4)
Partial Agile-Lab Manual
20 pages