0% found this document useful (0 votes)

6 views34 pages

Patterns_for_Parallel_Programming

The document discusses 'Patterns for Parallel Programming,' emphasizing the importance of design and concurrency in parallel programming. It outlines various design spaces, decomposition strategies, and implementation mechanisms, highlighting the need for efficient and maintainable code in increasingly parallel environments. The publication serves as a guide for understanding recurrent themes and terminology in parallel programming, supported by practical examples from different domains.

Uploaded by

Jesus R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views34 pages

Patterns_for_Parallel_Programming

Uploaded by

Jesus R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/234826291

Patterns for Parallel Programming

Article · September 2004

CITATIONS READS

423 6,629

3 authors, including:

Tim Mattson Beverly A. Sanders

Intel University of Florida
117 PUBLICATIONS 2,554 CITATIONS 71 PUBLICATIONS 1,203 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Machine Programming View project

GraphBLAS View project

All content following this page was uploaded by Tim Mattson on 31 May 2014.

The user has requested enhancement of the downloaded file.

Patterns for Parallel Programming

Timothy Mattson , Beverly Sanders , Berna Massingill,

Patterns for parallel programming, Addison-Wesley Professional, 2004
ISBN-13: 978-0321228116

© Gethin Williams 2010

Sticking Plaster Pitfall

“Could you just tweak my serial code to

make it run in parallel?”

© Gethin Williams 2010

Why Bother With This Book?
● Recipe based
● Recipes guide our thinking
● Help us not to forget
● Introduces recurrent themes and terminology
● e.g. (memory) latency, “loop parallelism”
● Emphasises design
● Amdahl's law highlights the pitfalls of looking for
sticking-plaster speed-ups in serial programs –
design for concurrency
© Gethin Williams 2010
Familiar Mantras -
..only more so
Flexibility
Environments will be more heterogeneous.
Efficiency
We're going parallel for a speed-up, right?
But more pitfalls (latency, thread overheads etc.)
Simplicity
Parallel codes will be more complicated.
All the more reason to strive for maintainable,
understandable programs.
© Gethin Williams 2010
Four Design Spaces
Finding Concurrency

Algorithm Structure

Supporting Structures

Implementation Mechanisms
© Gethin Williams 2010
Finding Concurrency

Dependency Analysis

Decomposition Group Tasks

Task Decomposition
Design Evaluation
Order Tasks

Data Decomposition

Data Sharing

© Gethin Williams 2010

Examples
● HPC: A Climate Model
● Embedded Systems: A Speech Recogniser
● The Cloud: Document Search

Highlights the fact that parallel programming is

emerging everywhere..

© Gethin Williams 2010

Task vs. Data Decomposition

Decomposition

Task Decomposition

Data Decomposition

© Gethin Williams 2010

Data Decomposition (trad. HPC):
A Climate Model

Data Parallel over grid cells

Acoustic Analysis: concurrency in stages and components

Pattern Matching: search over many possible word matches
© Gethin Williams 2010
Finding Relationships between
Concurrent Tasks
Dependency Analysis

Group Tasks

Order Tasks

Data Sharing

© Gethin Williams 2010

Dependency Analysis
group tasks
Pattern Matching
7
(speech recognition) data sharing:
● queue of paths
2 5 ● legal branches

● acoustic i/p

9
order tasks: branch..& bound

0 3 6

match 'cost'
4
2

© Gethin Williams 2010

Algorithm Structure

Organise by Data Organise by Flow

Organise by Tasks Decomposition Of Data
linear regular
Geometric linear Pipeline
Task Parallelism
Decomposition

recursive recursive irregular

Event-Based
Divide and Conquer Recursive Data Coordination

© Gethin Williams 2010

Organise by Tasks -
Some Considerations
● No dependencies between tasks
● massively parallel (vs. embarrassingly serial!)
● Dependencies between tasks
● Temporal (e.g. speech: real-time constraints)
● Separable – 'reductions' (we'll see later)
● Cost of setting up task vs. amount of work done
● See thresholds to switch to serial work
(we'll see this in e.g. quicksort)

© Gethin Williams 2010

Organise by Tasks -
Task Parallelism
Extend & Evaluate

Partial path
Partial path
pop
push
Partial path
Shared data:
Queue of partial paths
Bounding criterion
Partial path •

Partial path

Queue
Branch & bound implemented with a shared queue
© Gethin Williams 2010
Organise by Tasks -
Divide and Conquer

split

merge

merge
e.g. FFT for speech recognition
Sorting algorithms
© Gethin Williams 2010
Organise by Data Decomposition -
Geometric Decomposition

Grid Cells
© Gethin Williams 2010
Organise by Data Decomposition -
Geometric Decomposition

cells assigned
to single
processor

Exchange local data with neighbours

Halo exchange
© Gethin Williams 2010
Organise by Data Decomposition -
Geometric Decomposition
Benefits of halo exchange:
1. Can overlap communication &
computation
2. Compute scales with volume but
communication scales with
surface area

Q. What's wrong with the number

of grid cells here?

Bonus Q. Any issues with the grid

on a globe? Any solutions?

Pipeline

Pipeline
time
stage1 t1 t2 t3 t4

stage2 t1 t2 t3 t4

stage3 t1 t2 t3 t4

Speech recognition:
1. Discrete Fourier Transform (DFT)
2. manipulation e.g. log
3. Inverse DFT
4. Truncate 'Cepstrum' ..

Event-Based Coordination

Irregular events, ordering constraints (queues can be handy)

SPMD Shared Data

Master/Worker Shared Queue

Loop Parallelism Distributed Array

Fork/Join

Useful idioms rather than unique implementations

if(rank == 0) { if(rank == 0) {
printf(“MASTER\n”); printf(“MASTER\n”);
} }
else { else {
printf(“OTHER\n”); printf(“OTHER\n”);
} }

• Only one program to manage

• Conditionals based on thread or process IDs

• Load balance predictable (implicit in branching)

• Plenty of examples and practice when we look at MPI

Master/Worker
Use when load balance is not predictable..
..& work cannot be distributed using loops
● PEs may have different capabilities
● A bag of independent tasks is ideal
● Workers take from bag, process, then take
another
Load is automatically balanced in this way

Master/Worker (Cloud):
MapReduce

Big Data:
Google Processed 20PB/day in 2008 using MapReduce
Also used by Yahoo, FaceBook, eBay etc.

Loop Parallelism
Use if computational expense is concentrated in
loops (common in scientific code)
1. Profile code to find 'hot-spots'
2. Eliminate dependencies between iterations
(e.g. private copies & reductions)
3. Parallelise loops (easy in OpenMP)
4.Tune the performance, e.g. via scheduling
We'll get plenty of practice with OpenMP

Fork/Join
Use if the number of concurrent tasks varies,
e.g. if tasks are created recursively
● Beware: overhead of creating a new UEs
(Uinits of Execution, e.g. thread or process)
● Direct vs. indirect mappings from tasks to Ues
● Sorting algos are an examples

Shared Data
● Try to avoid, as can limit scalability
● Use a concurrency-controlled (e.g. 'thread-
safe') data type:
● One-at-a-time: critical region/'mutex'
● Look for non-interfering operations e.g. readers vs.
writers
● If pushed, finer grained critical regions, but this will
increase complexity & hence the chance of a bug
● 'Shared Queue' is an instance of 'Shared Data'
© Gethin Williams 2010
Distributed Arrays
In a nutshell: partition data and distribute so
that data is close to computation.
● Why? Memory access (esp. over a network) is
slow relative to computation.
● Simple concept but the devil is in the details
● Some terminology:
● 1D block, 2D block and block cyclic distribution
Libraries: e.g. ScaLAPACK

Recap of Key Points
Design..
● for massively parallel systems
● because if not today they will be tomorrow
● and in all areas of computing
Design Patterns..
● provide useful – recurring - solutions
● & structure to the process

Implementation Mechanisms

OpenMP & Pthreads

MPI
OpenCL

View publication stats

Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
No ratings yet
Parallel Programming For Modern High Performance Computing Systems (Czarnul, Pawel)
330 pages
Sta108 - Group Project Assignment
No ratings yet
Sta108 - Group Project Assignment
8 pages
BDS Session 6
No ratings yet
BDS Session 6
53 pages
Flikiti Exam
0% (1)
Flikiti Exam
5 pages
INFOSYS Quality Systems: Prepared by
No ratings yet
INFOSYS Quality Systems: Prepared by
22 pages
React Hooks Cheat Sheet: Usestate Usereducer
100% (4)
React Hooks Cheat Sheet: Usestate Usereducer
1 page
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
BDS Session 5
No ratings yet
BDS Session 5
48 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Partitioning
No ratings yet
Partitioning
37 pages
Lecture4
No ratings yet
Lecture4
29 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
High Performance Computing For Computational Scien
No ratings yet
High Performance Computing For Computational Scien
8 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
AA Part1
No ratings yet
AA Part1
43 pages
Parallel and Distributed Lec 8
No ratings yet
Parallel and Distributed Lec 8
24 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
HPC Introduction Lecture 3
No ratings yet
HPC Introduction Lecture 3
42 pages
Parallel Processing & Distributed Systems: Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
No ratings yet
Parallel Processing & Distributed Systems: Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
16 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Lecture 6
No ratings yet
Lecture 6
37 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Cloud Computing Unit4
No ratings yet
Cloud Computing Unit4
55 pages
Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
Pda 1
No ratings yet
Pda 1
72 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Parallel Programming
No ratings yet
Parallel Programming
692 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Cray-1 (1976) : The World's Most Expensive Love Seat
No ratings yet
Cray-1 (1976) : The World's Most Expensive Love Seat
18 pages
Applied Parallel Computing-Honest
100% (1)
Applied Parallel Computing-Honest
218 pages
High Performance Computing
No ratings yet
High Performance Computing
8 pages
Parallel Programming Models: Sathish Vadhiyar
No ratings yet
Parallel Programming Models: Sathish Vadhiyar
32 pages
Fundamentals of Multicore Software Development PDF
No ratings yet
Fundamentals of Multicore Software Development PDF
322 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Introduction To Parallel Processing: Shantanu Dutt University of Illinois at Chicago
No ratings yet
Introduction To Parallel Processing: Shantanu Dutt University of Illinois at Chicago
51 pages
Parallel Computing A Comparative
No ratings yet
Parallel Computing A Comparative
65 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
6 pages
Deno Deploy in Practice: The Complete Guide for Developers and Engineers
From Everand
Deno Deploy in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Gem5 Simulator - Tutorial: Indian Institute of Technology, Kharagpur High Performance Computer Architecture (CS60003)
No ratings yet
Gem5 Simulator - Tutorial: Indian Institute of Technology, Kharagpur High Performance Computer Architecture (CS60003)
15 pages
Bullet Constraints Builder: User Manual
No ratings yet
Bullet Constraints Builder: User Manual
39 pages
Course Handout: Galgotias University, Greater Noida Fall 2018-2019
No ratings yet
Course Handout: Galgotias University, Greater Noida Fall 2018-2019
5 pages
AWS Solution Architect Associate - Update1
100% (1)
AWS Solution Architect Associate - Update1
133 pages
Amisha G Solanki Final.
No ratings yet
Amisha G Solanki Final.
55 pages
Central-Web-Auth AireOS WLC
No ratings yet
Central-Web-Auth AireOS WLC
17 pages
Certy: Premium Exam Material
No ratings yet
Certy: Premium Exam Material
52 pages
Sambitan - Activity 7 To 10
No ratings yet
Sambitan - Activity 7 To 10
6 pages
Reference PPT On Reinforcement Learning
No ratings yet
Reference PPT On Reinforcement Learning
38 pages
Lesson 6 ICT
No ratings yet
Lesson 6 ICT
23 pages
AUTOSAR SWS NetworkManagementInterface
No ratings yet
AUTOSAR SWS NetworkManagementInterface
105 pages
02-Intro To Gen Ai and Prompting
No ratings yet
02-Intro To Gen Ai and Prompting
6 pages
Kurzanleitung ITV 2021 ENG
No ratings yet
Kurzanleitung ITV 2021 ENG
4 pages
Role of AI in Cyber Security Through Anomaly Detection and Predictive Analysis
No ratings yet
Role of AI in Cyber Security Through Anomaly Detection and Predictive Analysis
4 pages
Keshu Internship Report
No ratings yet
Keshu Internship Report
33 pages
JSP Architecture, Lifecycle, Elements, JSTL
No ratings yet
JSP Architecture, Lifecycle, Elements, JSTL
7 pages
ANNEX-4 Docs
No ratings yet
ANNEX-4 Docs
38 pages
Best Practice Spreadsheet Modelling Standards v6.0
No ratings yet
Best Practice Spreadsheet Modelling Standards v6.0
82 pages
Schulz Pmics Keep Power in Your Hands
No ratings yet
Schulz Pmics Keep Power in Your Hands
40 pages
How To Download PDF Statements AIB/First Trust
No ratings yet
How To Download PDF Statements AIB/First Trust
3 pages
Symantec Enterprise Vault Compatibility Charts
No ratings yet
Symantec Enterprise Vault Compatibility Charts
122 pages
A Study On A Partner Relationship Management (PRM) Solution: Dana@postech - Ac.kr Ehsuh@postech - Ac.kr Ieman@postech - Ac.kr
No ratings yet
A Study On A Partner Relationship Management (PRM) Solution: Dana@postech - Ac.kr Ehsuh@postech - Ac.kr Ieman@postech - Ac.kr
12 pages
Lab Content ..: Programming Language
No ratings yet
Lab Content ..: Programming Language
25 pages
Priyank Dhawan - Program Manager
No ratings yet
Priyank Dhawan - Program Manager
5 pages
Product Data Sheet 6AV2123-2GB03-0AX0: Display
No ratings yet
Product Data Sheet 6AV2123-2GB03-0AX0: Display
10 pages
Practical No.7
No ratings yet
Practical No.7
5 pages

Patterns_for_Parallel_Programming

Uploaded by

Patterns_for_Parallel_Programming

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Patterns for Parallel Programming

Article · September 2004

Tim Mattson Beverly A. Sanders

SEE PROFILE SEE PROFILE

Machine Programming View project

GraphBLAS View project

The user has requested enhancement of the downloaded file.

Timothy Mattson , Beverly Sanders , Berna Massingill,

© Gethin Williams 2010

“Could you just tweak my serial code to

© Gethin Williams 2010

Decomposition Group Tasks

© Gethin Williams 2010

Highlights the fact that parallel programming is

© Gethin Williams 2010

© Gethin Williams 2010

Data Parallel over grid cells

Acoustic Analysis: concurrency in stages and components

© Gethin Williams 2010

© Gethin Williams 2010

Organise by Data Organise by Flow

recursive recursive irregular

© Gethin Williams 2010

© Gethin Williams 2010

Exchange local data with neighbours

Q. What's wrong with the number

Bonus Q. Any issues with the grid

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

Irregular events, ordering constraints (queues can be handy)

SPMD Shared Data

Master/Worker Shared Queue

Loop Parallelism Distributed Array

Useful idioms rather than unique implementations

• Only one program to manage

• Load balance predictable (implicit in branching)

• Plenty of examples and practice when we look at MPI

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

© Gethin Williams 2010

OpenMP & Pthreads

© Gethin Williams 2010

View publication stats

You might also like