0% found this document useful (0 votes)

12 views

12.revision Parallelization

Uploaded by

spareyash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

12.revision Parallelization

Uploaded by

spareyash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Revision

Lecture 12
February 14, 2024
Broadcast Algorithms in MPICH
• Short messages
• < MPIR_CVAR_BCAST_SHORT_MSG_SIZE
• Binomial
• Medium messages
• Scatter + Allgather (Recursive doubling)
• Large messages
• > MPIR_CVAR_BCAST_LONG_MSG_SIZE
• Scatter + Allgather (Ring)

2
Old vs. New MPI_Bcast

Van de Geijn

3
Reduce on 64 nodes

4
Allgather – Ring Algorithm
• Every process sends to and 0 1 2 3
1 4 10 1 7 3 9 15
receives from everyone else
n/p n/p n/p n/p
• Assume p processes and total
n bytes 1 4 9 15
1 4 10 1
• Every process sends and
receives n/p bytes 10 1 7 3
7 3 9 15
• Time
• (p – 1) * (L + n/p*(1/B))
• How can we improve?

5
Non-blocking Point-to-Point

• MPI_Isend (buf, count, datatype, dest, tag, comm, request)

• MPI_Irecv (buf, count, datatype, source, tag, comm, request)

• MPI_Wait (request, status)

• MPI_Waitall (count, request, status)

6
Many-to-one Non-blocking P2P

7
Non-blocking Performance
• Standard does not require overlapping communication and
computation
• Implementation may use a thread to move data in parallel
• Implementation can delay the initiation of data transfer until “Wait”
• MPI_Test – non-blocking, tests completion, starts progress
• MPIR_CVAR_ASYNC_PROGRESS (MPICH)

8
Non-blocking Point-to-Point Safety
• MPI_Isend (buf, count, datatype, dest, tag, comm, request)
• MPI_Irecv (buf, count, datatype, source, tag, comm, request)
• MPI_Wait (request, status)

0 1
MPI_Isend MPI_Isend Safe
MPI_Recv MPI_Recv

9
Mesh Interconnect

• Diameter 2(√p – 1)
• Bisection width √p
• Cost 2(p – √p)
10
Torus Interconnect

• Diameter 2(√p/2)
• Bisection width 2√p
• Cost 2p
11
Parallelization
Parallelization Steps
1. Decomposition of computation into tasks
• Identifying portions of the work that can be performed concurrently
2. Assignment of tasks to processes
• Assigning concurrent pieces of work onto multiple processes running in parallel
3. Orchestration of data access, communication and synchronization among processes
• Distributing the data associated with the program
• Managing access to data shared by multiple processes
• Synchronizing at various stages of the parallel program execution
4. Mapping of processes to processors
• Placement of processes in the physical processor topology

13
Illustration of Parallelization Steps
Expose enough
concurrency

Source: Culler et al. book

14
Performance Goals
• Expose concurrency
• Reduce inter-process communications
• Load-balance
• Reduce synchronization
• Reduce idling
• Reduce management overhead
• Preserve data locality
• Exploit network topology

15
Matrix Vector Multiplication – Decomposition
P=3?
P1
Decomposition
Identifying portions of the
P2 work that can be performed
concurrently
P3 Assignment

16
Matrix Vector Multiplication – Orchestration
P=3

P1 Decomposition
Assignment
P2 Orchestration
• Allgather/Bcast
P3 • Scatter
• Gather
• Initial communication
• Distribute (read by process 0) or parallel reads
• Final communication 17
Distribute using Bcast vs. Allgather

18
Bcast vs. Allgather

19
Bcast vs. Allgather

20
Matrix Vector Multiplication – Column-wise
Decomposition

Decomposition
Assignment
Orchestration
P1 P2 P3
• Reduce

Row-wise vs. column-wise partitioning?

21
1D Domain Decomposition

N grid points
P processes
N/P points per process
Grid
point
P1 P2 P3 P4 #Communications?
2
1D domain
#Computations?
Nearest neighbor communications N/P

2 sends()
Communication to computation ratio=2P/N
2 recvs()
22
1D Domain Decomposition
N grid points
P processes
Grid N/P points per process
point
#Communications?
2√N (assuming square grid)
#Computations?
N/P (assuming square grid)

2D domain
Communication to computation ratio=?
23
2D Domain decomposition

Grid
point N grid points (√N x √N grid)
P processes (√P x √P grid)
N/P points per process

+ Several parallel communications

+ Lower communication volume/process
24
2D Domain decomposition

Grid 2 Sends()
point 2 Recvs()

#Communications?
2√N/√P (assuming square grid)
#Computations?
N/P (assuming square grid)

Communication to computation ratio=?

25
Stencils

Five-point stencil

26
2D Domain decomposition

Grid 4 Sends() N grid points (√N x √N grid)

P processes (√P x √P grid)
point 4 Recvs() N/P points per process

#Communications?
4√N/√P (assuming square grid)
#Computations?
N/P (assuming square grid)

Communication to computation ratio=?

27
Send / Recv

0 1 2 3

4 5 6 7 MPI_Send MPI_Recv

0 1 2 3 4 5 6 7

28
Send / Recv

0 1 2 3

4 5 6 7 MPI_Pack (buf) MPI_Recv (buf)

MPI_Send (buf) MPI_Unpack (buf)

0 1 2 3 4 5 6 7

29
MPI_Pack

int MPI_Pack (const void

*inbuf, int incount,
MPI_Datatype datatype,
void *outbuf, int outsize,
int *position, MPI_Comm
comm)

Software Architecture in Practice
100% (7)
Software Architecture in Practice
717 pages
Iec 1131
No ratings yet
Iec 1131
4 pages
11.collectives II
No ratings yet
11.collectives II
20 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Mpi
No ratings yet
Mpi
67 pages
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
No ratings yet
ST7 SHP 2.2 MessagePassing MPI p2p Communications 1spp 2
53 pages
Untitled document
No ratings yet
Untitled document
23 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Mpi
No ratings yet
Mpi
46 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Parallel Computing: MPI - Collective Communication
No ratings yet
Parallel Computing: MPI - Collective Communication
52 pages
Thakur03-Improving the Performance of Collective Operations in MPICH
No ratings yet
Thakur03-Improving the Performance of Collective Operations in MPICH
11 pages
VSS-MPI-2
No ratings yet
VSS-MPI-2
23 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
No ratings yet
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
199 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
2-MPI
No ratings yet
2-MPI
13 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
BIg data anslysi
No ratings yet
BIg data anslysi
57 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Module 5
No ratings yet
Module 5
9 pages
hpc_scaling
No ratings yet
hpc_scaling
56 pages
mpi_book
No ratings yet
mpi_book
673 pages
HPC Lecture40
No ratings yet
HPC Lecture40
25 pages
MPI (2)
No ratings yet
MPI (2)
25 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
No ratings yet
Floyd's Algorithm: Input N: Number of Vertices A (0..n-1) (0..n-1) - Adjacency Matrix
7 pages
Intro_MPI
No ratings yet
Intro_MPI
60 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
MPI (1)
No ratings yet
MPI (1)
57 pages
MPI Plamen Krastev
No ratings yet
MPI Plamen Krastev
49 pages
Using MPI Portable Programming With The Message Pa PDF
No ratings yet
Using MPI Portable Programming With The Message Pa PDF
8 pages
Introduction to MPI Basics
No ratings yet
Introduction to MPI Basics
8 pages
Pdcnotes
No ratings yet
Pdcnotes
23 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
l2
No ratings yet
l2
24 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
PC_course_notes_May17
No ratings yet
PC_course_notes_May17
123 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
08_1_MPI_Comm_Data_Distributions
No ratings yet
08_1_MPI_Comm_Data_Distributions
60 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
Introduction to C MPI PM
No ratings yet
Introduction to C MPI PM
50 pages
CS621 Final Term Current Papers
No ratings yet
CS621 Final Term Current Papers
9 pages
Parallel Computing: Lecture 4: Parallel Software: Basics
No ratings yet
Parallel Computing: Lecture 4: Parallel Software: Basics
31 pages
Week09 L2
No ratings yet
Week09 L2
13 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Pic® Micro Principles V11
From Everand
Pic® Micro Principles V11
Clive W. Humphris
No ratings yet
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
From Everand
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
Orhan Gazi
No ratings yet
GM06NW Protocol
No ratings yet
GM06NW Protocol
19 pages
Database and MySQL Assignment
No ratings yet
Database and MySQL Assignment
8 pages
SPARKING ZERO CUSTOM ENGINE GUIDE
No ratings yet
SPARKING ZERO CUSTOM ENGINE GUIDE
40 pages
Samsung Ue32es5500 Ue40es5500 Ue46es5500 Ue32eh5300 Ue40eh5300 Ue46eh5300 Ue26eh4500 Ue22es5400 Training
No ratings yet
Samsung Ue32es5500 Ue40es5500 Ue46es5500 Ue32eh5300 Ue40eh5300 Ue46eh5300 Ue26eh4500 Ue22es5400 Training
138 pages
Lecture12_EELE_1232
No ratings yet
Lecture12_EELE_1232
14 pages
Ultra Family Software Installation
No ratings yet
Ultra Family Software Installation
4 pages
Parle CoreStocky User Manual
No ratings yet
Parle CoreStocky User Manual
27 pages
ICT Final Exam
No ratings yet
ICT Final Exam
2 pages
What Is Epi Info PDF
No ratings yet
What Is Epi Info PDF
2 pages
Nist SP 800-213a
No ratings yet
Nist SP 800-213a
94 pages
WEEK 3 SLM Empowerment
No ratings yet
WEEK 3 SLM Empowerment
28 pages
Product Keys
No ratings yet
Product Keys
2 pages
Manual de Programa 'Ao Tpw04
No ratings yet
Manual de Programa 'Ao Tpw04
602 pages
Business Continuity Planning A Comprehensive Approach
No ratings yet
Business Continuity Planning A Comprehensive Approach
10 pages
Form 5A8 Curriculum Vitae (CV) For Each Proposed Professional Staff
No ratings yet
Form 5A8 Curriculum Vitae (CV) For Each Proposed Professional Staff
2 pages
KSP
No ratings yet
KSP
2,395 pages
530CID02 DS en PDF
No ratings yet
530CID02 DS en PDF
8 pages
CBAT-Exam Notice dtd.08.08.2024
No ratings yet
CBAT-Exam Notice dtd.08.08.2024
1 page
Asterion DC Asm Series Datasheet
No ratings yet
Asterion DC Asm Series Datasheet
14 pages
Empowerment Technologies Q2 Module 6
No ratings yet
Empowerment Technologies Q2 Module 6
40 pages
Performance and Evaluation of Ofdm and Sc-Fde Over An Awgn Propagation Channel Under RF Impairments Using Simulink at 60Ghz
No ratings yet
Performance and Evaluation of Ofdm and Sc-Fde Over An Awgn Propagation Channel Under RF Impairments Using Simulink at 60Ghz
6 pages
BSCS Revised Roadmap Batch 14 To Onwards
No ratings yet
BSCS Revised Roadmap Batch 14 To Onwards
40 pages
Sample Super Store - Assignment
No ratings yet
Sample Super Store - Assignment
3 pages
55H6500F SPEC SHEET - Doc v0.4 - 03042019
No ratings yet
55H6500F SPEC SHEET - Doc v0.4 - 03042019
2 pages
PS 1000 WR 011
No ratings yet
PS 1000 WR 011
32 pages
Handloom Helpline Centre: Request For Proposal
No ratings yet
Handloom Helpline Centre: Request For Proposal
51 pages
Chat GPT SEO GMBCrush
No ratings yet
Chat GPT SEO GMBCrush
8 pages
question_7
No ratings yet
question_7
4 pages