Distributed Memory Programming Using

Distributed Memory
Programming using MPI

TABLE OF CONTENTS
01 03
Introduction Deadlocks
02 04
Blocking and Non- Splitting & Collective
Blocking communication
Programming (Computing)
Sequential Parallel
programming programming
Serial program
1. A problem is broken into a discrete series of instruction
2. Instructions are executed sequentially one after another at any moment on a
single processor
Problem
? ?
Sub problem 1
Sub problem 3
Sub problem 4
Sub probem 2
Processor
Parallel program
1. The use of multiple computers, processors or cores working together on a
common task.
2. Each processor works on a section of the problem
3. Processors are allowed to exchange information (data in local memory) with
other processors
CPU #1 works in this CPU #2 works in this

area of the problem area of the problem
Problem
CPU #3 works in this CPU #4 works in this

area of the problem area of the problem
MPI
1. Stands for Message Passing Interface
2. Standard library for message passing
3. MPI is for distributed-memory parallel programming. Instead of using a
single processor, you use a group of machines to solve a problem parallelly.
4. Each process has its own address space (memory)
5. Each process communicate with other processes through a network
6. The MPI functions (routines) simplify communication between these
processes on distributed machines.
</> Syntax
MPI routines, data-types, constants , reduction operation are prefixed by
“MPI_”
Term Meaning
MPI_FLOAT Float
MPI_DOUBLE Double
MPI_INT Int
MPI_PACKED Packed sequence of bytes
MPI_BAND Bit-wise and
MPI_SUCCESS ( Constant return integer value )

DEFINITION OF CONCEPTS
Process Group
1. Group of processes
the instance of a
2. Each group assosiated
computer program
with a communicator
that is being executed
3.Base group contain all
It contains the
process and associated
program code and its
with
state
MPI_COM_WORLD
Communicator rank
Object that make A unique ID for each
processes of the same process in a certain
group to communicate group
with each other
Minimal set of routines
//initializes MPI environment
1.int MPI_Init(int *argc, char ***argv)
//terminate MPI environment

2. int MPI_Finalize()
//determine the number of processes

3. int MPI_Comm_size (MPI_Comm comm, int *size)
Minimal set of routines
//determine the rank of a process
4. int MPI_Comm_rank (MPI_Comm comm, int *rank)
//send message to another process

5. int MPI_Send(void *buf, int count, MPI_Datatype datatype,int dest_pe, int
tag, MPI_Comm comm)
//recieve message from another process

6. int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source_pe,
int tag, MPI_Comm comm, MPI_Status *status)
openMP, MPI and forking
OpenMP Multiprocessing MPI

• Parallelism • Parallelism on • Parallelism
using shared the same using
memory machine distributed
memory
• On different
machine
02
Point to Point Communication
Point to Point Communication
there are two types of communications, point-to-point (that we are going to call P2P from now on), and
collective. P2P communications are divided in two operations : Send and Receive.
The most basic forms of P2P communication are called blocking communications. The process sending a
message will be waiting until the process receiving has finished receiving all the information. This is the
easiest form of communications but not necessarily the fastest .
Send/receive
First, process A decides a message needs to be sent to process B. Process A then packs up all of its necessary data
into a buffer for process B. These buffers are often referred to as envelopes since the data is being packed into a
single message before transmission (similar to how letters are packed into envelopes before transmission to the post
office). After the data is packed into a buffer, the communication device (which is often a network) is responsible
for routing the message to the proper location. The location of the message is defined by the process’s rank.
Even though the message is routed to B, process B still has to acknowledge that it wants to receive A’s data. Once it
does this, the data has been transmitted. Process A is acknowledged that the data has been transmitted and may go
back to work.
Example int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Send( void* data, int count, MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
MPI_Datatype datatype, int
if (world_rank == 0) {
destination, int tag, MPI_Comm number = -1;
communicator) MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_Recv( void* data, int count, MPI_STATUS_IGNORE);
MPI_Datatype datatype, int source, printf("Process 1 received number %d from process 0\
int tag, MPI_Comm communicator, n",number);
}
MPI_Status* status)
Output: Process 1 received number -1 from process 0
MPI_Status
MPI_Status: provides information about the received message

Blocking Communication
Blocking Communication
Synchronous
Buffered
Blocking
Ready
Standard
Synchronous Blocking (MPI_Ssend)
Message transfer must be preceded by a sender-
receiver “handshake”. When the blocking
synchronous send MPI_Ssend is executed, the
sending task sends the receiving task a “ready to
send” message. When the receiver executes the
receive call, it sends a “ready to receive” message.
Once both processes have successfully received the
other’s “ready” message, the actual data transfer can
begin.
Synchronization overhead
▪ Synchronous send has more waiting, because a handshake must arrive before the send can
occur.
▪ If the receiver posts a blocking receive first, then the synchronization delay will occur on
the receiving side. Given large numbers of independent processes on disparate systems,
keeping them in sync is a challenging task and separate processes can rapidly get out of
sync causing fairly major synchronization overhead. MPI_Barrier() can be used to try to
keep nodes in sync, but probably doesn’t reduce actual overhead. MPI_Barrier() blocks
processing until all tasks have checked in (i.e., synced up), so repeatedly calling Barrier
tends to just shift synchronization overhead out of MPI_Send/Recv and into the Barrier
calls.
Buffered Blocking (MPI_Bsend)
The blocking buffered send MPI_Bsend copies the data

from the message buffer to a user-supplied buffer, and
then returns. The data will be copied from the user-
supplied buffer over the network once the “ready to
receive” notification has arrived.
System overhead
▪ Buffered send has more system overhead due to the extra copy operation..
▪ The sending task can then proceed with calculations that modify the original message buffer,
knowing that these modifications will not be reflected in the data actually sent.
▪ Buffered mode incurs extra system overhead, because of the additional copy from the message
buffer to the user-supplied buffer. On the other hand, synchronization overhead is eliminated on the
sending task—the timing of the receive is now irrelevant to the sender. Synchronization overhead
can still be incurred by the receiving task, though, because it must block until the send has been
completed.
▪ In buffered mode, the programmer is responsible for allocating and managing the data buffer (one
per process), by using calls to MPI_Buffer_attach and MPI_Buffer_detach. This has the advantage
of providing increased control over the system, but also requires the programmer to safely manage
this space. If a buffered-mode send requires more buffer space than is available, an error will be
generated, and (by default) the program will exit.
Advantages DisAdvantages
▪ Safest, therefore
most portable
▪ ▪ Can incur
Synchronous No need for substantial
Blocking extra buffer space Synchronization
▪ overhead
SEND/RECV order not
critical
▪ timing of the
corresponding receive is
irrelevant ▪ Copying to buffer
Buffered incurs additional
Blocking system overhead
▪ synchronization overhead
on the sender is
eliminated
Ready Blocking (MPI_Rsend())
-It is same as MPI_Ssend(), but, it expects a ready destination to receive the

message. This means, it does not do the handshaking process.
Ready blocking (MPI_Rsend())
• As it expects a ready destination, If the corresponding Recv has not yet been called, the
message that will be received might be ill-defined so you have to make sure that the
receiving process has asked for a Recv before calling the ready mode.
• No synchronization <handshaking time> or system <copy buffer> overhead.
• It’s the programmer’s responsibility to make sure that data sent has a waiting receive.
Standard Blocking (MPI_Send)
● This is the standard send-rev mode <the “non” mode>, it is allowed to buffer, either on the sender
or receiver side, or to wait for the matching receive.
● MPI decides which scenario is the best in terms of performance, memory, and so on. This might be
heavily dependent on the implementation.
● In any case, the data can be safely modified after the function returns. You can also reuse the buffer.
Standard blocking (openMPI case)
▪ In OpenMPI, the observed behavior is that for short messages, the send is automatically buffered at
the receiver side (using eager protocol), while for long messages, the message will be sent using a
mode somewhat close to the synchronous mode (using rendezvous protocol).
Non-blocking Communication
Non-blocking communications always require to be initialized and

completed. What that means is that now, we will call a send and a
receive commands to initialize the communication. Then, instead of
waiting to complete the send (or the receive), the process will
continue working, and will check once in a while to see if the
communication is completed.
Non-blocking Communication
Synchronous
MPI_Issend/Isrecv
Buffered
MPI_Ibsend/Ibrecv
Non-Blocking Ready
MPI_Irsend/Irrecv
Standard
MPI_Isend/Irecv
Non-blocking communication
- Syntax: - Example:
int MPI_Isend(const void *buf, int count,

MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm,
MPI_Request *request);
int MPI_Irecv(const void *buf, int count,

MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm,
MPI_Request *request);
• Something we haven't seen before : MPI_Request. Unlike MPI_Status though,
MPI_Request is a complex object and we won't elaborate here. See it that way : MPI_Isend
is preparing a request. This request is going to be executed when both processes are ready
to synchronise. This command only sets up the send, but actually does not transfer
anything to the destination process, only prepares it.
• Once this request has been prepared, it is necessary to complete it. There are two ways of
completing a request : wait and test.
• Non-blocking does not mean asynchronize. That's the function returns while the data
exchange process is still running in the background. Means you cannot reuse the buffer
until MPI_Test, MPI_Wait are done.
• MPI_Wait:
int MPI_Wait(MPI_Request *request, MPI_Status *status);
int MPI_Waitany(int count, MPI_Request array_of_requests[], int *index, MPI_Status *status);
▪ Waiting forces the process to go in "blocking mode". The sending process will simply wait for the
request to finish. If your process waits right after MPI_Isend, the send is the same as calling MPI_Send.
There are two ways to wait MPI_Wait and MPI_Waitany.
▪ The former, MPI_Wait just waits for the completion of the given request. As soon as the request is
complete an instance of MPI_Status is returned in status.
▪ The latter, MPI_Waitany waits for the first completed request in an array of requests to continue. As
soon as a request completes, the value of index is set to store the index of the completed request of
array_of_requests. The call also stores the status of the completed request.
• MPI_Test:
int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status);
int MPI_Testany(int count, MPI_Request array_of_requests[], int *index, int *flag, MPI_Status *status);
▪ Testing is a little bit different. As we've seen right before, waiting blocks the process until the request
(or a request) is fulfilled. Testing checks if the request can be completed. If it can, the request is
automatically completed and the data transferred. As for wait, there are two waits of testing : MPI_Test
and MPI_Testany.
▪ As for MPI_Wait, the parameters request and status hold no mystery. Now remember that testing is non-
blocking, so in any case the process continues execution after the call. The variable flag is there to tell
you if the request was completed during the test or not. If flag != 0 that means the request has been
completed.
Race condition
Message Exchange in one call
1- use 2 buffers 🡪 one for each process
2- use only one buffer
Benefits: 🡪 we use them in shift operations
MPI_Sendrecv
buffer
Process 0 Process 1
buffer
MPI_Sendrecv_replace
buffer
When to use which !
- if the amount and type of data sent and received are

identical. 🡪 Use MPI_Sendrecv_replace
- if not 🡪 Use MPI_Sendrecv

Deadlocks
Definition
A situation at which at least 2 processes are

waiting for an event that will never occur
Conditions for deadlock
- Mutual - Hold and wait

exclusion
A piece of code that can A process is holding at
only be executed by one least one resource and
process at a time waiting for resources
- No
preemption - Circular wait
a resource can be A set of processes are
released only by the waiting for each other in
process holding it a circular form
Hold & Wait
● Both processes 1 & 2 need to
hold resources 1 & 2
● Process 1 is holding resource1
and waiting for resource 2
● But process 2 is holding
resource 2 and waiting for
resource 1
Circular Wait
int a[10], b[10], npes, myrank;
MPI_Status status;
...
MPI_Comm_size(MPI_COMM_WORLD, &npes)
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
Send data to MPI_Send(a, 10, MPI_INT, (myrank + 1) % npes, 1,
MPI_COMM_WORLD);
neighbor to your
MPI_Recv(b, 10, MPI_INT, (myrank - 1 + npes) % npes, 1,
right on a ring MPI_COMM_WORLD, WORLD, &status)
...
Receive data from

neighbor to your
left on a ring
Break Circular Wait
Break Circular Wait
1. int a[10], b[10], npes, myrank;
2. MPI_Status status;
3. ...
4. MPI_Comm_size(MPI_COMM_WORLD, &npes);
5. MPI_Comm_rank(MPI_COMM_WORLD, WORLD, &myrank);
6. if(myrank % 2 == 1){ // odd processes send first, receive second
7. MPI_Send(a, 10, MPI_INT, (myrank + 1) % npes, 1,
MPI_COMM_WORLD);
8. MPI_Recv(b, 10, MPI_INT, (myrank - 1 + npes) % npes, 1,
MPI_COMM_WORLD, &status);
9. }
10. else{ // even processes receive first, send second
11. MPI_Recv(b, 10, MPI_INT, (myrank - 1 + npes) % npes, 1,
MPI_COMM_WORLD, &status);
12. MPI_Send(a, 10, MPI_INT, (myrank + 1) % npes, 1,
MPI_COMM_WORLD);
13. }
Does there exist an assumption we rely on in
the previous solution ?
What if the number of processes is odd

not even ?
Still working
Deadlock with MPI_Send/Recv
(Blocking-Comm.)
1. int a[10], b[10], myrank;
2. MPI_Status s1, s2;
3. ...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
4. if (myrank == 0){
5. MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD);
6. MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD);
7. }
8. else if (myrank == 1){
9. MPI_Recv(b, 10, MPI_INT, 0, 2, MPI_COMM_WORLD, &s1);
10. MPI_Recv(a, 10, MPI_INT, 0, 1, MPI_COMM_WORLD, &s2);
11.}
12....
Solution for the previous case
1- use combined send and receive
2- use non-blocking communications instead of blocking

communications
Combined Send & Receive
MPI_SENDRECV(
sendbuf, sendcount, sendtype, dest, sendtag,
recvbuf, recvcount, recvtype, source, recvtag,
comm, status)
1. IN sendbuf: initial address of send buffer (choice)

2. IN sendcount: number of elements in send buffer (non-negative integer)
3. IN sendtype: type of elements in send buffer (handle)
4. IN dest: rank of destination (integer)
5. IN sendtag: send tag (integer)
6. OUT recvbuf: initial address of receive buffer (choice)
7. IN recvcount: number of elements in receive buffer (non-negative integer)
8. IN recvtype: type of elements in receive buffer (handle)
9. IN source: rank of source or MPI_ANY_SOURCE (integer)
10. IN recvtag: receive tag or MPI_ANY_TAG (integer)
11. IN comm: communicator (handle)
12. OUT status: status object (Status)
Even if we use Combined Send &
Receive, deadlock may occur in one
case
Can you guess this case ?
Lack of buffer space

04
Communicators
Splitting Communicator
For simple applications, it’s not unusual to

do everything using
MPI_COMM_WORLD, but for more
complex use cases, it might be helpful to
have more communicators. An example
might be if you wanted to perform
calculations on a subset of the processes in
a grid. For instance, all processes in each
row might want to sum a value.
Consider having 8 processes where all of

these processes are grouped inside
MPI_COMM_WORLD at the beginning
We wanted to make two new
communicators . One will have 0,1,2,3 and
the other will have 4,5,6 while process 7
will stay only in MPI_COMM_WORLD.
A note to know each communicator has its
own size for example here
COMM_CUSTOM1 of size 4 while
COMM_CUSTOM2 of size 3
Remark:Assigning process to a new
communicator doesn’t mean the process
leaves original communicator so all the
processes will stay in
MPI_COMM_WORLD.
Also note that, as stated in the

introduction, the rank of a process in a
communicator will always start from 0.
Thus, process 0 in MPI_COMM_WORLD
is still process 0 in COMM_CUSTOM1,
but also process 4 in
MPI_COMM_WORLD is process 0 in
COMM_CUSTOM2. Bear in mind that the
rank of a process only has a meaning with
respect to a specific communicator.
Creating a communicator
MPI_Comm_split, the splitting function is a blocking call. That means that all the
processes on the original communicator will have to call the function for it to return. The
splitting function in MPI is prototyped as follows :
int MPI_Comm_split(MPI_Comm comm , int color , int key , MPI_Comm *newcomm);

Creating a communicator
• First parameter comm is the communicator we want to split

• Color parameter : every process will take a color depending on which communicator it
will be
• Key: indication to the rank that each process will get on the new communicator
• newcomm :pointer to your new communicator
Code
Output
Pros and Cons
Pros and Cons
⮚ Pros:
• Each processor can rapidly access its own memory without
interference.
• Portability: no need to modify your source code when you
port your application to different platform
• Functionality: Many routines available to use
⮚ Cons:
Programmer is responsible to:
• Provide data in another processor.
• Explicitly define how and when data is
communicated.
• Synchronize between tasks
Collective
Communication
What is Collective Communication?
Collective communication is communication that involves a

group of processing elements (nodes) and effects a data
transfer between all or some of these processing elements.
- Movement of Data
Data Transfer: - Reduction of Data
Why do we Use Them?
-Programs become easier to understand
- Every program does roughly the same thing
- No "Strange" Communication patterns
- Algorithms for collective communication are subtle, tricky

- Encourages use of communication algorithms devised by
experts.
- It raises the level of abstraction

- Better portability
- Avoids any specific communication pattern, selection of
ranks, process topology
Three classes of operations
▪ Synchronization
▪ Data movement
▪ Collective computation
Synchronization
● MPI_Barrier( comm )
o Blocks until all processes in the group of the communicator

comm call it.
o Can used to synchronize a program so that portions of the

parallel code can be timed accurately.
o Almost never required in a parallel program but Occasionally

useful in measuring performance and load balancing
MPI_Barrier( comm )
MPI_Barrier( comm )
Synchronization – Final Notes
● Always remember that every collective call you make is

synchronized.
● If you can't successfully complete an MPI_Barrier ,Then you also

can't successfully complete any collective call.
● Make Sure that all the Processes will call your function or your
program will idle.
One to all
Broadcast
Scatter (Personalized)
Collective Data All to one

Gather
Movement
All to all
All gather
All to all (Personalized)
"Personalized" means each process gets different data

MPI_Bcast
Why not Send() and Recv() O(N)
A Tree-Based Communication???
O(logN)
Tree-Based Communication
STAGE 1
STAGE
2
.
.
.
Bcast SYNTAX
MPI_Bcast(
void* data, //the buffer with data
int count,
MPI_Datatype datatype,
int root, // the source
MPI_Comm communicator) //the receiving comm
All The Participating processes Write The Same

Function
MPI_Scatter
MPI_Gather
COMMON
COLLECTIVES MPI_Reduce
MPI_Alltoall
MPI_Scatter
MPI_Scatter
Takes an array of elements and

distributes the elements in the
order of process rank
SCATTER SYNTAX
MPI_Scatter(
void* send_data, //array of data that resides on the root process
int send_count, //number of elements to be sent
MPI_Datatype send_datatype, //datatype (MPI_INT for example)
void* recv_data, //buffer that will hold received data
int recv_count, //number of elements to be received
MPI_Datatype recv_datatype, //datatype of elements to be received
int root, //the root process that’s scattering the array
MPI_Comm communicator the communicator in which the root process resides
)
Scatter VS Broadcats
Broadcast Scatter
CODE
MPI_Gather
MPI_Gather
Takes elements from each process

and gathers them to the root process.
The elements are ordered by the rank
of the process from which they were
received.
SCATTER
GATHER
MPI_Gather SYNTAX
MPI_Gather
(
void* send_data, int send_count, MPI_Datatype
send_datatype, void* recv_data, int recv_count,
MPI_Datatype recv_datatype, int root, MPI_Comm
communicator
)
THE SYNATX OF BOTH SCATTER AND GATHER IS

IDENTICAL
SCATTER VS GATHER
(MPI_Gather) Only the root process needs to

have a valid receive buffer. All other
processes can pass NULL for recv_data.
CODE
SIMILAR TO MPI_Gather MPI_AllGather
MPI_AllGather
Given a set of elements distrubted

across all processes, it will gather
all of the ellements to all the
processes.
AllGather Gather Broadcast
MPI_AllGather SYNTAX
MPI_AllGather
(void* send_data, int send_count, MPI_Datatype
send_datatype, void* recv_data, int recv_count,
MPI_Datatype recv_datatype, MPI_Comm
communicator)
IT’S ALMOST IDENTICAL TO THE SYNTAX OF

GATHER, ONLY MISSING THE ROOT.
MPI_Reduce
MPI_Reduce
data
A0
A1
processes
Takes an array of input

A2
A3
A4
A5 A elements on each process and
’
returns an array of output
elements to the root process.
MPI_Reduce SYNTAX
MPI_Reduce
(
void* send_data, //array of elements of type datatype
void* recv_data, //only relevant on the process of rank root, it
contains the reduced result
int count, //number of items to be received
MPI_Datatype datatype,
MPI_Op op, //the operation you wish to apply to your data
int root, //the root process that will hold the result
MPI_Comm communicator
)
MPI_Op Examples
MPI_MAX - Returns the maximum element.
MPI_MIN - Returns the minimum element.
MPI_SUM - Sums the elements.
MPI_PROD - Multiplies all elements.
MPI_LAND - Performs a logical and across the

elements.
CODE
SIMILAR TO MPI_Reduce MPI_AllReduce
MPI_Allreduce
MPI_Allreduce is identical to MPI_Reduce

with the exception that it doesn’t need a
root process, since the results are
distributed to all processes.
AllReduce Reduce Broadcast
MPI_Alltoall
data
A0 A1 A2 A3 A4 A5
processes
B0 B1 B2 B3 B4 B5
C0 C1 C2 C3 C4 C5
D0 D1 D2 D3 D4 D5
Each process starts with its own set
E0 E1 E2 E3 E4 E5 of blocks, one destined for each
F0 F1 F2 F3 F4 F5 process.
A0 B0 C0 D0 E0 F0
Each process finishes with all
A1 B1 C1 D1 E1 F1
A2 B2 C2 D2 E2 F2 blocks destined for itself.
A3 B3 C3 D3 E3 F3
A4 B4 C4 D4 E4 F4
A5 B5 C5 D5 E5 F5
05
Command line
To install MPI on visual studio
Refer to this link
Run Hello World Program
From Command Prompt
mpiexec –np <No of Processes to run> <.exe file>
● Take the .exe file to a location

you know
● Open the Command Prompt
● Run the .exe file using the
following command
06
Common Errors
Calling MPI_Init() more than once
• MPI_Init() is called only

once by each process.
• The initialization process
should only be done once
and repeated calls lead to
erroneous behavior. Common Errors
Calling MPI Routines before MPI_Init()
• You can’t call any MPI

Routine before MPI_Init()
or after MPI_Finalize();
Common Errors
Doing Things before MPI_Init()
✔ MPI_Init() isn’t like fork(), it doesn’t create processes.
✔ The processes are already created when you run the command (refer to
slide (108) )
✔ The processes already exist but they just don’t know about each other
so they can’t communicate.
✔ MPI_Init() just initialize the MPI library (i.e. Makes the processes in
the Communicator Knows each other).
✔ The MPI Standard says little about the situation before MPI_Init() and
after MPI_Finalize().
✔ So you have to write what ever you want enclosed between them.
✔ Writing outside them lead to unexpected & implementation dependent
results
Common Errors
Expecting argc & argv to be passed to all
processes
The First Implantation doesn’t generate errors But some compilers don’t pass
these arguments to all processes so you have to pass them explicitly using the
second implementation
Common Errors
Matching MPI_Bcast() with MPI_Recv
STUCK!!
⮚ You Don’t use MPI_Recv() to receive data sent by a broadcast()

Common Errors
Matching MPI_Bcast() with MPI_Recv
⮚ Instead, you use the same line of code for the receiving processes
EXACTLY as for the Sender of the Broadcast with the SAME
ARGUMENTS
WORKED
☺
REMEMBER: MPI_Bcast() in
all processes
with same arguments
Common Errors
Assuming Your MPI Imp. is Thread Safe
⮚ After we ran our program as MPI program can we still further

parallelize the processes them self? yes
⮚ yes , We can Mix Threading with MPI but take care!!!
⮚ Take care that MPI functions aren’t thread safe
⮚ So,Make Sure that when you call pthread_create() it doesn’t
call any MPI function
⮚ And if you used openMp to parallelize part of your code Make
sure this part doesn’t call any MPI functions
Common Errors
Question
Solution
Same tag must

be used

Distributed Memory Programming Using

Uploaded by

Copyright:

Available Formats

Distributed Memory Programming Using

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Memory Programming Using

Uploaded by

Copyright:

Available Formats

Distributed Memory

Programming using MPI

CPU #1 works in this CPU #2 works in this

CPU #3 works in this CPU #4 works in this

MPI_PACKED Packed sequence of bytes

MPI_BAND Bit-wise and

MPI_SUCCESS ( Constant return integer value )

//terminate MPI environment

//determine the number of processes

//send message to another process

//recieve message from another process

OpenMP Multiprocessing MPI

MPI_Status: provides information about the received message

The blocking buffered send MPI_Bsend copies the data

-It is same as MPI_Ssend(), but, it expects a ready destination to receive the

• No synchronization <handshaking time> or system <copy buffer> overhead.

Non-blocking communications always require to be initialized and

int MPI_Isend(const void *buf, int count,

int MPI_Irecv(const void *buf, int count,

- if the amount and type of data sent and received are

- if not 🡪 Use MPI_Sendrecv

A situation at which at least 2 processes are

- Mutual - Hold and wait

Receive data from

What if the number of processes is odd

1- use combined send and receive

2- use non-blocking communications instead of blocking

1. IN sendbuf: initial address of send buffer (choice)

Can you guess this case ?

Lack of buffer space

For simple applications, it’s not unusual to

Consider having 8 processes where all of

Also note that, as stated in the

int MPI_Comm_split(MPI_Comm comm , int color , int key , MPI_Comm *newcomm);

• First parameter comm is the communicator we want to split

Collective communication is communication that involves a

- Algorithms for collective communication are subtle, tricky

- It raises the level of abstraction

o Blocks until all processes in the group of the communicator

o Can used to synchronize a program so that portions of the

o Almost never required in a parallel program but Occasionally

● Always remember that every collective call you make is

● If you can't successfully complete an MPI_Barrier ,Then you also

Collective Data All to one

"Personalized" means each process gets different data

All The Participating processes Write The Same

Takes an array of elements and

Takes elements from each process

THE SYNATX OF BOTH SCATTER AND GATHER IS

(MPI_Gather) Only the root process needs to

Given a set of elements distrubted

IT’S ALMOST IDENTICAL TO THE SYNTAX OF

Takes an array of input

MPI_MAX - Returns the maximum element.

MPI_MIN - Returns the minimum element.

MPI_SUM - Sums the elements.

MPI_PROD - Multiplies all elements.

MPI_LAND - Performs a logical and across the

MPI_Allreduce is identical to MPI_Reduce