*
You can create an Open MPI + Open MP
cluster using regular x86 or x64 machines
using the same procedure as well.
*
Get the hardware:
(Raspberry Pi x as many as you like), to keep
things simple we are going to use only 3. You
should have at least one Pi(Master) with
keyboard, mouse and monitor.
A network router with at least 4 ports. (3 Pis on 3
ports and an extra port if you extend the cluster by
adding more switches or hubs.
A network router with at least 4 ports. (3 Pis on 3
ports and an extra port if you want to extend the
cluster by adding more switches or hubs.
Power Cables for the Raspberry Pi.
*
Setup the first Pi by installing the Raspbian image.
(http://www.raspberrypi.org/documentation/installation/installing-images/)
1. Start the image and login in to the Pi. Type in sudo raspiconfig . This is start the configuration screen. Go in to advanced
options and set hostname to Voltaire-1. This is going to be our
Master Node.
Install OpenMPI by typing this in to the terminal:
sudo apt-get install openmpi-bin openmpi-dev
Go to youtube:
Play https://www.youtube.com/watch?v=LTP9FUIt0Eg or https://www.youtube.com/watch?v=h3cE9iXIx9c
Cloning
Now, since you have setup everything which was needed. Lets see if it runs.
2. go the terminal and type in mpiexec -f machinefile n 1 hostname
it should display the systems hostname, which is VOLTAIRE-1 .
3. Create a directory on Desktop and name is Parallel. Create a new file
called mpi1.cpp and write your MPI code on it.
See the last slide for the code to our cpp file.
4. Clone the memory cards and change their Hostname as such :
VOLTAIRE-2, VOLTAIRE-3 .
For cloning you can either use dd command on linux or wind32diskimager
on Windows.
Copy Image on Linux: sudo dd bs=4M if=/dev/mmcblk0 of=~/Desktop/voltair-1.img
Write Image on Linux: sudo dd if=~/Desktop/voltair-1.img of=/dev/mmcblk0 bs=4M
5. A command will be executed using mpiexec which will run on all the
nodes. The nodes will run mpi1.out which is the outfile. Therefore it must
be present in the same location on all the nodes. We are going to compile
the source code over SSH (tunnel) by logging in to VOLTAIRE-2,
VOLTAIRE-3 through VOLTAIRE-1.
*
6. When the Master will invoke the Two Slaves over the Ethernet network, it needs the to login on
the remote slaves to execute the mpiexec command. Therefore we must create a way for the
Master to access the Slaves without login.
Login to the router and find all the attached devices and their IP given by the DHCP service of the
router. You can also set it to static IP etc. etc.
Type this in to the terminal of the Master node to allow for passwordless login from the master.
ssh-keygen -t rsacat ~/.ssh/id_rsa.pub | ssh pi@192.168.2.4 "mkdir .ssh;cat >> .ssh/authorized_keys
you may have to type in yes, then if it asks for password, the default for raspbian image is raspberry
for the login pi
*
7. on the master open terminal:
cd Desktop/Parallel
mpic++ -o mpi1.out mpi1.cpp
8. on the master open terminal to access the slave and do the same(compile):
ssh pi@192.168.2.4
9. lets run the program from the master, on the terminal type:
mpiexec n 2 host 192.168.2.3,192.168.2.4 mpi1.out
Processes
Master
Slave
executable
*
11. If all goes well you will see the code running:
In our code we have used SEND and RECEIVE. The two slaves sends the master their hostname
as array of characters of size 100. Master receives them and display them. Simple. See our code on
the last slide for the details.
The return of the message has not been synchronized. But it does return them.
The master closes the program as void Finalize() is called.
Now changing code and making sure its on all nodes is a tedious task if you have say 64 nodes.
Therefore you can either use NFS or FTP with a script and another MPI program to get the source and compile it.
NFS Share Executable : http://stackoverflow.com/questions/25829684/how-to-avoid-copying-executable-from-master-node-to-slaves-in-mpilibs
*
We suggest you also look at OpenMP on the bellow slides as it will allow for proper utilization of the
target slaves.
Slave 1
MP
Core 1
Core 3
Core 2
Core 4
MPI
Memory
Slave 2
Master
MP
Core 1
Core 3
Core 2
Core 4
MPI
MP
Core 1
Core 3
Core 2
Core 4
Memory
Core 1
Core 3
Core 2
Core 4
Memory
Memory
Slave 3
MPI
MP
MPI
*
What is OpenMP (Open Multi Processing)
It is a defacto standard API for writing shared memory parallel applications in C, C++ and Fortran.
OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review
Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software
vendors, including AMD, IBM, Intel, Cray, HP, Fujitsu, Nvidia, NEC, Red Hat, Texas
Instruments, Oracle Corporation, and more.[1]
OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for
developing parallel applications for platforms ranging from the standard desktop computer to
the supercomputer.
Compiler Directives
Directive pragma. It is a language construct that specifies how
a compiler (or assembler or interpreter) should process its input.
OpenMP
Runtime Subroutines
Environment variables
*
Processor
Older Generation
Processors
Core
Where only one processor was used and it had One
Core.
Memory
Processor
Core 1
Core 2
Current Generation
Processors
Today, multiple Cores are present on the same
Processor.
Memory
*
Sequential Program
Programs written were sequential in nature and
it utilized only 1 Core. Even if multiple were
available. But we want to use all of the cores.
Instructions
Instructions
Processor
Processor
Core
Memory
Core 1
Core 2
Core 3
Core 4
Memory
Every program consists of two parts:
Sequential part
Parallel part
*
Instructions
OpenMP programs start with a single thread; the master thread
At start of parallel region master creates team of parallel worker threads (FORK)
FORK
Statements in parallel
block are executed in
parallel by every thread
At end of parallel
region, all threads
synchronize, and join
master thread (JOIN)
Thread 0
(master)
Thread 1
Thread 2
Thread 3
JOIN
What are threads, cores, and how do they relate?
Thread is independent sequence of execution of program code. Block of code with one entry and one
exit. OpenMP threads are mapped onto physical cores. It is possible to map more than 1 thread on a
core.
*
The compiler is available for free from
http://openmp.org/wp/openmp-compilers/
OpenMP v4.0 specification (July2013) includes the library
libgomp in GNU compilers (C++,C etc.).
The manual can be found at:
https://gcc.gnu.org/onlinedocs/libgomp/
*
#include <iostream>
#include omp.h
// inclusion of the OPEN MP header files
Using namespace std;
int main()
{
#pragma omp parallel
{ // start of clause
cout << Hello World<<endl;
} // end of clause
return 0;
}
OpenMP Compiler Directives
*
OpenMP can control the number of threads used.
int main() {
It can be set using the following
int threads = 100;
Environmental variable OMP_NUM_THREADS
int id = 100;
Runtime function omp_set_num_threads(n)
cout <<Viewing Thread Number: ,id << Of;
cout << threads <<endl;
return 0;
}
To activate the OpenMP extensions for C/C++,
the compile-time flag
-fopenmp must be specified
Example : g++ -fopenmp -o hello.x hello.cpp
Invoke
compiler
Environment Variables
Flag
Executable
Filename
Code
To get information about threads:
Runtime function omp_get_num_threads()
Returns number of threads in parallel region
Returns 1 if called outside parallel region
Runtime function omp_get_thread_num()
Returns id of thread in team
Value between [0,n-1] // where n = #threads
Master thread always has id 0
* OpenMPI C++ Test Code
#include <iostream>
#include <ctime>
#include <mpi.h>
using namespace std;
int main(){
MPI :: Init();
int process =MPI::COMM_WORLD.Get_size();
int rank
=MPI::COMM_WORLD.Get_rank();
char host[100];
char displayhost[100];
if (rank==1)
{
MPI :: COMM_WORLD.Send (&host, 100, MPI::CHAR , 0, 0);
}
if (rank==2)
{
MPI :: COMM_WORLD.Send (&host, 100, MPI::CHAR , 0, 0);
}
if (rank==0)
{
MPI :: COMM_WORLD.Recv (&displayhost,100,MPI::CHAR,1,0); // recieve from Rank 1
cout<<"Recieved Hostname: "<<displayhost<<endl;
cout<<"--------------- "<<endl;
MPI :: COMM_WORLD.Recv (&displayhost,100,MPI::CHAR,2,0); // recieve from Rank 2
cout<<"Recieved Hostname: "<<displayhost<<endl;
cout<<"--------------- "<<endl;
gethostname(host,100);
cout<<"Hostname: "<<host<<endl;
cout<<"Process : "<<process<<endl;
cout<<"Rank : "<<rank<<endl;
cout<<"
+ "<<endl;
}
void Finalize(); // or use MPI::Finalize();
return 0;
}