DST4030A Lecture Notes Week 4
DST4030A Lecture Notes Week 4
Dr Mike Asiyo
3 OpenMP
All processes see and have equal access to shared memory. Program
development can often be simplified.
Implementations
On stand-alone shared memory machines, native operating systems,
compilers and/or hardware provide support for shared memory
programming. For example, the POSIX standard provides an API
for using shared memory, and UNIX provides shared memory
segments (shmget, shmat, shmctl, etc.).
On distributed memory machines, memory is physically distributed
across a network of machines, but made global through specialized
hardware and software.
For example:
The main program a.out is scheduled to run by the native operating system.
a.out loads and acquires all of the necessary system and user resources to run.
This is the ”heavy weight” process.
a.out performs some serial work, and then creates a number of tasks (threads)
that can be scheduled and run by the operating system concurrently.
Each thread has local data, but also, shares the entire resources of a.out. This
saves the overhead associated with replicating a program’s resources for each
thread (”light weight”). Each thread also benefits from a global memory view
because it shares the memory space of a.out.
A thread’s work may best be described as a subroutine within the main program.
Any thread can execute any subroutine at the same time as other threads.
Threads communicate with each other through global memory (updating
address locations). This requires synchronization constructs to ensure that more
than one thread is not updating the same global address at any time.
Threads can come and go, but a.out remains present to provide the necessary
shared resources until the application has completed.
Implementations
From a programming perspective, threads implementations
commonly comprise:
A library of subroutines that are called from within parallel source
code
A set of compiler directives imbedded in either serial or parallel
source code
OpenMP
Industry standard, jointly defined and endorsed by a group of
major computer hardware and software vendors, organizations and
individuals.
Compiler directive based
Portable / multi-platform, including Unix and Windows platforms
Available in C/C++ and Fortran implementations
Can be very easy and simple to use - provides for ”incremental
parallelism”. Can begin with serial code.
Implementations:
In 1992, the MPI Forum was formed with the primary goal of establishing a
standard interface for message passing implementations.
Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2
(MPI-2) was released in 1996 and MPI-3 in 2012. All MPI specifications are
available on the web at http://www.mpi-forum.org/docs/.
MPI is the ”de facto” industry standard for message passing, replacing virtually
all other message passing implementations used for production work. MPI
implementations exist for virtually all popular parallel computing platforms.
Not all implementations include everything in MPI-1, MPI-2 or MPI-3.
May also be referred to as the Partitioned Global Address Space (PGAS) model.
On shared memory architectures, all tasks may have access to the data structure
through global memory.
On distributed memory architectures, the global data structure can be split up
logically and/or physically across tasks.
Implementations:
Currently, there are several parallel programming implementations in various
stages of developments, based on the Data Parallel / PGAS model.
Coarray Fortran: a small set of extensions to Fortran 95 for SPMD parallel
programming. Compiler dependent. More information:
https://en.wikipedia.org/wiki/Coarray_Fortran
OpenMP
OpenMP is:
An Application Program Interface (API) that may be used to explicitly direct
multi-threaded, shared memory parallelism
OpenMP is not:
Necessarily implemented identically by all vendors
Guaranteed to make the most efficient use of shared memory
Required to check for data dependencies, data conflicts, race
conditions, or deadlocks
Required to check for code sequences that cause a program to be
classified as non-conforming
Designed to guarantee that input or output to the same file is
synchronous when executed in parallel. The programmer is
responsible for synchronizing input and output.
Goals of OpenMP:
1 Standardization:
Provide a standard among a variety of shared memory
architectures/platforms
Jointly defined and endorsed by a group of major computer hardware
and software vendors
2 Lean and Mean:
Establish a simple and limited set of directives for programming
shared memory machines.
Significant parallelism can be implemented by using just 3 or 4
directives.
3 Ease of Use:
Provide capability to incrementally parallelize a serial program,
unlike message-passing libraries which typically require an all or
nothing approach
Provide the capability to implement both coarse-grain and fine-grain
parallelism
4 Portability:
The API is specified for C/C++ and Fortran
Public forum for API and membership
Most major platforms have been implemented including Unix/Linux
platforms and Windows
c. Explicit Parallelism:
OpenMP is an explicit (not automatic) programming model, offering
the programmer full control over parallelization.
Parallelization can be as simple as taking a serial program and
inserting compiler directives. . . .
Or as complex as inserting subroutines to set multiple levels of
parallelism, locks and even nested locks.
It is possible to parallelize many sequential programs without using most of the API.
Used for:
Defining parallel regions / spawning threads
Distributing loop iterations or sections of code between threads
Serializing sections of code (e.g. for access to I/O or shared
variables)
Synchronizing threads
Environmental variables are used to store configurations needed for running the
program. In OpenMP, they are used for setting e.g. the number of threads per
team (OMP NUM THREADS), maximum number of threads
(OMP THREAD LIMIT) or the scheduler policy (OMP SCHEDULE).
While most of these settings can also be done using clauses in the compiler
directives of runtime library routines, environmental variables provide a user an
easy way to change these crucial settings without the need of an additional
config file (parsed by your program) or even rewriting/recompiling the
openmp-enhanced program.
Listing 1: Helloworld.cpp
#i n c l u d e <i o s t r e a m >
#i n c l u d e <omp . h>
using namespace s t d ;
i n t main ( )
{
#pragma omp p a r a l l e l
{
cout<<” H e l l o World”<<e n d l ;
}
return 0 ;
}
Listing 2: arrayex.cpp
#i n c l u d e <i o s t r e a m >
#i n c l u d e <a l g o r i t h m >
#i n c l u d e <omp . h>
#d e f i n e ARRAY SIZE 100000000
#d e f i n e ARRAY VALUE 1231
i n t main ( )
{
omp set num threads ( 4 ) ;
i n t * a r r = new i n t [ ARRAY SIZE ] ;
s t d : : f i l l n ( a r r , ARRAY SIZE , ARRAY VALUE
Listing 3: arrayex.cpp...
#pragma omp p a r a l l e l f o r
f o r ( i n t i = 0 ; i < ARRAY SIZE ; i ++)
{
arr [ i ] = arr [ i ] / arr [ i ] + arr [ i
}
return 0 ;
}