Openmp: Martin Kruliš Ji Ří Dokulil
Openmp: Martin Kruliš Ji Ří Dokulil
Openmp: Martin Kruliš Ji Ří Dokulil
Martin Kruli
Ji Dokulil
OpenMP
OpenMP Architecture Review Board
Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S.
Department of Energy,
http://www.openmp.org
specifications (freely available)
1.0 C/C++ and FORTRAN versions
2.0 C/C++ and FORTRAN versions
2.5 combined C/C++ and FORTRAN
3.0 combined C/C++ and FORTRAN
4.0 combined C/C++ and FORTRAN (July 2013)
OpenMP Threading Model
Basics
pragmas
#pragma omp
simple to use
only a few constructs
programs should run without OpenMP
possible but not enforced
compiler ignore unknown pragmas
#ifdef _OPENMP
Simple example
#define N 1024*1024
int* data=new int[N];
for(int i=0; i<N; ++i)
{
data[i]=i;
}
Simple example cont.
#define N 1024*1024
int* data=new int[N];
#pragma omp parallel for
for(int i=0; i<N; ++i)
{
data[i]=i;
}
Another example
int sum;
#pragma omp parallel for
for(int i=0; i<N; ++i) WRONG
{
sum+=data[i];
}
Variable scope
shared
one instance for all threads
private
one instance for each thread
reduction
special variant for reduction operations
valid within lexical extent
no effect in called functions
Variable scope private
default for loop control variable
only for the parallelized loop
should (probably always) be made private
all loops in Fortran
all variables declared within the parallelized
block
all non-static variables in called functions
allocated on stack private for each thread
uninitialized values
at start of the block and after the block
except for classes
default constructor (must be accessible)
may not be shared among the threads
Variable scope private
int j;
#pragma omp parallel for private(j)
for(int i=0; i<N/2; ++i)
{
j=i*2;
data[j]=i;
data[j+1]=i;
}
Variable scope reduction
performing e.g. sum of an array
cannot use only private variable
shared requires explicit synchronization
combination is possible and (relatively) efficient
but unnecessarily complex
each thread works on an private copy
initialized to a default value (0 for +, 1 for *,)
final results are joined and available to the
master thread
Variable scope reduction
long long sum=0;
#pragma omp parallel for reduction(+:sum)
for(int i=0; i<N; ++i)
{
sum+=data[i];
}
Variable scope firstprivate
and lastprivate
private variables at the start of the block and
after end of the block are undefined
firstprivate
all values are initialized to the value of the master
thread
lastprivate
variable after the parallelized block is set to the
value of the last iteration (last in the serial version)