0% found this document useful (0 votes)

15 views28 pages

11-Programming With OpenMP

Uploaded by

prathikshapshetty7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views28 pages

11-Programming With OpenMP

Uploaded by

prathikshapshetty7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Programming with OpenMP*

Intel Software College

Objectives

At the completion of this module you will be able to

• Thread serial code with basic OpenMP pragmas

• Use OpenMP synchronization pragmas to coordinate thread

execution and memory access

Programming with OpenMP*

2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

1
Agenda

What is OpenMP?
Parallel Regions
Worksharing Construct
Data Scoping to Protect Data
Explicit Synchronization
Scheduling Clauses
Other Helpful Constructs and Clauses

Programming with OpenMP*

3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

What Is OpenMP*?

Compiler directives for multithreaded programming

Creating teams of threads
sharing work among threads
synchronizing the threads
Library routines for setting and querying thread attributes
Environment variables for controlling run-time behavior of the parallel program

Easy to create threaded Fortran and C/C++ codes

Supports data parallelism model
Incremental parallelism
Combines serial and parallel code in single source

Programming with OpenMP*

4
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

2
What Is OpenMP*?

C$OMP FLUSH #pragma omp critical

C$OMP THREADPRIVATE(/ABC/) CALL OMP_SET_NUM_THREADS(10)

call omp_test_lock(jlok)
C$OMP parallel do shared(a, b, c)

C$OMP MASTER
call OMP_INIT_LOCK (ilok)

C$OMP SINGLE PRIVATE(X)

http://www.openmp.org C$OMP ATOMIC

Current spec is OpenMP 2.5

setenv OMP_SCHEDULE “dynamic”
C$OMP PARALLEL DO ORDERED PRIVATE (A, B, C)
C$OMP ORDERED
250 Pages
C$OMP PARALLEL REDUCTION (+: A, B)
C$OMP SECTIONS
(combined C/C++ and Fortran)
#pragma omp parallel for private(A, B) !$OMP BARRIER

C$OMP PARALLEL COPYIN(/blk/) C$OMP DO lastprivate(XX)

Nthrds = OMP_GET_NUM_PROCS() omp_set_lock(lck)

Programming with OpenMP*

5
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP* Architecture

Fork-join model
Work-sharing constructs
Data environment constructs
Synchronization constructs
Extensive Application Program Interface (API) for finer control

Programming with OpenMP*

6
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

3
Programming Model

Fork-join parallelism:
• Master thread spawns a team of threads as needed
• Parallelism is added incrementally: the sequential
program evolves into a parallel program

Master
Thread

Parallel Regions
Programming with OpenMP*

7
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP* Pragma Syntax

Most constructs in OpenMP* are compiler directives or pragmas.

• For C and C++, the pragmas take the form:

#pragma omp construct [clause [clause]…]

Programming with OpenMP*

8
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

4
Parallel Regions

#pragma omp parallel

Defines parallel region over structured
block of code
Threads are created as ‘parallel’ Thread Thread Thread

pragma is crossed 1 2 3
Threads block at end of region
Data is shared among threads unless
specified otherwise
C/C++ :
#pragma omp parallel
{
block
}

Programming with OpenMP*

9
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

How Many Threads?

Set environment variable for number of threads

set OMP_NUM_THREADS=4

There is no standard default for this variable

• Many systems:
• # of threads = # of processors
• Intel® compilers use this default

Programming with OpenMP*

10
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

5
Activity 1: Hello Worlds

Modify the “Hello, Worlds” serial code to run multithreaded

using OpenMP*

Programming with OpenMP*

11
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Work-sharing Construct

#pragma omp parallel

#pragma omp for
for (i=0; i<N; i++){
Do_Work(i);
}

Splits loop iterations into threads

Must be in the parallel region
Must precede the loop

Programming with OpenMP*

12
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

6
Work-sharing Construct

#pragma omp parallel

#pragma omp for #pragma omp parallel
for(i = 0; i < 12; i++)
c[i] = a[i] + b[i] #pragma omp for

i=0 i=4 i=8

i=1 i=5 i=9
Threads are assigned an i=2 i=6 i = 10
independent set of iterations i=3 i=7 i = 11

Threads must wait at the end of Implicit barrier

work-sharing construct

Programming with OpenMP*

13
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Combining pragmas

These two code segments are equivalent

#pragma omp parallel

{
#pragma omp for
for (i=0; i< MAX; i++) {
res[i] = huge();
}
}

#pragma omp parallel for

for (i=0; i< MAX; i++) {
res[i] = huge();
}

Programming with OpenMP*

14
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

7
Restriction on Loop Threading in Ver 2.5

• The loop variable must be signed integer

• The comparison operation must be in the form
loop_variable op loop_invariant_integer
where op is < , <=, >, >=
• The third expression or increment portion must be either
integer addition or subtraction and by loop-invariant value
• If the comparison operation is < or <=, the loop variable
must increment on every iteration. If it is > or >=, the loop
variable must decrement on every iteration.
• The loop must be single entry and exit. No jumps in or out
of the loop are permitted. Goto or breaks must jump within
loop. Execption must be handled in loop.

Programming with OpenMP*

15
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Challenges in Threading Loops

• Loop threading is effectively a reordering transformation of

loop
• Valid if loop carries no dependence
• Conditions for Loop-carried dependence
• Statement S2 is data dependent on statement S1 if
1. S1 and S2 both reference memory location L for some execution path
2. Execution of S1 occurs before S2
• Flow dependence
• S1 writes L and L is later read by S2
• Output dependence
• Both S1 and S2 write L
• Anti-dependence
• S1 reads L before S2 writes L

Programming with OpenMP*

16
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

8
Loop-carried and loop-independent
dependence
• Loop-carried
• S1 references L on one iteration; S2 references it on a
subsequent iteration
• Loop-independent
• S1 and S2 reference L on same loop iteration, but S1 executes it
before S2

Programming with OpenMP*

17
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Examples of loop-carried dependence

• Loop-carried flow dependence

S1 Write L
S2 Read L

• Loop-carried anti-dependence
S1 Read L
S2 Write L

Loop-carried output dependence

S1 Write L
S2 Write L

Programming with OpenMP*

18
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

9
Loop-carried dependence example

x[0] =0;
y[0] =1;
#pragma omp parallel for private(k)
for (k=1; k<100; k++) {
x[k] = y[k-1] +1; //s1
y[k] = x[k] +2; //s2
}

Programming with OpenMP*

19
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Loop-carried dependence example

x[0] =0;
y[0] =1;
#pragma omp parallel for private(k)
for (k=1; k<100; k++) {
x[k] = y[k-1] +1; //s1 anti-dependence
y[k] = x[k] +2; //s2 flow dependence
}

Programming with OpenMP*

20
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

10
What happens?

• OpenMP will thread the loop

• Threaded code will fail
• What to do – remove the loop-carried dependence
• Two approaches
• divide the loop in 2 nested loops
• Use parallel sections
• Need to predetermine x[49] and y[49] !!!!

Programming with OpenMP*

21
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Nested loops

x[0] =0; x[49]=74;

y[0] =1; y[49]=74;
#pragma omp parallel for private(m, k)
for (m=0; m<2; m++) {
for (k=m*49+1; k<m*50+50; k++) {
x[k] = y[k-1] +1; //s1 anti-dependence
y[k] = x[k-1] +2; //s2 flow dependence
}
}

Programming with OpenMP*

22
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

11
Parallel Sections

Independent sections of code can execute

concurrently

#pragma omp parallel sections

{
#pragma omp section
phase1();
#pragma omp section
phase2();
#pragma omp section
phase3();
} Serial Parallel

Programming with OpenMP*

23
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Parallel Sections

#pragma omp parallel sections private(k)

#pragma omp section
{{ x[0] =0; y[0] =1;
for (k=1; k<50; k++) {
x[k] = y[k-1] +1; //s1 anti-dependence
y[k] = x[k-1] +2; //s2 flow dependence
}
}
#pragma omp section
{{ x[49]=74; y[49]=74;
for (k=50; k<100; k++) {
x[k] = y[k-1] +1; //s1 anti-dependence
y[k] = x[k-1] +2; //s2 flow dependence
}
}
}

Programming with OpenMP*

24
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

12
Data Environment

OpenMP uses a shared-memory programming model

• Most variables are shared by default.

• Global variables are shared among threads

• C/C++: File scope variables, static

Programming with OpenMP*

25
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Data Environment

But, not everything is shared...

• Stack variables in functions called from parallel regions are
PRIVATE

• Automatic variables within a statement block are PRIVATE

• Loop index variables are private (with exceptions)

• C/C+: The first loop index variable in nested loops following a
#pragma omp for

Programming with OpenMP*

26
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

13
Data Scope Attributes

The default status can be modified with

default (shared | none)
Scoping attribute clauses

shared(varname,…)

private(varname,…)

Programming with OpenMP*

27
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

The Private Clause

Reproduces the variable for each thread

• Variables are un-initialized; C++ object is default constructed
• Any value external to the parallel region is undefined

void* work(float* c, int N) {

float x, y; int i;
#pragma omp parallel for private(x,y)
for(i=0; i<N; i++) {
x = a[i]; y = b[i];
c[i] = x + y;
}
}

Programming with OpenMP*

28
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

14
Example: Dot Product

float dot_prod(float* a, float* b, int N)

{
float sum = 0.0;
#pragma omp parallel for shared(sum)
for(int i=0; i<N; i++) {
sum += a[i] * b[i];
}
return sum;
}

What is Wrong?
Programming with OpenMP*

29
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Protect Shared Data

Must protect access to shared, modifiable data

float dot_prod(float* a, float* b, int N)

{
float sum = 0.0;
#pragma omp parallel for shared(sum)
for(int i=0; i<N; i++) {
#pragma omp critical
sum += a[i] * b[i];
}
return sum;
}

Programming with OpenMP*

30
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

15
OpenMP* Critical Construct

#pragma omp critical [(lock_name)]

Defines a critical region on a structured block

float R1, R2;
#pragma omp parallel
Threads wait their turn –
{ float A, B;
at a time, only one calls
#pragma omp for
consum() thereby
for(int i=0; i<niters; i++){
protecting R1 and R2
B = big_job(i);
from race conditions.
#pragma omp critical (R1_lock)
Naming the critical consum (B, &R1);
constructs is optional, A = bigger_job(i);
but may increase #pragma omp critical (R2_lock)
performance. consum (A, &R2);
}
}
Programming with OpenMP*

31
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP* Reduction Clause

reduction (op : list)

The variables in “list” must be shared in the enclosing parallel
region
Inside parallel or work-sharing construct:
• A PRIVATE copy of each list variable is created and initialized
depending on the “op”

• These copies are updated locally by threads

• At end of construct, local copies are combined through “op” into a

single value and combined with the value in the original SHARED
variable

Programming with OpenMP*

32
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

16
Reduction Example

#pragma omp parallel for reduction(+:sum)

for(i=0; i<N; i++) {
sum += a[i] * b[i];
}

Local copy of sum for each thread

All local copies of sum added together and stored in “global”
variable

Programming with OpenMP*

33
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

C/C++ Reduction Operations

A range of associative and commutative operators can be used

with reduction
Initial values are the ones that make sense

Operator Initial Value Operator Initial Value

+ 0 & ~0

* 1 | 0

- 0 && 1

^ 0
|| 0

Programming with OpenMP*

34
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

17
Numerical Integration Example

4.0 1

∫f(x)(1+x
4.04.0
) )dx = π
= 2
static long num_steps=100000;
(1+x 2
double step, pi;
0

void main()
{ int i;
2.0 double x, sum = 0.0;

step = 1.0/(double) num_steps;

for (i=0; i< num_steps; i++){
x = (i+0.5)*step;
sum = sum + 4.0/(1.0 + x*x);
}
pi = step * sum;
0.0 X 1.0
printf(“Pi = %f\n”,pi);
}
Programming with OpenMP*

35
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 2 - Computing Pi

static long num_steps=100000; Parallelize the numerical

double step, pi; integration code using
OpenMP
void main()
{ int i;
double x, sum = 0.0;
What variables can be
step = 1.0/(double) num_steps;
shared?
for (i=0; i< num_steps; i++){ What variables need to be
x = (i+0.5)*step;
private?
sum = sum + 4.0/(1.0 + x*x);
} What variables should be
pi = step * sum; set up for reductions?
printf(“Pi = %f\n”,pi);
}

Programming with OpenMP*

36
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

18
Assigning Iterations
The schedule clause affects how loop iterations are mapped
onto threads

schedule(static [,chunk])
• Blocks of iterations of size “chunk” to threads
• Round robin distribution

schedule(dynamic[,chunk])
• Threads grab “chunk” iterations
• When done with iterations, thread requests next set

schedule(guided[,chunk])
• Dynamic schedule starting with large block
• Size of the blocks shrink; no smaller than “chunk”
Programming with OpenMP*

37
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Scheduling defaults

• If the schedule clause is missing, an implementation

dependent schedule is selected. Gomp default is the static
schedule where iterations are distributed approximately
evenly among threads
• Static scheduling has low overhead and provides better data
locality since iterations generally touch memory sequentially
• Dynamic and guided scheduling may provide better load
balancing
• Dynamic scheduling handles chunks on a first-come first-
served basis with chunk size 1

Programming with OpenMP*

38
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

19
Guided Scheduling

• Allocates decreasingly large iterations to each thread until size

reaches C. A variant of dynamic scheduling in which the size of the
chunk decreases exponentially.
• Algorithm for chunk size
• N is number of threads
• B0 is number of loop iterations
• Ck is size of kth chunk
• Bk is number of loop iterations remaining when calculating the chunk
size Ck
Ck = ceil ( Bk /2N )
• When Ck chunk size gets too small it is set to C specified in the schedule clause
(default 1)

• Example : B0 = 800, N=2, C=80

Partition is 200, 150, 113, 85, 80, 80, 12
• Guided scheduling performs better than dynamic due to less overhead

Programming with OpenMP*

39
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Static Scheduling: Doing It By Hand

Must know:
• Number of threads (Nthrds)
• Each thread ID number (id)

Compute start and end iterations:

#pragma omp parallel

{
int i, istart, iend;
istart = id * N / Nthrds;
iend = (id+1) * N / Nthrds;
for(i=istart;i<iend;i++){
c[i] = a[i] + b[i];}
}

Programming with OpenMP*

40
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

20
Which Schedule to Use

Schedule Clause When To Use

STATIC Predictable and similar work

per iteration

DYNAMIC Unpredictable, highly variable

work per iteration

GUIDED Special case of dynamic to

reduce scheduling overhead

Programming with OpenMP*

41
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Clause – schedule(type, size)

schedule(type, size)
• schedule(static)
• Allocates ceiling(N/t) contiguous iterations to each thread, where N is
the number of iterations and t is the number of threads
• schedule(static, C)
• Allocates C contiguous iterations to each thread
• schedule(dynamic)
• Allocates 1 iteration at a time, dynamically.
• schedule(dynamic, C)
• Allocates C iterations at a time, dynamically. When a thread is ready to
receive new work, it is assigned the next pending chunk of size c
• schedule(guided, C)
• Allocates decreasingly large iterations to each thread until size
reaches C. A variant of dynamic scheduling in which the size of the chunk
decreases exponentially from chunk to C. Default value for chunk is
ceiling(N/p)
• schedule(guided)
• Same as (guided, C), with C = 1
• schedule(runtime)
• Indicates that the schedule type and chunk are specified by environment
variable OMP_SCHEDULE
• Example of run-time specified scheduling
setenv OMP_SCHEDULE “dynamic,2”
Programming with OpenMP*

42
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

21
Schedule Clause Example

#pragma omp parallel for schedule (static, 8)

for( int i = start; i <= end; i += 2 )
{
if ( TestForPrime(i) ) gPrimesFound++;
}

Iterations are divided into chunks of 8

• If start = 3, then first chunk is i={3,5,7,9,11,13,15,17}

Programming with OpenMP*

43
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Single Construct

Denotes block of code to be executed by only one thread

• Thread chosen is implementation dependent

Implicit barrier at end

#pragma omp parallel

{
DoManyThings();
#pragma omp single
{
ExchangeBoundaries();
} // threads wait here for single
DoManyMoreThings();
}

Programming with OpenMP*

44
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

22
Master Construct

Denotes block of code to be executed only by the master

thread
No implicit barrier at end

#pragma omp parallel

{
DoManyThings();
#pragma omp master
{ // if not master skip to next stmt
ExchangeBoundaries();
}
DoManyMoreThings();
}

Programming with OpenMP*

45
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Implicit Barriers

Several OpenMP* constructs have implicit barriers

• parallel
• for
• single

Unnecessary barriers hurt performance

• Waiting threads accomplish no work!

Suppress implicit barriers, when safe, with the nowait clause

Programming with OpenMP*

46
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

23
Nowait Clause

#pragma omp for nowait #pragma single nowait

for(...) { [...] }
{...};

Use when threads would wait between independent

computations

#pragma omp for schedule(dynamic,1) nowait

for(int i=0; i<n; i++)
a[i] = bigFunc1(i);

#pragma omp for schedule(dynamic,1)

for(int j=0; j<m; j++)
b[j] = bigFunc2(j);

Programming with OpenMP*

47
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Barrier Construct

Explicit barrier synchronization

Each thread waits until all threads arrive

#pragma omp parallel shared (A, B, C)

{
DoSomeWork(A,B);
printf(“Processed A into B\n”);
#pragma omp barrier
DoSomeWork(B,C);
printf(“Processed B into C\n”);
}

Programming with OpenMP*

48
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

24
Atomic Construct

Special case of a critical section

Applies only to simple update of memory location

#pragma omp parallel for shared(x, y, index, n)

for (i = 0; i < n; i++) {
#pragma omp atomic
x[index[i]] += work1(i);
y[i] += work2(i);
}

Programming with OpenMP*

49
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP* API

Get the thread number within a team

int omp_get_thread_num(void);

Get the number of threads in a team

int omp_get_num_threads(void);
Usually not needed for OpenMP codes
• Can lead to code not being serially consistent
• Does have specific uses (debugging)
• Must include a header file
#include <omp.h>

Programming with OpenMP*

50
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

25
Points to note on reductions

1. Value of the reduction variable is undefined from time first

thread reaches the region/loop with reduction clause and
remains so until reduction is completed
2. If the loop has a nowait clause, the reduction variable
remains undefined until a barrier synch is performed
3. The order in which the local values are combined is
undefined, so the answer may be different to the serial one
due to rounding effects

Programming with OpenMP*

51
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Monte Carlo Pi
1
= 4π2 r
# of darts hitting circle 2

# of darts in square r
# of darts hitting circle
π =4
# of darts in square
loop 1 to MAX
x.coor=(random#)
y.coor=(random#)
r dist=sqrt(x^2 + y^2)
if (dist <= 1)
hits=hits+1
pi = 4 * hits/MAX

Programming with OpenMP*

52
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

26
Making Monte Carlo’s Parallel

hits = 0
call SEED48(1)
DO I = 1, max
x = DRAND48()
y = DRAND48()
IF (SQRT(x*x + y*y) .LT. 1) THEN
hits = hits+1
ENDIF
END DO
pi = REAL(hits)/REAL(max) * 4.0

What
What is
is the
the challenge
challenge here?
here?
Programming with OpenMP*

53
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 3: Computing Pi
Use the Intel® Math Kernel Library (Intel® MKL) VSL:
• Intel MKL’s VSL (Vector Statistics Libraries)

• VSL creates an array, rather than a single random number

• VSL can have multiple seeds (one for each thread)

Objective:
• Use basic OpenMP* syntax to make Pi parallel

• Choose the best code to divide the task up

• Categorize properly all variables

Programming with OpenMP*

54
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

27
Programming with OpenMP
What’s Been Covered
OpenMP* is:
• A simple approach to parallel programming for shared memory
machines

We explored basic OpenMP coding on how to:

• Make code regions parallel (omp parallel)
• Split up work (omp for)
• Categorize variables (omp private….)
• Synchronize (omp critical…)

We reinforced fundamental OpenMP concepts through several

labs

Programming with OpenMP*

55
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Programming with OpenMP*

56
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Metafisica 4 en 1. Vol III Spanish Edition by Conny Mendez
22% (9)
Metafisica 4 en 1. Vol III Spanish Edition by Conny Mendez
3 pages
Gartner. DevOps Success Requires Shift-Right Testing in Production
No ratings yet
Gartner. DevOps Success Requires Shift-Right Testing in Production
16 pages
QCS-2010 Section 21 Part 25 Electrical Identification
No ratings yet
QCS-2010 Section 21 Part 25 Electrical Identification
3 pages
Industrial Attachment Report Computer SC
100% (3)
Industrial Attachment Report Computer SC
13 pages
ParallelProgramming Start2016
No ratings yet
ParallelProgramming Start2016
41 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
OpenMP Tutorial - Lawrence Livermore National Laboratory
No ratings yet
OpenMP Tutorial - Lawrence Livermore National Laboratory
75 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
CS8083 UNIT III Notes
No ratings yet
CS8083 UNIT III Notes
26 pages
Unit 3
No ratings yet
Unit 3
13 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
62 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Openmp
No ratings yet
Openmp
21 pages
Unit 3 - Programming Multi-Core and Shared Memory
No ratings yet
Unit 3 - Programming Multi-Core and Shared Memory
100 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
Cs6801 Mcap MGM
No ratings yet
Cs6801 Mcap MGM
7 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
Lecture 10 Shared Memory Programming With OpenMP
No ratings yet
Lecture 10 Shared Memory Programming With OpenMP
30 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Openmp: John H. Osorio Ríos
No ratings yet
Openmp: John H. Osorio Ríos
24 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Open MP
No ratings yet
Open MP
28 pages
OpenMP P3
No ratings yet
OpenMP P3
22 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
09 OpenMP Intro
No ratings yet
09 OpenMP Intro
15 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Unit III
No ratings yet
Unit III
15 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Openmp 6pp
No ratings yet
Openmp 6pp
5 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Openmp
No ratings yet
Openmp
115 pages
Openmp
No ratings yet
Openmp
61 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
From Everand
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
Rustacean Team
No ratings yet
Practical Rust 1.x Cookbook
From Everand
Practical Rust 1.x Cookbook
Rustacean Team
No ratings yet
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
JK Sharma - PERT and CPM
No ratings yet
JK Sharma - PERT and CPM
139 pages
Parliamentary: The Honourable Second Member For Port Louis South and Port Louis Central
No ratings yet
Parliamentary: The Honourable Second Member For Port Louis South and Port Louis Central
2 pages
Thunder 421D 801D 1501D
No ratings yet
Thunder 421D 801D 1501D
19 pages
IT Risk Management Based On COBIT 5 For Risk at Deutsche Telekom AG Joa Eng 0518
No ratings yet
IT Risk Management Based On COBIT 5 For Risk at Deutsche Telekom AG Joa Eng 0518
7 pages
Kb000090041 - Best Practices For NFS Client Settings
No ratings yet
Kb000090041 - Best Practices For NFS Client Settings
2 pages
Avision AVA6 Plus Service Manual
0% (1)
Avision AVA6 Plus Service Manual
46 pages
Directions: Using The Similar Figures, Find The Value(s) of The Variables. 1) 2)
No ratings yet
Directions: Using The Similar Figures, Find The Value(s) of The Variables. 1) 2)
2 pages
Tspolycet 2023 Lastranks Firstphase
No ratings yet
Tspolycet 2023 Lastranks Firstphase
15 pages
2022 KMOX Voice of Caring Partnership Application
No ratings yet
2022 KMOX Voice of Caring Partnership Application
1 page
F
No ratings yet
F
1 page
CTECH
No ratings yet
CTECH
4 pages
RS6100: Programming Barcodes For RS61B0
No ratings yet
RS6100: Programming Barcodes For RS61B0
3 pages
16790basic Electronics Question Bank
No ratings yet
16790basic Electronics Question Bank
8 pages
P2CC Komunikator Ang
100% (1)
P2CC Komunikator Ang
2 pages
Pchardware Stadsdrtec
No ratings yet
Pchardware Stadsdrtec
1 page
Epson FX-880, FX-880T, FX-1180 Service Manual
No ratings yet
Epson FX-880, FX-880T, FX-1180 Service Manual
108 pages
ARDUINO ေရးသားနည္း
67% (3)
ARDUINO ေရးသားနည္း
81 pages
QQ Box DVB Dream Manual
No ratings yet
QQ Box DVB Dream Manual
15 pages
French Dissertation Format
100% (3)
French Dissertation Format
8 pages
Auditing Notes by Rehan Farhat ISA 300
No ratings yet
Auditing Notes by Rehan Farhat ISA 300
21 pages
Chapter 5 Multimedia Database System
No ratings yet
Chapter 5 Multimedia Database System
47 pages
Configuration Manager Microca: Operation Manual
No ratings yet
Configuration Manager Microca: Operation Manual
18 pages
Chapter V-Evaluating Company Resources and Competitive Capabilities
No ratings yet
Chapter V-Evaluating Company Resources and Competitive Capabilities
55 pages
Aqa Textiles Coursework Specification
100% (2)
Aqa Textiles Coursework Specification
5 pages
Research - Referencing - Univ of Auckland
No ratings yet
Research - Referencing - Univ of Auckland
2 pages
Manual de Grabador DVD SONY DVO-1000MD
No ratings yet
Manual de Grabador DVD SONY DVO-1000MD
69 pages