Unit 4
Unit 4
Unit 4
Ms Muskan Kumari,
Assistant Professor,
CSE department, PIET, PU
Module Topics
Synchronization: Scheduling, Job Allocation, Job Partitioning,
Dependency Analysis Mapping Parallel Algorithms onto
Parallel Architectures, Thread Basics, The POSIX Thread API,
Thread Basics: Creation and Termination, Synchronization
Primitives in Pthreads, Controlling Thread and
Synchronization Attributes, Thread Cancellation, Composite
Synchronization Constructs, Tips for Designing Asynchronous
Programs, OpenMP: a Standard for Directive Based Parallel
Programming
Parallel Processing Fundamental Design Issues
• Synchronization
• Scheduling
• Job allocation
• Job partitioning
• Dependency analysis
• Mapping parallel algorithm onto Parallel Architecture
• Performance Analysis of Parallel Algorithms
Synchronization
• Parallel program need to manage sequence of work and
tasks.
• It is one of factor of program performance.
• It require “serialization” of segment of program.
• Types of Synchronization
–Barrier
–Lock/Semaphore
–Synchronous communication operation
Barrier
• All tasks are involved
• Each task performs its work until it reaches the
barrier, after it “blocks”
• When last task reaches the barrier, all tasks are
synchronized. Actual work starts from here.
• Often serial section of work must be done.
• In many cases, the tasks are automatically
released to continue their work.
Lock/Semaphore
• It involve any number of tasks
• Used to protect access to global data
• Only one task may use lock/semaphore.
• The first task to acquire the lock sets it and can
access the protected data.
• Other task wait until task that owns the lock
release it.
Synchronous communication operation
• Involve only communicating tasks
• Need of coordination
• Example: before task can perform a send
operation, it must first receive an
acknowledgement from receiving task that it is ok
to send.
Job scheduling
• In parallel system, system selects order of jobs execution to minimize
turnaround time.
• Job is mapped to subset of processor.
• Partition of machine: set of processor dedicated to certain job
• User submit job to scheduler, job are queued and are considered for
allocation.
• The main aim is to increase processor utilization but because of lack
of knowledge regarding future job and execution time, it sometime
fails.
• Scheduler allocate compute nodes and other resources to job by
matching requirement of job to compute nodes according to data
Job scheduling policies
• FCFS:
–allocate job according to order of their arrival.
–in addition to no of nodes and no of tasks per node, the
job specifies computational processor power, disk
storage, memory space, system architecture, high speed
network resources.
• Look-ahead optimizing scheduler
• Gang scheduling
Dependency analysis
• Dependence: two computations that places constraints
on their execution order
• Dependency analysis: to identify these constraints
• Dependency of 2 type:Control and data dependency
Dependency analysis
• Data dependency:
–Two memory accesses are involved
–if data may refer to same memory location and one of
the reference is a write.
–It can either between two distinct program statement
or two different dynamic execution of same program
statement.
–Ensure data is produced and consumed in right order
Dependency analysis
• Data dependency of following type:
• FLOW DEPENDENCY:
–One instruction depends on the final outcome of
previous instruction.
–Also known as write or write-read dependency.
–Eg. I1: Add R1, R2
I2: Mov R4,R1
Dependency analysis
• ANTI-DEPENDECY:
–One instruction depends on data that could be destroyed
by another instruction
–Eg. I1: A= B +C
I2: C=D+E
–It occurs between instruction that reads a register and
subsequent instruction writes a new value to same location
–Two instruction have anti-dependency if swapping their
order would result in true dependence.
Dependency analysis
• OUTPUT DEPENDENCY:
–Occurs when two instruction both write a result
–Also known as write write dependency
–Exists due to limited no of architectural registers.
Dependency analysis
• I/O DEPENDENCE
–If both I/O statement try to use same file for their
operation
• The stack corresponding to the function call is generally treated as being local to
the thread for liveness reasons.
• This implies a logical machine model with both global memory (default) and
local memory (stacks).
• It is important to note that such a flat model may result in very poor performance
since memory is physically distributed in typical machines.
Thread Basics
• The concepts discussed here are largely independent of the API and can
be used for programming with other thread APIs (NT threads, Solaris
threads, Java threads, etc.) as well.
Thread Basics: Creation and Termination
• Pthreads provides two basic functions for specifying concurrency in a
program:
#include <pthread.h>
int pthread_create (
pthread_t *thread_handle, const pthread_attr_t *attribute,
void * (*thread_function)(void *),
void *arg);
int pthread_join (
pthread_t thread,
void **ptr);
• The function pthread_create invokes function thread_function as a thread
Thread Basics: Creation and Termination
#include <pthread.h>
#include <stdlib.h>
#define MAX_THREADS 512
void *compute_pi (void *);
....
main() {
...
pthread_t p_threads[MAX_THREADS];
pthread_attr_t attr;
pthread_attr_init (&attr);
for (i=0; i< num_threads; i++) {
hits[i] = i;
pthread_create(&p_threads[i], &attr, compute_pi,
(void *) &hits[i]);
}
for (i=0; i< num_threads; i++) {
pthread_join(p_threads[i], NULL);
total_hits += hits[i];
}
...
}
Thread Basics: Creation and Termination
void *compute_pi (void *s) {
int seed, i, *hit_pointer;
double rand_no_x, rand_no_y;
int local_hits;
hit_pointer = (int *) s;
seed = *hit_pointer;
local_hits = 0;
for (i = 0; i < sample_points_per_thread; i++) {
rand_no_x =(double)(rand_r(&seed))/(double)((2<<14)-1);
rand_no_y =(double)(rand_r(&seed))/(double)((2<<14)-1);
if (((rand_no_x - 0.5) * (rand_no_x - 0.5) +
(rand_no_y - 0.5) * (rand_no_y - 0.5)) < 0.25)
local_hits ++;
seed *= i;
}
*hit_pointer = local_hits;
pthread_exit(0);
}
Synchronization Primitives in Pthreads
• When multiple threads attempt to manipulate the same data item, the
results can often be incoherent if proper care is not taken to synchronize
them.
• Consider:
/* each thread tries to update variable best_cost as follows */
if (my_cost < best_cost)
best_cost = my_cost;
• Assume that there are two threads, the initial value of best_cost is 100,
and the values of my_cost are 50 and 75 at threads t1 and t2.
• Depending on the schedule of the threads, the value of best_cost could be
50 or 75!
• The value 75 does not correspond to any serialization of the threads.
Controlling Thread and Synchronization Attributes
• The Pthreads API allows a programmer to change the default
attributes of entities using attributes objects.
• An attributes object is a data-structure that describes entity (thread,
mutex, condition variable) properties.
• Once these properties are set, the attributes object can be passed to
the method initializing the entity.
• Enhances modularity, readability, and ease of modification.
Attributes Objects for Threads
• Use pthread_attr_init to create an attributes object.
• Individual properties associated with the attributes object can be changed
using the following functions:
pthread_attr_setdetachstate,
pthread_attr_setguardsize_np,
pthread_attr_setstacksize,
pthread_attr_setinheritsched,
pthread_attr_setschedpolicy, and
pthread_attr_setschedparam
Attributes Objects for Mutexes
• Initialize the attrributes object using function:
pthread_mutexattr_init.
• The function pthread_mutexattr_settype_np can be used for setting
the type of mutex specified by the mutex attributes object.
pthread_mutexattr_settype_np (
pthread_mutexattr_t *attr,
int type);
• Here, type specifies the type of the mutex and can take one of:
– PTHREAD_MUTEX_NORMAL_NP
– PTHREAD_MUTEX_RECURSIVE_NP
– PTHREAD_MUTEX_ERRORCHECK_NP
Composite Synchronization Constructs
• By design, Pthreads provide support for a basic set of operations.
• Higher level constructs can be built using basic synchronization
constructs.
• We discuss two such constructs - read-write locks and barriers.
Tips for Designing Asynchronous Programs
• Never rely on scheduling assumptions when exchanging data.
• Never rely on liveness of data resulting from assumptions on
scheduling.
• Do not rely on scheduling as a means of synchronization.
• Where possible, define and use group synchronizations and data
replication.
OpenMP: a Standard for Directive Based Parallel Programming