Parallel and Distributed Computing Module I
Parallel and Distributed Computing Module I
Parallel and Distributed Computing Module I
PARALLEL COMPUTING
Lesson 1 Parallelism
Fundamentals
Lesson 2 Parallelism
Decomposition
Module I
2
MODULE I
PARALLEL COMPUTING
INTRODUCTION
OBJECTIVES
There are four lessons in the module. Read each lesson carefully then
answer the exercises/activities to find out how much you have benefited from
it. Work on these exercises carefully and submit your output to your instructor
or to the College of Computer Science office.
In case you encounter difficulty, discuss this with your instructor during
the face-to-face meeting. If not contact your instructor at the College of
Computer Science office.
Module I
3
Lesson 1
Parallelism Fundamentals
We could definitely say that complexity will decrease when there are
2 queues and 2 cashier giving tickets to 2 persons simultaneously. This is an
example of Parallel Computing.
Module I
4
calculation. It does this in a rapid-fire fashion, over and over and over
again.
Table 1.1 shows the main differences between Serial and Parallel
Computing.
• Saves time and money. Parallel usage of more resources shortens the
task completion time, with potential cost saving. Parallel clusters can
be constructed from cheap components.
• Solve Larger Problems. Complex and large problems that are
impractical to solve by a single computer especially with limited
memory.
• Support Non-Local Resources. Network wide computer resources can
be utilized in scarcity at local resources.
• Parallel computing models the real world. Things don’t happen one
at a time, waiting for one event to finish before the next one starts.
To crunch numbers on data points in weather, traffic, finance,
industry, agriculture, oceans, ice caps, and healthcare, we need
parallel computers.
Module I
5
Parallel computing has also its overheads. Listed below are the current
limitations of parallel computing:
Module I
6
EXERCISE
Demonstrate a real world task that can be solved by serial
computing (ex. Sorting a deck of cards). Then, demonstrate how
to solve the same task using parallel computing. Indicate how
many seconds are consumed per demonstration. Finally,
identify which computing paradigm has the least execution
time.
THINK!
Give at least three (3) complex problems that cannot be
solved by serial computing.
Answer:
Module I
7
Lesson 2
Parallelism Decomposition
Decomposition
Module I
8
Decomposition Techniques
Module I
9
Figure 1.4 Partitioning of input and output matrices into 2x2 submatrices
Module I
10
Module I
11
EXERCISE
In this exercise, the goal is to add a series of 16 random numbers.
Addition can only be performed between two numbers at any
given time, and it assumed that it takes the same amount of time
to add any two numbers. Thus, one addition operation takes one
time step. Each number is written down on a separate note card.
For further clarification, take note of the following sample
scenarios:
Serial case:
One student is given the set of 16 random numbers and
is asked to add them up. To add 16 numbers, it takes a total of
15 addition operations.
Questions:
1. What is the total number of time steps for four
students?
2. What is the total number of time steps for eight
students?
Module I
12
Lesson 3
Communication
Parallel tasks typically need to exchange data so they can stay in sync
concerning changes in data values. Processors will rely on software to
communicate with each other. Each processor will operate normally and will
perform operations in parallel as instructed, pulling data from the computer’s
memory. Assuming all the processors remain in sync with one another, at the
end of a task, software will fit all the data pieces together.
Figure 3.1 The different types or modes of communication among processors: (a) one to
one, (b) one to many, (c) broadcast (one to all), and (d) gather and reduce.
Module I
13
Coordination
Module I
14
Race Condition
For example, let as assume that two threads each increment the value
of a global integer variable by 1. Typically, the following sequence of
operations would take place as shown of Figure 3.2:
In this case, the final value is 1 instead of the correct result of 2. This
occurs because here the increment operations are not mutually exclusive.
Mutually exclusive operations are those that cannot be interrupted while
accessing some resource such as a memory location.
Module I
15
Deadlocks
Module I
16
P1 P2
acquire x_lock: ok acquire y_lock: ok
acquire y_lock: wait acquire x_lock: wait
wait wait
wait wait
wait wait
... ...
Generally, deadlocks can occur in in any system that satisfies the four
conditions below:
Module I
17
void Philosopher
{
while(1)
{
// Section where the philosopher is using chopstick
wait(use_resource[x]);
wait(use_resource[(x + 1) % 5]);
// Section where the philosopher is thinking
signal(free_resource[x]);
signal(free_resource[(x + 1) % 5]);
}
}
Module I
18
Deadlock Detection Using a Resource Allocation Graph
Module I
19
Even though a cycle is present in the graph above, all the processes
can be executed successfully because of the extra resource instance of R2.
If P3 finished its task, P1 can now have access to R2. If P1 finished its task,
P2 can finally access R1 and also finish its task. Thus, no deadlock is present
in the system.
EXERCISE
Module I
20
Lesson 4
Parallel Architectures
Shared Memory are the models that rely on the shared memory
multiprocessors. Shared memory based parallel programming models
communicate by sharing the data objects in the global address space.Shared
memory models assume that all parallel activities can access all of memory.
Communication between parallel activities is through shared mutable state
that must be carefully managed to ensure correctness. Synchronization
primitives such as locks are used to enforce thismanagement.
Shared memory models can be further divided into two main classes
based upon memory access times which are the Uniform Memory Access (UMA)
and Non-Uniform Memory Access (NUMA).
Module I
21
Figure 4.1 shows the Uniform Memory Access model. In this model, a
single memory is used and accessed by all the processors with the help of
interconnection network. Each processor has equal memory accessing time
(latency) and access speed.
Figure 4.2 shows the Non-Uniform Memory Access model. This model
is also a multiprocessor model in which each processor is connected with the
dedicated memory. However, these small parts of the memory combine to
make a single address space. Unlike UMA, the access time of the memory
relies on the distance where the processor is placed which means varying
memory access time. It allows access to any of the memory location by
using the physical address.
Table 4.1 below presents the comparison between UMA and NUMA:
Module I
22
1. Data sharing between tasks is both fast and uniform due to the
proximity of memory to CPUs
2. Avoids redundant data copies by managing shared data in main
memory in caches
3. There is no need to write explicit code for processor interaction
and communication
Module I
23
Flynn’s Taxonomy
Module I
24
Module I
25
EXERCISE
Question:
If you were to implement a program that requires several
processor to frequently exchange data, what parallel computing
architecture would you utilize? Is it shared memory or
distributed memory model? Justify your answer in at most ten
sentences only.
Answer:
Module I
26
MODULE SUMMARY
Congratulations! You have just studied Module I. Now you are ready to
evaluate how much you have benefited from your reading by answering the
summative test. Good Luck!!!
Module I
27
SUMMATIVE TEST
Directions: Answer the following questions concisely. If possible, cite your
references to support your answer.
1. What are the two primary reasons for using parallel computing?
Module I
28
Module I