Parallel and Distributed
Computing
Muhammad Awais
Lecturer Computer Science
C.S Department, AIU
Course Learning Outcomes(CLOs)
Learn about parallel and distributed computers.
Write portable programs for parallel or distributed architectures
using Message-Passing Interface (MPI) library
Analytical modelling and performance of parallel programs.
Analyze complex problems with shared memory programming with
openMP.
Weeks Course Outline
1 Introduction to P & D Computing, Asynchronous/synchronous computation
/communication
2 Concurrency control, fault tolerance
3 GPU architecture and programming, heterogeneity
4 Interconnection topologies, load balancing,
5 Memory consistency model, memory hierarchies
6 Message passing interface (MPI), MIMD/SIMD, multithreaded programming
7- 8 Parallel algorithms & architectures, parallel I/O, performance analysis and tuning,
power
9-10 Programming models (data parallel, task parallel, process-centric, shared/distributed
memory),
11 Scalability and performance studies, scheduling, storage systems
12 Synchronization, and tools
13 Tools (Cuda, Swift, Globus, Condor)
14 Tools (Amazon AWS, OpenStack, Cilk)
15 gdb, threads, MPICH
16
Recommended books and material
Distributed Systems: Principles and Paradigms, A. S. Tanenbaum
and M. V. Steen, Prentice Hall, 2nd Edition, 2007
Distributed and Cloud Computing: Clusters, Grids, Clouds, and the
Future Internet, K Hwang, J Dongarra and GC. C. Fox, Elsevier, 1st
Ed.
Assessment criteria
ASSESSMENT ITEM WEIGHTAGE
Assignment 10%
Quizzes 10%
Project + C.P 10%
Mid-Term Examination 30%
Final Examination 40%
Total 100%
Parallel Computing (Intro)
Parallel Computing:
• In parallel computing, multiple processors or cores work on different parts of a
problem simultaneously. It’s typically used within a single system.
• Examples: Multi-core processors, GPUs (Graphics Processing Units).
Models:
• Shared Memory, Distributed Memory, SIMD (Single Instruction, Multiple Data),
MIMD (Multiple Instruction, Multiple Data).
Distributed Computing:
• Distributed computing spreads tasks across multiple computers connected by a
network, each working on a part of the task.
• Examples: Cloud computing, grid computing, Hadoop clusters.
Models:
• Client-server, peer-to-peer, actor model.
What Is Parallel Computing?
Serial Computing
software has been written for serial computation
• A problem is broken into a discrete series of instructions
• Instructions are executed sequentially one after another
• Executed on a single processor
• Only one instruction may execute at any moment in time
What Is Parallel Computing?
Serial Computing
Example
• A process of payroll software has multiple instructions
executed by single processor in system
What Is Parallel Computing?
Parallel Computing
• The simultaneous use of multiple compute resources to solve a
computational problem
• A problem is broken into discrete parts that can be solved concurrently
• Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different processors
• An overall control/coordination mechanism is employed
What Is Parallel Computing?
•Parallel programming – uses parallel hardware to execute computation more quickly,
but efficiency is its main concern.
What Is Parallel Computing?
•The computational problem should be able to:
• Be broken apart into discrete pieces of work that can be solved
simultaneously
• Execute multiple program instructions at any moment in time
• Be solved in less time with multiple compute resources than with a
single compute resource
•The compute resources are typically:
• A single computer with multiple processors/cores
• An arbitrary number of such computers connected by a network
What Is Parallel Computing?
•Virtually all stand-alone computers today are parallel from a hardware
perspective:
• Multiple functional units
• L1 cache, L2 cache, branch, pre-fetch, decode, floating-point,
graphics processing (GPU), integer, etc.)
• Multiple execution units/cores
• Multiple hardware threads
What Is Parallel Computing?
• For example, the schematic below shows a typical LLNL parallel computer
cluster:
• Each compute node is a multi-processor parallel computer in itself
• Multiple compute nodes are networked together with an Infinite-band
network
• Special purpose nodes, also multi-processor, are used for other purposes
Example of typical parallel computer cluster
Types of Parallel Computing
Bit-Level
Parallelism
Instruction-Level Task Level
Parallelism Parallelism
Approaches to Parallel Computing
Flynn’sTaxonomy:
Single Instruction Single Instruction
Single Data (SISD) Multiple Data (SIMD)
Multiple Instruction Multiple Instruction
Single Data (MISD) and Multiple Data
(MIMD)
Distributed System
A distributed system is a collection of independent computers that appears to
its users as a single coherent system
• the process of connecting multiple computers via a local network or
wide area network;
• they can act together as a single ultra-powerful computer capable of
performing computations
• no single computer within the network would be able to perform on its
own.
• Distributed computers offer two key advantages:
•Easy scalability.
•Redundancy.
Four networked computers and three applications:
• Application B is distributed across computers 2 and 3.
• Each application is offered the same interface
• Provides the means for components of a single distributed application to
communicate with each other, but also to let different applications communicate.
A distributed system organized as middleware. The middleware layer
extends over multiple machines, and offers each application the same
interface.
Transparency Goals
Distributed System
Asynchronous and Synchronous
Computation
•Synchronous:
• Tasks wait for one another to finish before moving forward.
• Communication requires the sender to wait for a response.
•Asynchronous:
• Tasks continue without waiting for others to complete.
• Communication allows the sender to proceed without waiting for an
acknowledgment.
Asynchronous and Synchronous
Computation
•Key Concept:
•Tasks or threads must complete a step before moving on to the next one.
•Example:
•Parallel Reduction: Summing an array across multiple cores, where all cores
synchronize before combining results.
•Benefit:
•Guarantees consistency, but can cause delays if one thread is slower.
• Key Concept:
• Threads perform tasks independently, without waiting for others.
• Example:
• Asynchronous I/O: One thread loads data from memory while another
processes already-loaded data (e.g., matrix multiplication).
• Benefit:
• Faster processing, as threads remain busy and avoid idle time.
Parallel Computing