0% found this document useful (0 votes)

65 views36 pages

Slides

Uploaded by

ossamasamir.workout

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views36 pages

Slides

Uploaded by

ossamasamir.workout

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

CS476

Parallel and distributed systems

Module 1
Introduction to Parallel and Distributed Computing.

1
Contents

1. Introduction-Scope, issues , applications.

2. Challenges of Parallel and Distributed Computing.
3. Implicit Parallelism: Trends in Microprocessor Architectures.
4. Dichotomy of Parallel Computing Platforms.
5. Physical Organization, co-processing.

2
Weekly Learning Outcomes
1. What is parallel or distributed computing and why is it necessary?

2. Explain Scope, and Applications and challenges of parallel and

distributed computing?

3. Understanding Parallel Platforms' Communication Model.

3
Required Reading
Chapter 1 Introduction to Parallel and Distributed Computing: (A Grama, AGupra, G
Karypis, V Kumar. Introduction to Parallel Computing (2nd ed.). Addison Wesley, 2003.)

Recommended Reading:
Introduction to parallel and distributed processing:
https://www.youtube.com/watch?v=RNVIcm8-6RE
Implicit parallelism https://www.youtube.com/watch?v=9_uX7L7-7ME
https://www.cs.rochester.edu/users/faculty/sandhya/csc258/

4
What is Parallel Computing:
The process of running numerous processors an application or computation
simultaneously is referred to as parallel computing. In general, it refers to a type of
computing architecture where big issues are divided into separate, smaller, typically
related sections that can be processed all at once. Multiple CPUs work together to
complete it by exchanging information across shared memory, which then combines
the findings. It facilitates the execution of complex computations by distributing the
enormous problem among multiple processors.

What is Distributed Computing :

In distributed computing, a user sees a single system that is actually made up of several
autonomous computers. There is no shared memory in distributed systems, and
machines connect with one another by exchanging messages. A single work is split up
among several processors in distributed computing.
Types of parallel computing
1. Bit-level parallelism: The form of parallel computing in which every task is dependent
on processor word size. In terms of performing a task on large-sized data, it reduces the
number of instructions the processor must execute. There is a need to split the operation into
series of instructions. For example, there is an 8-bit processor, and you want to do an
operation on 16-bit numbers. First, it must operate the 8 lower-order bits and then the 8
higher-order bits. Therefore, two instructions are needed to execute the operation. The
operation can be performed with one instruction by a 16-bit processor.
2. Instruction-level parallelism: In a single CPU clock cycle, the processor decides in
instruction- level parallelism how many instructions are implemented at the same time. For
each clock cycle phase, a processor in instruction-level parallelism can have the ability to
address that is less than one instruction.
3. Task Parallelism: Task parallelism is the form of parallelism in which the tasks are
decomposed into subtasks. Then, each subtask is allocated for execution. And, the execution
of subtasks is performed concurrently by processors.
Parallel computing

Observation
High-performance distributed computing started
with parallel computing
Multiprocessor and multicore versus multicomputer
Moore’s law (attributed to Gordon Moore, Founder of Intel): Number
of transistors doubles every 2 years
Leveraging Moore’s Law
More transistors – opportunities for
exploiting parallelism
Implicit parallelism
Pipelining
Superscalar
Explicit parallelism
Streaming and multimedia processor
extensions
– E.g., MMX, Altivec
Very long instruction words (VLIW)
Uniprocessor Limits
The power problem!

http://www.tomshardware.com/2005/11/21/the_mother_of_all_cpu_charts_2005
S.No. Parallel Computing Distributed Computing
1 Many operations are performed System components are located at
simultaneously different locations
2 Single computer is required Uses multiple computers

3 Multiple processors perform multiple Multiple computers perform multiple

operations operations
4 It may have shared or distributed It have only distributed memory
memory
5 Processors communicate with each Computer communicate with each
other through bus other through message passing.
6 Improves the system performance Improves system scalability, fault
tolerance and resource sharing
capabilities
Applications of Parallel Computing
There are various applications of Parallel Computing, which are as
follows:
1. One of the primary applications of parallel computing is Databases
and Data mining.
2. The real-time simulation of systems is another use of parallel
computing.
3. The technologies, such as Networked videos and Multimedia.
4. Science and Engineering.
5. Collaborative work environments.
6. The concept of parallel computing is used by augmented reality,
advanced graphics, and virtual reality.
Applications of Distributed computing:

Social networks, mobile systems, online

banking, and online gaming (e.g. multiplayer
systems) also use efficient distributed
systems. Additional areas of application for
distributed computing include e-learning
platforms, artificial intelligence, and e-
commerce.
What are the challenges of parallel and distributed
computing?
Important concerns are workload sharing, which attempts to take advantage of
access to multiple computers to complete jobs faster; task migration, which
supports workload sharing by efficiently distributing jobs among machines; and
automatic task replication, which occurs at different sites for greater
reliability.
Heterogeneity: The Internet enables users to access services and run applications
over a heterogeneous collection of computers and networks. ...
Transparency: ...
Openness. ...
Concurrency. ...
Security. ...
Scalability. ...
Failure Handling.
Distributed Systems Issues
• The lack of global knowledge.
• Naming.
• Scalability: Size, geographically, administratively
• Compatibility.
• Process synchronization (requires global knowledge)
• Resource management (requires global knowledge)
• Reliability/fault tolerance: fail-stop, byzantine (arbitrary) failure models
• Problems:

– Distributed consensus
– Replication, caching consistency
– Security and trust
Scope of Parallelism
Conventional architectures coarsely comprise of a processor,
memory system, and the data path. Each of these components'
present significant performance bottlenecks. Parallelism addresses
each of these components in significant ways. Different applications
utilize different aspects of parallelism – e.g., data intensive
applications utilize high aggregate throughput, server applications
utilize high aggregate network bandwidth, and scientific applications
typically utilize high processing and memory system performance. It
is important to understand each of these performance bottlenecks.
Implicit Parallelism:
Microprocessor clock speeds have posted impressive gains over
the past two decades (two to three orders of magnitude) Higher
levels of device integration have made available a large number of
transistors.
-The question of how best to utilize these resources is an important
one.
-Current processors use these resources in multiple functional units
and execute multiple instructions in the same cycle.
-The precise manner in which these instructions are selected and
executed provides impressive diversity in architectures.
Very Long Instruction Word (VLIW) Processors
The hardware cost and complexity of the superscalar scheduler is a major consideration in
processor design.
#To address this issues, VLIW processors rely on compile time analysis to identify and
bundle together instructions that can be executed concurrently.
#These instructions are packed and dispatched together, and thus the name very long
instruction word.
#This concept was used with some commercial success in the Multiflow Trace machine
(circa 1984).
#Variants of this concept are employed in the Intel IA64 processors.
Dichotomy of Parallel Computing Platforms:
• An explicitly parallel program must specify concurrency and
interaction between concurrent subtasks.
• The former is sometimes also referred to as the control structure
and the latter as the communication model.
Control Structure of Parallel Programs
q Parallelism can be expressed at various levels of granularity – from instruction
level to processes.
q Between these extremes exist a range of models, along with corresponding
architectural support.
q Processing units in parallel computers either operate under the centralized
control of a single control unit or work independently.
q If there is a single control unit that dispatches the same instruction to
various processors (that work on different data), the model is referred to as
single instruction stream, multiple data stream (SIMD).
q If each processor has its own control control unit, each processor can execute
different instructions on different data items. This model is called multiple
instruction stream, multiple data stream (MIMD).
SIMD and MIMD Processors

A typical SIMD architecture (a) and a typical MIMD architecture (b).

SIMD-MIMD Comparison

• Some of the earliest parallel computers such as the • In contrast to SIMD processors, MIMD processors
Illiac IV, MPP, DAP, CM-2, and MasPar MP-1 can execute different programs on different
processors.
belonged to this class of machines.
• A variant of this, called single program multiple
• Variants of this concept have found use in co-
data streams (SPMD) executes the same program
processing units such as the MMX units in Intel
on different processors.
processors and DSP chips such as the Sharc.

• SIMD relies on the regular structure of • It is easy to see that SPMD and MIMD are closely
computations (such as those in image processing). related in terms of programming flexibility and
underlying architectural support.
• It is often necessary to selectively turn off
operations on certain data items. For this reason, • Examples of such platforms include current
most SIMD programming paradigms allow for an generation Sun Ultra Servers, SGI Origin Servers,
“activity mask”, which determines if a processor multiprocessor PCs, workstation clusters, and the
should participate in a computation or not. IBM SP.
Communication Model of Parallel Platforms

üThere are two primary forms of data exchange between parallel

tasks – accessing a shared data space and exchanging messages.

üPlatforms that provide a shared data space are called shared

address-space machines or multiprocessors.

üPlatforms that support messaging are also called message passing

platforms or multi computers.
Shared Memory Parallel Systems
• Multiple processors can access the same memory simultaneously
• Challenges:
– One processor’s cache may contain a copy of data that was just
modified by another
Requires hardware support for coherence
Two processes’ operations may be interleaved
• in unintuitive ways
• Requires hardware support and guarantees on atomicity and
ordering
Shared-Memory Parallel Computer
Shared-Memory Parallel Computer
Shared Memory
Programming through threading
Multiple processors share a pool of memory
Problems: cache coherence
UMA vs. NUMA architecture

Pros:
Easier to program (probably)

Cons:
Performance may surfer if the memory is located on
distant machines
Limited scalability
Distributed-Memory Parallel Computer
Distributed-Memory Parallel Computer
Distributed Memory
Programming through processes
Explicit message passing
Networking
Pros:
Tighter control on message passing

Cons:
Harder to program
Modern supercomputers are hybrids!
Multicore Resource Management

Multicore systems contain many shared resources, e.g.,

memory bandwidth and cache

Problem:

– Management of shared resources for Efficiency, fairness

Distributed Memory Parallel Systems
vParallel systems that do not share memory
vSoftware system support for communication
(point-to-point, group)
vData must be explicitly partitioned and transferred
when needed
vDynamic workload management?
Shared-Address-Space Platforms
• Part (or all) of the memory is accessible to all processors.
• Processors interact by modifying data objects stored in this
shared-address-space.

• If the time taken by a processor to access any memory

word in the system global or local is identical, the
platform is classified as a uniform memory access
(UMA), else, a non uniform memory access (NUMA)
machine.
NUMA and UMA Shared-Address-Space Platforms:
§ The distinction between NUMA and UMA platforms is important from the point of
view of algorithm design. NUMA machines require locality from underlying
algorithms for performance. • Programming these platforms is easier since reads and
writes are implicitly visible to other processors.

§ However, read-write data to shared data must be coordinated (this will be discussed in
greater detail when we talk about threads programming).

§ Caches in such machines require coordinated access to multiple copies. This leads to
the cache coherence problem.

§ A weaker model of these machines provides an address map, but not coordinated
access. These models are called non cache coherent shared address space machines.
•
Shared-Address-Space vs. Shared Memory Machines
oIt is important to note the difference between the terms
shared address space and shared memory.

oWe refer to the former as a programming abstraction and to

the latter as a physical machine attribute.

oIt is possible to provide a shared address space using a

physically distributed memory.
Physical Organization of Parallel Platforms:
We begin this discussion with an ideal parallel machine called Parallel Random Access
Machine, or PRAM.
Architecture of an Ideal Parallel Computer

o A natural extension of the Random Access Machine (RAM) serial

architecture is the Parallel Random Access Machine, or PRAM.

o PRAMs consist of p processors and a global memory of unbounded size

that is uniformly accessible to all processors.

o Processors share a common clock but may execute different instructions in

each cycle.

o Depending on how simultaneous memory accesses are handled, PRAMs

can be divided into four subclasses.
Contin….
• Exclusive-read, exclusive-write (EREW) PRAM.

• Concurrent-read, exclusive-write (CREW) PRAM

• Exclusive-read, concurrent-write (ERCW) PRAM.

• Concurrent-read, concurrent-write (CRCW) PRAM.

Physical Complexity of an Ideal Parallel Computer
• Processors and memories are connected via switches.

• Since these switches must operate in O(1) time at the level

of words, for a system of p processors and m words, the
switch complexity is O (mp ).

• Clearly, for meaningful values of p and m, a true PRAM is

not realizable.

PDC Notes Complete- Updated
No ratings yet
PDC Notes Complete- Updated
52 pages
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
100% (2)
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
506 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
CS-3006_2_PDC_Overview_compressed
No ratings yet
CS-3006_2_PDC_Overview_compressed
107 pages
Lecture 1 - Parallel and Distributed Computing
100% (1)
Lecture 1 - Parallel and Distributed Computing
25 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
OS UNIT-I
No ratings yet
OS UNIT-I
56 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Multithreading
No ratings yet
Multithreading
143 pages
CC_UNIT_1.2
No ratings yet
CC_UNIT_1.2
39 pages
CS439 CC 2 Parallel Distributed Systems[1]
No ratings yet
CS439 CC 2 Parallel Distributed Systems[1]
37 pages
Unit 5
No ratings yet
Unit 5
66 pages
Parallel Programming- Unit 1
No ratings yet
Parallel Programming- Unit 1
81 pages
PDA_2
No ratings yet
PDA_2
105 pages
Unit 1
No ratings yet
Unit 1
54 pages
Advanced Os
No ratings yet
Advanced Os
120 pages
Co-1 (2)
No ratings yet
Co-1 (2)
66 pages
U1&U2 PADCOM-25 (2)
No ratings yet
U1&U2 PADCOM-25 (2)
95 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Week 14 Applications of Parallel and Distributed Computing
No ratings yet
Week 14 Applications of Parallel and Distributed Computing
10 pages
Aca Notes
No ratings yet
Aca Notes
148 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Watercolor Organic Shapes SlidesMania
No ratings yet
Watercolor Organic Shapes SlidesMania
23 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
1. introduction
No ratings yet
1. introduction
17 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Lecture 2 Introduction to Parallel and Distributed Computing
No ratings yet
Lecture 2 Introduction to Parallel and Distributed Computing
29 pages
unit 4
No ratings yet
unit 4
16 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
6 pages
1.3. Underlying Principles of Parallel and Distributed Computing
No ratings yet
1.3. Underlying Principles of Parallel and Distributed Computing
118 pages
Lecture 1 Introduction To PDC
No ratings yet
Lecture 1 Introduction To PDC
17 pages
Parallel Computing972003 1223239697675005 9
No ratings yet
Parallel Computing972003 1223239697675005 9
32 pages
Parallel Computing
No ratings yet
Parallel Computing
25 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
1-Introduction
No ratings yet
1-Introduction
48 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Chapter 1: Introduction Chapter 1: Introduction: 1.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts
No ratings yet
Chapter 1: Introduction Chapter 1: Introduction: 1.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts
55 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
CS621_Handouts - Mids
No ratings yet
CS621_Handouts - Mids
61 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Multiprocessing vs Multithreading 2
No ratings yet
Multiprocessing vs Multithreading 2
16 pages
cloud computing
No ratings yet
cloud computing
30 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Types of Parallel Computing
No ratings yet
Types of Parallel Computing
11 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
70 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Introduction To Parallel Computing Tutorial - HPC at LLNL
No ratings yet
Introduction To Parallel Computing Tutorial - HPC at LLNL
46 pages
Module -4 - Parallel Processing
No ratings yet
Module -4 - Parallel Processing
32 pages
Unit V Processes and Operating Systems
100% (1)
Unit V Processes and Operating Systems
37 pages
Assignment 1st PC
No ratings yet
Assignment 1st PC
12 pages
Os-Unit 1
No ratings yet
Os-Unit 1
64 pages
Answer:: Question 1. What Is Microsoft Dos and Windows Command Line?
No ratings yet
Answer:: Question 1. What Is Microsoft Dos and Windows Command Line?
39 pages
Unit-I QISCET 2025
No ratings yet
Unit-I QISCET 2025
50 pages
PDC-3
No ratings yet
PDC-3
26 pages
Guided by Done By: Investigating The Schedulability of Periodic Real-Time Tasks in Virtualized Cloud Environment
No ratings yet
Guided by Done By: Investigating The Schedulability of Periodic Real-Time Tasks in Virtualized Cloud Environment
31 pages
Map Reduce
No ratings yet
Map Reduce
11 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
Pdc Digital Notes 6 17
No ratings yet
Pdc Digital Notes 6 17
12 pages
Operating System Notes-1
No ratings yet
Operating System Notes-1
11 pages
Loosely Coupled Architecture
No ratings yet
Loosely Coupled Architecture
25 pages
Operating System Goals: Execute User Programs and Solve User Problems
No ratings yet
Operating System Goals: Execute User Programs and Solve User Problems
24 pages
(eBook PDF) Computer Organization and Architecture10th Global Editionpdf download
100% (4)
(eBook PDF) Computer Organization and Architecture10th Global Editionpdf download
38 pages
Term Paper Cse 211
No ratings yet
Term Paper Cse 211
20 pages
Chapter 1 - Introduction - Slides - 2
No ratings yet
Chapter 1 - Introduction - Slides - 2
48 pages
A Review On Use of MPI in Parallel Algorithms: IPASJ International Journal of Computer Science (IIJCS)
No ratings yet
A Review On Use of MPI in Parallel Algorithms: IPASJ International Journal of Computer Science (IIJCS)
8 pages
Distributed Systems: An Introduction
No ratings yet
Distributed Systems: An Introduction
43 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
Parallelism
No ratings yet
Parallelism
22 pages
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
BY Juhi Agarwal Assistant Professor CIT Upes
No ratings yet
BY Juhi Agarwal Assistant Professor CIT Upes
46 pages
W 14 Clusters and Message-passing Multiprocessor
No ratings yet
W 14 Clusters and Message-passing Multiprocessor
22 pages
Detailed Syllabus1
No ratings yet
Detailed Syllabus1
17 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Intro PDC1
No ratings yet
Intro PDC1
3 pages
History of Operating Systems
No ratings yet
History of Operating Systems
6 pages
Systemc in QT GUI PDF
No ratings yet
Systemc in QT GUI PDF
11 pages
OS Question Bank Unit 1-5
No ratings yet
OS Question Bank Unit 1-5
9 pages
Symmetric Multiprocessors: Features of Symmetric Multiprocessing (SMP)
No ratings yet
Symmetric Multiprocessors: Features of Symmetric Multiprocessing (SMP)
3 pages
Programming with Patterns in Parallel and Distributed Systems
From Everand
Programming with Patterns in Parallel and Distributed Systems
Pasquale De Marco
No ratings yet
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet