0% found this document useful (0 votes)

19 views32 pages

Parallel_computing

Uploaded by

Yash Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views32 pages

Parallel_computing

Uploaded by

Yash Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Talking about…

• Multi-core processors/chips
• CMP
• MPSoCs
Why parallel architectures
• Concurrency in applications

• Limits to frequency:
– The memory wall
– The ILP wall
– the power wall

• Multicore can reduce power

Example
• Core A
• Execution time = 10 sec A
• P=10 W
• E=100 J

• 2 Core A
• Execution time = 5 sec A A
• P=2*10W = 20W
• E=100 J

• 2 Core A
• Execution time = 10 sec A A
• P=2*5W = 10W (no saving)
• E=100 J
Example
• Core B
• Execution time = 15 sec B
• P=5 W
• E=100 J

• 2 Core B
• Execution time = 7.5 sec B B
• P=2*5W = 10W
• E=75 J
Why (multiple) slower cores are power-efficient
• Pswitch = Activity * Frequency * V2
rule of thumb ∝ F2

• Less pipeline stages are needed (lower number of flip flops)

• Synthesizing using a higher frequency constraints leads to

more power hungry netlists

• In a multi-core cores can be specialized

Ahmdal’s law
• Amdahl's Law states that potential program speedup is defined by the fraction of code (P)
that can be parallelized:
speedup = 1/1 - P
• Introducing the number of processors performing the parallel fraction of work, the
relationship can be modeled by:
speedup = 1/ (S+ P/N)
where
P = parallel fraction,
N = number of processors and
S = serial fraction.
Limitation and costs
• Complexity

• Portability

• Resource requirement

• Scalability
MPSoCs, a stack view

Application

APIs APIs
OS Communication
APIs
APIs APIs
HAL

Core A Core B Core C Core D

Interconnect and memory subsystem I/O

Serial computation

 Traditionally software has been written for serial computation:

- run on a single computer
- instructions are run one after another
- only one instruction executed at a time
Parallel Computing

 Simultaneous use of multiple compute sources to solve a single

problem
Concepts and Terminology

 Flynn’s Taxonomy (1966)

SISD SIMD

Single Instruction, Single Data Single Instruction, Multiple Data

MISD MIMD

Multiple Instruction, Single Data Multiple Instruction, Multiple Data

SISD

 Serial computer
 Deterministic execution
 Examples: older generation main frames, work stations, PCs
SIMD
 A type of parallel computer
 All processing units execute the same instruction at any given clock cycle
 Each processing unit can operate on a different data element
 Two varieties: Processor Arrays and Vector Pipelines
 Most modern computers, particularly those with graphics processor units
(GPUs) employ SIMD instructions and execution units.
MISD
 A single data stream is fed into multiple processing units.
 Each processing unit operates on the data independently via independent
instruction streams.
 Few actual examples : Carnegie-Mellon C.mmp computer (1971).
MIMD
 Currently, most common type of parallel computer
 Every processor may be executing a different instruction stream Every
 processor may be working with a different data stream Execution can
 be synchronous or asynchronous, deterministic or non-
deterministic
 Examples: most current supercomputers, networked parallel computer
clusters and "grids", multi-processor SMP computers, multi-core PCs.
 Note: many MIMD
architectures also
include SIMD execution
sub-components
Parallel Computer Architectures

 Shared memory:
all processors can access the
same memory

 Uniform memory access (UMA):

- identical processors
- equal access and access
times to memory
Non-uniform memory access (NUMA)
 Not all processors have equal access to all memories

 Memory access across link is slower

 Advantages:
- user-friendly programming perspective to memory
- fast and uniform data sharing due to the proximity of memory to CPUs
 Disadvantages:
- lack of scalability between memory and CPUs.
- Programmer responsible to ensure "correct" access of global
- Expense
Distributed memory

 Distributed memory systems require a communication network to

connect inter-processor memory.
 Advantages:
- Memory is scalable with number of processors.
- No memory interference or overhead for trying to keep cache coherency.
- Cost effective
 Disadvantages:
- programmer responsible for data communication between processors.
- difficult to map existing data structures to this memory organization.
Hybrid distributed-shared memory

 Generally used for the currently largest and fastest

computers
 Has a mixture of previously mentioned advantages and
disadvantages
Parallel programming models

 Shared memory
 Threads
 Message Passing
 Data Parallel
 Hybrid

All of these can be implemented on any architecture.

Shared memory

 tasks share a common address space, which they read and write
asynchronously.
 Various mechanisms such as locks / semaphores may be used to
control access to the shared memory.
 Advantage: no need to explicitly communicate of data between
tasks -> simplified programming
 Disadvantages:
• Need to take care when managing memory, avoid
synchronization conflicts
• Harder to control data locality
Threads

 A thread can be considered as a

subroutine in the main program
 Threads communicate with each
other through the global memory
 commonly associated with shared
memory architectures and
operating systems
 Posix Threads or pthreads
 OpenMP
Message Passing

 A set of tasks that use their

own local memory during
computation.
 Data exchange through
sending and receiving
messages.
 Data transfer usually requires
cooperative operations to be
performed by each process.
For example, a send operation
must have a matching receive operation.

 MPI (released in 1994)

 MPI-2 (released in 1996)
Data Parallel
 The data parallel model demonstrates the following characteristics:
• Most of the parallel work
performs operations on a data
set, organized into a common
structure, such as an array
• A set of tasks works collectively
on the same data structure,
with each task working on a
different partition
• Tasks perform the same operation
on their partition

 On shared memory architectures, all tasks may have access to the

data structure through global memory. On distributed memory
architectures the data structure is split up and resides as "chunks"
in the local memory of each task.
Other Models
 Hybrid
- combines various models, e.g. MPI/OpenMP
 Single Program Multiple Data (SPMD)
- A single program is executed by all tasks simultaneously

 Multiple Program Multiple Data (MPMD)

- An MPMD application has multiple executables. Each task can
execute the same or different program as other tasks.
Designing Parallel Programs

 Examine problem:
- Can the problem be parallelized?
- Are there data dependencies?
- where is most of the work done?
- identify bottlenecks (e.g. I/O)

 Partitioning
- How should the data be decomposed?
Various partitionings
How should the algorithm be decomposed
Communications
 Types of communication:

- point-to-point

- collective
Synchronization types

 Barrier
• Each task performs its work until it reaches the barrier. It then
stops, or "blocks".
• When the last task reaches the barrier, all tasks are
synchronized.
 Lock / semaphore
• The first task to acquire the lock "sets" it. This task can then
safely (serially) access the protected data or code.
• Other tasks can attempt to acquire the lock but must wait until
the task that owns the lock releases it.
• Can be blocking or non-blocking
Load balancing

 Keep all tasks busy all of the time. Minimize idle time.
 The slowest task will determine the overall performance.
Granularity
 In parallel computing, granularity is a qualitative
measure of the ratio of computation to
communication.
 Fine-grain Parallelism:
- Low computation to communication ratio
- Facilitates load balancing
- Implies high communication overhead and less
opportunity for performance enhancement
 Coarse-grain Parallelism:
- High computation to communication ratio
- Implies more opportunity for performance
increase
- Harder to load balance efficiently

Oracle Cloud Infrastructure Solution Engineer Specialist: Answers
60% (5)
Oracle Cloud Infrastructure Solution Engineer Specialist: Answers
2 pages
BIM Portfolio
No ratings yet
BIM Portfolio
60 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Multi Threading
No ratings yet
Multi Threading
168 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
PDA_2
No ratings yet
PDA_2
105 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Flynns
No ratings yet
Flynns
41 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
1. GPU Unit-1
No ratings yet
1. GPU Unit-1
10 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
Unit V
No ratings yet
Unit V
95 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Unit 5
No ratings yet
Unit 5
96 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
72 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Parallel and Distributed Algorithms: Johnnie W. Baker
No ratings yet
Parallel and Distributed Algorithms: Johnnie W. Baker
67 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
WhatsApp Chat with (069)LEARN CODINGPROGRAMMING USING MOBILE PHONE ? OR COMPUTER ? FREE ONLINE
No ratings yet
WhatsApp Chat with (069)LEARN CODINGPROGRAMMING USING MOBILE PHONE ? OR COMPUTER ? FREE ONLINE
34 pages
5-Binding Epicor ERP Combo Box Values To User Codes - GingerHelp
No ratings yet
5-Binding Epicor ERP Combo Box Values To User Codes - GingerHelp
7 pages
Task Completion Data Sheet - Ring Stacker - 032706
No ratings yet
Task Completion Data Sheet - Ring Stacker - 032706
1 page
BE1 951 IMPackage
No ratings yet
BE1 951 IMPackage
531 pages
Instruction Manual PC Communication Software: Panaterm For BL
No ratings yet
Instruction Manual PC Communication Software: Panaterm For BL
49 pages
ERASPEC - F M 8003 Short
No ratings yet
ERASPEC - F M 8003 Short
2 pages
arc42 by Examples
No ratings yet
arc42 by Examples
262 pages
MS-Power Point INTRODUCTION AND IMPORTANT TOPICS
No ratings yet
MS-Power Point INTRODUCTION AND IMPORTANT TOPICS
7 pages
BF KFW RMMV FS 02 MMS 220527 04
No ratings yet
BF KFW RMMV FS 02 MMS 220527 04
3 pages
Aisha Hamdan - h00271661 - Electronic Engineering - CV
No ratings yet
Aisha Hamdan - h00271661 - Electronic Engineering - CV
2 pages
Appendix C Valid Postcode Format: Guidance On Recording Valid Postcodes
No ratings yet
Appendix C Valid Postcode Format: Guidance On Recording Valid Postcodes
2 pages
Lecture # 44: Selector
No ratings yet
Lecture # 44: Selector
4 pages
ID card 25 Nov 2024
No ratings yet
ID card 25 Nov 2024
1 page
MegaStat Users Guide Mac 2022
No ratings yet
MegaStat Users Guide Mac 2022
70 pages
Hassan Zulfiqar Haider 0323414090
No ratings yet
Hassan Zulfiqar Haider 0323414090
7 pages
Calls For Service PDF
No ratings yet
Calls For Service PDF
53 pages
Project Report 1 (15)
No ratings yet
Project Report 1 (15)
35 pages
The ACS Core Body of Knowledge For ICT Professionals
No ratings yet
The ACS Core Body of Knowledge For ICT Professionals
1 page
Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook
No ratings yet
Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook
23 pages
Java Unit 4
No ratings yet
Java Unit 4
19 pages
Klasa 5 Unit Test 2 B - PDF
No ratings yet
Klasa 5 Unit Test 2 B - PDF
1 page
Sample Report - Front Pages
No ratings yet
Sample Report - Front Pages
16 pages
COMP 261 A1 MS Office An Introduction
No ratings yet
COMP 261 A1 MS Office An Introduction
2 pages
VisionPLUS 10 Processing Control Table.doc
No ratings yet
VisionPLUS 10 Processing Control Table.doc
14 pages
P RefStd - 4041 - v080201 - EN - HAZID
No ratings yet
P RefStd - 4041 - v080201 - EN - HAZID
49 pages
Computer Vision For A Camel-Vehicle Collision Mitigation System
No ratings yet
Computer Vision For A Camel-Vehicle Collision Mitigation System
9 pages
Brainchild VR18 Manual
No ratings yet
Brainchild VR18 Manual
96 pages
pkcs11 Base v3.0
No ratings yet
pkcs11 Base v3.0
167 pages