0% found this document useful (0 votes)

44 views8 pages

Basic Parallel Programming Methods

Uploaded by

ABDUL MAJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views8 pages

Basic Parallel Programming Methods

Uploaded by

ABDUL MAJITH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Real-Time Systems

Lecture Topic – Review of Basic Concurrent/Parallel Programming

Dr. Sam Siewert
Electrical, Computer and Energy Engineering
Embedded Systems Engineering Program

Copyright © 2019 University of Colorado

Flynn’s Taxonomy – Parallel Systems
SISD – Single core, no vector
instructions Single Instruction/Prog Multiple Instruction
Single Data SISD (Traditional Uni- MISD
SIMD Ideal for Large Bitwise, processor)
Integer, and Floating Point
Vector Math
Multiple Data SIMD (SSE 4.2, Vector MIMD
Flynn’s Taxonomy Processing)
R-Pi 3b+/4 – MIMD SPMD (Single Program MPMD (Multi-threaded
Multi-core, NEON vector
instructions Multiple Data), GP-GPU Program, Multi-Data)

MIMD and SPMD Architecture

often leverages GP-GPU Co-
Processors

DSP VLIW (SIMD) or MIMD

(e.g. Beagle Bone AI)

2
Parallel Programming for Speed-up
Sharpen single core Demonstrations

Both are threaded, but

erast.c has semaphore
locks and sharpen does
not.

Sharpen with thread grid 1. erast.c

• Without locks
do we risk data
corruption
• Indivisible test
and set?
Can use Shared Memory with POSIX Threads – but may need locking! • Concurrent
– Locking will serialize and slow down code if sequential sections are too long reader and
– erast.c vs. erastsimp.c is a good example writer?
– Can we just run lockless?
2. sharpen_grid.c
Speed up is? – Linear?, Better?, Worse?

3
Scaling and Bottlenecks
Compiler Optimization 1 - Simple and Effective: turn on compiler optimization ~ 3x
– Turn on higher levels of optimization
– Level 3 optimization: –03 for gcc or g++
– Highest is -04, but requires feedback optimization
2 - Simple and Sometimes Effective: turn on NEON SIMD ~ 1.f x
SIMD Vector Instructions
– Turn on SIMD (NEON) instruction generation on ARM A-Series
– Flynn’s taxonomy

Using Multiple Cores 3 - Harder and Mostly Effective: Grid to Map and Reduce ~ 3.2x
– Shared Memory POSIX Threads

Combine #1, #2, and #3

Co-Processing 4 - Hardest and Highly Effective: Grid programming 128 SPs

– Linux SMP
– With advanced platforms like Jetson Nano with CUDA

~ 70x

4
Theoretical Speed-Up – Linear at Best

Speed-Up
< Linear

Due to Sequential Section

(Mapping - Split)

Compared to Parallel Section

(Gridded - Apply)

…and Due to Final Step

(Combine)

5
Parallel Processing Speed-up
Grid Data Processing Speed-up
1. Multi-Core, Multi-threaded, Macro-blocks/Frames
2. SIMD, Vector Instructions Operating over Large Words (Many Times
Instruction Set Size)
3. Co-Processor Operates in Parallel to CPU(s)

SPMD – GPU or GP-GPU Co-Processor

– PCI-Express Bus Interfaces
– Transfer Program and Data to Co-Processor S is infinite here
– Threads and Blocks to Transform Data Concurrently
1
Image Data Processing – Few Data Dependencies Max _ Speed _ Up =
– Good Speed-up by Amdahl’s Law
(1 − P) + 0
– P=Parallel Portion
– (1-P)=Sequential Portion 1
Multicore _ Speed _ Up =
– S=# of Cores (Concurrency) (1 − P) + P / S
– Overhead for Co-Processor
– IO for Co-Processing

6
Conceptual View of Hardware Resources
Three-Space View of CPU-bound HPC vs. RT or Fair
Utilization
Goal is to fully use
Requirements All resources to scale!
– CPU Margin?
– IO Latency (and CPU-Use
Bandwidth) Margin?
– Memory Capacity (and
Latency) Margin? CPU, I/O,
Mem bound
Upper Right Front Corner –
Low-Margin CPU, I/O
Mem Margin IO-Use
I/O-bound
Origin – High-Margin
Memory-Use
CPU + I/O + Memory
Bound?! – Bad day!

memory-bound

Ericsson GSM OSS
No ratings yet
Ericsson GSM OSS
2 pages
The CROW (1994-2005) - Complete Movie and TV Series - 480p-720p x264
No ratings yet
The CROW (1994-2005) - Complete Movie and TV Series - 480p-720p x264
2 pages
CV Template Engineer
No ratings yet
CV Template Engineer
1 page
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
A Comprehensive Survey of Various Processor Types & Latest Architectures
No ratings yet
A Comprehensive Survey of Various Processor Types & Latest Architectures
7 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
126 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Module1 PP BDS701 Notes
No ratings yet
Module1 PP BDS701 Notes
27 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Lecture13 - Full IS1500
No ratings yet
Lecture13 - Full IS1500
34 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Chapter 06
No ratings yet
Chapter 06
76 pages
Flynn's Taxonomy and SISD SIMD MISD MIMD
86% (14)
Flynn's Taxonomy and SISD SIMD MISD MIMD
7 pages
Week 4 PDC
No ratings yet
Week 4 PDC
11 pages
Unit IV CA
No ratings yet
Unit IV CA
73 pages
Chapter 7
No ratings yet
Chapter 7
25 pages
Flynn's Taxonomy: 1. Sisd
No ratings yet
Flynn's Taxonomy: 1. Sisd
7 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
MCAP
No ratings yet
MCAP
32 pages
CS-3006!3!1 SIMD Intrinsic Programming Reduced
No ratings yet
CS-3006!3!1 SIMD Intrinsic Programming Reduced
55 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
Lec 5
No ratings yet
Lec 5
14 pages
Ispc - A SPMD Compiler For High-Performance CPU Programming (Ispc - Inpar - 2012)
No ratings yet
Ispc - A SPMD Compiler For High-Performance CPU Programming (Ispc - Inpar - 2012)
13 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Multi
No ratings yet
Multi
5 pages
Flynn's and Fengs Architecture
No ratings yet
Flynn's and Fengs Architecture
28 pages
Mcap Notes
No ratings yet
Mcap Notes
186 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Parallel Computing
No ratings yet
Parallel Computing
34 pages
1.2 Underlying Principles of Parallel and Distributed Computing
No ratings yet
1.2 Underlying Principles of Parallel and Distributed Computing
42 pages
Architecture
No ratings yet
Architecture
67 pages
Cs8083 Notes Mcap
No ratings yet
Cs8083 Notes Mcap
187 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
CS 61C: Great Ideas in Computer Architecture: Parallel Processing - SIMD
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Parallel Processing - SIMD
66 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
unit 1
No ratings yet
unit 1
25 pages
21cs401 CA Unit V
No ratings yet
21cs401 CA Unit V
16 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
25 pages
ACA1
No ratings yet
ACA1
29 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Computer Architecture Flynn's Taxonomy
No ratings yet
Computer Architecture Flynn's Taxonomy
4 pages
Parallel Processing Lecture2
No ratings yet
Parallel Processing Lecture2
62 pages
NOTES
No ratings yet
NOTES
19 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Flynn's Classification - SISD, SIMD, MISD & MIMD
No ratings yet
Flynn's Classification - SISD, SIMD, MISD & MIMD
15 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Fulltext
No ratings yet
Fulltext
29 pages
Parallel Computing Unit 2 - Parallel Computing Architecture
No ratings yet
Parallel Computing Unit 2 - Parallel Computing Architecture
49 pages
Lecture 02
No ratings yet
Lecture 02
20 pages
BCS702 Module 5 Textbook
No ratings yet
BCS702 Module 5 Textbook
48 pages
First Hop Redundancy Protocol: Network Redundancy Protocol
From Everand
First Hop Redundancy Protocol: Network Redundancy Protocol
Mulayam Singh
No ratings yet
UNO-2484G-V2 - DS - Advantech
No ratings yet
UNO-2484G-V2 - DS - Advantech
4 pages
What Is Computer
100% (2)
What Is Computer
7 pages
Ansi A156.28 - 2013
No ratings yet
Ansi A156.28 - 2013
20 pages
Excel 2007 Visual Basic For Application - VBA PDF
No ratings yet
Excel 2007 Visual Basic For Application - VBA PDF
96 pages
Compiler Design 18CSC28
No ratings yet
Compiler Design 18CSC28
2 pages
Can We Be Strangers Again - Shrijeet Shandilya PDF Beard
No ratings yet
Can We Be Strangers Again - Shrijeet Shandilya PDF Beard
10 pages
Get The Entered Value From A Field On POV of Another Field Using FM DYNP
No ratings yet
Get The Entered Value From A Field On POV of Another Field Using FM DYNP
3 pages
SEO Brouchers PDF
No ratings yet
SEO Brouchers PDF
11 pages
Apple Company History
No ratings yet
Apple Company History
2 pages
Airtel Nokia Parameter
No ratings yet
Airtel Nokia Parameter
23 pages
Sec Unit 2
No ratings yet
Sec Unit 2
20 pages
Ryan Project
No ratings yet
Ryan Project
37 pages
5 Scrabble
No ratings yet
5 Scrabble
40 pages
DBMS (Unit (1,2,3,4,5) )
No ratings yet
DBMS (Unit (1,2,3,4,5) )
39 pages
SQL Server Surface Area Configuration Manager
No ratings yet
SQL Server Surface Area Configuration Manager
4 pages
SDK Ingenico Telium Campus
No ratings yet
SDK Ingenico Telium Campus
2 pages
6-Series Service Manual Complete All Chapters (Partnr 160077X - 41)
No ratings yet
6-Series Service Manual Complete All Chapters (Partnr 160077X - 41)
108 pages
The OSI Model and Network Protocols
No ratings yet
The OSI Model and Network Protocols
17 pages
Installing and Upgrading Webmethods Broker
No ratings yet
Installing and Upgrading Webmethods Broker
24 pages
Complete CRM Notes
100% (2)
Complete CRM Notes
41 pages
End Users Agreement
No ratings yet
End Users Agreement
3 pages
WRC Bulletin 107 (537) : Environment (On Page 48)
No ratings yet
WRC Bulletin 107 (537) : Environment (On Page 48)
1 page
Aly Malak CV Final PDF
No ratings yet
Aly Malak CV Final PDF
7 pages
Resume Sample
No ratings yet
Resume Sample
3 pages
Iete Activities Details For Web Publish
No ratings yet
Iete Activities Details For Web Publish
3 pages
Step Changing in Well Test Operation
No ratings yet
Step Changing in Well Test Operation
10 pages
Fundamental UX Rule Chapter 4
No ratings yet
Fundamental UX Rule Chapter 4
31 pages

Basic Parallel Programming Methods

Uploaded by

Basic Parallel Programming Methods

Uploaded by

Real-Time Systems

Lecture Topic – Review of Basic Concurrent/Parallel Programming

Copyright © 2019 University of Colorado

MIMD and SPMD Architecture

DSP VLIW (SIMD) or MIMD

Both are threaded, but

Sharpen with thread grid 1. erast.c

Combine #1, #2, and #3

Co-Processing 4 - Hardest and Highly Effective: Grid programming 128 SPs

Due to Sequential Section

Compared to Parallel Section

…and Due to Final Step

SPMD – GPU or GP-GPU Co-Processor

You might also like