0% found this document useful (0 votes)

10 views

HPC_introduction_Lecture_3

Uploaded by

Shehzad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

HPC_introduction_Lecture_3

Uploaded by

Shehzad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Independent University, Bangladesh

Department of Computer Science and Engineering

Course Title: Introduction to High Performance Computing
Course Code: Autumn-2024-CSC471

SECTION 1: (T) 06:30 PM --- 9:30 PM

Presented by
Dr. Rubaiyat Islam
Adjunct Faculty, IUB.
Omdena Bangladesh Chapter Lead
Crypto-economist Consultant
Sifchain Finance, USA.
WHAT IS APPLICATION
TIMING?

• Analysis of a program’s behavior using information gathered

as the program runs

• Why do it?
• Good way to improve efficiency of scripts
• Identify performance problems
• Many times required for allocation requests
HOW WOULD YOU DO IT?

• Measure time of execution of an entire program or simply

a code snippet
• Loops
• Timing functions within programs
• Python, Fortran, C++, R have functions that allow you to
measure the execution time of small code snippets
• Can also do this with the Linux “time” command
• Can make changes to code to improve efficiency
• Or…it’s just informational
THE LINUX TIME UTILITY

• The first place to start when profiling your program

time mpirun –np 4 ./prog.mpi

real 0m17.801s Wall clock time

user 0m58.125s Threads x Wall clock
time
sys 0m0.081s System overhead
FINE-GRAINED TIMING

• Often useful to time portions of a program

• Good idea when developing your own code
• Tough when its 3rd party software
• Useful functions:
• Fortran: system_clock
• C++ clock()
• Python: time.time()
• R sys.time()
SERIAL VS. PARALLEL
PROCESSING

• Serial processing
• A problem is broken into a set of discrete instructions
• These instructions are carried out sequentially on a single
processor
• Only one instruction is executed at a tifme

• Parallel processing
• Idea where many instructions are carried out simultaneously
across a computing system
• Can divide a large problem up into many smaller problems

6
WHY PARALLELIZE?

• Single core too slow for solving the problem in a

“reasonable” time
• “Reasonable” time: overnight, over lunch, duration of a PhD
thesis

• Memory requirements
• Larger problem
• More physics
• More particles
• Larger images
• Larger neural networks
7
BASIC COMPUTER
ARCHITECTURE

• Old computers – one unit to • New computers have 4 or

execute instructions more cpu cores

8
SERIAL PROCESSING –
THOUGHT EXPERIMENT

• Let’s say you own a lawn service company

• You have one hundred clients who each want their lawn
mowed, with patterns
• Each of them want their lawn mowed by the end of the week
• A serial process would be for you to mow all one hundred
laws yourself
• You cannot mow lawn 2 until you mow lawn 1, etc
• Let’s say doing this takes you the full 7 days to complete,
working 16 hour days

9
SERIAL PROCESSING

• Instructions are executed on

one core
• The other cores sit idle
• If a task is running, Task 2 waits
for Task 1 to complete, etc.
• Wasting resources
• Want to instead parallelize and
use all cores

10
SERIAL VS. PARALLEL
PROCESSING
1
2

PARALLEL PROCESSING –
THOUGHT EXPERIMENT

• Let’s say that you decide that 100 lawns is too many for one
person to mow in a week
• Or you want to finish it faster
• Therefore you hire one additional person to help you
• How long (in theory) should it take you to finish the lawns?
• Either 3.5 days working 16 hours each day, or 7 days working 8 hour
days
• You could accomplish this either by both working on one lawn
together or each of you working on a different lawn at the
same time (more on this later)
1
3

PARALLEL PROCESSING –
THOUGHT EXPERIMENT

• Similarly, you could hire three more people

• Now five total
• How long should it take you to finish?
• In theory, five times faster
• However, it doesn’t actually work out this way. Why?
• Overhead
• Communication
• Who is mowing which lawn?
• If you split a lawn, who mows which parts?
• How do you make sure the patterns match up?
1
4

PARALLEL PROCESSING – THOUGHT

EXP ERIMENT (CONT.)

• However, it doesn’t actually work out this way. Why?

• Resource contention
• Fights over who gets to use the best lawn mower
• So maybe instead of five times as fast its four times as fast
• Still faster
• More people?
• Too many people slows down the process too much to make it
worthwhile
• Diminishing return
• 100 might be too many
1
5

PARALLEL OVERHEAD

• Should you convert your serial code to parallel?

• Usually do it to speed up
• But need to consider things like overhead
• Overhead because of
• Startup time
• Synchronizations
• Communication
• Overhead by libraries, compilers
• Termination time

https://computing.llnl.gov/tutorials/parallel_comp/#ModelsShared
1
6

PROGRAMMING TO USE
PARALLELISM

• Parallelism across
processors/threads - OpenMP

• Parallelism across multiple

nodes - MPI

www.scan.co.uk
PARALLEL PROCESSING MUSTS
AND TRICKS

• Need to be able to break the problem up into parts that

can work independently of each other
• Can’t have the results from one CPU depend on another at
each time step

• Do loops are a great place to start looking for bottlenecks

in your code
1
8

MEMORY MODELS

• There are three common kinds of parallel memory models

• Shared
• Distributed
• Hybrid
SHARED MEMORY MODEL
2
0

SHARED MEMORY MODEL

• All cores share the same pool of memory

• HPC Architecture – we talked about the memory available on one node
• Any memory changes seen by all processors
2
1

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• When all the workers are working on one lawn, they are sharing the memory
• Every “core” is impacted by changes to “memory”
2
2

BENEFITS AND DRAWBACK

• Benefit:
• Data sharing is fast

• Drawback:
• Adding more processors may lead to performance issues when accessing the same
shared memory resource (memory contention)
DISTRIBUTED MEMORY MODEL
2
4

DISTRIBUTED MEMORY MODEL

• In a distributed memory model, each core has its own memory
• Processors share memory only through a network connection and/or
communication protocol ( e.g., MPI )
• Changes to local memory associated with processor do not have an impact on
other processors
• Remote-memory access must be explicitly managed by the programmer
2
5

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• When each worker is working on a different lawn, it is distributed memory
2
6

BENEFITS AND DRAWBACKS

• Biggest benefit is scalability

• Adding more processors doesn’t result in resource contention as far as memory is
concerned

• Biggest Drawback
• Can be tedious to program for distributed memory models
• All data relocation must be programmed by hand
HYBRID MEMORY MODEL
2
8

HYBRID MEMORY MODEL

• As the name implies, the hybrid memory model is a combination of the shared
and distributed memory models
• Most large and fast clusters today admit a hybrid-memory model
• A certain number of cores share the memory on one node, but are connected to
the cores sharing memory on other nodes through a network
2
9

THOUGHT EXPERIMENT

• Let’s go back to our lawn mowing example

• From the serial vs. parallel processing
• In this example, the lawns are the memory
• The workers are the cores
• The idea that you have several workers working on one lawn
• Or, better, several workers working on sections of a lawn, and have to
communicate to make it work
• Patterns
BENEFITS AND DRAWBACKS

• Benefit:
• Scalability

• Drawback
• Must know how to program communication between
nodes (e.g., MPI)
DATA AND TASK PARALLELISM

• Earlier discussed data parallel memory methods

• One of them was distributed memory, wherein different
memory pools were accessed by different cores on a single
node
• Data and task parallelism are a similar concept
• Data parallelism
• Distribute the data across processors
• Task parallelism
• Distribute the compute tasks across processors
DATA PARALLELISM

• Different parts of a dataset are distributed across nodes

array1=a b c d
NODE 1 NODE 2
a b c d
TASK PARALLELISM

• Each processor executes a different task on the same

dataset
• Tasks (code, instructions) are spread out among the cores
• Might be same instructions/code or different

• Distributed programming
• Example: Calculating wind speed from vector components
across a geographic area. Divide vector calculation among
processors
DATA PARALLELISM - SIMD

• Two types of data parallelism we’ll discuss here

• SIMD – Single Instruction, Multiple Data
• SPMD – Single Program, Multiple Data
• SIMD
• Carry out the same instruction simultaneously multiple times across
different elements of a dataset
• Vector operation
• Addition, subtraction, multiplication, division
• Have to prepare your data to be vectorized
VECTORIZATION

Non-vectorized
a=rand(1,4)
b=rand(1,4)

• Simply put, performing multiple math for i=1:length(a)

operations c(i)=a(i)+b(i)
end

Vectorized
a=rand(1,4)
b=rand(1,4)

c=a+b

Python, R, etc.
Compiled languages – compiler can handle it
DATA PARALLELISM - SPMD

• SPMD
• Carry out the same program multiple times on different elements of a dataset
• Calculate the wind direction from wind components

a1 a2 b1 b2

Program

c1 c2
WHY DO THIS?

• Cleaner code
• Faster execution time
• Eliminating loops!
• Usually not too challenging
• Many languages have functions that make this easy to perform
3
8

HIGH THROUGHPUT COMPUTING

• Thus far: High Performance Computing (HPC)

• Typical HPC: employ multiple processors to
• Solve a problem faster
• Solve larger problems
• Today: High Throughput Computing (HTC)
• Typical HTC:
• Multiple small jobs spread across many processors
3
9

HIGH THROUGHPUT COMPUTING

• HTC useful when have many small jobs that require little computational power
or memory
• Jobs are typically serial, and not parallel
• HTC advantage:
• Small serial jobs can fill in the “gaps” left by large parallel jobs
• E.g., Open Science Grid
• Effectively parallel: batch of jobs completes faster when spread across multiple cores
• Example: Image analysis
4
0

ADVANTAGES AND DISADVANTAGES

• Advantages
• Simplicity
• Much easier to match one task to one CPU rather than many at
once
• Doesn’t require knowledge of parallelization for programmer
• Disadvantage
• Your HPC center might not be set up ideally for HTC
• Might not allow for node sharing
• No batch submission system to manage multiple small jobs
• Possibly requires heavy scripting to manage workflow
4
1

HTC MECHANICS

• No real tricks
• Break down your problem and then submit a lot of smaller jobs
• For example, if you are analyzing 1 million images, rather than submitting
one job to analyze all 1 million images, submit one thousand jobs that
analyze 1000 images each
• The resource manager (i.e., Slurm) should take care of the rest

• Is HTC appropriate? Problem dependent!

• Is the serial execution time reasonable?
• Can the problem fit into one core’s worth of memory?
THANK YOU

Parallel Computing
No ratings yet
Parallel Computing
57 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Khaitan PSERC Webinar HPC Mar 2013 Slides
No ratings yet
Khaitan PSERC Webinar HPC Mar 2013 Slides
52 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Intro To Parallel Computing
No ratings yet
Intro To Parallel Computing
127 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Concurrent Programming With Threads: Rajkumar Buyya
No ratings yet
Concurrent Programming With Threads: Rajkumar Buyya
168 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
p1
No ratings yet
p1
30 pages
Parallel Computation Lecture Notes
No ratings yet
Parallel Computation Lecture Notes
44 pages
P 1
No ratings yet
P 1
44 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Multi Threading
No ratings yet
Multi Threading
168 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
PC 1
No ratings yet
PC 1
53 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
HPC
No ratings yet
HPC
8 pages
01_Lecture Intro to HPC
No ratings yet
01_Lecture Intro to HPC
62 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Parallel Programming
No ratings yet
Parallel Programming
5 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
CC_UNIT 1
No ratings yet
CC_UNIT 1
29 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
HPCfirstlecture
No ratings yet
HPCfirstlecture
4 pages
1.Introduction
No ratings yet
1.Introduction
65 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
Parallel Computing I
No ratings yet
Parallel Computing I
52 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
No ratings yet
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
8 pages
23553
No ratings yet
23553
56 pages
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
From Everand
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
Arnaud Weil
No ratings yet
Learn Kubernetes - Container orchestration using Docker: Learn Collection
From Everand
Learn Kubernetes - Container orchestration using Docker: Learn Collection
Arnaud Weil
4/5 (1)
Rust Essentials: Master the Language of Safe Systems Programming
From Everand
Rust Essentials: Master the Language of Safe Systems Programming
Tyler Hayes
No ratings yet
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Processes: Exercises
No ratings yet
Processes: Exercises
4 pages
Immersive Portals f3 Log
No ratings yet
Immersive Portals f3 Log
123 pages
Parallel Algorithms
No ratings yet
Parallel Algorithms
21 pages
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
No ratings yet
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
40 pages
Race Condition
No ratings yet
Race Condition
3 pages
SIMD
No ratings yet
SIMD
10 pages
CS2354 Advanced Computer Architecture Anna University Question Paper For ME
No ratings yet
CS2354 Advanced Computer Architecture Anna University Question Paper For ME
2 pages
Mila Linearizability
No ratings yet
Mila Linearizability
60 pages
Operating System Cpe343 Sessional 1
No ratings yet
Operating System Cpe343 Sessional 1
2 pages
ES 2023 L6 Embedded System Interfacing RTOS
No ratings yet
ES 2023 L6 Embedded System Interfacing RTOS
13 pages
OS Course Outline
No ratings yet
OS Course Outline
5 pages
IPT Chapter 1
No ratings yet
IPT Chapter 1
11 pages
04 Quiz 1 - ARG
No ratings yet
04 Quiz 1 - ARG
2 pages
Spark
No ratings yet
Spark
9 pages
For FinalExam
No ratings yet
For FinalExam
65 pages
Lect11 12 Cuda Threads
No ratings yet
Lect11 12 Cuda Threads
25 pages
Cs2257 Operating Systems Lab Manual
No ratings yet
Cs2257 Operating Systems Lab Manual
43 pages
BCS303 Module-3 Notes
No ratings yet
BCS303 Module-3 Notes
40 pages
Multithreading in Java
No ratings yet
Multithreading in Java
14 pages
CCT Unit - 1
No ratings yet
CCT Unit - 1
26 pages
CH 5 Process Scheduling
No ratings yet
CH 5 Process Scheduling
100 pages
Data-Centric Consistency Models: Presented by Saadia Jehangir
100% (2)
Data-Centric Consistency Models: Presented by Saadia Jehangir
31 pages
Chap 5 Process Synchronization
No ratings yet
Chap 5 Process Synchronization
37 pages
Concurrency Control in Distributed Database Systems
No ratings yet
Concurrency Control in Distributed Database Systems
5 pages
ACA Important Topic
No ratings yet
ACA Important Topic
2 pages
2024-12-03_00-50-56.7066_-0500-717f4236b7e6a4231c8a0f95424f1cf0d1edbac7
No ratings yet
2024-12-03_00-50-56.7066_-0500-717f4236b7e6a4231c8a0f95424f1cf0d1edbac7
12 pages
ACA Assignment
No ratings yet
ACA Assignment
4 pages
Mca 2 Sem Operating Systems Kca203 2023
No ratings yet
Mca 2 Sem Operating Systems Kca203 2023
3 pages
System Calls, Process, States, Scheduling, Types
No ratings yet
System Calls, Process, States, Scheduling, Types
7 pages