HPC_introduction_Lecture_3
HPC_introduction_Lecture_3
Presented by
Dr. Rubaiyat Islam
Adjunct Faculty, IUB.
Omdena Bangladesh Chapter Lead
Crypto-economist Consultant
Sifchain Finance, USA.
WHAT IS APPLICATION
TIMING?
• Why do it?
• Good way to improve efficiency of scripts
• Identify performance problems
• Many times required for allocation requests
HOW WOULD YOU DO IT?
• Serial processing
• A problem is broken into a set of discrete instructions
• These instructions are carried out sequentially on a single
processor
• Only one instruction is executed at a tifme
• Parallel processing
• Idea where many instructions are carried out simultaneously
across a computing system
• Can divide a large problem up into many smaller problems
6
WHY PARALLELIZE?
• Memory requirements
• Larger problem
• More physics
• More particles
• Larger images
• Larger neural networks
7
BASIC COMPUTER
ARCHITECTURE
8
SERIAL PROCESSING –
THOUGHT EXPERIMENT
9
SERIAL PROCESSING
10
SERIAL VS. PARALLEL
PROCESSING
1
2
PARALLEL PROCESSING –
THOUGHT EXPERIMENT
• Let’s say that you decide that 100 lawns is too many for one
person to mow in a week
• Or you want to finish it faster
• Therefore you hire one additional person to help you
• How long (in theory) should it take you to finish the lawns?
• Either 3.5 days working 16 hours each day, or 7 days working 8 hour
days
• You could accomplish this either by both working on one lawn
together or each of you working on a different lawn at the
same time (more on this later)
1
3
PARALLEL PROCESSING –
THOUGHT EXPERIMENT
PARALLEL OVERHEAD
https://computing.llnl.gov/tutorials/parallel_comp/#ModelsShared
1
6
PROGRAMMING TO USE
PARALLELISM
• Parallelism across
processors/threads - OpenMP
www.scan.co.uk
PARALLEL PROCESSING MUSTS
AND TRICKS
MEMORY MODELS
THOUGHT EXPERIMENT
• Benefit:
• Data sharing is fast
• Drawback:
• Adding more processors may lead to performance issues when accessing the same
shared memory resource (memory contention)
DISTRIBUTED MEMORY MODEL
2
4
THOUGHT EXPERIMENT
• Biggest Drawback
• Can be tedious to program for distributed memory models
• All data relocation must be programmed by hand
HYBRID MEMORY MODEL
2
8
• As the name implies, the hybrid memory model is a combination of the shared
and distributed memory models
• Most large and fast clusters today admit a hybrid-memory model
• A certain number of cores share the memory on one node, but are connected to
the cores sharing memory on other nodes through a network
2
9
THOUGHT EXPERIMENT
• Benefit:
• Scalability
• Drawback
• Must know how to program communication between
nodes (e.g., MPI)
DATA AND TASK PARALLELISM
array1=a b c d
NODE 1 NODE 2
a b c d
TASK PARALLELISM
• Distributed programming
• Example: Calculating wind speed from vector components
across a geographic area. Divide vector calculation among
processors
DATA PARALLELISM - SIMD
Non-vectorized
a=rand(1,4)
b=rand(1,4)
Vectorized
a=rand(1,4)
b=rand(1,4)
c=a+b
Python, R, etc.
Compiled languages – compiler can handle it
DATA PARALLELISM - SPMD
• SPMD
• Carry out the same program multiple times on different elements of a dataset
• Calculate the wind direction from wind components
a1 a2 b1 b2
Program
c1 c2
WHY DO THIS?
• Cleaner code
• Faster execution time
• Eliminating loops!
• Usually not too challenging
• Many languages have functions that make this easy to perform
3
8
• HTC useful when have many small jobs that require little computational power
or memory
• Jobs are typically serial, and not parallel
• HTC advantage:
• Small serial jobs can fill in the “gaps” left by large parallel jobs
• E.g., Open Science Grid
• Effectively parallel: batch of jobs completes faster when spread across multiple cores
• Example: Image analysis
4
0
• Advantages
• Simplicity
• Much easier to match one task to one CPU rather than many at
once
• Doesn’t require knowledge of parallelization for programmer
• Disadvantage
• Your HPC center might not be set up ideally for HTC
• Might not allow for node sharing
• No batch submission system to manage multiple small jobs
• Possibly requires heavy scripting to manage workflow
4
1
HTC MECHANICS
• No real tricks
• Break down your problem and then submit a lot of smaller jobs
• For example, if you are analyzing 1 million images, rather than submitting
one job to analyze all 1 million images, submit one thousand jobs that
analyze 1000 images each
• The resource manager (i.e., Slurm) should take care of the rest
42