0% found this document useful (0 votes)
174 views

Intro - HPC Cluster Computing v2 PDF

This document introduces high performance computing (HPC) and cluster computing. It discusses how HPC is needed to solve problems that require large amounts of computational resources. It then provides an overview of parallel computing and how cluster computing uses multiple interconnected computers to work in parallel. Examples of parallel processing and I/O are also presented.

Uploaded by

kucaisoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views

Intro - HPC Cluster Computing v2 PDF

This document introduces high performance computing (HPC) and cluster computing. It discusses how HPC is needed to solve problems that require large amounts of computational resources. It then provides an overview of parallel computing and how cluster computing uses multiple interconnected computers to work in parallel. Examples of parallel processing and I/O are also presented.

Uploaded by

kucaisoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Pengenalan

Hi h P
High Performance
f C
Computing:
ti
Cluster Computing

Heru Suhartanto,
Fakultas Ilmu Komputer UI,
heru@cs.ui.ac.id

http:// bit.ly/hs-wshop-hpc 1
Outline
KKenapa perlu
l high
hi h performance
f
computing (HPC) : Masalah
membutuhkan sumber daya
komputasi tinggi
 Pengenalan HPC - komputasi parallel
 Cluster Computing

http:// bit.ly/hs-wshop-hpc 2
Resource Hungry
g y Applications
pp
• Solving grand challenge applications using computer modeling,
simulation and analysis : Lihat contoh Deteksi Tumor dan
Perancangan obat insilico, di slide terpisah

Aerospace
Internet &
Life Sciences
co e ce
Ecommerce

http:// bit.ly/hs-wshop-hpc 3
CAD/CAM Digital Biology Military Applications
Contoh proses Parallel

http:// bit.ly/hs-wshop-hpc 4
Contoh proses Parallel

http:// bit.ly/hs-wshop-hpc 5
Contoh proses Parallel

http:// bit.ly/hs-wshop-hpc 6
Parallel I/O

http:// bit.ly/hs-wshop-hpc 7
Parallel I/O

http:// bit.ly/hs-wshop-hpc 8
The Area/research topics
 At Applications/problems
A li ti / bl tto b
be
solved,
 At Cluster
Cl t computingti problems
bl

http:// bit.ly/hs-wshop-hpc 9
http:// bit.ly/hs-wshop-hpc 10
http:// bit.ly/hs-wshop-hpc 11
http:// bit.ly/hs-wshop-hpc 12
http:// bit.ly/hs-wshop-hpc 13
Windows of Opportunities
 Parallel Processing
 Use multiple processors to build MPP/DSM-like systems for
parallel computing
 Network RAM
 Use memory associated with each workstation as aggregate
DRAM cache
 Software RAID
 Redundant array of inexpensive disks
 Use the arrays of workstation disks to provide cheap,
cheap highly
available, & scalable file storage
 Possible to provide parallel I/O support to applications
 Use arrays of workstation disks to provide cheap, highly
available and scalable file storage
available,
 Multipath Communication
 Use multiple networks for parallel data transfer between nodes

http:// bit.ly/hs-wshop-hpc 14
How to Run Applications Faster ?

There are 3 ways


y to improve
p
performance:
Work Harder
Work Smarter
Get Help
Computer Analogy
Usingg faster hardware
Optimized algorithms and techniques used
to solve computational tasks
Multiple computers to solve a particular 15
http:// bit.ly/hs-wshop-hpc
task
Era of Computing

 Rapid
p technical advances
 the recent advances in VLSI technology
 software technology
 OS,
OS PL,
PL development
d l methodologies,
h d l i & tools
l
 grand challenge applications have become
the main driving
g force
 Parallel computing
 one of the best ways to overcome the speed
bottleneck of a single processor
 good price/performance ratio of a small
cluster-based
cluster based parallel computer
http:// bit.ly/hs-wshop-hpc 16
In Summary
 Need more computing power
 Improve the operating speed of processors &
other components
 constrained by the speed of light,
thermodynamic laws, & the high financial costs
for processor fabrication

 Connect multiple processors together &


coordinate their computational efforts
 parallel computers
 allow the sharing of a computational task among
multiple processors
http:// bit.ly/hs-wshop-hpc 17
Technology Trends...

 Performance of PC/Workstations components


has almost reached performance of those
used in supercomputers…
 Microprocessors (50% to 100% per year)
 Networks (Gigabit SANs);
 Operating Systems (Linux,...);
 Programming environment (MPI(MPI,…);
);
 Applications (.edu, .com, .org, .net, .shop, .bank);
 The rate of performance improvements
off commodity
di systems is
i much h rapid
id
compared to specialized systems.

http:// bit.ly/hs-wshop-hpc 18
Technology Trends

http:// bit.ly/hs-wshop-hpc 19
Trend
 [Traditional Usage] Workstations with
Unix for science & industry vs PC-
based machines for administrative
work & work processing
 [[Trend]] A rapid
p convergence
g in
processor performance and kernel-
level functionality of Unix workstations
and PC-based
PC based machines

http:// bit.ly/hs-wshop-hpc 20
Rise and Fall of Computer
Architectures
 Vector Computers (VC) - proprietary system:
 provided the breakthrough needed for the emergence
of computational science, buy they were only a
partial answer.
 Massively
M i l P Parallel
ll l P
Processors (MPP) -proprietary i t
systems:
 high cost and a low performance/price ratio.
 Symmetric Multiprocessors (SMP):
 suffers from scalability
 Distributed Systems:
 difficult to use and hard to extract parallel
performance.
 Clusters - gaining popularity:
 High Performance Computing - Commodity
S
Supercomputingti
 High Availabilityhttp://
Computing - Mission Critical
bit.ly/hs-wshop-hpc 21
Applications
The Dead Supercomputer Society
h
http://www.paralogos.com/DeadSuper/
// l /D dS /
 ACRI  Dana/Ardent/Stellar
 Alliant  Elxsi
 American  ETA Systems
Supercomputer  Evans & Sutherland
 Ametek Computer
p Division
 Applied Dynamics Floating Point Systems
  Meiko
 Astronautics  Galaxy YH-1
 BBN Convex C4600  Myrias
 Goodyear Aerospace
 CDC  Thinking
MPP
 Convex Machines
 Gould NPL
 Cray Computer  Saxpy
 Guiltech
 Cray Research  Scientific
(SGI?Tera)  I t lS
Intel Scientific
i tifi
Computers Computer
 Culler-Harris
 Intl. Parallel Machines Systems (SCS)
 Culler Scientific
 Cydrome  KSR  Soviet
 MasPar Supercomputer
http:// bit.ly/hs-wshop-hpc s 22

 Suprenum
Computer Food Chain: Causing the
demise of specialize systems

•Demise of mainframes, supercomputers, & MPPs


http:// bit.ly/hs-wshop-hpc 23
Towards Clusters

http:// bit.ly/hs-wshop-hpc 24
The promise of supercomputing to the average PC User ?
Towards Commodity Parallel
Computing
 linking together two or more computers to jointly
solve computational problems
 since the early 1990s, an increasing trend to move
away y from expensive
p and specialized
p proprietary
p p y
parallel supercomputers towards clusters of
workstations
 Hard to find money to buy expensive systems
 th
the rapid
id improvement
i t in
i the
th availability
il bilit off
commodity high performance components for
workstations and networks
 Low
Low-cost
cost commodity supercomputing
 from specialized traditional supercomputing platforms
to cheaper, general purpose systems consisting of
g or
loosely coupled components built up from single
multiprocessor PCs or workstations
http:// bit.ly/hs-wshop-hpc 25
History: Clustering of
Computers for Collective
Computing

PDA
Clusters

1960 1980s 1990


http:// bit.ly/hs-wshop-hpc 1995+ 2000+
26
Why PC/WS Clustering Now ?
 Individual PCs/workstations are becoming increasing
powerful
 Commodity networks bandwidth is increasing and
l ten is
latency i de
decreasing
e ing
 PC/Workstation clusters are easier to integrate into
existing networks
 Typical low user utilization of PCs/WSs
 Development tools for PCs/WS are more mature
 PC/WS clusters are a cheap and readily available
 Clusters can be easily grown

http:// bit.ly/hs-wshop-hpc 27
What is Cluster ?
 A cluster is a type of parallel or distributed processing
system, which consists of a collection of interconnected
stand-alone computers cooperatively working together as a
single,
g integrated
g computing
p g resource.
 A node
 a single or multiprocessor system with memory, I/O
facilities, & OS
 generally 2 or more computers (nodes) connected
together
 in a single cabinet
cabinet, or physically separated & connected
via a LAN
 appear as a single system to users and applications
 provide a costcost-effective
effective way to gain features and
benefits
http:// bit.ly/hs-wshop-hpc 28
Cluster Architecture
Parallel Applications
Parallel Applications
P
Parallel
ll l Applications
A li ti
Sequential Applications
Sequential Applications
Sequential Applications Parallel Programming Environment

Cluster Middleware
(Single System Image and Availability Infrastructure)

PC/Workstation PC/Workstation PC/Workstation PC/Workstation

Communications Communications Communications Communications


Software Software Software Software

Network Interface Network Interface Network Interface Network Interface


Hardware Hardware Hardware Hardware

Cluster Interconnection Network/Switch


http:// bit.ly/hs-wshop-hpc 29
Cluster Components

http:// bit.ly/hs-wshop-hpc 30
Prominent Components of
Cluster Computers (I)
Multiple High Performance
Computers
 PCs
 Workstations
 SMPs (CLUMPS)
 Distributed HPC Systems leading to
Grid Computing

http:// bit.ly/hs-wshop-hpc 31
System CPUs
 Processors
 Intel x86 Processors
 Pentium Pro and Pentium Xeon
 AMD x86x86, Cyrix x86,
x86 etc.
etc
 Digital Alpha
 Alpha 21364 processor integrates processing,
memory controller,
ll network
k interface
f into a
single chip
 IBM PowerPC
 Sun SPARC
 SGI MIPS
 HP PA
http:// bit.ly/hs-wshop-hpc 32
System Disk
 Disk and I/O
 Overall improvement in disk access time
has been less than 10% p
per yyear
 Amdahl’s law
 Speed-up
p p obtained by y from faster
processors is limited by the slowest system
component
 Parallel
ll l I/O
/
 Carry out I/O operations in parallel,
supported by parallel file system based on
hardware or software
http:// RAID
bit.ly/hs-wshop-hpc 33
Commodity Components for
Clusters (II): Operating Systems
 Operating Systems
 2 fundamental services for users
 make the computer hardware easier to use
 create a virtual machine that differs markedly from the
reall machine
h
 share hardware resources among users
 Processor - multitasking
 Th
The new concept in
i OS services
i
 support multiple threads of control in a process
itself
 parallelism within a process
 multithreading
 POSIX thread interface is a standard programming environment
 Trend
 Modularity – http://
MS Windows, IBM OS/2
bit.ly/hs-wshop-hpc 34
 Microkernel – provide only essential OS services
 high le el abst action of OS po tabilit
Prominent Components of
Cluster Computers
 State of the art Operating Systems
 Linux (MOSIX, Beowulf, and many more)
 Microsoft NT(Illinois HPVM
HPVM, Cornell Velocity)
 SUN Solaris (Berkeley NOW, C-DAC PARAM)
 IBM AIX ((IBM SP2))
 HP UX (Illinois - PANDA)
 Mach (Microkernel based OS) (CMU)
 Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project)
 OS gluing layers (Berkeley Glunix)
http:// bit.ly/hs-wshop-hpc 35
Commodity Components for
Clusters
 Operating Systems
 Linux
 Unix-like OS
 Runs on cheap x86 platform,
platform yet offers the power
and flexibility of Unix
 Readily available on the Internet and can be
downloaded without cost
 Easy to fix bugs and improve system performance
 Users can develop or fine-tune hardware drivers
which can easily be made available to other users
 Features such as preemptive multitasking,
p g virtual memory,
demand-page y, multiuser,,
multiprocessor support
http:// bit.ly/hs-wshop-hpc 36
Commodity Components for
Clusters
 Operating Systems
 Solaris
S l i
 UNIX-based multithreading and multiuser OS
 support Intel x86 & SPARC-based platforms
 Real
Real-time
time scheduling feature critical for multimedia applications
 Support two kinds of threads
 Light Weight Processes (LWPs)
 User level thread
 Support both
b h BSD and
d severall non-BSD file
f l system
 CacheFS
 AutoClient
 TmpFS: uses main memory to contain a file system
 Proc file system
 Volume file system
 Support distributed computing & is able to store & retrieve
distributed information
 OpenWindows allows application to be run on remote systems
http:// bit.ly/hs-wshop-hpc 37
Commodity Components for
Clusters
 Operating Systems
 Microsoft Windows NT (New Technology)
 Preemptive, multitasking, multiuser, 32-bits OS
 Object
Object-based
based security model and special file system
(NTFS) that allows permissions to be set on a file and
directory basis
 Support
pp multiple
p CPUs and p provide multitasking g using
g
symmetrical multiprocessing
 Support different CPUs and multiprocessor machines
with threads
 Have the network protocols & services integrated with
the base OS
 several built-in networking protocols (IPX/SPX, TCP/IP,
N tBEUI) & API
NetBEUI), APIs (NetBIOS,
(N tBIOS DCE RPC
RPC, Wi
Window
d S
Sockets
k t
(Winsock))
http:// bit.ly/hs-wshop-hpc 38
Prominent Components of
Cluster Computers (III)
 High Performance Networks/Switches
 Ethernet
Eth t (10Mb
(10Mbps),
)
 Fast Ethernet (100Mbps),
 Gigabit Ethernet (1Gbps)
 SCI (Scalable Coherent Interface- MPI- 12µsec
latency)
 ATM (Asynchronous Transfer Mode)
 Myrinet (1.2Gbps)
 QsNet (Quadrics Supercomputing World, 5µsec
latency for MPI messages)
 Digital Memory Channel
 FDDI (fiber distributed data interface)
 InfiniBand
http:// bit.ly/hs-wshop-hpc 39
Prominent Components of
Cluster Computers (IV)
 Fast Communication Protocols
and Services (User Level
Communication):
 Active Messages (Berkeley)
 Fast Messages (Illinois)
 U-net (Cornell)
 XTP (Virginia)
 Virtual Interface Architecture (VIA)

http:// bit.ly/hs-wshop-hpc 40
Cluster Interconnects: Comparison
(created in 2000)
Myrinet QSnet Giganet ServerNet2 SCI Gigabit
Ethernet

Bandwidth 140 – 33MHz 208 ~105 165 ~80 30 - 50


(MBytes/s) 215 – 66 Mhz

MPI 16.5 – 33Nhz 5 ~20 - 40 20.2 6 100 - 200


Latency (µs) 11 – 66 Mhz

List price/port $1.5K $6.5K ~$1.5K $1.5K ~$1.5K ~$1.5K

Hardware Now Now Now Q2‘00 Now Now


Availability

Linux Support Now Now Now Q2‘00 Now Now

Maximum 1000’s 1000’s 1000’s 1000’s 64K 1000’s


#nodes
Protocol Firmware on Firmware Firmware on Implemented in Firmware Implemented
Implementation adapter on adapter adapter hardware on adapter in hardware

VIA support Soon None NT/Linux Done in hardware Software NT/Linux


TCP/IP, VIA

MPI support 3rd party Quadrics/ 3rd Party Compaq/3rd party 3rd Party MPICH – TCP/IP
Compaq
http:// bit.ly/hs-wshop-hpc 41
Commodity Components for
Clusters
 Cluster Interconnects
 Communicate over high-speed networks using a
standard networking protocol such as TCP/IP or a low-
level p
protocol such as AM
 Standard Ethernet
 10 Mbps
 cheap, easy way to provide file and printer sharing
 bandwidth & latency are not balanced with the
computational power
 Ethernet, Fast Ethernet, and Gigabit Ethernet
 Fast Ethernet – 100 Mbps
 Gigabit Ethernet
 preserve Ethernet
Ethernet’s s simplicity
 deliver a very high bandwidth to aggregate multiple Fast
Ethernet segments
http:// bit.ly/hs-wshop-hpc 42
Commodity Components for
Clusters
 Cluster Interconnects
 Myrinet
 1.28 Gbps full duplex interconnection network
 Use low latency cut-through routing switches, which is able
t offer
to ff fault
f lt tolerance
t l by
b automatic
t ti mapping
i off the
th
network configuration
 Support both Linux & NT
 Advantages
 Very low latency (5s, one-way point-to-point)
 Very high throughput
 Programmable onon-board
board processor for greater flexibility
 Disadvantages
 Expensive: $1500 per host
 Complicated scaling: switches with more than 16 ports are
unavailable
http:// bit.ly/hs-wshop-hpc 43
Prominent Components of
Cluster Computers (V)
 Cluster Middleware
 Single System Image (SSI)
 System Availability (SA) Infrastructure
 Hardware
 DEC Memory Channel, DSM (Alewife, DASH), SMP
Techniques
 Operating
O ti S
System
t K
Kernel/Gluing
l/Gl i L
Layers
 Solaris MC, Unixware, GLUnix, MOSIX
 Applications and Subsystems
 Applications (system management and electronic forms)
 Runtime systems (software DSM, PFS etc.)
 Resource management and scheduling (RMS) software
 SGE (Sun Grid Engine), LSF, PBS, Libra: Economy Cluster Scheduler,
NQS, etc. http:// bit.ly/hs-wshop-hpc 44
Advanced Network Services/
Communication SW
 Communication infrastructure support protocol for
 Bulk data transport
Bulk-data
 Streaming data
 Group communications
 Communication service provide cluster with important QoS
parameters
 Latency
 Bandwidth
 Reliability
 F l
Fault-tolerance
l
 Jitter control
 Network service are designed as hierarchical stack of protocols
y low-level communication API,, provide
with relatively p means to
implement wide range of communication methodologies
 RPC
 DSM
 Stream-based
Stream based and message passing interface (e.g., MPI, PVM)

http:// bit.ly/hs-wshop-hpc 45
Prominent Components of
Cluster Computers (VI)
 Parallel Programming Environments and Tools
 Threads (PCs, SMPs, NOW..)
 POSIX Threads
 Java Threads
 MPI (Message Passing Interface)
 Linux, NT, on many Supercomputers
 PVM (Parallel Virtual Machine)
 Parametric Programming
g g
 Software DSMs (Shmem)
 Compilers
 C/C++/Java
 P
Parallel
ll l programming
i with
ith C++ (MIT P
Press b
book)
k)
 RAD (rapid application development tools)
 GUI based tools for PP modeling
 Debuggers
 Performance Analysis Tools
http:// bit.ly/hs-wshop-hpc 46
 Visualization Tools
Prominent Components of
Cluster Computers (VII)
 Applications
 Sequential
 Parallel
P ll l / Di
Distributed
t ib t d (Cl
(Cluster-aware
t
app.)
 Grand
G d Ch
Challenging
ll i applications
li ti
 Weather Forecasting
 Quantum Chemistry
 Molecular Biology Modeling
 Engineering Analysis (CAD/CAM)
 ……………….
 PDBs, webhttp://
servers,data-mining
bit.ly/hs-wshop-hpc 47
Key Operational Benefits of
Clustering
 High Performance
 Expandability and Scalability
 High Throughput
 High Availability

http:// bit.ly/hs-wshop-hpc 48
Clusters Classification (I)

A li ti
Application T
Targett
 High Performance (HP)
Clusters
Grand Challenging Applications
 High Availability (HA) Clusters
Mission Critical applications

http:// bit.ly/hs-wshop-hpc 49
Clusters Classification (II)

N d O
Node Ownership
hi
 Dedicated Clusters
 Non-dedicated clusters
ti
Adaptive
Ad parallel
ll l computing
ti
Communal multiprocessing

http:// bit.ly/hs-wshop-hpc 50
Clusters Classification (III)

N d H
Node Hardware
d
 Clusters of PCs (CoPs)
Piles of PCs (PoPs)
 Clusters
Cl ste s of Wo
Workstations
kstations
(COWs)
 Clusters of SMPs (CLUMPs)

http:// bit.ly/hs-wshop-hpc 51
Clusters Classification (IV)
 Node Operating System
 Linux Clusters (e.g., Beowulf)
 Solaris Clusters (e.g.,
(e g Berkeley
NOW)
 NT Clusters (e.g., HPVM)
 AIX Clusters (e.g., IBM SP2)
 SCO/Compaq Clusters (Unixware)
 Digital VMS Clusters
 HP
HP-UX
UX clusters
 Microsoft Wolfpack clusters
http:// bit.ly/hs-wshop-hpc 52
Clusters Classification (V)

N d C
Node Configuration
fi ti
 Homogeneous Clusters
 All nodes will have similar
architectures and run the same OSs
 Heterogeneous Clusters
 All nodes will have different
architectures and run different OSs

http:// bit.ly/hs-wshop-hpc 53
Clusters Classification (VI)
 Levels of Clustering
 Group Clusters (#nodes: 2-99)
 Nodes are connected by SAN like Myrinet
 Departmental Clusters (#nodes: 10s to 100s)
 Organizational Clusters (#nodes: many 100s)
 National Metacomputers (WAN/Internet-based)
 International Metacomputers (Internet-based,
#nodes: 1000s to many millions)
 Grid Computing
p g
 Web-based Computing
 Peer-to-Peer Computing

http:// bit.ly/hs-wshop-hpc 54
Cluster Programming

http:// bit.ly/hs-wshop-hpc 55
Levels of Parallelism
Code-Granularity
Code Item
PVM/MPI Task ii-ll Task i Task i+1 Large grain
(task level)
Program

func1 ( ) func2 ( ) func3 ( )


{ { {
Medium grain
Threads ....
....
....
....
....
.... (control level)
} } } Function (thread)

Fine grain
Compilers
a ( 0 ) =.. a ( 1 )=.. a ( 2 )=..
b ( 0 ) =.. b ( 1 )=.. b ( 2 )=.. (data level)
Loop (Compiler)

Very fine
f grain
CPU + x Load
(multiple issue)
http:// bit.ly/hs-wshop-hpc 56
With hardware
Cluster Programming
Environments
 Shared Memory Based
 DSM
 Threads/OpenMP (enabled for clusters)
 Java threads ((IBM cJVM))
 Message Passing Based
 PVM (PVM)
 MPI (MPI)
 Parametric Computations
 Nimrod-G and Gridbus Data Grid Broker
 Automatic Parallelising Compilers
 Parallel Libraries & Computational Kernels
(
(e.g., N
NetSolve)
tS l )
http:// bit.ly/hs-wshop-hpc 57
Programming Environments and
Tools (I)
 Threads (PCs,
(PCs SMPs,
SMPs NOW..)
NOW )
 In multiprocessor systems
 Used to simultaneously y utilize all the available
processors
 In uniprocessor systems
 Used to utilize the system resources effectively
 Multithreaded applications offer quicker response to
user input and run faster
 Potentially portable, as there exists an IEEE
standard for POSIX threads interface (pthreads)
 Extensively used in developing both application and
system software
http:// bit.ly/hs-wshop-hpc 58
Programming Environments and
Tools (II)
 Message Passing Systems (MPI and PVM)
 Allow efficient parallel programs to be written for
distributed memory systems
 2 most popular high-level message-passing systems –
PVM & MPI
 PVM
 both an environment & a message-passing library
 MPI
 a message passing specification, designed to be
standard for distributed memory y parallel
p
computing using explicit message passing
 attempt to establish a practical, portable, efficient,
& flexible standard for message passing
 generally, application developers prefer MPI, as it
is fast becoming
http:// the de facto standard for message
bit.ly/hs-wshop-hpc 59
passing
Programming Environments and
Tools (III)
 Distributed Shared Memory (DSM) Systems
 Message-passing
 the most efficient, widely used, programming paradigm on distributed
memory system
 complex & difficult to program
 Sh
Shared
d memory systems
t
 offer a simple and general programming model
 but suffer from scalability
 DSM on distributed memory
y system
y
 alternative cost-effective solution
 Software DSM
 Usually built as a separate layer on top of the comm interface
 Take full advantage of the application characteristics: virtual pages, objects,
& language types are units of sharing
 TreadMarks, Linda
 Hardware DSM
 Better performance, no burden on user & SW layers, fine granularity of
sharing, extensions of the cache coherence scheme, & increased HW
complexity
 DASH, Merlin http:// bit.ly/hs-wshop-hpc 60
Programming Environments and
Tools (IV)
 Parallel Debuggers and Profilers
 Debuggers
 Very limited
 HPDF (High Performance Debugging Forum) as Parallel Tools
Consortium project in 1996
 Developed a HPD version specification, which defines the
functionality, semantics, and syntax for a commercial-line
parallel debugger
 TotalView
 A commercial product from Dolphin Interconnect Solutions
 The only widely available GUI-based parallel debugger
that supports multiple HPC platforms
 Only used in homogeneous environments, where each process of
the parallel application being debugged must be running under
the same version of the OS

http:// bit.ly/hs-wshop-hpc 61
Functionality of Parallel
Debugger
 Managing multiple processes and multiple
th
threads
d within
ithi a process
 Displaying each process in its own window
 Displaying source code, stack trace, and stack
frame for one or more processes
 Diving into objects, subroutines, and functions
 Setting both source-level
source level and machine-level machine level
breakpoints
 Sharing breakpoints between groups of
processes
 Defining watch and evaluation points
 Displaying arrays and its slices
 Manipulating code variable and constants
http:// bit.ly/hs-wshop-hpc 62
Programming Environments and
Tools (V)
 Performance Analysis Tools
 Help a p
programmer
og amme to understand
nde stand the performance
pe fo mance
characteristics of an application
 Analyze & locate parts of an application that exhibit
poor performance and create program bottlenecks
 Major components
 A means of inserting instrumentation calls to the
performance monitoring routines into the user
user’s
s applications
 A run-time performance library that consists of a set of
monitoring routines
 A set of tools for processing and displaying the performance
d t
data
 Issue with performance monitoring tools
 Intrusiveness of the tracing calls and their impact on the
application performance
 Instrumentation affects the performance characteristics of
the parallel application and thus provides a false view of 63
http:// bit.ly/hs-wshop-hpc its
performance behavior
Performance Analysis
and Visualization Tools
Tool Supports URL
AIMS Instrumentation, monitoring http://science.nas.nasa.gov/Software/AIMS
library, analysis

MPE Logging library and snapshot http://www.mcs.anl.gov/mpi/mpich


performance
f visualization
i li ti

Pablo Monitoring library and analysis http://www-pablo.cs.uiuc.edu/Projects/Pablo/

Paradyn
a ady Dynamic instrumentation http://www.cs.wisc.edu/paradyn
running analysis

SvPablo Integrated instrumentor, http://www-pablo.cs.uiuc.edu/Projects/Pablo/


monitoring library and analysis

V
Vampir
i Monitoring
M it i library
lib performance
f htt //
http://www.pallas.de/pages/vampir.htm
ll d / / i ht
visualization

Dimenma Performance prediction for http://www.pallas.com/pages/dimemas.htm


message passing programs
s
Paraver Program visualization and http://www.cepba.upc.es/paraver
http:// bit.ly/hs-wshop-hpc 64
analysis
Programming Environments and
Tools (VI)
 Cluster Administration Tools
 Berkeley NOW
 Gather & store data in a relational DB
 Use Java applet to allow users to monitor a system
 SMILE (Scalable Multicomputer Implementation using
Low-cost Equipment)
 Called K-CAP
 Consist of compute nodes, a management node, & a
client that can control and monitor the cluster
 K-CAP uses a Java applet to connect to the management
node through a predefined URL address in the cluster
 PARMON
 A comprehensive environment for monitoring large
clusters
 Use client-server techniques to provide transparent
access to all
a nodes
odes to be monitored
o to ed
 parmon-server & parmon-client
http:// bit.ly/hs-wshop-hpc 65
Cluster Applications

http:// bit.ly/hs-wshop-hpc 66
Cluster Applications
 Numerous Scientific & engineering
g g Apps.
pp
 Business Applications:
 E-commerce Applications (Amazon,
eBay);
y)
 Database Applications (Oracle on
clusters).
 Internet Applications:
 ASPs (Application Service Providers);
 Computing Portals;
 E-commerce and E-business.
 Mission Critical Applications:
 command control systems, banks,
nuclear reactor control, star-wars, and
h dli
handling lif
life threatening
h i situations.
i i
http:// bit.ly/hs-wshop-hpc 67
Some Cluster Systems:
Comparison
Project Platform Communications OS Other
Beowulf PCs Multiple Ethernet with Linux and MPI/PVM.
TCP/IP Grendel Sockets and HPF

Berkeley
B k l Solaris-based
S l i b d Myrinet
M i andd Active
A i Solaris
S l i + AM, PVM
AM PVM, MPI
MPI,
Now PCs and Messages GLUnix + xFS HPF, Split-C
workstations
HPVM PCs Myrinet with Fast NT or Linux Java-fronted,
Java fronted
Messages connection and FM, Sockets,
global Global Arrays,
resource SHEMEM and
manager + MPI
LSF
Solaris MC Solaris-based Solaris-supported Solaris + C++ and
PC and
PCs d Gl b li i
Globalization CORBA
workstations layer
http:// bit.ly/hs-wshop-hpc 68
Cluster of SMPs (CLUMPS)
 Clusters of multiprocessors (CLUMPS)
 To be the supercomputers of the future
 Multiple
u t p e SMPs
S s with
t several
se e a network
et o
interfaces can be connected using high
performance networks
 2 advantages
 Benefit from the high performance, easy-to-
use-and program SMP systems with a small
number
b off CPUs
C
 Clusters can be set up with moderate effort,
resulting in easier administration and better
support for data locality inside a node
http:// bit.ly/hs-wshop-hpc 69
Many types of Clusters
 High Performance Clusters
 Linux Cluster; 1000 nodes; parallel programs; MPI
 Load-leveling Clusters
 Move p processes around to borrow cycles
y (eg.
( g Mosix))
 Web-Service Clusters
 LVS/Piranah; load-level tcp connections; replicate
data
 Storage Clusters
l
 GFS; parallel filesystems; same view of data from
each node
 Database Clusters
 Oracle Parallel Server;
 High Availability Clusters
 ServiceGuard,
ServiceGuard Lifekeeper
Lifekeeper, Failsafe,
Failsafe heartbeat,
heartbeat
failover clusters
http:// bit.ly/hs-wshop-hpc 70
Cluster Design Issues

• Enhanced
E h dPPerformance
f (performance
( f @ low
l cost)
t)
• Enhanced Availability (failure management)
• Single System Image (look-and-feel of one system)
• Size Scalability (physical & application)
• Fast Communication (networks & protocols)
• Load Balancing (CPU, Net, Memory, Disk)
• Security and Encryption (clusters of clusters)
• Distributed Environment (Social issues)
• Manageability (admin
(admin. And control)
• Programmability (simple API if required)
• Applicability (cluster-aware and non-aware app.)

http:// bit.ly/hs-wshop-hpc 71
Summary: Cluster Advantage
 Price/performance ratio is low when
compared with a dedicated parallel
supercomputer.
 Incremental growth that often matches
with the demand patterns.
 The provision of a multipurpose system
 Scientific, commercial, Internet applications
 Have become mainstream enterprise
computing
p g systems:
y
 In 2003 List of Top 500 Supercomputers, over
50% of them are based on clusters and many of
them are deployed in industries.

http:// bit.ly/hs-wshop-hpc 72
References
 Hi
HighhPPerformance
f Cl
Cluster
t C Computing:
ti
Architectures and Systems, Book
Editor: Rajkumar Buyya,
Buyya Slides: Hai
Jin and Raj Buyya
 http://www.buyya.com/cluster
http://www buyya com/cluster
 Bahan kuliah Topik Dalam Komputasi
Paralel Fasilkom UI
Paralel,

http:// bit.ly/hs-wshop-hpc 73

You might also like