Clusters With GPUs Under Linux and Windows HPC
Clusters With GPUs Under Linux and Windows HPC
Clusters With GPUs Under Linux and Windows HPC
Host Server
Tesla S1070
Linux for GPU clusters
Deploying CUDA on Linux clusters
Several cluster management systems are now CUDA enabled
( Rocks, Platform Computing, Clustervision, Scyld Clusterware)
•Thermal Monitoring:
GPU temperatures, chassis inlet/outlet temperatures
•System Information:
Unit serial number, firmware revision, configuration info
•System Status
System fan states (e.g. failure), GPU faults
Power system fault , cable fault
Exclusive access mode
nvidia-smi can set up access policies for the GPUs:
#nvidia-smi -g 1 -s
Compute-mode rules for GPU=0x1: 0x1
#nvidia-smi -g 0 -s
Compute-mode rules for GPU=0x0: 0x1
Current limitation:
Requires an NVIDIA GPU for the display ( S1070 + GHIC) or
an host system graphic chipset with WDDM driver
What is Windows HPC Server?
• Windows HPC Server consists of:
– A Windows Server x64 OS installation
• An inexpensive SKU called “HPC Edition” can be volume-
licensed for clusters dedicated to HPC applications
– The HPC Pack, which provides services, tools and
runtime environment support for HPC applications
• Management
• Job Scheduling
• Diagnostics
• MPI Stack
Deployment Means…
1. Getting the OS Image on the machines. Options?
– Manual Installation
– 3rd-Party Windows Deployment Tools
• Includes some solutions for mixed Linux/Windows clusters
– PXE Boot from Head Node
2. Configuring the HPC Pack
– Step-by-step wizard for interactive installation
– XML-based configuration for automated, reproducible
deployments
Deployment Process for Network Boot
Overall Cluster Setup
Head node Compute node Head node
Select from
Network
ToDo List Create Images, Templates, Add several built-
Configuration
Drivers in functional
Wizard
and perf tests