Hera Quickstart

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

HERA QUICK START FOR

THEIA/JET USERS

Aug, 21 2019
❏ System basics
❏ Modules
❏ Compiling and running MPI codes
❏ Using the batch system
❏ File systems
❏ Panasas differences
❏ /scratch1 & /scratch2
❏ /scrach3 & /scratch4 from Theia (only temporary)
❏ Data migration from Theia to Hera
❏ Reporting problems
❏ Upcoming training

OVERVIEW
❏ Hera is the Queen of the Gods and is the wife and sister of Zeus in the Olympian pantheon. She is known for being the Goddess of Marriage &
Birth. Despite being the Goddess of Marriage, she was known to be jealous and vengeful towards the many lovers and offspring of her husband
Zeus. (Juno, the Roman equivalent, is the TDS)

❏ Based on the Intel Skylake


❏ 20 cores/socket, 2.4 GHz, 2 sockets/node, 96 GB/node
❏ 12 Front-ends (192GB)
❏ 8 Big memory (384GB) nodes
❏ Mellanox HDR Infiniband (a partially blocking fat-tree as opposed to Theia
non-blocking fat-tree)
❏ File systems: /scratch1 & /scratch2
❏ Lustre, approximately 16.9PB, >196GB/s (about 1.5x of /scratch3 + /scratch4)
❏ /scratch3 and /scratch4 will be available during transition only!!
❏ Please do not run jobs from /scratch3 or /scratch4; Use it only for data
migration! (explained later)

SYSTEM OVERVIEW
❏ Currently only 1 / 2 of Hera is up!
❏ “Group 1” is currently up, and will switch to “Group 2” after 2 weeks so that
all the nodes are exercised, but this part should be transparent to users
❏ Currently all of Theia is up, but at some point part of Theia will be shut down
to bring all of Hera up
❏ That will be followed by a complete shutdown of Theia
❏ FGA (nodes with GPUs) have not yet been moved, and continue to be part of
Theia
❏ Currently scheduled to be moved to Hera 2nd week of Sep (during the
downtime for Facilities work)
❏ The Theia file systems /scratch3 and /scratch4 are not expected to be
available after Sep 30!

Please Note!!!!!
Hera Theia

CPU Intel Skylake 6148 Intel Haswell E5-2690 v3

Cores Socket / Node 20 / 40 (2.4GHz) 12 / 24 (2.6GHz)

Memory 96GB / 384GB 64GB / 256GB

Number of Nodes 1336 (1316+8 (+12 spares)) 1198 (1184+14)


(subjetc to change)

Note: FGE system has not changed significantly. It will simply be


moved over from Theia to Hera and its network will be upgraded from
32Gb/s to 54Gb/s.
Hera Theia Comparison - Processor
Hera Theia

Hardware Mellanox HDR 100 Intel TruScale QDR

Theoretical BW 197 Gbps bi-directional 64 Gbps bi-directional

Hera Theia Comparison - IB Network


Hera Theia

/home Shared between the two Shared between the two

FS Hardware Lustre w SSD and PFL Panasas

Peak Performance 196GB/s 125GB/s

Directories /scratch1, /scratch2 /scratch3, /scratch4

Transition Period /scratch3, /scratch4 avail

Hera Theia Comparison - File Systems


❏ Hera is set up to be as close as possible to Theia
❏ They are so similar, in fact the documentation for Hera was
created by substituting “hera” for “theia”, and then
appropriate changes were made
❏ So it will be a very familiar environment. So the differences
are emphasized in this presentation.

Transitioning to Hera
❏ 12 Login nodes, hfe[01-12]
❏ CAC log in
❏ sshg3 bastion-hera.princeton.rdhpcs.noaa.gov
❏ sshg3 bastion-hera.boulder.rdhpcs.noaa.gov
❏ RSA log in
❏ ssh hera-rsa.princeton.rdhpcs.noaa.gov
❏ ssh hera-rsa.boulder.rdhpcs.noaa.gov

Note: Output of “hostname” is different from the conventional “hfe01” etc.


So when you login you may be confused by the prompt! (explained on next slide)

Hera Logging in
❏ Generally users don’t need to know this! If you are used to
seeing t001-t128, etc, it may seem confusing!
❏ Some of these are subject to change

❏ All the nodes are named “h<rack><a|m|c><node>


❏ h - For “Hera”
❏ a - For “access” or login node
❏ m - For “bigmem”
❏ c - For “compute”

Node Name Convention


❏ Based on LMOD
❏ Same as on Theia and jet
❏ Designed to prevent loading incompatible modules
❏ Not all modules appear with module avail
❏ Use module spider to see everything that is available
❏ Compatible with environment modules (used on Gaea, PPAN)

MODULES
❏ Intel compilers and Intel MPI are available
❏ Compilers: intel/18.0.5.274, intel/19.0.4.243
❏ MPI: impi/2018.0.4, impi/2019.0.4
❏ PGI: 19.4 is available
❏ Looking into other open source MPI stacks
❏ Allinea tools
❏ “Forge” and “performance reports” for profiling
❏ “DDT” for debugging (replaces TotalView)

COMPILING AND RUNNING MPI CODES and Tools


❏ Skylake processor has 2 AVX512 units, each capable of 8 DP operations
❏ Also has mult-add capability
❏ If your code can use these instructions, it will perform very well
❏ Intel recommended options

❏ -g -O3 -ftz -traceback -fpe0 -xHost


❏ -g -O3 -ftz -traceback -fpe0 -xHOST -axcore-avx512
❏ -g -O3 -ftz -traceback -fpe0 -xHOST -axcore-avx512 -qno-opt-dynamic-align
❏ PGI recommended options
❏ -g -O2 -tp skylake

COMPILING AND RUNNING MPI CODES


❏ The batch system should feel no different
❏ If it does, file a help report
❏ The batch system will be isolated from the Theia batch system
❏ Use the projects you normally use
❏ All projects will have the same allocation during transition
❏ This may change as management dictates
❏ Portfolio managers will have revised allocations (expect updates to allocations and new projects)

USING THE BATCH SYSTEM


❏ Lustre is different from Panasas
❏ Metadata performance is divided into 2 separate units per file system
❏ This means that MD ops are scoped to approximately ¼ the projects
❏ Quotas are tracked by “Project ID” (usually the same as group ID and directory name)
❏ The Project ID is assigned to top-level dirs and will be inherited for all new subdirs.
❏ Tracking and enforcement includes maximum file count, not just capacity.
❏ (This is different than Jet’s Lustre which uses “group” quotas. Jet will likely change in the
future.)
❏ Please DO NOT run “find”, “du” (or anything) similar to produce reports
❏ Your current allocation is approximately 0.85x your current Theia allocation
❏ If yours is different from this and you don’t think it should be, let us know via the help system
❏ As usual, small files are not desirable (but for different reasons)

FILE SYSTEMS
❏ Finding your new Project directory:
/bin/ls -ld /scratch[12]/*/<project>

❏ For example:
/bin/ls -ld /scratch[12]/*/nesccmgmt
drwxrwsr-x 5 Leslie.B.Hart nesccmgmt 4096 Aug 20 07:37 /scratch1/SYSADMIN/nesccmgmt

Where is my Project Space?


❏ Finding your quota and usage
❏ The saccount_params command is not yet ready
❏ Quotas are “project” based
❏ Run the “id” command to get your project ID number (not the name)
❏ lfs quota -p <project ID number> /scratch1
❏ User and Group usage (capacity and file count) is tracked but not limited
❏ You can also find your usage across all projects:
❏ lfs quota -u <User.Name> /scratch1
❏ lfs quota -g <groupname> /scratch1

What is my Quota and Usage


❏ The new Lustre file systems include solid state disks (SSDs)
❏ Only ~1.7% of total capacity
❏ Uses “Progressive File Layout” or PFL
❏ The first part of each file (up to 256KB) is stored on SSD
❏ This is similar to how Panasas /scratch3,4 operated
❏ As the file grows bigger, it overflows to disks and it stripes it across more
disks and more disks
❏ Up to 32 MB - on HDD, single stripe
❏ Up to 1 GB - on HDD, 4-way stripe
❏ Up to 32 GB - on HDD, 8-way stripe
❏ > 32 GB - on HDD, 32-way stripe, larger object size
❏ So small files reside on SSDs, big files get striped “progressively” wider!
❏ Do not attempt to set striping!! If you think the default is not working for
you, please let us know by submitting a help ticket.

A Word About Lustre and PFL


❏ Simple rules of thumb
❏ Keep source code and critical configuration files on /home, and back up critical
data to HPSS. (/scratch is not backed up!)
❏ Tar up old small files (or delete them) to free up space on the SSD pool and
stay under your file count quota
❏ Large files are still optimal for HPC batch job performance.
❏ Do not open with O_APPEND unless you really need it.

FILE SYSTEMS
❏ After doing this a few times, we have some advice based on previous experiences!
❏ Keep the following link handy, we will talk about a few of these:
❏ https://heradocs.rdhpcs.noaa.gov/wiki/index.php/Migrating_data_between_l
ocal_filesystems

Data Migration from Theia to Hera


❏ Remove unneeded data!
❏ Plan ahead! Everyone is trying to do the same thing that you are!
❏ Use compute nodes for all transfers (and not login nodes)
❏ Volume of data being moved is large, so there will likely be failures!! So you
need reliable ways of transferring data! “cp” and “mv” are likely to fail.
❏ Be sure to use the tools on “static” directories; Do not start rsync on a
directory that is still being updated!
❏ Check your running jobs and crontab entries to see where your outputs
are going
❏ Make sure the person doing the transfers has adequate permissions on both
systems.

Data Migration from Theia to Hera: Guidelines


❏ Reminder: Use compute nodes for your data transfers! Submit a batch job to do the transfers
❏ Use rsync for small data sets
❏ Use pan_sitesync for bigger directories and files
❏ Sample job can be found here:
❏ https://heradocs.rdhpcs.noaa.gov/wiki/index.php/Migrating_data_between_local_filesystems
#pan_sitesync

Data Migration from Theia to Hera: Tools


❏ Hera has its own Data Transfer Nodes
❏ Only Hera file systems “/scratch1” and “/scratch2” are mounted
❏ /home is not mounted (same as it was with all the DTNs)
❏ As you may have guessed, you would use the following host name to do the data transfers
dtn-hera.fairmont.rdhpcs.noa.gov

Data Transfer Nodes (DTNs)


❏ This not a problem only when either Theia or Hera is the only machine!
❏ While they are both active, we have to do the following:

❏ Example, Local Port on both Hera and Theia is: 45537

❏ Need to distinguish between Theia and Hera port tunnels
❏ Putty/Tectia config for Theia: Source Port: 45537, dest port 45537
❏ Putty/Tectia config for Hera: Source Port: 45538, dest port 45537

❏ WinSCP or x2go for Theia: host: localhost; Port: 45547


❏ WinSCP or x2go for Hera: host:localhost; Port: 45548
❏ The port in YELLOW is arbitrary, the example above is using the next port

Misc Topics - x2go and Port tunnelling


❏ Half of Hera is up and you need to begin transitioning!
❏ Hera “looks” and “feels” similar to Theia, so hopefully the transition will be smooth
❏ You need to get started on Data Migration off of /scratch[3,4] as soon as possible

Summary
❏ Preliminary documentation at:

❏ https://heradocs.rdhpcs.noaa.gov/wiki/index.php/Main_Page

❏ Hera-docs is being set up

❏ Help: rdhpcs.hera.help@noaa.gov

Documentation/Reporting Problems
❏ Wednesday,August 28th and Thursday August,29th in Boulder (GB124) 8AM-5PM.
❏ Compilers, Libraries, VTUNE
❏ Will be telecon and GoToMeeting
❏ ARM/Allinea Training Planned for September 4th
❏ Forge and DDT
❏ Hosted in Boulder will be telecon and WebEx/GoToMeeting

Upcoming Training from Intel and ARM/Allinea


Thanks!

Further Questions?

You might also like