Agastya: HPC User Manual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

HPC User Manual

Agastya
Super Computer Center (3280 core)

भारतीय प्रौद्योगिकी संस्थान जम्मू

Contributions
Team HPC@IIT Jammu
Version – 1.4.4
Date – 17-Feb-2022
Foreword

Journey
IIT Jammu got established in 2016 as an Institute of National Importance to promote higher
education and cutting-edge R & D. Since then, the idea of setting up a large-scale computational
facility started evolving. The challenges were integrating diverse user requirements, campus
networking, and IT services on board in a single infrastructure for optimized usage and
performance. After three years of consistent efforts, here we have the “AGASTYA*” with us, a
proud asset for IIT Jammu and to this region. [* Name credits: Prof. Dinesh Pandya]

Challenges
Primarily, it is the diverse research community that propelled the HPC requirement at IIT
Jammu. We have been carrying out research across a broad range of disciplines such as
Astrophysics, computational biology, plasmas, material modeling, and computational chemistry,
AI, Machine Learning, and Deep Learning.

Challenge was to build a single platform that could serve CPU intensive, GPU intensive, and
large in memory-based computational needs. Keeping the accelerated rate of institute growth,
we were also expected to keep the possibility of future scale-up of the facility.

We are in an eco-friendly zone, and our aim is to keep the natural environment intact while
building the campus. Designing a Datacenter with the lowest H/W footprint yet highest
deliverable performance was a challenge. During execution, we came up with optimized
containment designs which benefited us in two ways:

1. Data center Containment providing us future scalability


2. Increase in our power used efficiency percentage.
This approach addressed most of our concerns & challenges.

Design
1. The scalable low latency interconnect which currently has 100Gbps speed and scalable to
200Gbps in future.

2. An efficient latest compute HPC infra that has 256Tflops of peak performance from CPUs and
appx 56 Teraflops from GPUs.

3. A scalable 500TB+ based PFS with sustained performance of more than 30GBps.

2
4. A unified cluster management utility which eases overall administration is the integral
components of our design.

Target
This document is designed for End-Users for reference of technical and usage details. In this
document, it has been tried to provide a clear-cut theoretical picture and practical approach to use
the cluster in a better way.

Dr. Sanat Kumar Tiwari


Assistant Professor, Physics
Indian Institute of Technology Jammu

3
ACKNOWLEDGEMENT
Dear HPC users, to help us document the scientific productivity of the Agastya Cluster, we
request you to add the following text to the acknowledgment section of your articles/reports.

“Author(s) acknowledge the use of Agastya HPC for present studies.”

We appreciate it if you may send a copy of the publication to hpc.support@iitjammu.ac.in,


where Agastya uses are acknowledged.

4
Table of Contents
Glossary ........................................................................................................................................................ 8
Chapter – 1 .................................................................................................................................................. 10
AGASTYA-HPC ............................................................................................................................................. 10
1. Introduction to HPC ............................................................................................................................ 11
1.1 What is HPC?.............................................................................................................................. 11
1.2 What is the AGASTYA-HPC? ................................................................................................... 11
1.3 What is not the AGASTYA-HPC? ............................................................................................. 12
1.4 AGASTYA-HPC Architecture ................................................................................................... 13
1.5 Technical Stuff of AGASTYA-HPC .......................................................................................... 13
1.5.1 Batch or interactive mode? ................................................................................................. 13
1.5.2 What is Parallel or sequential programs? ............................................................................ 13
1.5.3 What are Core, Processor, and Node? ................................................................................. 14
1.5.4 HPC Stack ............................................................................................................................ 14
1.5.5 Libraries ............................................................................................................................... 14
1.5.6 Compilers ............................................................................................................................ 15
1.5.7 Scheduler............................................................................................................................. 15
1.5.8 MPI ..................................................................................................................................... 15
1.5.9 In “AGASTYA” HPC what programming language did we use? ...................................... 16
1.5.10 In “AGASTYA” HPC what operating systems did we use? ............................................... 16
Chapter – 2 .................................................................................................................................................. 17
How to start work on AGASTYA-HPC .......................................................................................................... 17
2. How to get an account on “AGASTYA” HPC?.................................................................................. 18
2.1 How do we connect to the AGASTYA-HPC? ............................................................................ 19
2.1.1 Window Access ................................................................................................................... 19
2.1.2 UNIX Access......................................................................................................................... 22
2.2 How to change the PEM file (Private Key) ................................................................................ 24
Chapter – 3 .................................................................................................................................................. 26
Data Access on AGASTYA-HPC .................................................................................................................... 26
3 How to download and copy file to the HPC ....................................................................................... 27
3.1 Using “MobaXterm” tools .......................................................................................................... 27
3.2 Using SCP ................................................................................................................................... 30

5
3.2.1 Transfer file from HPC to Local Machine. ......................................................................... 30
3.2.2 Transfer file between Local machine to HPC ..................................................................... 31
Chapter – 4 .................................................................................................................................................. 34
Jobs run on AGASTYA-HPC ..................................................................................................................... 34
4 How to run Jobs on AGASTYA-HPC ................................................................................................ 35
4.1 Job Scheduler use in AGASTYA-HPC ...................................................................................... 35
4.2 Module ........................................................................................................................................ 36
4.2.1 How to check available module list .................................................................................... 36
4.2.2 How to load module ............................................................................................................ 37
4.2.3 How to check loaded module list ........................................................................................ 37
4.2.4 How to unload loaded module ............................................................................................ 37
4.2.5 How to unload all modules at once ..................................................................................... 38
4.2.6 How to get the details of module ........................................................................................ 38
4.2.7 How to get the module environment details ....................................................................... 39
4.3 Queue in AGASTYA-HPC ......................................................................................................... 39
4.4 JOB scheduler commands ........................................................................................................... 40
4.5 Get information on your Cluster ................................................................................................. 41
4.6 PBS Environment Variables ....................................................................................................... 41
4.7 Job Execution Step ...................................................................................................................... 41
Chapter – 5 .................................................................................................................................................. 43
Troubleshooting .......................................................................................................................................... 43
5.1 Lammps_Installation_with_gcc .................................................................................................. 44
5.2 HPC Job Example ....................................................................................................................... 46
5.2.1 MATLAB ................................................................................................................................ 46
 Example 1 -: .................................................................................................................................... 47
Sample MATLAB (matlabdemo.m) ....................................................................................................... 47
 Example of MATLAB with PBS .................................................................................................... 48
5.2.2 Example of GAUSSION with PBS ......................................................................................... 50
5.2.3 Example of ANSYS with PBS ................................................................................................ 51
5.2.4 Example of ABAQUS with PBS ............................................................................................ 52
5.2.5 Example of CST studio ........................................................................................................... 52

6
5.2.6 Example of Ludwig................................................................................................................. 59
Chapter – 6 .................................................................................................................................................. 65
HPC Usage Policy....................................................................................................................................... 65
6.1 Storage Quota Policy .................................................................................................................. 65
6.2 User Account Deletion Policy..................................................................................................... 66
Chapter – 7 .................................................................................................................................................. 67
Appendix A .................................................................................................................................................. 67
7.1 Linux Useful command for HPC .................................................................................................. 67
7.2 File and Directory Permissions in ................................................................................................ 68
7.3 Data backup and restore............................................................................................................. 68
7.3.1 Backup and Restore using TAR and filter the archive through GZIP(-Z) ................................. 68
7.4 Compiling parallel code............................................................................................................... 72
7.5 Execute a parallel job .................................................................................................................. 73

7
Glossary
1. Cluster

Cluster is a group of compute nodes.

2. FLOPS

FLOPS is short for “Floating-point Operations Per second”, i.e., the number of (floating
point) computations that a processor can perform per second.

3. FTP

FTP is File Transfer Protocol and it is a standard communication protocol, used to copy
files between one hosts to another (over a network.)

4. HPC

High performance computing and multiple-task computing on a supercomputer.

5. InfiniBand

A high speed switched fabric computer network communications link used in HPC.

6. LAN

Local Area Network

7. Linux

Linux is an operating system, similar to UNIX.

8. Master node

On HPC cluster, Master Node provides node is used to compile software and submit jobs.

9. Compute Node

On HPC cluster, the compute nodes are where the CPU jobs run and jobs are submitted
from the master node by the user. Compute node assigned by the job scheduler. There are
72 compute nodes in AGASTYA-HPC.

10. GPU Compute Node

The GPU compute nodes are where the GPU jobs run and Jobs are submitted from
Master node and GPU compute nodes assigned by the Job scheduler. There are 8 GPU
compute node in AGASTYA-HPC.

8
11. CPU Core

Each node in the AGASTYA-HPC has two CPU Socket and each CPU Socket has 20
core means total 40 cores are available in each node.

12. MPI

MPI (Message Passing Infrastructure) is a technology for running compute jobs on more
than one node. Designed for conditions where parts of the job can run on independent
nodes with the results being transferred to other nodes for the next part of the job to be
run.

13. Module

On HPC, using module User can work with different versions of software without risking
version conflicts.

14. Job Scheduler

The job scheduler monitors the jobs currently running on the HPC cluster and assigns
nodes to the new jobs based on recent HPC cluster usage, job resource requirements, and
nodes available to the submitter. In summary, the job scheduler determines when and
where a job should run. The job scheduler that we use is called PBS Pro.

15. SCP

SCP is secure copy protocol; it is used for securely transferring the file between two
hosts, it is based on secure shell (SSH) Protocol.

9
Chapter – 1

AGASTYA-HPC

10
1. Introduction to HPC
1.1 What is HPC?

HPC(High-Performance Computing) is Like a supercomputer In which two or more computer


Server are connected via high-speed Network media(optical Fiber cable and Infiniband
switch) and each computer is called Node and “High-Performance Computing” (HPC) is a
collection of many isolated servers (Computers) and they can be viewed as a single system.
It’s a computer with modern processing capacity especially the fastest speed of calculation and
accessible memory. In HPC A large number of the dedicated processor are used to resolve
intensive computational tasks.

High-performance computing (HPC) uses supercomputers and computer clusters to solve


advanced computation problems. Today, computer systems approaching the teraflops region are
counted as HPC-computers. HPC integrates systems administration and parallel programming
into a multidisciplinary field that combines digital electronics, computer architecture, system
software, programming languages, algorithms, and computational techniques. HPC technologies
are the tools and systems used to implement and create high-performance computing systems.

HPC is also called computer cluster refers to a group of servers working together on one system
to provide high performance and availability over that of a single computer while typically being
more cost-effective than single computers of comparable speed or availability.

HPC plays an important role in the field of computational science and HPC is a powerful
technique are used for a wide range of computationally intensive tasks in various fields,
including Bio-Science (biological macromolecules), Weather forecasting, chemical
Engineering(computing the structures and properties of chemical Molecular design, Oil and gas
exploration, Mechanical design(2D & 3D design verification) and physical simulations(such as
simulations of the universal activity like early moments of the Universe, the detonation of
nuclear weapons and nuclear fusion).

1.2 What is the AGASTYA-HPC?


A Typical HPC Cluster can be employ two or more server, all working together to execute High
performance Job, HPC system based on Intel and/or AMD CPUs, running a Linux platform and
each server architecture is quite plane and each server has its own hardware: Network, Memory,
storage and processing CPUs and each server are interconnected with copper and fiber cables.

The “AGASTYA” HPC can conduct 256 Teraflops of operations using CPU-CPU based parallel
computing. This cluster (of computing nodes) also has 8 GPUs (Tesla V100), with each of them
capable of 7 Teraflops of operations per second. The facility comes with 800 TB of storage
space. The cluster supports data transfer with an overwhelming speed of 100 Gbps. With a
dedicated UPS of 210 KVA with a three-phase power supply, we are committed to providing the
institute an uninterrupted computing capability. In summary, AGASTYA stands within 20 of the
best computing facilities available in the country.

11
The AGASTYA-HPC relies on parallel-processing technology to offer AGASTYA researchers
an extremely fast solution for all their data processing needs.

A set of different compute clusters in AGASTYA-HPC. For an up to date list of all clusters and
their hardware are following -:

Usable Local
Cluster # Processor memory diskspace
Name nodes architecture/node /node /node
Master Intel® Xeon® Gold 6248 (20-Cores,
Node 2 2.50GHz, 27.50MB Cache, 150W)*2 32 GB * 12 24 TB
Intel® Xeon® Gold 6248 (20-Cores,
CPU Node 72 2.50GHz, 27.50MB Cache, 150W)*2 32 GB * 12 480 GB
Intel® Xeon® Gold 6248 (20-Cores,
GPU Node 8 2.50GHz, 27.50MB Cache, 150W)*2 32 GB * 12 480 GB

Storage Size
/iitjfs 804 TB

1.3 What is not the AGASTYA-HPC?

1. Personal Desktop: Agastya is NOT a replacement of the personal desktop even if it can
do most of their jobs. It is a cluster of computer servers that can be utilized for
computational jobs that a usual desktop will either not able to execute or will take too
long to make the execution practically useless.
2. develops your applications;
3. HPC not for the playing games.
4. answers your query;
5. Thinks like you;
6. Runs your PC-applications much faster for bigger problems;

12
1.4 AGASTYA-HPC Architecture

1.5 Technical Stuff of AGASTYA-HPC


Before know How HPC works first we have to know about some technical stuff about HCP.

1.5.1 Batch or interactive mode?


In real time a HPC system have ability to run multiple programs in parallel without any User
interaction this is called batch mode.

When a program run at the HPC as needed User interaction for give some input like enter Input
data and press any key so this kinds of process is called Interactive mode. So in that case user
interaction is needed and the computer will wait for user input. The available computer resources
will not be used in those cases.

Interactive mode is normally useful in Data visualization Jobs.

1.5.2 What is Parallel or sequential programs?


Parallel program is a form of computation in which many program executing simultaneously.
In Parallel program execution have large program divide into sub small program jobs, which are
execute concurrently. Parallel computing is the simultaneous use of multiple processing units to
solve a computational problem

13
Parallel computer can be installing with multicore computer having multiple processing units
with in single machine so that multiple computer to work on the same task in a cluster. Parallel
computing has played important role in HPC cluster in the form of multicore processor.

In Sequential Program A single-processor computer that executes one instruction at a time and
It does not do calculations in parallel, i.e., it only uses one core at a time of single node means it
can only use one core; it does not become faster by just providing more cores at it.

But it also possible to run multiple instances of program with different input parameter on the
HPC, This is just additional form of execution of sequential programs in parallel.

1.5.3 What are Core, Processor, and Node?


In HPC, Server denoted to as Node and each Single Node (Server) contain one or more socket
and each socket contain multicore processor which consists from multiple CPU or cores that are
used to run HPC Program.

1.5.4 HPC Stack


HPC Stack is a cluster components’ stack which explains about different components which uses
in HPC cluster implementation and its dependencies on one another.

1.5.5 Libraries
Library is a collection of resources used to develop software. These may include prewritten code
and subroutines, classes, values or type specifications. Libraries contain code and data that
provide services to independent programs. This allows the sharing and changing of code and data
in a modular fashion. Some executable are both standalone programs and libraries, but most
libraries are not executable. Executable and libraries make references known as links to each
other through the process known as linking, which is typically done by a linker. Here in HPC
Stack, libraries mean development libraries both serial and parallel which are associated with
compilers and other HPC programs to run jobs. Originally, only static libraries existed. A static
library, also known as an archive, consists of a set of routines which are copied into a target
application by the compiler, linker, or binder, producing object files and a standalone executable
file. This process, and the stand-alone executable file, is known as a static build of the target
application. Actual addresses for jumps and other routine calls are stored in a relative or
symbolic form which cannot be resolved until all code and libraries are assigned final static
addresses. Dynamic linking involves loading the subroutines of a library into an application
program at load time or run-time, rather than linking them in at compile time. Only a minimum
amount of work is done at compile time by the linker; it only records what library routines the
program needs and the index names or numbers of the outlines in the library.

Dynamic libraries almost always offer some form of sharing, allowing the same library to be
used by multiple programs at the same time. Static libraries, by definition, cannot be shared.
With Centos 7.8 on Cluster, glibc is installed for OS and developmental activities. The GNU C
Library, commonly known as glibc, is the C standard library released by the GNU Project.

14
Originally written by the free software Foundation (FSF) for the GNU operating system, the
library's development has been overseen by a committee since 2001, with Ulrich Dripper from
Red Hat as the lead contributor and maintainer.

1.5.6 Compilers
A compiler is a computer program (or set of programs) that transforms source code written in a
programming language (the source language) into another computer language (the target
language, often having a binary form known as object code). The most common reason for
wanting to transform source code is to create an executable program. The GNU Compiler
Collection (GCC) is a compiler system produced by the GNU Project supporting various
programming languages. GCC is a key component of the GNU tool chain. As well as being the
official compiler of the unfinished GNU operating system, GCC has been adopted as the
standard compiler by most other modern Unix-like computer operating systems, including Linux,
the BSD family and Mac OS X. Originally named the GNU C Compiler, because it only handled
the C programming language, GCC 1.0 was released in 1987, and the compiler was extended to
compile C++ in December of that year. Front ends were later developed for FORTRAN, Pascal,
Objective-C, Java, and Ada, among others.

1.5.7 Scheduler
A job scheduler is a software application that is in charge of unattended background executions,
commonly known for historical reasons as batch processing. Synonyms are batch system,
Distributed Resource Management System (DRMS), and Distributed Resource Manager (DRM).
Today's job schedulers typically provide a graphical user interface and a single point of control
for definition and monitoring of background executions in a distributed network of computers.

For this Cluster resources manager is PBSPro. It is a open-source HPC job scheduler. The
PbsPro for Resource Management is an open source, fault-tolerant, and highly scalable cluster
and job scheduling system for large and small Linux clusters. PBSPro requires no kernel
modifications for its operation and is relatively self-contained.

1.5.8 MPI
Message Passing Interface (MPI) is an API specification that allows processes to communicate
with one another by sending and receiving messages.

MPI is a language-independent communications protocol used to program parallel computers.


Both point-to-point and collective communication are supported. MPI "is a message-passing
application programmer interface, together with protocol and semantic specifications for how its
features must behave in any implementation." MPI's goals are high performance, scalability, and
portability. MPI remains the dominant model used in high- performance computing today.

We are using Intel MPI also. Intel® MPI Library is a multi-fabric message passing library that
implements the Message Passing Interface, version 3.1 (MPI-3.1) specifications. Use the library
to develop applications that can run on multiple clusters interconnects.

15
1.5.9 In “AGASTYA” HPC what programming language did we use?
In HPC we can use any programming language, any library and software package provided that
run on Linux, specifically, on the version of Linux that is installed on the compute nodes, Centos
7.8.

For the most common programming languages, a compiler is available on CentOS 7.8.

Supported and common programming languages on the HPC are C/C++, FORTRAN, Java, Perl,
Python, MATLAB, etc. Supported and commonly used compilers are GCC and Intel.

Note -: Additional software can be installed “on demand”. Please contact the “AGASTYA”
HPC support to specific requirements.

“AGASTYA” HPC Support Mail ID – “hpc.support@iitjammu.ac.in”

1.5.10 In “AGASTYA” HPC what operating systems did we use?


All nodes in the “AGASTYA” HPC cluster run under CentOS-7.8 which is a specific version of
Red Hat Enterprise Linux. This means that all programs (executable) should be compiled for
CentOS-7.8.

Users can connect from any computer in the “AGASTYA” network to the HPC, irrespective of
the Operating System that they are using on their personal computer. Users can use any of the
common Operating Systems (such as Windows, mac-OS or any version of Linux/Unix/BSD) and
run and control their programs on the HPC.

16
Chapter – 2

How to start work on AGASTYA-HPC

17
2. How to get an account on “AGASTYA” HPC?
Users who want to access on “AGASTYA” HPC kindly follow the following step -:

 First request for the new user account through given dedicated link portal on the IIT
Jammu intranet -:
 Link - https://intranet.iitjammu.ac.in

 And fill the required detail after open the link and click on submit button for request
submission.

18
 After submit the account creation request then automatically request send to the final
authority for approval
 After final approval you will get the user ID with private key (.Pem file) on your mail ID.
 Then you have to download the “private key” and put into your local system from where
you will access to HPC.
 Without private key, you won’t be able to access on AGASTYA-HPC.

Note: - In AGASTYA-HPC cluster we use public/private key pairs for user authentication
(rather than pass-words).

2.1 How do we connect to the AGASTYA-HPC?


Pre-Requisites – User need to have own private key those User can download from our mail id
into our local system and System connected with Internet.

If User wants to connect with HPC then it’s also depends on User Operating System
(UNIX/Windows) so there is two ways to connect with the AGASTYA-HPC are -:

2.1.1 Window Access


If User use window based Operating system from outside the campus then they have to use “SSH
client tools” (Putty, MobaXterm, SecureCRT, mRemoteNG) so here we are connect to HPC
through “MobaXterm” so we follow the following steps -:

 So first we have to download “MobaXterm” tools from Internet in our local system and
then run the exe.
 After run the exe “MobaXterm” screen will open.

19
 Then user will click on session button then session screen will open.

Session
Shell option
button

 Then User will click on shell button so next shell window will open.

20
 Here user can select terminal shell by default Bash shell selected so here user will click
on ok button then next Bash command line terminal will open.

 After that User will type following command with pem file path on terminal then press
enter.

Syntax - #ssh -i pemfilePath username@agastya.iitjammu.ac.in

# ssh -i /home/Vimal29/Desktop/student/vimal.pem vimal@agastya.iitjammu.ac.in

21
Note -: In above pem file path is “/home/Vimal29/Desktop/student/vimal.pem” but in your
case it should be different.

 Then User will successfully login on AGASTYA-HPC

2.1.2 UNIX Access


If User working on UNIX based operating system then User have to follow the below step.

22
 The First, User will Login from Linux system then downloads the pem file (given by the
hpc.support) in our local Directory.
 After download the pem file then user will open terminal and now user have to change
the permission from 644 to 600 using following command.
 # chmod 600 hmaster1.pem

 Then on same location we will run the following command for SSH login.

Syntax - #ssh -i pemfile user_name@agastya.iitjammu.ac.in

# ssh -i vimal.pem vimal@agastya.iitjammu.ac.in

Note -: at username place user have to write own user name provided by the hpc.support

23
 After press the enter button then first time system will ask continue connecting then user
has to write yes then User will login successfully.

Note -: Users can change the own private key (Pem file) after getting the pem file,
procedure are mentation below.

2.2 How to change the PEM file (Private Key)

User can change own PEM file, User need to recreate the rsa key and add it to authorized_keys
file using following step -:

 After login, User will enter in to .ssh directory.

$cd .ssh/

24
 Then type following command -:

$ssh-keygen –t rsa

 Then recreate the authorized key using copy the content of public key into the authorized
key using following command -:

$ cat id_rsa.pub > authorized_keys

 Now download the private key(id_rsa) in own system.


 Now after logout the system user can login on system with new key.

25
Chapter – 3

Data Access on AGASTYA-HPC

26
3 How to download and copy file to the HPC

Before running the Job on HPC, first users have to copy the file from desktop/Laptop to the HCP
cluster and at the end of Job; users have to download some files back to our system so here we
have used some GUI and Command Line tools for this purpose.

3.1 Using “MobaXterm” tools


MobaXterm provides the facility to copy and download the file from AGASTYA-HPC but this
tool works only for windows User, so step are following -:
 First the open “MobaXterm” tools in Local system.

 Then click on the session button for a new login session and then click on SSH on the
next appears a window.

27
 Then enter HPC IP 10.10.119.10 in the “Remote host” option and select a private key and
then click on the ok button.
 Then enter the user name on the upcoming screen and after press enters User will
successfully login on HPC.

28
Upload
Button

Download
Button

 So there is a left panel there is two buttons are given for downloading and uploading files
so here User can download and upload file in the current working directory.

29
3.2 Using SCP
SCP is a secure copy command-line tool in UNIX based system so if a user is a login from the
UNIX system then the user can transferring files between the local host (User computer) and
Remote Host (HPC Cluster) and it is based on a secure shell protocol.

3.2.1 Transfer file from HPC to Local Machine.


So If User wants to transfer file from HPC to a Local Computer then they will use the following
command

# scp -i (pem file Name) user@HPC_IP:(SourcePath) /(DestinationPath) //Command type


in Local System

Example -:

#scp -i vimal.pem vimal@10.10.119.10:/home/vimal/hello /root/

In the below screen I currently I am login into HPC and I will transfer a file, Name is “hello”
from HPC to our local machine so I will open a new terminal in our local machine then type the
following command.

30
Now we will check in our local system then find file transfer successfully.

3.2.2 Transfer file between Local machine to HPC


If User wants to transfer file from a Local Machine to HPC then they will use the following
command

#scp -i (pemfileName) (Source file Path) user@HPC_IP:/(Destination Path)

31
Above command run only on local machine means here user copy the file from local machine to HPC
without login into HPC.

In the below screen I am trying to copy a file from our local computer current working directory to the
home directory of HPC user “Vimal”

So we use below command -:

#scp -i vimal.pem TestDemo vimal@10.10.119.10:/home/vimal

Now file transfer successfully on HPC cluster and now we can check after login on HPC.

32
33
Chapter – 4

Jobs run on AGASTYA-HPC

34
4 How to run Jobs on AGASTYA-HPC
To run the jobs in HPC, the User required Job scheduler with a resource manager so that the user
can use maximum resources of HPC and get the result in minimum time so first, we will define
the required some technical stuff during job running on HPC

4.1 Job Scheduler use in AGASTYA-HPC


In HPC to have access to the compute nodes of a cluster, we have to use the job system so In
AGASTYA-HPC for access to the, all nodes of a cluster have used “Portable Batch System
Professional (PBS Pro) version 19.1.3.” And for the use of job scheduler first we have to write a
pbs jobs script so that User can select resources of HPC like number of processor, number of
nodes, memory, wall time etc.

So pbs jobs script example are following -:

How to use job scheduler

-------------------------------------------------------------------------------------

Sample job script:-

-------------------------------------------------------------------------------------

#!/bin/bash

#PBS N test_job (test_job is user sample job name)

#PBS -l nodes=1:ppn=1 (Requests for 1 processors on 1 node. )

#PBS –q <queue name>

#PBS -l mem=5gb (Requests 5 GB of memory in total)

#PBS o outtest.log (Output log file)

#PBS e Error.log (Error Log file)

cd $PBS_O_WORKDIR

./a.out

NOTE -: after getting the user account detail to the User, the user can see a sample PBS
jobs script in there, own home directory for reference.

35
4.2 Module
In HPC, Using modules, each user has control over their environment, and using modules User
can work with different versions of software without risking version conflicts.

So in HPC all software package are activate and deactivate through module so the first users has
to know about the all following module command

4.2.1 How to check available module list


To check the available module list on AGASTYA-HPC user can use the following command -:

#module avail

Output

36
4.2.2 How to load module
To load the module on AGASTYA-HPC user can use the following command -:

#module load module-name

Output

4.2.3 How to check loaded module list


To check the loaded module list on AGASTYA-HPC user can use the following command -:

#module list

Output

4.2.4 How to unload loaded module


To unload the loaded module on AGASTYA-HPC user can use the following command -:

#module unload module-name

Output

37
4.2.5 How to unload all modules at once
To unload the all loaded module on AGASTYA-HPC user can use the following command -:

#module purge

Output

4.2.6 How to get the details of module


To get the detailed list of all possible module command then the user can use the following command -:

#module help

Or

#module help module-name  (help required for particular module)

38
4.2.7 How to get the module environment details
To see the module environment detail user can use following command -:

#module show module-name

Output

4.3 Queue in AGASTYA-HPC


In AGASTYA-HPC following Queue are available for the Jobs execution so the user can use all given
Queue in PBS jobs Script Parameter.

Maximum Maximum Wall


S.No Queue Name Maximum Job run / User Node/Job time/ Job
1 GPU-EXT 2 2 7 Days
2 cpu-debug 2 4 8 Days
3 cpu-parallel 2 8 4 Days
4 cpu-test 2 1 1 Days
5 cpu-serial 40 1 20 Days

The queue policy of Agastya HPC define the available resources like what kinds of hardware are
available to the various needs of its users and also define the what types of Job users can run and

39
how much resources can use so all detail are mentation in the queue detail. So each queue details
are following.

1. cpu-test

In cpu-test queue user can use maximum 40 cpu core for the each jobs with one node,
means Users can test the code only on one node and each user can run maximum 2 jobs at
a time with the limited wall time (1 days) of period.

2. gpu-ext

In “gpu-ext” queue User can run GPU parallel job with a minimum of 40 cpu core with
each node so user have to mentation minimum 40 processor with each node in PBS jobs
script and HPC committee recommend that user do not use more than 2 GPU node for the
job execution means user can use up to 2 GPU nodes (80 CPU core) with each jobs and
each user can run maximum 2 jobs at a time with the limited wall time (7 days) of period.

3. cpu-parallel

In cpu-parallel queue User can run parallel jobs with a minimum of 40 cpu core with each
node and HPC committee recommend that user do not use more than 320 cpu for the job
execution means user can use maximum 8 compute nodes (320 CPU core) with each jobs
and each user can run maximum 2 jobs at a time with the limited wall time (4 days) of
period.

4. cpu-serial

In cpu-serial queue User can run only serial jobs job with 1 cpu core processor means
users can use only one processor of each compute node and in “cpu-serial” queue and
each user can run maximum 40 jobs at a time with the limited wall time (20 days) of
period.

5. cpu-debug

In cpu-debug queue User can run the application code for debugging and compilation
purpose only and HPC committee recommend that user do not use more than 160 cpu
core for the job execution means there are user can submit the job with a maximum of up
to 4 nodes (160 cpu core) and each user can run 2 jobs at a time with limited wall time (8
days) of period.

4.4 JOB scheduler commands


1. Submit the job to the scheduler by :- $qsub <script_name>
2. check the jobs status by :- $qstat
3. check where the job are running :- $qstat –n

40
4. check full information of the job :- $qstat –f <JOB_ID>
5. delete the job from the queue :- $qdel $job_id (it may take 5 to 10 seconds)
6. check the queue information :- $qstat –Q
7. List all jobs and their state :- $qstat –a
8. List all running jobs :- $qstat –r

4.5 Get information on your Cluster


1. List offline and down nodes in the cluster :- pbsnodes –l
2. List information on every node in the cluster :- pbsnodes –a

4.6 PBS Environment Variables


Environment Variable Description

PBS_JOBNAME User specified job name

PBS_ARRAYID Job array index for this job

PBS_GPUFILE List of GPUs allocated to the job located 1 per line :<host>-gpu<number>

PBS_O_WORKDIR Job's submission directory

PBS_TASKNUM Number of tasks requested

PBS_O_HOME Home directory of submimng user

PBS_JOBID Unique pbs job id

PBS_NUM_NODES Number of nodes allocated to the job

PBS_NUM_PPN Number of processor per node allocated to the job

PBS_O_HOST Host on which job script is currently running

PBS_QUEUE Job queue

PBS_NODEFILE File containing line delimited list on nodes allocated to the job

PBS_O_PATH Path variable used to locate executable within job script

4.7 Job Execution Step


 Copy Input file from local system to HPC (procedure given in above chapter-3)
 Write pbs script file (sample pbs job script file mentation in above steps)
 In pbs script file user have to load required module.
 Run the script using “qsub” command

41
o qsub pbsjob_script
 output will save in working directory
 User can download output from HPC to the local system (procedure given in above
chapter-3).

42
Chapter – 5

Troubleshooting

43
5.1 Lammps_Installation_with_gcc
LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel Simulator.

44
45
5.2 HPC Job Example

5.2.1 MATLAB
Any MATLAB .m file can be run in the queue. The -r command flag for MATLAB will cause it
to run a single command specified on the command line. For proper batch usage, the specified
command should be your own short script. Make sure that you put an exit command at the end of
the script so that MATLAB will automatically exit after finishing your work. In the example
given below, debugging runs of the program on a workstation or in interactive queue runs will
print a message when the job is finished, and unattended batch runs will automatically save
results in a file based off the job ID and then exit. Failure to include the exit command will cause
your job to hang around until its wall clock allocation is expired, which will keep other jobs in
the queue longer, and also tie up MATLAB licenses unnecessarily.

46
 Example 1 -:

Sample MATLAB (matlabdemo.m)

PBS Jobs Script for Matlab

1. Single core

#!/bin/bash -l

#PBS -q batch

#PBS -N NetCDF_repeat

#PBS -l nodes=1:ppn=1

#PBS -l walltime=100:00:00

#PBS -o out.txt

#PBS -e err.txt

cd $PBS_O_WORKDIR

module load matlab

matlab -nodesktop -nosplash -r running_monthly_mean &> out..log

47
2. Multicore

#!/bin/bash -l

#PBS -q batch

#PBS -N parallel_matlab

#PBS -l nodes=1:ppn=6

#PBS -l walltime=100:00:00 cd

$PBS_O_WORKDIR

module load matlab

matlab -nodesktop -nosplash -r my_parallel_matlab_script &> out..log

 Example of MATLAB with PBS


I have simple matlab program “matlabtest.m” are -:

Matlabtest.m

a=4;

b=6;

c=a+b

fprintf ('%i',c)

so run above program with pbs scheduler, I use following pbs script name “pbsmat”

pbsmat

#!/bin/bash

#PBS -N pbsmat

48
#PBS -l nodes=20:ppn=40

#PBS -l mem=2gb

#PBS -q cpu-test

cd $PBS_O_WORKDIR

module load codes/matlab-2020

matlab -nodisplay -nodesktop -nosplash -r "run ./matlabtest.m"

Now for run the script we write following command -:

#qsub pbsmat

Now we check our job status with following command -:

#qstat –an

Now same here after finished the execution, output file “pbsmat.o1614” and error log
“pbsmat.e1614” will be generated.

49
Now we can check the output -:

5.2.2 Example of GAUSSION with PBS

#!/bin/bash

#PBS -N g16test

#PBS -l nodes=1:ppn=40

#PBS -l mem=20gb

#PBS -q cpu-test

50
cd $PBS_O_WORKDIR

source /etc/profile.d/gaussian.sh

g16 </home/vimal/GAUSSIAN/Methane-opt-s.gjf

Now for run the script we write following command -:

#qsub g16test (g16test is the above job script name)

5.2.3 Example of ANSYS with PBS

#!/bin/bash

#PBS -N pbsmat

#PBS -l nodes=1:ppn=16

#PBS -l mem=200gb

#PBS -q cpu-test

export ANS_FLEXLM_DISABLE_DEFLICPATH=1

export ANSYSLMD_LICENSE_FILE=1055@10.10.100.19:1055@agastya1

export ANSYSLI_SERVERS=2325@10.10.100.19:2325@agastya1

cd $PBS_O_WORKDIR

echo $PBS_NODEFILE > out.$PBS_JOBID

cat $PBS_NODEFILE >> out.$PBS_JOBID

export ncpus=`cat $PBS_NODEFILE | wc -l`

dos2unix *.jou

/apps/codes/ANSYS/ansys_inc/v202/fluent/bin/fluent 2ddp -g -SSH -pinfiniband -mpi=default -


t$ncpus -cnf=$PBS_NODEFILE -i $PBS_O_WORKDIR/run.jou >> out.$PBS_JOBID

51
Note -: in above run.jou file have some instruction has given for the case file and data file
(this file also create by the user with pbs job script file)

/file confirm no

/server/start-server ,

rc init_SS_SST_kw_sigma_0pt2_ita_1e11_54826_elements.cas.h5

rd init_SS_SST_kw_sigma_0pt2_ita_1e11_54826_elements.dat.h5

it 10000

wcd tmp.cas.h5

/server/shutdown-server

exit yes

5.2.4 Example of ABAQUS with PBS

Note -: In Above code User can change number of nodes and processor value according to the
queue configuration.

5.2.5 Example of CST studio


For CST jobs running user have to follow the steps-:

52
1. First login into the HPC with appropriate credential(login method already define in previous
chapter)

2. Then User will upload own input file from local to HPC(procedure already define on above
chapter-3)

Input file

3. Then users execute CST binary using following command -:

53
4. Then next CST home screen will show

5. After that user press the enter key, next screen will show.

6. In above screen user have to select queue then next screen will show.

54
7. Then users have to mention the input file path and then press enter, then next screen will show

8. Here users have to select solver method.

55
9. After that user have to select Cluster Acceleration Method , then press enter for the next input
screen.

Note -: If user select none then by default Jobs will run on single core, if user want to submit parallel
jobs then user have to select MPI computing otherwise user can select distributed computing

56
10. After those users have to mention the number of nodes required here user can select up-to 2
nodes maximum.

11. Then after users have to mention wall time so wall time limit define in queue policy otherwise
user can left.

12. Then after user confirm the final input parameter.

57
13. Then jobs id will show.

Job ID

14. Then user can jobs status with following command-:

# qstat –an

58
5.2.6 Example of Ludwig
Ludwig is a LBM based solver, used for solving Hydrodynamic problems.

1. First user login into the HPC system with appropriate credential(login method already define in
previous chapter)

59
2. Then User will upload own input file from local system to HPC system (procedure already define
on above chapter-3)

Input file for Ludwig

3. Then user create a pbs jobs script for run the input file so here we create a “pbs jobs script” with
name “demo_ludwig” file for run the ludwig input file.

60
4. Now add the following content in “pbs_job_script” of “demo_ludwig” file.

5. After save the content of “pbs_job_script” of demo_ludwig file then user run the
“pbs_jobs_script” with “qsub” command.

61
6. Now jobs_ID will be generated now we can check our jobs status using “qstat -an” command

Job_ID

62
Above status showing “R” means jobs are running.

7. After jobs completion output will be generated on same location.

63
Generated output

Now user can download the output from HPC to local machine (procedure already define in previous
chapter)

64
Chapter – 6

HPC Usage Policy

6.1 Storage Quota Policy


 In AGASTYA-HPC every users are provided a fix space quota with soft limit of 2 TB to
Hard limit till 3 TB and grace period of 7 days to store their result.
 This disk space is provided to run your program in your home directory.
 If a user passes the soft limit of quota space then User will get only 7 days to utilize cross
the soft limit of space quota after that job will stop or not run.
 If HPC Users utilize more than their allocated space quota then they may not be able to
run the jobs from their home directory until they clean their space and reduce their usage,
or they can also request for additional storage with proper justification, which may be
allocated to them, subjected to the availability of space.
 If users required more storage space then he has to raise the request to HPC In charge.
 User can store the data only for the computation. Do not use for the store backup of your
data.

65
 Kindly compress your data, if not used in currently but may be used in future.
 HPC User data will be deleted for
o Normal User - after 15 days of their account expiry.
o Faculty – After 1 month of retirement date.
o Other scientific staff – After two month of their account expiry
 It is the user responsibility to take their data out or to take proper approval for extending
their account expiry date from HPC authority.

User can also check own available quota size in AGASTYA-HPC with following command.

# lfs quota –u –h username /iitjfs

6.2 User Account Deletion Policy


In AGASTYA-HPC, User account will be suspended for-:

 Normal User - after 1 month of their account expiry.


 Faculty – After 2 month of retirement date.
 Other scientific staff – After three month of their account expiry

66
Chapter – 7

Appendix A

7.1 Linux Useful command for HPC

ls - List directory contents

ls -l - List the content and its information

ls -a - List all the content, including hidden files

pwd - Prints the full name (the full path) of current/working directory

mkdir Directory_name - Create a new directory

cd foldername - Change the working directory to foldername

cd - Return to $HOME directory

cd .. - Go up a directory

cd - - Return to the previous directory

nano, vi, vim, - File editors

cp source destination - Copy source to destination

cp -r source destination - Copy a directory recursively from source to destination

mv source destination - Move (or rename) a file from source to destination

67
rm file1 - Remove file1

rm -rf Directory - Remove a directory and its contents recursively

cat file - view contents of file on the screen

less file - View and paginate file

head file - Show first 10 lines of file

tail file - Show last 10 lines of file

7.2 File and Directory Permissions in


File permissions control file access. They allow you to control who can read, write, or execute any files
and specified for each file and directory. In Linux three basic permissions are following:

Read (r)
Having read permission on a file grants the right to read the contents of the file. Read
permission on a directory implies the ability to list all the files in the directory.
Write (w)
Write permission implies the ability to change the contents of the file (for a file) or create
new files in the directory (for a directory).
Execute (x)
Execute permission on files means the right to execute them, if they are programs. (Files
that are not programs should not be given the execute permission.) For directories,
execute permission allows you to enter the directory (i.e., cd into it), and to access any of
its files

7.3 Data backup and restore

7.3.1 Backup and Restore using TAR and filter the archive through
GZIP(-Z)
 Backup Syntax:

[root@localhost~]# tar <option> <file_name.tar.gz> <source_file/folder>

Option -> -c create

-v verbose

-f filename

-z compress the backup file

68
Example – in this example we create backup file of above selected directory (ludwig_job)

69
Now we check our backup file has been created.

We can also view the content of backup file using following command -:

[root@localhost~]# tar –tvzf bkup_ludwig.tar.gz

70
Now we can copy our data from hpc to local machine through archive format.

 Restore the backup file in HPC

For restore the current “bkup_ludwig.tar.gz” file we use following command syntax -:

[root@localhost~]# tar <option> <file_name.tar.gz> -C <New_Directory>

So first we have to create new directory for restore the backup

Option -> -x create

-v verbose

-f filename

-z compress the backup file

[root@localhost~]# tar -xvzf <file_name.tar.gz>

71
[root@localhost~]# tar -xvzf bkup_ludwig.tar.gz

7.4 Compiling parallel code


The compilers are invoked via PBS scripts that specify the appropriate compiler switches and link to the
parallel libraries.

 In order to compile a parallel program written in FORTRAN user should type:

mpifort <FileName>

 If user compile a parallel program written in c then user should type:

mpicc <file_name>

 And parallel C++ code is compiled by

mpiCC <file_name>

72
7.5 Execute a parallel job
For execute the parallel job then user should type:

mpirun ./a.out

73

You might also like