marp | theme |
---|---|
true |
uncover |
- Understanding the Computer Cluster setup
- Learning command line Linux
- Navigating the cluster
- Submitting cluster jobs
-
You will need to be registered to gain access to the cluster
-
Windows systems software:
- FileZilla Client - https://filezilla-project.org
- Putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
-
macOS software:
- FileZilla Client - https://filezilla-project.org
- 130 nodes (on CPU partition)
- 754 GB RAM
- 112 (HT) cores
- 14,560 usable cores
- Simply put, 14,560 processes that can be run in parallel
- Huge data storage (
/cephfs2
: 7.1PB) - (Almost) never turned off
- Specialist software manages long-running jobs
- Compute cluster needed for modern life sciences datasets
- Maintained by Scientific Computing
- On a Mac open the terminal to connect to head node (either hal, hex or max):
ssh –Y username@hal
- Enter your cluster password
- Connect to atg first if connecting from outside
-
On a PC, open Putty and connect to the head node (either hal, hex or max):
username@hal
-
Connect to atg first if connecting from outside
- FileZilla Client: https://filezilla-project.org
- Free and available for Windows and macOS
- Normal logon credentials, and
Host:
hal
Port:22
-
Analogous to a single cluster node
-
21TB of data storage, 80CPUs and 97GB RAM
-
Maintained by the Cell Biology Division
-
Software can be installed as per users' requirements
-
Let us know if you would like an account (and you are a memember of the Cell Biology Division)
-
sean-pc-10.lmb.internal
-
NOT BACKED UP!!!
- Similar to Windows and macOS, Linux is an operating system
- Free and open source
- Different types of Linux e.g. Android
- The LMB cluster uses AlmaLinux
- Big difference : you need to use the command line�
- Not so intuitive, but more powerful
-
Shells – command line interface interpreter programs
-
We recommend using Bash - arguably the best known
-
This is not the LMB cluster default
-
Ask scientific computing to make it your default
-
Otherwise, temporarily specify the bash shell with:
bash
-
Each command is actually a program
-
Modified by flags, options and arguments
command [-flag(s)] [-option(s) [value]] [argument(s)]
ls
directory1 file1.txt file2.txt file3.txt
command [-flag(s)]
ls –l
total 12
drwxrwxr-x 2 swingett swingett 4096 Jul 15 15:59 directory1
-rw-rw-r-- 1 swingett swingett 0 Jul 15 15:57 file1.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
command [-flag(s)]
ls -l --human-readable
total 12K
drwxrwxr-x 2 swingett swingett 4.0K Jul 15 15:59 directory1
-rw-rw-r-- 1 swingett swingett 0 Jul 15 15:57 file1.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
command [-flag(s)]
ls –l -h
total 12K
drwxrwxr-x 2 swingett swingett 4.0K Jul 15 15:59 directory1
-rw-rw-r-- 1 swingett swingett 0 Jul 15 15:57 file1.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
command [-flag(s)]
ls –lh
total 12K
drwxrwxr-x 2 swingett swingett 4.0K Jul 15 15:59 directory1
-rw-rw-r-- 1 swingett swingett 0 Jul 15 15:57 file1.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
command [-flag(s)] [-option(s) [value]]
ls -l --sort=size
total 12
drwxrwxr-x 2 swingett swingett 4096 Jul 15 15:59 directory1
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 0 Jul 15 15:57 file1.txt
command [-flag(s)] [-option(s) [value]] [argument(s)]
ls -l file2.txt file3.txt
-rw-rw-r-- 1 swingett swingett 17 Jul 15 16:35 file2.txt
-rw-rw-r-- 1 swingett swingett 37 Jul 15 16:34 file3.txt
ls
pwd
cd
cp
mv
mkdir
rmdir
rm
history
-
Locations represented as a line of text
-
Each folder ends with a forward slash:
/lmb/home/jsmith/file1.txt
-
Relative links:
../pjones/file2.txt
./file4.txt
~/folderA/file5.txt
-
Use only alphanumeric characters, the underscore symbol (_) and the dot (.):
my_file1.txt
-
Not spaces!
-
File extension can tell you what a file is
-
Hidden files:
.hidden_file.log
cat
head
tail
more
nano
gzip
zcat
gunzip
-
Redirect to a file:
cat file1.txt > file1_copy.txt
cat file1.txt file2.txt file3.txt > combined.txt
-
Append to a file:
cat file4.txt >> combined.txt
Can use redirects with other command (i.e. not just cat
)
- Takes output from one command and pass to another:
zcat file.txt.gz | more
-
Search text files
-
Return lines in a text file where search term is found:
grep organoid thesis.txt
-
Represent symbolically other characters
-
Example:
england.txt
,northern_ireland.txt
,scotland.txt
,wales.txt
-
Asterisk matches none or more characters:
ls *land.txt england.txt northern_ireland.txt scotland.txt
-
Question mark matches exactly one character:
ls wa?es.txt wales.txt
-
Character class matches any of the single alphanumeric characters in the list:
ls [es]*.txt england.txt scotland.txt
-
Symbolic links akin to shortcuts on Windows and aliases on macOS
-
Link to a single file:
ln -s /target_folder/target_file_of_interest.txt
-
Link to a single file, except link has a different name:
ln -s /target_folder/target_file_of_interest.txt link.txt
-
Links to multiple files:
ln -s /target_folder/*.txt .
-
Simple description:
whatis
-
Detailed manual:
man
-
Google, ChatGPT
-
Forums
-
Cheat sheet
<style scoped> table { font-size: 20px; } </style>
Column | Description (ls -l ) |
---|---|
1 | File type (- file / d directory / l link) |
2 | Permission string (owner / group /everyone) (rwx) |
3 | Number of hard links |
4 | Owner name |
5 | Owner group |
6 | File size in bytes |
7 | Modification time |
8 | File name |
-
Add execute privileges for user:
chmod u+x [files]
-
Add write privileges for group:
chmod g+w [files]
-
Remove read privileges for others:
chmod o-r [files]
-
Add read privileges for everyone:
chmod a+r [files]
-
There is also a "numerical" system to do this
-
Variables (e.g.
$USER
) – built-in and user-defined -
Display to screen using
echo
-
Order lines with
sort
-
Transfer data with
curl
-
Fix line endings with
dos2unix
andmac2unix
-
$PATH
/usr/bin:/usr/local/sbin:/usr/sbin
-
which
which ls
/usr/bin/ls
-
ps
/top
-
nohup
(no hang up) -
Backgrounding with
&
-
Cancel job with CTRL + C
-
Suspend with CTRL + Z /
bg
(fg
will foreground a job) -
kill [job id]
-
kill -9 [job id]
- Cluster architecture
- Logging in to the cluster
- Using Linux and command line shells
- BASH
- Navigating
- Copying; deleting; moving; linking files
- Reading; writing; searching files
- Compressing data
- Re-direction, appending, piping
- Wildcards
- File permissions
- Downloading
- Variables
- Running programs
- Checking running programs (
ps
,top
) $PATH
- Running in the background (
&
,bg
,nohup
) - Where to get help
-
Clusters require job management and scheduling system
-
Keeps the nodes all in contact with one another etc.
-
LMB cluster uses Slurm (updated recently)
-
Slurm is open-source software for large and small Linux clusters
-
Uses the command line
-
man
pages are available
-
squeue
-
squeue -u $USER
-
sqsummary
– CPU node state -
sinfo
– partition node information -
qinfo
– interactive webpage: http://nagios2/qinfo/
-
Interactive jobs: run short operations that complete quickly while you wait, then check the results and perform another calculation if required
-
Submitted jobs: long-running jobs that do not require user intervention
-
Move to a compute node
-
srun --pty bash
-
Prompt change:
username@fmb376
-
There are options:
srun -c 8 --pty bash
-
Job runs without further user input
-
Write Bash script:
#!/bin/bash echo Sleeping! sleep 100
-
Execute script:
bash test.sh Sleeping
- Submit script to queue:
sbatch test.sh
-
More options:
sbatch -J test_job -c 2 --mail-type=ALL --mail-user=$USER@mrc-lmb.cam.ac.uk --mem=2G test.sh
<style scoped> table { font-size: 20px; } </style>
Command | Function |
---|---|
-J [jobname] | Specify an easily identifiable jobname |
-c [number of cores] | Number of cores on a node to reserve for the job [default: 1] |
--mem=[RAM]G | GB of RAM to reserve for the job [default: 5] |
--mail-type=ALL | Send email updates on job progress |
--mail-user=$USER@mrc-lmb.cam.ac.uk | Recipient’s email address |
-
sacct -j [job id]
-
To get the maximum memory usage:
sacct --format=jobID%20,CPUTime,MaxRSS -j [job id]
-
scancel [job id]
-
Import specific software versions
-
To list available module:
module avail
-
To use a module:
module load [module name]
-
Exit codes – 0 means success!
-
View images requires XQuartz (Mac) or VcXsrv (Windows)
-
Not so responsive – maybe transfer to local machine first?
-
More details: https://www.mrc-lmb.cam.ac.uk/scicomp
-
~
(home directory) - config files and scripts -
/cephfs
&/cephfs2
- very large data storage / suitable location for processing data. -
/scratch
- suitable location for processing data, BUT FILES ARE AUTOMATICALLY DELETED - DON'T STORE FILES HERE! -
/istore
or/isilon
- a place to store data
Refer to Scientific Computing for further information
-
~
(home directory) - config files and scripts -
Create a named folder in
/data1
,/data2
or/data3/scratch
to store data files -
Much smaller storage compacity as compared to the Cluster (terrabytes)
-
To/From the cluster To/From another machine via the intranet
-
scp user@host:[target_to_download] [destination_path]
-
scp [target_to_upload] user@host:[destination_path]
-
Perform a recursive copy for folders:
-r
-
To/From the cluster To/From another machine via the internet
-
Linux command line equivalent of FileZilla
-
sftp [hostname]
-
mget -r [files_to_download]
-
mput -r [files_to_download]
-
ftp [hostname]
-
bin
-
prompt
-
LMB FTP:
/ftp/pub/
-
LMB FTP:
ftp.mrc-lmb.cam.ac.uk
-
'anonymous' with no password
-
Interactive, but run in background (needed for SFTP)
-
screen -S [screen_name]
-
CTRL + A and then press D
-
screen -ls
-
screen -r [ID_number]
-
exit
- Microsoft Text editor
- Windows / Mac / Linux
- Edit remote files (even via atg)
- Built-in terminal
- View webpages
- Transfer files
- Free
-
Web interface
- Jupyter Notebook - create and share documents that contain live code, equations, plots and descriptive text
-
Not supported on the Cluster
-
Installed on the Cell Biology Workstation (Xeon)
-
We can set you up with an account
-
Course: https://github.com/StevenWingett/data-analysis-with-python-course
-
/public/genomics/soft/bin
-
Add to PATH?
-
software
group
-
Enable software and dependencies to be bundled into one file
-
most effective way to distribute versioned bioinformatics software
-
On the cluster, containers can only be run from:
/public/singularity/
. -
Add files to that folder:
singularity
group -
Also installed on Xeon, where containers can be run from any location
- NGS QC
- ATAC-seq
- ChIP-seq
- Cut and Run/Tag
- RNA-seq
- Single Cell RNA-seq (10x)
- Single Cell RNA-seq (Parse)
- Taxonomy Profiling
- NGS data downloading Data
-
Linux / Bash
-
Cell Biolgy Xeon Workstation
-
Compute cluster and its architecture
-
Slurm
-
R Studio Server / JupyterHub / Visual Studio Code
-
Find a reason to have a go in the coming weeks
-
Thanks for listening!!!