Licence This manual is © 2025, Steven Wingett
This manual is distributed under the creative commons Attribution-Non-Commercial-Share Alike 2.0 licence. This means that you are free:
• to copy, distribute, display, and perform the work
• to make derivative works
Under the following conditions:
• Attribution. You must give the original author credit.
• Non-Commercial. You may not use this work for commercial purposes.
• Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one.
Please note that:
• For any reuse or distribution, you must make clear to others the licence terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. • Nothing in this license impairs or restricts the author's moral rights.
Full details of this licence can be found at http://creativecommons.org/licenses/by-nc-sa/2.0/uk/legalcode
-
Connect to the LMB cluster (you should already be connected to the LMB intranet) using a Mac terminal or Putty for Windows.
-
What message do you see when you log in?
-
Is your username displayed when you log in?
-
Connect to the cluster using FileZilla.
-
Using FileZilla, create a folder in your cluster home directory named
LMB_Cluster_Course_Exercises
. Note: your home directory will be your username (e.g./lmb/home/jsmith
). -
Copy the data files given to you at the start of this course into the directory on the cluster you just created.
-
Change your working directory to
LMB_Cluster_Course_Exercises
(which should already be in your home directory). -
Confirm your new working directory with
pwd
. -
Create a folder in here named
Exercise2
. -
Rename this directory to
Exercise_2
.
-
Go into the folder
Exercise_2
. -
Type
ls
to confirm the folder is empty. -
We’ve not introduced the
touch
command so far, so let’s try it now. Enter in the command line:touch file1.txt
Has anything happened?
-
What is the size of
file1.txt
? (You will need to use a command to do this.) ` -
Make a copy of
file1.txt
namedfile2.txt
. -
Are the file sizes the same?
-
Delete
file1.txt
.
-
Go “up a level” in the filesystem hierarchy (when you have done this you should be able to see the folder
Exercise_2
when you run the commandls
). -
Delete the folder
Exercise_2
with the commandrmdir
. -
Did that work? If not, why not, and find a way to delete
Exercise_2
(look in the manual about recursive deletions). -
Display the recent Bash commands you have entered.
Only perform this step if you were given the data files in the form of tar
archive.
Let’s open the archive file we copied from your local machine to the cluster:
tar xvzfp [course_title].tar.gz
(Use the actual filename when running this command i.e. not "[course_title]".)
Apparently, certain web browser setting cause the archive to unzipped upon downloading. If your archive on the cluster ends with the file extension .tar
instead of .tar.gz
, then execute the following command:
tar xvfp [course_title].tar
(The tar
command is useful, as it allows multiple files and the associated file hierarchy to be stored within a single archive file. You don’t need to understand this command at the moment.)
Explore the MAZE
folder. Use cd
to move around the maze and ls
to check what is in each room. Can you find the treasure?
-
In the folder
Exercise_3
you will find a file entitledpoem.txt
. Write out the contents to the screen using the commandcat
. -
Now run a command so you can scroll through the text one page at a time.
-
Write out the first 7 lines of the poem to the screen.
-
Write out the bottom 3 lines of poem to the screen.
-
Add the line “The Waste Land by T. S. Elliot” to the start of the file. And a blank line below this. Save the file.
-
Compress the file. Notice whether the file size changed after compression. Did the filename change?
-
Read the compressed file with
cat
and thenzcat
. What happens in each case?
-
What does the command
date
do? -
Try this command and redirect the output to a file named
date.txt
. -
Now try the command again, but this append the output to
date.txt
. -
View the contents of date.txt.
-
What is the result of adding the flag
--version
todate
?
-
Look at the file named
uk_counties.csv
. Using thegrep
command, create a new file namedscotland_counties.csv
that contains only Scottish counties. -
Using the
grep
command, create a new file namedother_counties.csv
that contains all counties except Scottish counties. We advise reading the manual ongrep
to assist with this. (Hint: this is inverting the grep matching.) -
Using the commands
head
andtail
and also using a pipe, create a file namedsubset_counties.csv
that contains the counties on lines 27-39 of the original file.
Using a single wildcard, create symbolic links to files in the files_list
folder that meet the following criteria:
- end with the file extension
.tsv
. - start with
B
orC
and end with the file extension.txt
.
The links should be generate outside the file_list folder, in separate folders named TSV_links
and TXT_links
.
-
View the contents of the file
add_name1.txt
and then edit the file by adding your name to the end of the file. However, don’t do this with a text editor (such asnano
), but instead use a single Bash command which contains a re-direct. -
Repeat 1., but this time edit the file
/usr/bin/who
. Where you able to do this? If not, why not? Maybe checking the file permissions will clarify the situation? -
To what groups do you belong (the name of the required Linux command is quite intuitive)? To what groups does the person who is running the course belong? If you can't work out how to do this, then look in the
man
pages.
- Write a single-line Bash command that takes the contents of the file letters.txt, sorts them alphabetically and then writes them to a new file named
sorted.txt
.
- Write a Bash command to download the file: https://raw.githubusercontent.com/StevenWingett/Bioinformatics_Computer_Cluster_Course/refs/heads/main/README.md
(Hint:
curl
can name the downloaded file the same as the file on the remote server – look in the Linux man pages to find the relevantcurl
flag.)
-
Print the contents of the
$USER
variable to the screen. Look familiar? -
List your running processes with
ps
. Then look at all jobs running on your current node withtop
. Then look at ONLY YOUR jobs running on your current node usingtop
(look in theman
pages fortop
, there is a flag that enables users to do this). -
Where is the
curl
program found on your system? Check this location is indeed in the$PATH
variable. -
Use the
sleep
command to suspend execution on your system for 10 seconds. -
Try the
sleep
command again, but end the job once it has started. -
Execute the sleep command for 60s, but this time background the job. Can you see the running sleep command with
ps
andtop
? -
Try again, but set the sleep to 100s. Suspend the command. Can you see the suspended command with
ps
? Nowkill
the sleep command.
Look at the available modules for the latest version of R. Import this latest version of R as a cluster module. Check this version of R is indeed now running on your system using the command: Rscript --version
. [Note: to run R code already saved to a file, you need to execute the Rscript
command.]
-
Look at all the currently running jobs submitted to the cluster. Can you see any long-running jobs? Are CPU / GPU nodes being used? Who has most jobs running on the cluster?
-
Look at the
sqsummary
command. What percentage of CPUs are currently available? -
Look at the
sinfo
command. -
Look at
qinfo
webpage: http://nagios2/qinfo/
Are most of the CPU nodes in use? Are any nodes listed as “down”? Which user is using the most CPU nodes?
-
Log in to a compute node, try some commands and then exit the compute node.
-
Log in to a compute node, but this time reserve 4 cores and 5GB RAM.
Let's run the R script norm_dist_1_billion.R
by submitting the job to the cluster as a non-interactive job.
This R script randomly generates 1 billion data values from a Normal Distribution. The results are then plotted as a histogram.
Firstly, make a bash script named norm_dist.sh
which contains the command:
Rscript norm_dist_1_billion.R
.
(Don't forget the link to the Bash Shell at the top of the file!)
Now run the job, allocating 1 core and 1GB RAM. Make sure the cluster emails you about the job’s progress.
Did the job succeed? If not, try again but increase the RAM allocation to 30Gb.
Check how much RAM was actually used by this job.