Python Crash Course
Python Crash Course
Python
programming
Presented by:
Shailender Nagpal, Al Ritacco
Research Computing
UMASS Medical School
Information Services, 09/17/2012
AGENDA
Python Basics: Lists, Tuples, Expressions, Printing
Built-in functions, Blocks, Branching, Loops
Hash arrays, String and Array operations
File reading and writing
Writing custom functions
Regular expressions: Find/replace/count
Providing input to programs
Using Python scripts with the LSF cluster
2 Information Services, 00/00/2010
What is Python?
• Python is a high-level, general-purpose, interpreted,
interactive programming language
• Provides a simple iterative, top-down, left to right
programming environment for users to create small,
and large programs
• Some functions
x.strip()
x.lstrip()
x.rstrip()
x.split("\t")
x.count("p")
if x.startswith('#'):
if x.endswith('a'):
if x.isalpha():
if x.islower():
if x.isupper():
Iterating over Lists with "in"
• Ok, so we have these lists, but how do we work with
each element automatically?
– How can we iterate over them and perform the same
operation to each element?
• We use looping logic to work with the arrays
• We use Python's "for", more specifically foreach
for named_item in array:
named_item = <some expression>
Output:
Nucleotide is: adenine
Nucleotide is: cytosine
Nucleotide is: guanine
Nucleotide is: thymine
Nucleotide is: uracil
for i in range(1,5):
fib = fibonacci(i)
print "fibonacci(%d) is %s\n" % (i,fib)
• Example Output:
fibonacci(1) is 1
fibonacci(2) is 1
fibonacci(3) is 2
fibonacci(4) is 3
fibonacci(5) is 5
67
How to submit a "job"
• The basic syntax is:
bsub <valid linux command>
• bsub: LSF command for submitting a job
• Lets say user wants to execute a Python script.
On a linux PC, the command is
Python countDNA.py
• To submit a job to do the work, do
bsub Python countDNA.py
68
Specifying more "job" options
• Jobs can be marked with options for better job
tracking and resource management
– Job should be submitted with parameters such as queue
name, estimated runtime, job name, memory required,
output and error files, etc.
• These can be passed on in the bsub command
bsub –q short –W 1:00 –R rusage[mem=2048] –J
"Myjob" –o hpc.out –e hpc.err Python countDNA.py
69
Job submission "options"
Option flag or Description
name
-q Name of queue to use. On our systems, possible values are "short"
(<=4 hrs execution time), "long" and "interactive"
-W Allocation of node time. Specify hours and minutes as HH:MM
-J Job name. Eg "Myjob"
70
Why use the correct queue?
• Match requirements to resources
• Jobs dispatch quicker
• Better for entire cluster
• Help GHPCC staff determine when new resources are
needed
71
Questions?
• How can we help further?
• Please check out books we recommend as
well as web references (next 2 slides)