0% found this document useful (0 votes)
11 views

Python Workshop v4

Uploaded by

Ravikanth L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Python Workshop v4

Uploaded by

Ravikanth L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Three players when running a python script

python myscript.py
Python Python script imports
interpreter Python script other packages

• Python interpreter: software to compile/execute the script.

• Python script: script you wrote


• Python packages: python libraries called by script
Two different ways to run a python script

python myscript.py

If the script has a shebang* line, you can also run the script like this:

myscript.py

* Shebang line is the first line of a script to specify path of the interpreter, e.g. “#!/usr/bin/python3.9.6”;

** In Linux, file extension like “.py” is ignored. It is the “Shebang” line that defines the type of a script.
In Linux, it is the Shebang line that defines the script type.

In Windows, the file name


Python script: bamCoverage.py extension define the script type.
#!/usr/bin/python3.6

import deeptools.misc
In Linux, the Shebang line
if __name__ == "__main__":
args = None define the script type,
if len(sys.argv) == 1: whether it is a Python, R,
args = ["--help"] Perl, or shell script.
main(args)

Two different formats of Shebang line


#!/usr/bin/python3.6 Full path of the Python interpreter

#!/usr/bin/env python3 Default python3 on the system, as defined in $PATH.


Python interpreter

&

Python packages (libraries)


Which Python?
Multiple Python installations co-exist on the same computer. On BioHPC, we have v2.7.5, v2.7.15,
v3.6.7, v3.9.6. There are more versions of Python in Conda.

How to verify which Python is being used?

which python
python -V

Alternative ways to use a different version of Python.


• Shebang line #!/usr/bin/python3.9.6

• Add to PATH export PATH=/programs/python-3.9.6/bin:$PATH

• Linux Module module load python/3.9.6


Each Python has its own library directories, and
a companion “pip” for library installation
For example:

Python /usr/bin/python3.6
Alias (symbolic link): /usr/local/bin/python

Pip /usr/bin/pip3.6
Alias (symbolic link): /usr/local/bin/pip

Packages /usr/lib/python3.6/ & /usr/lib64/python3.6/

If you run “pip install”, you will get an error message “permission denied”. You need
to run “pip install --user” which would install python packages under your home
directory.
When running a script, Python looks for packages from three
different places, and following this order. The first found is used.

Directories defined in • Custom location, e.g. export


$PYTHONPATH
PYTHONPATH=/workdir/lib:$PYTHONPATH. This is independent
of which “python” or which version of “python” you use.

• If you run “pip install --user packageName”, the


$HOME/.local package are installed under $HOME/.local. This
is independent of which “python” you use, but
different for each python version.

sys.path
• Each python installation has its own unique sys.path.
e.g. /usr/lib/python3.6
Install python software with Pip
sys.path
pip install deepTools e.g. /usr/lib/python3.6
# you need write permission to the
sys.path.

pip install deepTools --user $HOME/.local


# packages are only accessible by the user

pip install deepTools --prefix=/workdir/$USER


/workdir/$USER
(Pip download software from PyPI) #when using this library, you need to
specified it in $PYTHONPATH
Some other features of pip

1. Install a specific version of a python package


pip install --user deepTools==3.5.1

2. Upgrade a package including its dependencies to latest

pip install --upgrade deepTools


Conda
• Online software repository (independent from PyPI);
What is Conda? • A package manager for software installation;
• An environment manager for running software;

Why Conda? /usr/bin


For executables

Traditionally, Linux software


For libraries
are installed into these /usr/lib
three directories
/etc For config files

Only a system admin can install software


into these directories.
Conda adds a directory where user can install software

/usr/bin
Python, pip and other
/usr/lib executables go here
bin
etc
lib
Python packages
/home/qs24/ go here
miniconda3 etc

Some config files


A regular user can install software into go here
these directories.
Conda envs directory is a collection of multiple environments

/usr/bin
Each software can have its
own isolated environment.
/usr/lib
bin
etc
lib
binPython3.6
$HOME/
etc
env_1 lib
miniconda3 etc
env_2
envs bin Python3.9
env_3 lib
etc
env_n
Each Conda environment has its own python, libraries
and companion pip
Install Conda
python packages in base
pip Install Conda
packages in a
Conda python environment
base pip
python
pip
Install softwere in Conda base vs Conda environment

Install under Conda base:


conda install -c bioconda deeptools

Create a Conda environment and install software:


conda create -c bioconda -n deeptools deeptools

Name of Conda channel. It is Name of the environment you Name of the Conda package.
the place where conda find will create. It can be any This name must exists in the
the package name. channel.
Activate/deactivate a Conda environment
Activate De-activate
#activate conda base
source ~/miniconda/bin/activate conda deactivate
#activate an environment
conda activate busco

or
#activate conda base
source ~/miniconda/bin/activate busco
During Conda installation, it tries to trick you to make conda activated by default. Don’t do
that!!! If you have already done that, disable it by modifying .bashrc file.
Within a conda environment, you can run either
“conda install” or “pip install”.

# create and activate an environment, which only has python in it


conda create -n myEnv python=3.9

conda activate myEnv

# install deeptools in the environment

conda install deeptools #installation through Anaconda repository


or
pip install deeptools #installation through Pypi repository
Compatibility of software versions within a Conda environment

When depositing a software, the developer When installing a software, Conda


provides an installation recipe package manager reads the recipe to
determine which version to download.
For example, the recipe for Deeptools:

run: • Check whether a package exists


- deeptoolsintervals >=0.1.8 in the current environment;
- matplotlib-base >=3.1.0
- numpy >=1.9.0
- plotly >=2.0.0 • Find a package available in the
- py2bit >=0.2.0 repository and compatible with
- pybigwig >=0.2.3 all software within the same
- pysam >=0.14.0 environment.
- python >=3
- scipy >=0.17.0
Conda as a package manager Mamba, an alternative to Conda
conda create -n deeptools deeptools package manager
Install mamba:
conda install mamba

Use mamba:

mamba install …

mamba create …

* Mamba is often much faster than


conda and more robust.
A few tips of using Conda

Sometimes, a little intervention is needed.


For example, when “biopython” was upgraded to 1.77, it was not compatible with
“hicexplorer”. In this case, you need to explicitly specify a lower biopython version.

conda install -c bioconda hicexplorer biopython=1.76


* Afterwards, hicexplorer developers noticed this problem and updated its recipe
to “<1.77”

You might need to update Conda software once in a while

conda update conda


Conda channels

conda install -c bioconda -c conda-forge deeptools

* conda-forge is more comprehensive, but less strictly managed.


Including conda-forge could take much longer to “solve packages”.
Troubleshooting Python
Step 1. verify which Python you are using
which python
Common errors:
1. You are using a wrong version of python;
For example, running Python2 script with Python3. You would see this
error message: SyntaxError: Missing parentheses in call to 'print’.
To fix:
module load python/2.7.15

2. A python module is missing, and you need to install it.


If you are using system Python
pip install --user theModuleName
If you are using Python in Conda
pip install theModuleName

3. You are using a wrong version of Python modules. You need to re-install the right version.
pip install theModuleName==3.12
* When running into version issue, it is better to do it within a Conda environment, to
avoid interference with other software.
If you installed the right version, but still got error message. You need to
verify which python module is actually being used
Python follows this order to find a library

echo $PYTHONPATH >>> import numpy


ls -l ~/.local/lib
>>> print numpy.__file__
/usr/lib64/python2.7/site-
packages/numpy/__init__.pyc
Under $HOME/.local, libraries for different
python versions are separated
>>> print numpy.__version__
1.14.3
* run these commands in “python” prompt
The most common error: you are in Conda base, but try to run a
software not installed through Conda

• System default
You are in Conda base, but try to run
• Conda base
a Python script installed by BioHPC
• Conda environment admin.

How to tell that you are in Conda? How to correct?


Edit the .bashrc file in your home directory.

Insert a line with the word


“return” before “conda
“(base)” initialize”. Then logout and
login again.
Jupyter Notebook
Three ways to run Python:
Python shell, Python script and Jupyter Notebook (Jupyter Lab)
Python shell

Python script (run in Linux shell)


python myscript.py or ./myscript.py
(#!shebang line ignored) (#!shebang line define which python interpreter to use)

Jupyter notebook (Jupyter Lab)


(https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=263#c )
Jupyter notebook runs Python through a web browser
http://cbsum1c2b010.biohpc.cornell.edu:8016/?token=72cc017561bd59ba4dab4a5604d7857c93dd8f68a45d520b
Client: your laptop Server: cbsum1c2b010.biohpc.cornell.edu
Putty (ssh) ssh cbsum1c2b010.biohpc.cornell.edu • Port 22
Browser (http)
ssh daemon • Protocol: ssh

http://cbsum1c2b010.biohpc.cornell.edu:8009
• Port 80
http daemon 1 • Protocol: http

http://cbsum1c2b010.biohpc.cornell.edu:8009 • Port 8009


http daemon 2 • Protocol: http
http: communication protocol

Cbsum1c2b010.biohpc.cornell.edu: server address

8009: port
• A ‘daemon’ is a software process that is continuously
running in a background, often listening to a port;
User 1 cbsum1c2b010
Cbsum1c2b010:8009
jupyter daemon 1
8009
ssh daemon

User 2 22

Cbsum1c2b010:8010
jupyter daemon 2

8010
rstudio daemon
8015
User 3

Cbsum1c2b010:8011 Jupyter daemon 3


8011

With ssh and rstudio, one daemon can


serve multiple users.
To start a Jupyter notebook daemon with default Python (v3.6)
It is important keep the server daemon
running in a persistent “screen” session
screen

export PYTHONPATH=/programs/jupyter3/lib/python3.6/site-
packages:/programs/jupyter3/lib64/python3.6/site-packages

export PATH=/programs/jupyter3/bin:$PATH

jupyter notebook --ip=0.0.0.0 --port=8017 --no-browser

You will be provided with a URL which you can open in a web browser:
http://cbsum1c2b010.biohpc.cornell.edu:8017/?token=dfe3b002ca2d7721c4a2c0c641de91645e74f59d6519e31b

How to use “screen”: https://biohpc.cornell.edu/lab/doc/Linux_exercise_part2.pdf


If you need a different version of Python,
install and run Jupyter with Conda or Docker

source ~/miniconda3/bin/activate #activate Conda

#create a Conda environment


conda create -n mypython3 python=3.8 “mypython3” with python v3.8

conda activate mypython3 #activate mypython3 environment

mamba install -c conda-forge notebook #install Jupyter Notebook. I


use “mamba” here as it is a lot
faster than Conda.
To run Jupyter installed in a Conda environment:

screen

source ~/miniconda3/bin/activate mypython3

jupyter notebook --ip=0.0.0.0 --port=8019 --no-browser

• On BioHPC, only ports between 8009-8039 are open to users;

• Check if a port is already being used: netstat -tulpn | grep 8019


In summary
Installing Python software Running Python software

Python software repository: Which python interpreter?


Pypi which python
Anaconda python -V
#check shebang line of the script
Installation package manager:
Pip
Which python package?
Conda or Mamba

Installation directory:
Pip: sys.path or ~/.local (--user option)

Conda: Conda base and Conda environment


Some afterthoughts

Why is it so complicated?
Because a server is shared by many people and many
applications. To work peacefully together, we have to follow
certain rules.

Maybe someday a computer is cheap enough, I


can have a dedicated computer for each job.
Not likely in the near future.
… But wait, we have something that is close enough, “Docker”
and “Singularity”.

You might also like