BPGA User Manual
BPGA User Manual
BPGA User Manual
User Manual
Developed by: Narendrakumar M. Chaudhari1, Vinod Kumar Gupta1 & Chitra Dutta*
Structural Biology & Bioinformatics Division, CSIR- Indian Institute of
Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India
1
NMC & VKG contributed equally to this work.
*
Corresponding author
Availability:
http://sourceforge.net/projects/bpgatool/
http://www.iicb.res.in/bpga/index.html
Installation
Installation of BPGA is simple.
1. Download the installer for Windows or LINUX system from our sourceforge page:
http://sourceforge.net/projects/bpgatool/
2. Run the installer as Administrator. It will extract files to a folder locally.
Executables are present inside bin folder.
3. BPGA is written in perl but bundled in an executable; hence no modules are
needed to be installed.
(For using BPGA as perl code, user should install required perl modules
manually)
Win32::GUI (For Windows only)
Prima (For Linux only)
Term::ANSIScreen
File::chdir
File::Remove
Sort::Fields
Statistics::Basic
Statistics::Descriptive
Bio::Phylo
Data::Dumper
Other requirements
Installation of gnuplot (4.6.6) is must for plotting graphs. You can
download Windows 32-bit version from here and 64-bit version from here.
Linux users can install gnuplot from terminal by command:
sudo apt-get install gnuplot-4.6.6
Installation of ps2pdf (ghostscript ) is must for proper plotting by
command:
sudo apt-get install ghostscript
BPGA uses USEARCH as a default clustering tool. Users need to get
their own licensed Windows/Linux version freely available at:
http://www.drive5.com/usearch/download.html,
Note: 32 bit Version is freely available. It also works on 64 bit system.
rename it to usearch.exe (for Windows) or usearch (for Linux) and copy it
inside the bin folder.
Note: For USERACH to work properly on Windows, please check the
required vcomp100.dll system file inside Windows\System32 or
System64 folder of your computer. If not, put it in this place. It is
available at: http://www.drive5.com/usearch/manual/vcomp100.html
.
MUSCLE is used for alignments and tree generation. It is provided with
the package.
rsvg-convert.exe is required to handle SVG image data. It is also
provided with the BPGA package.
This is not available for Linux system. To run rsvg-convert.exe on Linux
system install wine by command:
sudo add-apt-repository ppa:ubuntu-wine/ppa -y && sudo
apt-get update && sudo apt-get install wine
Note: BPGA treats separate files as separate organism. If there are multiple files
(chromosomes) for an organism, user should concatenate all files into a single file
for that organism (applicable for all formats).
All the files for a particular dataset should maintain uniform formats. User cannot
use genebank files for some organisms and FASTA files for others. But user can
use protein FASTA files of any type together (Using Any Protein FASTA File
option of Input Preparation step)
Please note that, user must use input file (INPUT_all.faa generated by
Option-1) for clustering with CD-HIT (online server/offline package) or with
OrthoMCL pipeline with desired options.
While clustering with USEARCH, identity cut off can be set by the user (see
picture below).
Option-3 (ONE CLICK MODE): Allows user to perform all the analyses in
single step using all default parameters :
Clustering: USEARCH (Identity cut off = 50%)
No. of combinations: 30 for less than 20 genomes and 20 for 20-50
genomes.
Atypical GC Content Analysis: 2 × δ (Standard Deviation)
Type of phylogeny tree: Neighbor Joining Tree (NJ).
KEGG/COG Functional analysis: will be performed if dataset
contains less than 50 genomes.
Subset Analysis: NA
In the next step, after completion of DEFAULT PAN GENOME ANALYSIS,
ADVANCED ANALYSIS OPTIONS will be available.
User may perform any of the 5 analyses one by one. Completion status of each
analysis will be displayed in brackets ( _NOT DONE_ or _DONE_ ).
After completing desired analyses, user should exit by typing ‘exit’’ and then
closing the terminal.
For MLST analysis, BPGA will halt and allow the user to make required changes.
For MLST analysis, on user defined housekeeping genes Check "mlst_core.txt"
for gene details and copy the respective cluster IDs from the first column to
"CORE_MLST.txt". If you Press enter to continue without this step, 20 random
clusters will be used for core phylogeny.
Results
Input preparation option will give input files (INPUT_all.faa, INPUT_all.ffn)
necessary for clustering and dataset file (DATASET.xls) containing organism
details. A file list, required for further analysis is also generated.
Additional instructions
For subset analysis user must create a text file having information about groups to
be created. Here is the example,
Organism ID as
per dataset list
Group 1 1 2 3 4
Group 2 6 7 8 9 13 15
Group 3 5 10 11 12 14
Here, rows represent groups. Each number represents a genome (refer list file
created during preparation). Blue colored labels are just for representation
purpose. Actual file should contain only tab delimited values. Maximum 10 groups
can be formed. There should be no repeats or wrong id.
Accepted file formats: