Article BioinformaticsNewToolsAndAppli

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Applied Microbiology and Biotechnology

https://doi.org/10.1007/s00253-020-11056-2

MINI-REVIEW

Bioinformatics: new tools and applications in life science


and personalized medicine
Iuliia Branco 1 & Altino Choupina 1

Received: 29 November 2020 / Revised: 29 November 2020 / Accepted: 9 December 2020


# The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021

Abstract
While we have a basic understanding of the functioning of the gene when coding sequences of specific proteins, we feel the lack
of information on the role that DNA has on specific diseases or functions of thousands of proteins that are produced.
Bioinformatics combines the methods used in the collection, storage, identification, analysis, and correlation of this huge and
complex information. All this work produces an “ocean” of information that can only be “sailed” with the help of computerized
methods. The goal is to provide scientists with the right means to explain normal biological processes, dysfunctions of these
processes which give rise to disease and approaches that allow the discovery of new medical cures. Recently, sequencing
platforms, a large scale of genomes and transcriptomes, have created new challenges not only to the genomics but especially
for bioinformatics. The intent of this article is to compile a list of tools and information resources used by scientists to treat
information from the massive sequencing of recent platforms to new generations and the applications of this information in
different areas of life sciences including medicine.

Key points
• Biological data mining
• Omic approaches
• From genotype to phenotype

Keywords Sequencing . Bioinformatics . Tools . Applications . Life science . Personalized medicine

Introduction Although we initially tried to create a complete profile of


all the available bioinformatics resources, quickly, it was ev-
The knowledge derived from genomic and computational ident dynamism and constant updating of this field surpassed
technologies increases in geometric progression. The under- this goal. We create a division into four categories: sequence
standing of this avalanche of data is closely linked to the analysis software, software prediction of protein structures,
formidable development in the bioinformatics area. By en- resource servers “online,” and finally left a list of places of
abling the overall assessment of this extraordinary amount of interest on the Internet that can shorten the search time. We
data, bioinformatics has considerably accelerated scientific opted for the selection of these categories because we believe
discoveries. This growth has as a consequence a large supply that analyze in a comprehensive way the molecular biology
of products, services, and information, so that keep up to date, central dogma.
locate, and use the latest innovations; it has become a full-time Bioinformatics, as a scientific area, gathering techniques
activity. and tools from the subjects: molecular biology, source of in-
formation to be analyzed; informatics or computer science,
provides the hardware for analysis and networks to
* Altino Choupina share the results; mathematics, the origin of the algo-
albracho@ipb.pt rithms used in the data analysis. The interrelationship of
the three areas creates the basis for bioinformatics ap-
1
Centro de Investigação de Montanha (CIMO), Instituto Politécnico plications in molecular biology, as can be seen in the
de Bragança, Campus de Santa Apolónia,
following diagram (Li et al. 2013) (Fig. 1).
5300-253 Bragança, Portugal
Appl Microbiol Biotechnol

Fig. 1 Relationship of biological “-omics” with bioinformatics (Li et al. 2013)

We know that there have been many reviews of bioinfor- These first genome projects were, in turn, a stimulus to the
matics tools published, but in this review, we want to present development of new and powerful platforms for sequencing
in a simple way the fundamental and most useful tools for the called next-generation sequencing (NGS) (Heather and Chain
life science researcher work, integrating several tools that can 2016).
cover all omics. Tools were used since the design of the ex-
periments, the obtaining of biological data, the deposit of
these data, and their mineralization in order to deduce pheno- Next-generation sequencing systems
types and their interrelationships and applications both in mo-
lecular cloning in pharmacy and medicine. So this mini- Next-generation sequencing (NGS) is a high-throughput
review will be useful both for biotechnology researchers and methodology that allows massive base-pair sequencing in
for agronomists, zootechnics, ecologists, pharmacists, and DNA or RNA samples. Making a large number of applica-
medicine agents (Martins et al. 2014, 2019). tions possible, including full sequencing of numerous ge-
Deduce the order of DNA sequences is essential for basic nomes, the study of gene expression profiles, the study of
biological research, with several important applications in bio- epigenetic changes, the study of mutations, and molecular
technology. The large capacity sequencing obtained with analysis, to make the future of personalized medicine possible
modern DNA sequencing technologies has been responsible (Goldman and Domschke 2014).
for the immense, extraordinary, and complete sequencing of NGS systems include multiple platforms, the so-called sec-
DNA sequences, or genomes, including the human genome. ond generation of sequencers, with different approaches and
The first sequencing method “Sanger sequencing technique” sequencing capabilities such as Life Sciences’ SOLiD/Ion
based on the selective incorporation of chain-terminating Torrent PGM, Illome’s Genome Analyzer/HiSeq
dideoxynucleotides (ddNTP’s) by DNA polymerase with cap- 2000/MiSeq, and Roche GS FLX Titanium/GS Junior. In
illary electrophoresis, automatic, was developed by Applied the third generation of sequencers, the most popular platform
Biosystems (Namely AB370). These automated tools, with is the single-molecule real-time (SMRT) sequencing is a
significant capacity sequencing, have been the main tool in parallelized single-molecule DNA sequencing method. Each
the sequencing of various genomes and the human genome. of the four DNA bases is attached to one of four different
Appl Microbiol Biotechnol

fluorescent dyes. When a nucleotide is incorporated by the Molecular Biology, which has come to be known as the
DNA polymerase, a detector detects the fluorescent signal of “Staden package”: “the package covers most of the standard
the nucleotide incorporation, and the base call is made accord- sequence analysis tasks such as restriction site searching,
ing to the corresponding fluorescence of the dye. Other se- translation, pattern searching, comparison, gene finding, and
quencing platforms already from the fourth generation of se- secondary structure prediction, and provides powerful tools
quencers, based on nanoporous, are developing with more for DNA sequence determination.”
data generation capabilities in less time and lower costs. This package contains the following programs:
Whichever platform you use, millions of data points are gen-
erated in hours, so getting data is no longer a problem, leading & Gap4 and Gap5: This program is the main tool of this
to a paradigm shift, where data processing, storage, and anal- package; it performs compilation, sequence merging,
ysis become the task most relevant task. It is at these points compilation rectification, reads sequence pairs, and allows
that bioinformatics, with its ability to analyze large amounts of editing them (Fig. 2);
data with diversified objectives, assumes its essential role, also & Pregap4: Allows the reception and analysis of the infor-
considering that each of the mentioned platforms incorporates mation from the sequencers constituting the information
a series of bioinformatics tools for processing the output data input port for this program package;
(Goldman and Domschke 2014; Kulski 2016). & Trev: Fast and effective, allows the visualization of se-
quences in ABI, ALF, or SRF formats;
& Trace diff: Automatically localizes mutation points by
Primary analysis of DNA sequences comparing the sequence understudy with reference se-
quences. It supports any number of sequences and allows
The primary analysis of DNA sequences is essential in the the visualization of results by gap4;
daily life of the biotechnology laboratory, in the detection of & Sip4: Compares sequence pairs in various ways, often
mutations and the establishment of phylogenies, elaboration displaying results graphically. It allows a comparison be-
of restriction maps to make cloning, and cassettes for silencing tween nucleotides between proteins and between proteins
genes in order to see their role in the cell metabolism. and nucleotides.
In the genome analysis software, several program packages & Nip4: Analyze nucleotide sequences to find genes, restric-
can be found, which accompany the entire process from re- tion sites; allows translation, etc.
ceiving the sequencer graphics to publishing the data in online
databases. These features, along with free access to aca-
demics, file compatibility, and their date of conception are
the main factors in the choices made. pDRAW32 (https://www.acaclone.com/)
We point out that many of the services provided by these
programs are also provided by some programs available on- A program to be used on the Windows platform, with a nice
line, the disadvantage that each query requires a network con- and intuitive interface, available for free on the Internet at the
nection, but with the advantage that these online resources are website (https://www.acaclone.com).
updated regularly. With this program, it is passive to perform various opera-
tions, such as annotations for the DNA understudy, cloning
Staden package (http://staden.sourceforge.net/) DNA, editing sequences, selecting restriction enzymes,
(Bonfield and Whitwham 2010; Rodger et al. 2003a, b) exporting graphics and text, calculating the optimal PCR tem-
perature, calculating homologies between two DNA frag-
The very complete program package for nucleotide sequence ments, and containing scientific aid files. Possibly one of the
analysis, free for students and researchers, allows requests via best programs for cloning strategies, extremely intuitive, so
mail or directly from the network. Staden is very powerful and easy-to-use, and produces very beautiful, simple, and com-
lends itself to automated processing of data; it is not very plete images.
intuitive as it requires some learning but it is certainly an
excellent work tool. GenBeans (http://www.genbeans.org/)
The Staden package was developed at the Medical
Research Council (MRC) Laboratory of Molecular Biology, GenBeans is an integrated stand-alone platform for bioinfor-
Cambridge, England, by Rodger Staden’s group. The package matics based on NetBeans (developed by Apache Software
was converted to open source in 2004, and several new ver- Foundation) open-source software. It focuses on molecular
sions have been released since. biology and provides a fully integrated toolbox in a rich,
The authors describe the current version of the sequence easy-to-use graphical interface for analyzing and visualizing
analysis package developed at the MRC Laboratory of sequences. Another interesting program based on NetBeans is
Appl Microbiol Biotechnol

Fig. 2 Gap Interface (Staden


Package Handbook) (Bonfield
and Whitwham 2010; Rodger
et al. 2003a, b)

geneinfinity (http://www.geneinfinity.org/) that we describe Sequencher (https://www.genecodes.com/)


in Table 1.
“Gene Codes Corporation is a privately-owned international
firm, which specializes in bioinformatics software for genetic
DNASTAR™ (https://www.dnastar.com/) sequence analysis. Its flagship software product, Sequencher,
is a sequencing software used throughout the world. Its
Another computer package whose utilization has been getting targeted use is by researchers at academic and government
big is the DNASTAR™; this package has programs with labs as well as for biotechnology and pharmaceutical compa-
which we can edit and compare sequences, deduce physico- nies for DNA sequence assembly.”
chemical characteristics, and do genetic constructions, restric- Sequencher is a simple but useful program that allows us
tion maps, etc. to:

– Analyze nucleic acid sequences in editing modes;


Serial Cloner (http://serialbasics.free.fr/Serial_Cloner. – Alignment with possible visualization of the various
html) chromatograms;
– Perform manipulations and restriction maps.
This program was developed at Institut Curie by Franck Perez.
Serial Cloner is designed to provide molecular biology soft- The latest release of Sequencher highlights Gene Codes’
ware for Macintosh and Windows users. It reads and writes goal of providing researchers with powerful, easy-to-use
DNA Strider compatible files and imports and exports files in DNA analysis software tools. Sequencher 5.3 adds RNA-
universal FASTA format. It consists of graphical display tools Seq analysis to its long list of DNA sequence analysis fea-
and simple interfaces that help you analyze and build in a very tures, as well as improvements to Sequencher Connections, its
intuitive way. newest architecture for DNA sequence analysis.
“The user interface is relatively simple to operate, in that it
is within the reach of any user with advanced biology knowl-
edge, who should be especially impressed with the huge FastPCR (https://primerdigital.com/fastpcr.html)
amount of options available. With Serial Cloner you can,
among other things, join DNA fragments obtained through FastPCR is an integrated tool for PCR primers or probe de-
PCR, manipulate the shRNA, or simply assemble fragments sign, in silico PCR, oligonucleotide assembly and analyzes
of different chains.” alignment and repeat searching developed by PrimerDigital
Appl Microbiol Biotechnol

Table 1 Tools for biological sequences characterization

Tool Description References/URL

BLAST: Basic Local Compares nucleotide or protein sequences to sequence databases and (Altschul et al. 1990; Lobo 2008)
Alignment Search calculates the statistical significance of matches
Tool
FASTA This tool provides sequence similarity searching against protein databases https://www.ebi.ac.uk/Tools/sss/fasta/
using the FASTA suite of programs (Madeira et al. 2019)
HMMER Sequence analysis identifies homologous protein or nucleotide sequences and https://www.ebi.ac.
performs sequence alignments uk/Tools/hmmer/search/phmmer (Potter
et al. 2018; Finn et al. 2011)
Clustal Omega A new multiple sequence alignment program that uses seeded guide trees and https://www.ebi.ac.uk/Tools/msa/clustalo/
HMM profile-profile techniques to generate multiple alignments (Madeira et al. 2019)
Sequerome Integrating the results of a BLAST sequence-alignment report with external https://www.bioinformatics.
research tools and servers that perform sequence manipulations org/sequerome/wiki/Main/HomePage
(Ganesan et al. 2005)
ProtParam Prediction of multiple physicochemical properties of proteins https://web.expasy.org/protparam/ (Gasteiger
et al. 2005)
JIGSAW To find genes and predict splicing locations in the DNA sequence selected (Allen and Salzberg 2005)
novoSNP Used to find the unique nucleotide variation in the DNA sequence (Weckx et al. 2005)
ORF Finder Searches for open-reading frames (ORFs) in the DNA sequence you enter and https://www.ncbi.nlm.nih.gov/orffinder/
verify predicted protein using SMART BLAST or regular BLASTP
GENEinfinity Informational resources, tools, and calculators to facilitate work at the bench http://www.geneinfinity.org/
and analysis of biological data
PPP Finds promoter regions and TFBS in prokaryotes and also finds sigma A http://bioinformatics.biol.rug.nl/
binding sites in the upstream region websoftware/ppp/ppp_start.php
Virtual Foorprint Designed to analyze transcription factor binding sites in whole bacterial (Münch et al. 2005)
genomes and their underlying regulatory networks
WebGeSTer Database of intrinsic transcription terminators http://pallab.serc.iisc.ernet.in/gester (Mitra
et al. 2011)
Genscan Used for predicting the locations and exon-intron structures of genes in https://bio.tools/genscan (Burge and Karlin
genomic sequences 1997)
Softberry Tools Animal, plant, and bacterial genomes annotation and RNA and proteins http://www.softberry.com/
structures prediction
GeneID Ab initio gene finding program used to predict genes along DNA sequences in (Parra et al. 2000)
a large set of organisms
SpliceView Tools for prediction and analysis of protein-coding gene structure http://bioinfo.itb.cnr.
it/~webgene/wwwspliceview.html
GeneBuilder Based on prediction of functional signals and coding regions by different http://bioinfo.itb.cnr.it/~webgene/genebuilder.
approaches in combination with similarity searches in proteins and EST html
GeneFinder To predict putative internal protein-coding exons in genomic DNA sequences http://rulai.cshl.org/tools/genefinder/
HCPolyA Tools for prediction and analysis of protein-coding gene structure http://bioinfo.itb.cnr.it/~webgene/wwwHC_
polya.html

(Kalendar et al. 2017a, b, c). PrimerDigital is a biotechnolo- for genetically related DNA sequences) or unique (specific
gy company specialized in high-quality primer, probe de- primers for each from genetically related DNA sequences),
sign service, and software development that delivers state- overlap extension PCR (OE-PCR) multi-fragments assem-
of-the-art PCR software. From the wide experience we bling cloning and Loop-mediated Isothermal Amplification
have in using FastPCR, we agree with the description of (LAMP); single primer PCR (design of PCR primers from
this software made by the company that we summarize: close located inverted repeat), automatically detecting SSR
“The FastPCR software is an integrated tools environment loci and direct PCR primer design, amino acid sequence
that provides comprehensive and professional facilities for degenerate PCR, Polymerase Chain Assembly (PCA), de-
designing any kind of PCR primers for standard, long- sign multiplexed of overlapping and non-overlapping DNA
distance, inverse, real-time PCR (TaqMan, LUX-primer, amplicons that tile across a region(s) of interest for
Molecular Beacon, Scorpion), multiplex PCR, Xtreme targeted next-generation sequencing (Molecular Tagging)
Chain Reaction (XCR), group-specific (universal primers and much more.”
Appl Microbiol Biotechnol

The design of the primer has to be done very rigorously to be established with the programs and methodologies that are
guarantee the future of the PCR project, errors in the design described below.
can be noticed only long after much effort, and money has
already been spent on the project. That is why it is recom-
mended to use a software that allows us in silico to previously
establish the conditions of reaction and design of the primers; Prediction of subcellular protein location
FastPCR is a free (free) software, friendly, and extremely ver-
satile to avoid spending time and money. The prediction of the subcellular location of proteins predicts
the fate of a protein in the cell, using computational methods
with the protein sequence.
Biological sequences characterization There are several publicly available software, using differ-
ent methods to predict the location of proteins (amino acid
Annotation is the process of characterizing genes and their composition, signal peptide composition, physical-chemical
biological products in a DNA sequence. This process had to composition, among others), which is a very important part
be automated because the number of genes is too large to be of the bioinformatics prediction of protein function and ge-
written down by hand. The annotation was made possible by nome annotation (Nielsen et al. 2019).
the fact that the genes have recognizable start and stop regions. Software used for protein location predictions can be
Sequence analysis refers to the study of different characteris- accessed via URL addresses as follows:
tics of molecules such as nucleic acids or proteins, which
guarantee their specific functions. In the first instance, the SignalP 3.0: http://www.cbs.dtu.dk/services/SignalP-3.0/
sequences of the molecules are deposited in public biological
databases (Mehmood et al. 2014). As written on the website “SignalP 3.0 server predicts the
Then, several tools can be used to predict their characteris- presence and location of signal peptide cleavage sites in amino
tics related to their function, structure, evolutionary history, or acid sequences from different organisms: Gram-positive pro-
identification of counterparts with high precision. These anal- karyotes, Gram-negative prokaryotes, and eukaryotes. The
yses are quite popular due to the many applications in science method incorporates a prediction of cleavage sites and a signal
biological factors, simplicity, and quantity of information peptide/non-signal peptide prediction based on a combination
about the gene/protein under study. Table 1 presents a list of of several artificial neural networks and hidden Markov
tools for the characterization of biological sequences models” (Bendtsen et al. 2004a, b).
(Mehmood et al. 2014).
An orthodox procedure to deduce genetic information con- CELLO2GO: http://cello.life.nctu.edu.tw/cello2go/
sists of obtaining fragments of genomic DNA by mechanical
fragmentation or with restriction endonucleases or of cDNA “Cello2go is a publicly available, web based system for
obtained with the enzyme reverse transcriptase from messen- screening various properties of a targeted protein and its sub-
ger RNA; these fragments can be cloned into cloning vectors cellular localization” (Yu et al. 2014).
for sequencing with the help of programs like pDRAW32
(https://www.acaclone.com/). The cloned sequences LocTree2: https://rostlab.org/services/loctree2/
obtained are separated from the vector sequences (which
served as a reference for the design of the sequencing “The method, LocTree2, predicts the location of all pro-
primers) with a simple but useful VecScreen program teins in all areas of life. Similar to the previous method,
(https://www.ncbi.nlm.nih.gov/tools/vecscreen/). The partial LocTree, incorporates a system of Support Vector Machines
sequences obtained by sequencing multiple clones can be organized hierarchically to mimic the mechanism of protein
assembled in programs such as Sequencher (https://www. trafficking in cells” (Goldberg et al. 2012).
genecodes.com/) to obtain larger contigs. The information
contained in these sequences can begin to be worked on in EuK-mPLoc 2.0: http://www.csbio.sjtu.edu.cn/bioinf/
order to search for open-reading frames (ORFs) through pro- euk-multi-2/
grams such as the ORF finder described in Table 1 (https://
www.ncbi.nlm.nih.gov/orffinder/). The homology of the Euk-mPLoc 2.0, Predicting subcellular localization of eu-
proteins deduced from the ORFs, with protein sequences karyotic proteins including those with multiple sites (Chou
deposited in the databases, will then be searched using and Shen 2010).
programs such as BLAST, FASTA, or CLUSTAL. Then,
the physical-chemical characteristics, the 3D structure, and ESLpred: http://www.imtech.res.in/raghava/eslpred/
the fate of proteins in the cell, as well as their function, will index.html
Appl Microbiol Biotechnol

ESLpred is a tool for predicting the subcellular localization comparison analysis is used usually to control the lack of
of proteins using support vector machines. The predictions are statistical independence between species. Phylogenetic tools
based on dipeptide and amino acid composition and physico- are usually used to test various hypotheses evolutionary, and
chemical properties (Horler et al. 2009). they are indispensable to functional genomics (Mehmood
et al. 2014; Khan et al. 2014).
Secre tomeP: http://www.cbs.dtu.dk/servic es/ MEGA-Molecular Evolutionary Genetics Analysis
SecretomeP/ (MEGA) is computer software for conducting statistical anal-
ysis of molecular evolution and for constructing phylogenetic
The SecretomeP 2.0 server produces ab initio predictions trees, very recommended for sequence alignment and phylog-
of non-classical, i.e., not signal peptide-triggered protein se- eny inference (https://www.megasoftware.net/). We share the
cretion. The method queries a large number of other feature opinion of Kumar et al. in 2016 about the Molecular
prediction servers to obtain information on various post- Evolutionary Genetics Analysis (MEGA) that MEGA in-
translational and localizational aspects of the protein, which cludes a large repertoire of programs for assembling sequence
are integrated into the final secretion prediction (Bendtsen alignments, inferring evolutionary trees, estimating genetic
et al. 2004a, b). distances and diversities, inferring ancestral sequences, com-
puting time trees, and testing selection. Over the last 25 years,
MEGA’s use in evolutionary analysis has been cited in over
Proteins characterization one hundred thousand studies in diverse biological fields
(Kumar et al. 2018).
After decoding the open-reading frame of a gene, a series of
bioinformatics tools can be used to characterize the deduced MOLPHY: Molecular phylogenetic analysis tool (https://
sequence of the protein. A search on the Expasy Proteomics sbgrid.org/software/titles/molphy).
Server website (http://expasy.org/tools) and a nucleotide PAML: Package of programs for phylogenetic analyses
sequence allows us to identify and characterize proteins; of DNA or protein sequences using maximum likelihood
identify motifs, patterns, and profiles; infer their stability, (https://bio.tools/paml).
cell location, or function; make predictions of secondary and PHYLIP: PHYLogeny Inference Package (PHYLIP).
tertiary structures; look for similar sequences deposited in One of the most useful and used free computational phy-
databases and compare them; and establish phylogenetic logenetic package of programs for inferring evolutionary
relationships. trees (phylogenies). The author is Joseph Felsenstein,
The detection of the physical-chemical characteristics of Professor at the University of Washington, Seattle. It con-
proteins can be carried out in PROSITE (http://prosite. sists of 35 programs that include methods for, distance
expasy.org/scanprosite/), in the neural network system of the matrix and maximum likelihood, including calculating
Pôle BioInformatique Lionnais/Network Protein Sequence statistical support for clades (bootstrapping) and consen-
Analysis or in the DiANNA 1.1 application (http://clavius. sus trees based on the following types of data: molecular
bc.edu/~clotelab/DiANNA/), for the prediction of post- sequences, gene frequencies, restriction sites and frag-
translational modifications on the Center of Biological ments, matrices from distance (http://evolution.genetics.
Sequence Analysis website (http://www.cbs.dtu.dk/services). washington.edu/phylip.html).
ProDom is a comprehensive database of protein domain Jalview: Program for multiple sequence alignment
families generated from the global comparison of all available editing (https://www.jalview.org/).
protein sequences. Recent improvements include the use of
three-dimensional (3D) information from the SCOP database,
a completely redesigned web interface (http://www.toulouse.
inra.fr/prodom.html). Biological sequence databases

Biological sequence databases are a vast collection of biolog-


Phylogenetic analysis ical information data, such as sequences of nucleotides, pro-
teins, and macromolecular structures. The information stored
Phylogenetic analyses are procedures used to rebuild relations in these databases has no only important for future applica-
evolutionary between a group with molecules and related or- tions but also serves as a tool for primary sequence analysis.
ganisms for prediction of certain characteristics of a molecule The submission and storage of this information to be freely
in which its functions are not known (Mehmood et al. 2014). available to the scientific community led to the development
The underlying principle of phylogeny is to group living or- of several bases worldwide. The bases of data contain varied
ganisms according to the degree of similarity. Phylogenetic information; therefore, they are classified as primary and
Appl Microbiol Biotechnol

secondary, through information stored. Primary databases are with experimentation (Lekamwasam and Liyanage 2013;
composed of derived information directly from basic scientific Chordia and Kumar 2018) (refer to Table 6).
research on sequencing. SWISSPROT, UniProt, GenBank,
and PDB are examples of primary databases. Secondary data-
bases contain information derived from the interpretation of Integrative bioinformatics modules
information stored in the database’s primary. SCOP, CATH,
PROSITE, and eMOTIF are examples of secondary databases As already mentioned, the amount of biological data grows
(Koonin and Galperin 2003) (Table 2). exponentially and these data are spread over infinity of public
and private repositories and are stored in different formats.
This makes it difficult to search for these data and carry out
Proteins structure prediction tools the analyses necessary to deduce new knowledge from the set
of deposited data. Integrative bioinformatics attempts to solve
Proteins are composed of polypeptides, which in turn are this problem by providing unified access to life science data.
polymers composed of amino acids that fold together creating The several directions which may lead to breaking the bot-
a three-dimensional structure (3D). Protein folding in its form tleneck of Integrative Bioinformatics are described by Chen
correctly is a prerequisite for any protein that can perform its et al. (2019) and include:
biological function; therefore, in order to understand the func-
tions of a specific protein, information is needed about their – “Integration of multiple biological data towards systems
three-dimensional structures, see Table 3. biology. Different omics data is reflecting different as-
pects of the biological problem. Often, to solve a prob-
lem, there are many different methods developed by
Molecular interactions many groups. These methods may perform differently,
some good, some bad. Combing with big data, and other
Proteins rarely perform their functions in isolation and there- approaches, artificial intelligence (AI) has been success-
fore interact with other molecules to run a particular process. fully applied in bioinformatics, especially in the field of
Understand how biomolecules interact with other molecules biomedical image analysis;
could be used in purification techniques as well as drug de- – Computing infrastructure development. Integrative
velopment. It is also essential to understand the interactions Bioinformatics in the big data era requires a more ad-
between molecules in order to elucidate the biological func- vanced IT environment. To facility the related computing
tions of a molecule. For example, interactions between pro- and visualization demands, both hardware (e.g. GPU)
teins have a key role in cellular activities such as signaling, and software (e.g. Tensor flow) are developing.
transport, metabolism, and various biochemical processes Supercomputers are used. Cloud services are provided
(Table 4). by more and more institutes and big companies.”

Many ready-made professional commercial bio-


Molecular dynamics simulations informational programs are presented by development com-
panies using modern sequencing technologies.
Biological activity is the result of molecular interactions. This Bioinformation groups usually prefer to use ready-made mod-
behavior of molecules can be studied with the use of bioinfor- ules and write scripts to bind data between them.
matics tools, usually referred to as simulation tools for molec- Therefore, in parallel, two ways for data analysis are ready-
ular dynamics. They aim to provide detailed information on made commercial products and scripts for linking different
the dynamics of processes that occur in biological systems ready-made mini-programs. Both ways are necessary.
(refer to Table 5). Therefore, the most important thing is the support and
updating of ready-made programs and modules. In the follow-
ing list, we present the most important modules for integrated
Medicines concession bioinformatics:

Before bioinformatics tools, scientists resorted to chemistry, – Uniprot UGene: Ugene is free bioinformatics software for
pharmacology, and clinical sciences to discover new com- multiple sequence alignment, genome sequencing data
pounds. Traditional processes are time-consuming and costly. analysis, and amino acid sequence visualization. Unipro
Bioinformatics came to facilitate this complex process and has UGENE is a multiplatform open-source software with the
a vital role in the discovery of new drugs and its design due to main goal of assisting molecular biologists without much
the quick analysis of molecules in a computer when compared expertise in bioinformatics to manage, analyze, and
Appl Microbiol Biotechnol

Table 2 Biological databases

Database Description References/URL

Nucleotide database
DNA Data Bank of Japan Member of the International Nucleotide https://www.ddbj.nig.ac.jp/index-e.html
(DDBJ) Sequence Databases (INSD) is one of the (Miyazaki et al. 2003)
largest sources of information regarding
nucleotide sequences.
European Nucleotide Archive European primary repository for nucleotide https://www.ebi.ac.uk/ena (Leinonen et al. 2011)
(ENA) sequences
GenBank Member of the International Nucleotide http://www.ncbi.nlm.nih.gov/genbank/
Sequence Databases (INSD) is one (Benson et al. 2010; NCBI Resource
of the largest sources of information Coordinators 2018)
regarding nucleotide sequences.
Rfam A collection of RNA families, each https://rfam.xfam.org/ (Kalvari et al. 2018)
represented by multiple alignment
sequences
Protein databases
Uniprot The UniProt Consortium is a collaboration https://www.uniprot.org/ (UniProt
between the European Bioinformatics Consortium 2008)
Institute (EBI), the Protein Information
Resource (PIR) and the Swiss Institute
of Bioinformatics (SIB). Containing the
manually annotated protein sequences
section SWISS PROT.
Protein Data Bank Database for biomolecule structures like https://www.rcsb.org/pdb/home/sitemap.do
proteins. Structures determined by (Kinjo et al. 2012)
experimentation.
Prosite A large collection of biologically meaningful https://prosite.expasy.org/ (Sigrist et al. 2012)
signatures that are described as patterns or
profiles.
Pfam Large collection of protein families, https://pfam.xfam.org/ (El-Gebali et al. 2019)
represented by multiple sequence alignments
and hidden Markov models (HMMs).
InterPro Provides functional analysis of proteins by https://www.ebi.ac.uk/interpro
classifying them into families and predicting
domains and important sites.
Proteomics Identification Contains information on functional https://www.ebi.ac.uk/pride/
Database (PRID) characterization and post-translation (Perez-Riverol et al. 2019)
modifications.
Genomic databases
Contains eukaryotic genomes, including https://www.ensembl.org/index.html
Ensembl human, rat and other vertebrates, and (Hunt et al. 2018; Spooner et al. 2018)
many tools to work and compare genomes.
PIR Integrated tool that supports genomic and https://proteininformationresource.org/
proteomic research. (Chen et al. 2011)

visualize their data. It provides visualization modules for genome alignments of different species” (Frazer et al.
biological objects such as annotated genome sequences, 2002). http://genome.lbl.gov/vista/index.shtml;
next-generation sequencing (NGS) assembly data, multi- – Qlucore: Qlucore Omics Explorer (QOE) is next-
ple sequence alignments, phylogenetic trees, and 3D generation bioinformatics software for research in the life
structures. Availability and implementation: UGENE bi- sciences. Qlucore Omics Explorer is built for fast and
naries are freely available for MS Windows, Linux, and easy analysis of many different types of data and a wide
Mac OS X. (Okonechnikov et al. 2012); range of application areas are supported:
– Vista: “Vista is a comprehensive suite of programs and
databases for comparative analysis of genomic se- With Qlucore Omics Explorer, you can examine and ana-
quences. There are two ways of using VISTA - you can lyze data from gene expression experiments, DNA methyla-
submit your own sequences and alignments for analysis tion data, proteomics data, and next-generation sequencing
(VISTA servers) or examine pre-computed whole- (NGS) data (https://www.qlucore.com/);
Appl Microbiol Biotechnol

Table 3 Tools for analyzing


protein structures and Tool Description References/URL
functionality
SWISS-MODEL A fully automated protein structure https://swissmodel.expasy.org/
homology-modeling server. ... the ExPASy (Waterhouse et al. 2018)
web server, or from the program DeepView
(Swiss Pdb-Viewer)
GOpenMol Tool for the visualization and analysis of https://www.softpedia.
molecular structures and their chemical com/get/Science-CAD/gOpenMol.
properties shtml
Cn3d Application for a web browser that allows you https://www.ncbi.nlm.nih.gov
to view 3-dimensional structures from /Structure/CN3D/cn3d.shtml (Wang
GenBank Entrez Structure database et al. 2000)
CATH Publicly available online resource that https://www.cathdb.info/ (Dawson et al.
provides information on the evolutionary 2017)
relationships of protein domains
RaptorX Prediction of protein structures http://raptorx.uchicago.edu/ (Källberg
et al. 2012)
JPRED Protein secondary structure prediction server http://www.compbio.dundee.ac.
uk/jpred/ (Drozdetskiy et al. 2015)
PHD Prediction of protein structures https://npsa-prabi.ibcp.fr/cgi-bin/npsa_
automat.pl?page=/NPSA/npsa_phd.
html (Rost et al. 1994)
HMMSTR Prediction of structural correlations in proteins http://www.bioinfo.rpi.
edu/~bystrc/hmmstr/server.php
(Bystroff and Shao 2002)
APSSP2 Predict the secondary structure of protein’s http://crdd.osdd.net/raghava/apssp2/
from their amino acid sequence (Raghava 2002)
MODELLER Prediction of 3D protein structures based on https://salilab.org/modeller/ (Webb and
comparative models Sali 2016)
Phyre and Prediction of protein structures on the web http://www.sbg.bio.ic.ac.
Phyre2 uk/~phyre2/html/page.cgi?id=index
(Kelley et al. 2015)
Prot pi Web application for calculating https://www.protpi.ch/
physicochemical parameters of proteins and
peptides

– CIBI: The CRCM’s Integrative BioInformatics (Cibi) is a that guarantees high-quality results and accurate conclu-
technological platform of the Centre de Recherche en sions. The vibrant visuals produced by the pipeline facil-
Cancérologie de Marseille. The Cibi platform offers a wide itate a better understanding of the complex and multidi-
range of expertise in bioinformatics (large-scale data mensional microbiome data (Buza et al. 2019;
integration, development of specific analysis) and develops – MIGenAS: Migenas is a versatile and extensible integrat-
state-of-the-art bioinformatics pipelines as NGS (next- ed bioinformatics toolkit for the analysis of biological
generation sequencing) data analysis and integration (Chip- sequences over the Internet. The toolkit is part of the
Seq, RNA-Seq, SC-RNA-Seq, variant analysis for research Max-Planck Integrated Gene Analysis System
and cancer diagnostics) (https://cibi.marseille.inserm.fr/); (MIGenAS) of the Max-Planck Society available at
– iMAP: iMAP is an integrated bioinformatics and visual- www.migenas.org (Rampp et al. 2006);
ization pipeline for microbiome data analysis. According – Methy-Pipe: Methy-Pipe is an integrated bioinformatics
to Buza et al., the iMAP tool wraps functionalities for pipeline for whole-genome bisulfite sequencing data
metadata profiling, quality control of reads, sequence pro- analysis. According to Jiang et al., Methy-Pipe uses
cessing and classification, and diversity analysis of oper- Burrow-Wheeler transform (BWT) algorithm to directly
ational taxonomic units. This pipeline is also capable of align bisulfite sequencing reads to a reference genome
generating web-based progress reports for enhancing an and implements a novel sliding window-based approach
approach referred to as review-as-you-go (RAYG). The with statistical methods for the identification of differen-
iMAP pipeline integrates several functionalities for better tially methylated regions (DMRs). Methy-Pipe is a useful
identification of microbial communities present in a given pipeline that can process whole-genome bisulfite se-
sample. The pipeline performs in-depth quality control quencing data in an efficient, accurate, and user-friendly
Appl Microbiol Biotechnol

Table 4 Tools for studying molecular interactions

Tool Description References/URL

SMART Provides varied information about the protein in question http://smart.embl-heidelberg.de/help/latest.shtml


AutoDock Predicts protein-ligand interactions http://autodock.scripps.edu/downloads/
autodock-registration/autodock-4-2-download-page/
HADDOCK Describes the interaction between protein-protein, https://haddock.science.uu.nl/
protein-DNA
BIND Biomolecular Interaction Network Database (Bader et al. 2003)
STRING Database of known protein interactions and prediction https://string-db.org/ (Szklarczyk et al. 2019)
MIMO Tool for molecular interactions http://www.mybiosoftware.com/mimo-1-0-
molecular-interaction-maps-overlap.html
(Di-Lena et al. 2013)
IntAct Provides a freely available, open-source database system https://www.ebi.ac.uk/intact/ (Orchard et al. 2014)
and analysis tools for molecular interaction data
PathBLAST Research of interactions between proteins http://www.pathblast.org/ (Ideker et al. 2004)

manner. Software and test dataset are available at http:// research institutes, and businesses. Biomatters create
sunlab.lihs.cuhk.edu.hk/methy-pipe/ (Jiang et al. 2014); powerful, integrated, and visually appealing bioinformat-
– IGV: The Integrative Genomics Viewer (IGV) is a ics solutions, with a strong emphasis on ease of use and
high-performance, easy-to-use, interactive tool for overall user experience (https://www.geneious.com/).
the visual exploration of genomic data. It supports
flexible integration of all the common types of ge-
nomic data and metadata, investigator-generated or
publicly available, loaded from local or cloud Trends and future of bioinformatics
sources (https://igv.org/);
– Bioconductor: Bioconductor provides tools for the analysis Bioinformatics has developed significantly from the develop-
and comprehension of high-throughput genomic data. ment and establishment of molecular cloning methodologies
Bioconductor uses the R statistical programming language and the automation of DNA sequencing methods. With the
and is open source and open development. It has two re- development and application of the new generation sequencing
leases each year, and an active user community. platforms, large-scale sequencing of genomes and
Bioconductor is also available as an AMI (Amazon transcriptomes began, which contributes to the development
Machine Image) and Docker images (https://www. of bioinformatics methodologies and tools at a level that went
bioconductor.org/); beyond academic centers and which includes medical biotech-
– Geneious: Geneious is a very useful and popular DNA, nology, gene therapy, agriculture biotechnology, animal bio-
RNA and protein sequence alignment, assembly and technology, environmental biotechnology, and forensic bio-
analysis software platform, integrating bioinformatic technology. Currently, bioinformatics has a great application
and molecular biology tools into a simple interface. in genomics, proteomics, metabolomics, transcriptomics, and
These tools are created by the company Biomatters, molecular phylogenomics. The development of biomarkers
headquartered in New Zealand with offices in the USA, for the creation of safer and more personalized drugs is leading
and users in 125 countries worldwide, yours solutions to more to greater development and use of bioinformatics. The
enhance productivity in more than 4000 universities, sequencing of personal genomes and metagenomics projects

Table 5 Molecular dynamics


simulation tools Tool Description References/URL

Abalone Program focused on molecular dynamics of http://www.biomolecular-modeling.


biopolymers. com/Abalone/index.html
Ascalaph Program for modeling for molecular design and http://www.biomolecular-modeling.
simulations com/Ascalaph/index.html
Amber Package of programs for molecular dynamics https://ambermd.org/ (Salomon-Ferrer
simulations of proteins and nucleic acids et al. 2013)
FoldX Provides quantitative estimates of molecular interactions http://foldxsuite.crg.eu/
Appl Microbiol Biotechnol

Table 6 Databases for target drugs

Database Description References/URL

Potential Drug Target Database Target drug database http://www.dddc.ac.cn/pdtd/ (Zhang et al. 2018)
(PDTD)
Drug Bank Online database containing information on drugs https://www.drugbank.ca/ (Wishart et al. 2018)
and drug targets
Therapeutic Target Database (TTD) Collection of therapeutic proteins http://db.idrblab.net/ttd/ (Yunxia et al. 2019)
TDR Target Database Identification and prioritization of molecular targets https://tdrtargets.org/ (Magariños et al. 2012)
for drug development, focusing on pathogens
responsible for neglected human diseases
MATADOR: Manually Annotated Resources for exploring drug-target relationships http://matador.embl.de/ (Günther et al. 2008)
Targets and Drugs Online
Resource
TB Drug Target Database Database specialized in drugs and target proteins https://www.bioinformatics.org/tbdtdb/
for tuberculosis (TB) treatment
DrugPort Structure information available in the PDB for http://www.ebi.ac.
molecules of drugs uk/thornton-srv/databases/drugport/
ChEMBL Database of bioactive molecules with drug-like properties https://www.ebi.ac.uk/chembl/ (Mendez et al. 2019)

will increase significantly in the coming years with the conse- Authors’ contributions I.B. and A.C. designed prepared the manuscript;
A.C. wrote the manuscript; and I.B. made the corrections including the
quent intervention of bioinformatics. We think that the future of
English text and confirmation of the bibliography and URLs.
bioinformatics will involve specialization in different areas that
go down more in scientific depth at the level of nanopores and Funding The authors are grateful to the Foundation for Science and
even the atom itself. Technology (FCT, Portugal) and FEDER under Programme PT2020 for
financial support to CIMO (UID/AGR/00690/2019).

Compliance with ethical standards


Conclusion
Conflict of interest The authors declare that they have no conflict of
Bioinformatics is a discipline relatively new that in recent interest.
years progressed very quickly. It is discipline that makes it
Human and animal rights No human participants or animals were in-
possible to test hypotheses virtually what allows you to have volved in this research.
better knowledge before proceeding with expensive studies.
Despite the development of the most many tools for analysis Informed consent This manuscript is original and submitted with the
genomics, proteomics, inference of structures, drug design, consent of all authors.
and simulations of molecular dynamics, none can be consid-
ered the “perfect” tool. Bioinformatics tools provide results
that are more accurate what allows reliable interpretations.
Perspectives in the field of bioinformatics include contri- References
butions to understanding the human genome, leading to
the discovery of new drugs and specific therapies. It is Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of
evidence for gene prediction. Bioinformatics 21:3596–3603
essential that bioinformatics and other disciplines move Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic
side by side to understand biological systems and the con- local alignment search tool. J Mol Biol 215:403–410
sequent development of human well-being. During the first Bader GD, Betel D, Hogue CW (2003) BIND: the biomolecular interac-
years of biotechnology, the most important was to obtain tion network database. Nucleic Acids Res 31(1):248–250. https://
doi.org/10.1093/nar/gkg056
biological data. With the development of the methods and
Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004a) Improved
techniques of the new generation of sequencing (NGS), prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795
the paradigm shifted to the ability to analyze such a large Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S (2004b)
amount of data resulting from the sequencing of genes Feature-based prediction of non-classical and leaderless protein se-
and genomes. However, with the development of bioinfor- cretion. Protein Eng Des Sel 17(4):349–356
matics tools in recent years, the most important is to know Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2010)
GenBank. Nucleic Acids Res 38(Database issue):D46–D51. https://
what we want in research, choose the appropriate tool, doi.org/10.1093/nar/gkp1024
work hard with it, and know-how to correctly interpret Bonfield JK, Whitwham A (2010) Gap5–editing the billion fragment
the results provided. sequence assembly. Bioinformatics 26(14):1699–1703
Appl Microbiol Biotechnol

Burge C, Karlin S (1997) Prediction of complete gene structures in hu- Horler RS, Butcher A, Papangelopoulos N, Ashton PD, Thomas GH
man genomic DNA. J Mol Biol 268:78–94 (2009) EchoLOCATION: an in silico analysis of the subcellular
Buza T, Tonui T, Stomeo F, Tiambo C, Katani R, Schilling M, Lyimo B, locations of Escherichia coli proteins and comparison with experi-
Gwakisa P, Cattadori IM, Buza J, Kapur V (2019) iMAP: an inte- mentally derived locations. Bioinformatics 25(2):163–166
grated bioinformatics and visualization pipeline for microbiome data Hunt SE, McLaren W, Gil L, Thormann A, Schuilenburg H, Sheppard D,
analysis. BMC Bioinformatics 20:374 Parton A, Armean IM, Trevanion SJ, Flicek P, Cunningham F
Bystroff C, Shao Y (2002) Fully automated ab initio protein structure (2018) Ensembl variation resources. Database Volume 2018
prediction using I-SITES, HMMSTR and ROSETTA. https://doi.org/10.1093/database/bay119
Bioinformatics 18(Suppl 1):S54–S61 Ideker T, Kelley, Shamir R, Karp R (2004) Identification of protein com-
Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R plexes by comparative analysis of yeast and bacterial protein inter-
(2011) Representative proteomes: a stable, scalable and unbiased action data Proceedings: RECOMB 2004, pp. 282-289; J Comput
proteome set for sequence analysis and functional annotation. Biol 12: 835–846, 2005
PLoS One 6(4):e18910 Jiang P, Sun K, Lun FMF, Guo AM, Wang H, Chan KCA, Rossa WK,
Chen M, Hofestädt R, Taubert J (2019) Integrative bioinformatics: histo- ChiuY M, Lo D, Sun H (2014) Methy-Pipe: an integrated bioinfor-
ry and future. Journal of Integrative Bioinformatics 16. https://doi. matics pipeline for whole genome bisulfite sequencing data analysis.
org/10.1515/jib-2019-2001 PLoS One 9(6):e100360. https://doi.org/10.1371/journal.pone.
Chordia N, Kumar A (2018) Bioinformatics in drug discovery. SF Protein 0100360
Sci J 1:1 Kalendar R, Khassenov B, Ramankulov Y, Samuilova O, Ivanov KI
Chou K, Shen H (2010) A new method for predicting the subcellular (2017a) FastPCR: an in silico tool for fast primer and probe design
localization of eukaryotic proteins with both single and multiple and advanced sequence analysis. Genomics 109:312–319
sites: Euk-mPLoc 2.0. PLoS One Kalendar R, Muterko A, Shamekova M, Zhambakin K (2017b) In silico
Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, PCR tools a fast primer, probe and advanced searching. Methods
Sillitoe I (2017) CATH: an expanded resource to predict protein Mol Biol 1620:1–31. https://doi.org/10.1007/978-1-4939-7060-5_1
function through structure and sequence. Nucleic Acids Res 45: Kalendar R, Tselykh T, Khassenov B, Ramanculov EM (2017c)
D289–D295. https://doi.org/10.1093/nar/gkw1098 Introduction on using the FastPCR software and the related Java
Di-Lena P, Wu G, Martelli PL, Casadio R, Nardini C (2013) MIMO: an web tools for PCR, in silico PCR, and oligonucleotide assembly
efficient tool for molecular interaction maps overlap. BMC and analysis. Methods Mol Biol 1620:33–64. https://doi.org/10.
Bioinformatics 14:159 10.1186/1471-2105-14-159. 10.1093/bioin- 1007/978-1-4939-7060-5_2
formatics/btn596 Källberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, Xu J (2012)
Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein Template-based protein structure modeling using the RaptorX web
secondary structure prediction server. Nucleic Acids Res 43(Web server. Nat Protoc 7:1511–152253
Server issue):W389–W394. https://doi.org/10.1093/nar/gkv332 Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD,
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Bateman A, Petrov AI (2018) Non-coding RNA analysis using the
Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer Rfam database. Curr Protoc Bioinformatics 62(1):e51. https://doi.
ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD org/10.1002/cpbi.51
(2019) The Pfam protein families database in 2019: nucleic acids Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015) The
res. https://doi.org/10.1093/nar/gky995 Phyre2 web portal for protein modeling, prediction and analysis. Nat
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive Protoc 10:845–858
sequence similarity searching. Nucleic Acids Res 39:W29–W37 Khan FA, Phillips CD, Baker RJ (2014) Timeframes of speciation, retic-
Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2002) VISTA: ulation, and hybridization in the bulldog bat explained through phy-
computational tools for comparative genomics. Nucleic Acids Res logenetic analyses of all genetic transmission elements. Syst Biol 63:
32(Web Server issue):W273–W279 96–110
Ganesan N, Bennett NF, Velauthapillai M, Pattabiraman N, Squier R, Kinjo AR, Suzuki H, Yamashita R, Ikegawa Y, Kudou T, Igarashi R,
Kalyanasundaram B (2005) Web-based interface facilitating Kengaku Y, Cho H, Standley MD, Nakagawa A, Nakamura H
sequence-to-structure analysis of BLAST alignment reports. (2012) Protein Data Bank Japan (PDBj): maintaining a structural
Biotechniques 39(186):188 data archive and resource description framework format. Nucleic
Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Acids Res 40:D453–D460
Bairoch A (2005) In: The proteomics protocols handbook. Protein Koonin EV, Galperin MY (2003) Sequence - evolution - function: com-
identification and analysis tools on the ExPASy server. Springer, pp putational approaches in comparative genomics. Chapter 3,
571-607 Information Sources for Genomics. Kluwer Academic, Boston.
Goldberg T, Hamp T, Rost B (2012) LocTree2 predicts localization for all https://www.ncbi.nlm.nih.gov/books/NBK20256/
domains of life. Bioinformatics 28(18):i458–i465. https://doi.org/ Kulski JK (2016) Next-generation sequencing – an overview of the his-
10.1093/bioinformatics/bts390 tory, tools, and “Omic” applications, next generation sequencing-
Goldman D, Domschke K (2014) Making sense of deep sequencing. Int J advances, applications and challenges. InTech
Neuropsychopharmacol 17(10):1717–1725. https://doi.org/10. Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X:
1017/S1461145714000789 molecular evolutionary genetics analysis across computing plat-
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, forms. Mol Biol Evol 35(6):1547–1549. https://doi.org/10.1093/
Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, Schneider R, Skoblo molbev/msy096
R, Russell RB, Bourne PE, Bork P, Preissner R (2008) SuperTarget Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, Cheng Y,
and Matador: resources for exploring drug-target relationships. Cleland I, Faruque N, Goodgame N, Gibson R, Hoad G, Jang M,
Nucleic Acids Res 36(Database issue):D919–D922. https://doi.org/ Pakseresht N, Plaister S, Radhakrishnan R, Reddy K, Sobhany S,
10.1093/nar/gkm862 Ten Hoopen P, Vaughan R, Zalunin V, Cochrane G (2011) The
Heather JM, Chain B (2016) The sequence of sequencers: the history of European nucleotide archive. Nucleic Acids Res 39:D28–D31
sequencing DNA. Genomics 107(1):1–8. https://doi.org/10.1016/j. Lekamwasam S, Liyanage C (2013) Editorial. Galle Medical Journal
ygeno.2015.11.003 18(1). https://doi.org/10.4038/gmj.v18i1.5520
Appl Microbiol Biotechnol

Li MW, Qi X, Ni M, Lam HM (2013) Silicon era of carbon-based life: Raghava GPS (2002) APSSP2 : a combination method for protein sec-
application of genomics and bioinformatics in crop stress research. ondary structure prediction based on neural network and example
Int J Mol Sci 14(6):11444–11483 based learning. CASP5. A-132
Lobo I (2008) Basic Local Alignment Search Tool (BLAST). Nature Rampp M, Soddemann T, Lederer H (2006) The MIGenAS integrated
Education 1(1):215 bioinformatics toolkit for web-based sequence analysis. Nucleic
Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar Acids Res 34:W15–W19. https://doi.org/10.1093/nar/gkl254
P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL- Resource Coordinators NCBI (2018) Database resources of the National
EBI search and sequence analysis tools APIs. Nucleic Acids Res Center for Biotechnology Information. Nucleic Acids Res 46(D1):
47(W1):W636–W641. https://doi.org/10.1093/nar/gkz268 D8–D13. https://doi.org/10.1093/nar/gkx1095
Magariños MP, Carmona SJ, Crowther GJ, Ralph SA, Roos DS, Rodger S, David PJ, James KB (2003a) Analysing sequences using the
Shanmugam D, Van Voorhis WC, Agüero F (2012) TDR Targets: Staden package and EMBOSS. In: Krawetz SA, Womble DD (eds)
a chemogenomics resource for neglected diseases. Nucleic Acids Introduction to Bioinformatics. A Theoretical and Practical
Res (Database issue):D1118–D1127. https://doi.org/10.1093/nar/ Approach. Human Press Inc., Totawa, p 07512
gkr1053 Rodger S, David PJ, James KB (2003b) Managing sequencing projects in
Martins IM, Matos M, Costa R, Silva F, Pascoal A, Estevinho LM, the GAP4 environment. In: Krawetz SA, Womble DD (eds)
Choupina AB (2014) Transglutaminases: recent achievements and Introduction to Bioinformatics. A Theoretical and Practical
new sources. Appl Microbiol Biotechnol 98:6957–6964 Approach. Human Press Inc., Totawa, p 07512
Martins IM, Meirinho S, Costa R, Cravador A, Choupina A (2019) Rost B, Sander C, Schneider R (1994) PHD–an automatic mail server for
Cloning, characterization, in vitro and in planta expression of a protein secondary structure prediction. Comput Appl Biosci 10:53–
necrosis-inducing Phytophthora protein 1 gene npp1 from 60
Phytophthora cinnamomi. Mol Biol Rep 46:6453–6462 Salomon-Ferrer R, Case DA, Walker RC (2013) An overview of the
Mehmood MA, Sehar U, Ahmad N (2014) Use of bioinformatics tools in Amber biomolecular simulation package. WIREs Comput Mol Sci
different spheres of life sciences. J Data Mining Genomics 3:198–210
Proteomics 5:158. https://doi.org/10.4172/2153-0602.1000158
Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A,
Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E,
Bougueleret L, Xenarios I (2012) New and continuing develop-
Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-
ments at PROSITE. Nucleic Acids Res 41:D344–D347. https://
Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez
doi.org/10.1093/nar/gks1067
M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A,
Spooner W, McLaren W, Slidel T, Finch DK, Butler R, Campbell J,
Leach AR (2019) ChEMBL: towards direct deposition of bioassay
Eghobamien L, Rider D, Kiefer CM, Robinson MJ, Hardman C,
data. Nucleic Acids Res 47(D1):D930–D940. https://doi.org/10.
Cunningham F, Vaughan T, Flicek P, Huntington CC (2018)
1093/nar/gky1075
Haplosaurus computes protein haplotypes for use in precision drug
Mitra A, Kesarwani AK, Pal D, Nagaraja V (2011) WebGeSTer DB—a
design. Nat Commun 9:4128. https://doi.org/10.1038/s41467-018-
transcription terminator database. Nucleic Acids Res 39:129–135
06542-1
Miyazaki S, Sugawara H, Gojobori T, Tateno Y (2003) DNA Data Bank
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J,
of Japan (DDBJ) in XML. Nucleic Acids Res 31:13–16
Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Mering
Münch R, Hiller K, Grote A, Scheer M, Klein J, Schobert M, Jahn D
CV (2019) STRING v11: protein-protein association networks with
(2005) Virtual footprint and PRODORIC: an integrative framework
increased coverage, supporting functional discovery in genome-
for regulon prediction in prokaryotes. Bioinformatics 21:4187–4189
wide experimental datasets. Nucleic Acids Res 47(D1):D607–
Nielsen H, Tsirigos KD, Brunak S, von Heijne G (2019) A brief history of
D613. https://doi.org/10.1093/nar/gky1131
protein sorting prediction. Protein J 38:200–216. https://doi.org/10.
1007/s10930-019-09838-3 UniProt Consortium (2008) The universal protein resource (UniProt).
Okonechnikov K, Golosova O, Fursov M, the UGENE team (2012) Nucleic Acids Res 36:D190–D195
Unipro UGENE: a unified bioinformaticstoolkit. Bioinformatics. Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH (2000) Cn3D:
28(8):11667. https://doi.org/10.1093/bioinformatics/bts091 sequence and structure views for Entrez. Trends Biochem Sci
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter 25(6):300–302
F, Campbell NH, Chavali G, Chen C, del-Toro N, Duesbury M, Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny
Dumousseau M, Galeota E, Hinz U, Iannuccelli M, Jagannathan R, Heer FT, de Beer TAP, Rempfer C, Bordoli L, Lepore R,
S, Jimenez R, Khadake J, Lagreid A, Licata L, Lovering RC, Schwede T (2018) SWISS-MODEL: homology modelling of pro-
Meldal B, Melidoni AN, Milagros M, Peluso D, Perfetto L, Porras tein structures and complexes. Nucleic Acids Res 46(W1):W296–
P, Raghunath A, Ricard-Blum S, Roechert B, Stutz A, Tognolli M, W303
van Roey K, Cesareni G, Hermjakob H (2014) The MIntAct project- Webb B, Sali A (2016) Comparative protein structure modeling using
IntAct as a common curation platform for 11 molecular interaction Modeller. Curr Protoc Bioinformatics 54:5.6.1–5.6.37 John Wiley,
databases. Nucleic Acids Res 42(D1):D358–D363. https://doi.org/ Sons, Inc.
10.1093/nar/gkt1115 Weckx S, Del-Favero J, Rademakers R, Claes L, Cruts M, De Jonghe P,
Parra G, Blanco E, Guigó R (2000) GeneID in Drosophila. Genome Res Van Broeckhoven C, De Rijk P (2005) novoSNP, a novel compu-
10(4):511–515. https://doi.org/10.1101/gr.10.4.511 tational tool for sequence variation discovery. Genome Res 15:436–
Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, 442
Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Pérez E, Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T,
Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y,
Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaíno Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon
JA (2019) The PRIDE database and related tools and resources in A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the
2019: improving support for quantification data. Nucleic Acids Res DrugBank database for 2018. Nucleic Acids Res. https://doi.org/10.
47(D1):D442–D450. https://doi.org/10.1093/nar/gky1106 1093/nar/gkx1037
Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD (2018) Yu CS, Cheng CW, Su WC, Chang KC, Huang SW, Hwang JK, Lu CH
HMMER web server: 2018 update. Nucleic Acids Res 46:W200– (2014) CELLO2GO: a web server for protein subCELlular
W204 LOcalization prediction with functional gene ontology annotation.
Appl Microbiol Biotechnol

PLoS One 9(6):e99368. https://doi.org/10.1371/journal.pone. Zhang S, Zhang L, Wang Y, Liao M, Bi S, Xie Z, Ho C, Wan X (2018)
0099368 TBC2target: a resource of predicted target genes of tea bioactive
Yunxia W, Song Z, Fengcheng L, Ying Z, Ying Z, Zhengwen W, compounds. Front Plant Sci 9:211. Published 2018 Feb 22. https://
Runyuan Z, Jiang Z, Yuxiang R, Ying T, Chu Q (2019) doi.org/10.3389/fpls.2018.00211
Therapeutic target database 2020: enriched resource for facilitating
research and early development of targeted therapeutics. Nucleic Publisher’s note Springer Nature remains neutral with regard to jurisdic-
Acids Res 48(D1):D1031–D1041. https://doi.org/10.1093/nar/ tional claims in published maps and institutional affiliations.
gkz981 ISSN 1362-4962

You might also like