Module_2_Reference Course content
Module_2_Reference Course content
Module02
MITBIO/MITADT University
Syllabus:
Module 2:
Biological Database and its Types
MITBIO/MITADT University
Objective/Learning Outcome:
Discuss about the basics of gene expression and understanding the difference between pattern finding
CO5
and regular expression
CO6 Deduce the evolutionary relationships between the sequences by generating a phylogenetic tree.
MITBIO/MITADT University
Databases
5
Sanket Bapat
Database
• Collection of data.
• 3 types of databases:
1. Flat file format.
2. Relational database.
3. Object oriented database.
6
Sanket Bapat
Difference between Primary and Secondary
db.
Sanket Bapat
Biological Databases
• GenBank:
8
Sanket Bapat
• DDBJ:
• EMBL:
9
Sanket Bapat
Ensembl
• Contains all the human genome DNA
sequences currently available in the public
domain.
• Automated annotation: by using different
software tools, features are identified in the
DNA sequences:
• Genes (known or predicted)
• Single nucleotide polymorphisms (SNPs)
• Repeats
• Homologies
• Created and maintained by the EBI and the
Sanger Center (UK)
• www.ensembl.org
Sanket Bapat
Swiss-Prot
• Annotated protein sequence database established in
1986 and maintained collaboratively since 1987, by the
Department of Medical Biochemistry of the University
of Geneva and EBI
• Complete, Curated, Non-redundant and cross-
referenced with 34 other databases
• Highly cross-referenced
• Available from a variety of servers and through
sequence analysis software tools
• More than 8,000 different species
• First 20 species represent about 42% of all sequences
in the database
Sanket Bapat
Protein DataBank (PDB)
• Important in solving real problems in
molecular biology
• Protein Databank
• PDB Established in 1972 at Brookhaven National
Laboratory (BNL)
• Sole international repository of macromolecular
structure data
• Moved to Research Collaboratory
for Structural Bioinformatics
http://www.rcsb.org/
Sanket Bapat
TrEMBL (Translation of EMBL)
• Computer-annotated supplement to SWISS-
PROT, as it is impossible to cope with the flow
of data…
• Well-structure SWISS-PROT-like resource
• Derived from automated EMBL CDS
translation maintained at the EBI, UK.
• TrEMBL is automatically generated and
annotated using software tools (incompatible
with the SWISS-PROT in terms of quality)
• TrEMBL contains all what is not yet in SWISS-
PROT
Sanket Bapat
Databases in Bioinformatics
Sequence databases: GenBank, UniProt
14
Sanket Bapat
Pitfalls of Biological Databases
15
Sanket Bapat
Disclaimer:
MITBIO/MITADT University
References:
MITBIO/MITADT University
Interesting Links:
MITBIO/MITADT University
The content is intended for internal use only, and the ownership belongs to the coordinator. It
should not be uploaded on any platform without proper authorization.