0% found this document useful (0 votes)
44 views39 pages

Lab 1

The document describes a lab report for a bioinformatics lab conducted by a student. The lab report includes sections for an abstract, introduction, objectives, theory, materials and methods, results and discussion, conclusion, and references. The objectives of the bioinformatics lab were to provide hands-on experience using biological databases, explore databases that provide biological information, learn how to obtain DNA sequences and translate them into amino acids, and learn basic and advanced database search techniques.

Uploaded by

Nurul Azera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views39 pages

Lab 1

The document describes a lab report for a bioinformatics lab conducted by a student. The lab report includes sections for an abstract, introduction, objectives, theory, materials and methods, results and discussion, conclusion, and references. The objectives of the bioinformatics lab were to provide hands-on experience using biological databases, explore databases that provide biological information, learn how to obtain DNA sequences and translate them into amino acids, and learn basic and advanced database search techniques.

Uploaded by

Nurul Azera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

FACULTY OF CHEMICAL ENGINEERING

BIOINFORMATICS LABORATORY (CBE 647)

NAME& MATRIC NUM : NURUL AZERA BT RASHID (2014458562)


GROUP : EH2427A
LAB NO./TITLE OF : LAB REPORT 1
EXPERIMENT
DATE PERFORMED : 12TH OCTOBER 2017
SEMESTER : 7
PROGRAMME : EH242/CHEMICAL AND BIOPROCESS
ENGINEERING
LECTURER : DR TAN HUEY LING

No Content Allocate Marks Marks Obtained


1 Abstract
2 Introduction
3 Objectives
4 Theory
5 Materials and Methodology
6 Result and Discussion
7 Conclusion
8 Reference
9 Appendix
Total Marks 100

Remarks:

Checked by: Date: 6th November 2017

…………………………
(DR TAN HUEY LING

1
TABLE OF CONTENT

NO TITLE PAGE

1. Abstract 3

2. Introduction 4-5

3. Objectives 5

4. Theory 6-8

5. Materials and Methodology 9-11

6. Result and Discussion 11-36

7. Conclusion 37

8. Reference 38

9. Appendix 38

2
1.0 Abstract

Those amazing volume for atomic information What's more its enigmatic and unpretentious
examples need prompted an outright prerequisite to electronic databases What's more
Investigation instruments in bioinformatics the place the principle keep tabs is on new
developments done genome bioinformatics and computational science. Through this lab
activities, distinctive open living databases camwood make investigated through NAR website
which provide for short demonstration and data for the databases fancied for example, such
that OMIM, KEGG, GO, UniProtKB What's more a lot of people more. Moreover, the
provision from claiming NCBI Entrez Also seeking living databases for any genes
extraordinarily help those totally provision of diverse hunt queries for example, Gene,
nucleotide Furthermore Protein. Starting with there, the majority of the data of the genes for
example, chromosome location, number of amino acids Furthermore actually the creators of
the journals might be known. Furthermore that, this lab exercise point will prepare clients with
those determination from claiming genes through the open perusing outline (ORF) that might
interpret codons (triplets) under amino arrangement naturally. Other than that, this lab laid
open clients of the requisition of ExPasy Likewise an interpretation apparatus that interpret
codon arrangement under proteins. Impact also is presented on quest NCBI protein database
for matching the successions. To example, ara H2 will be recognized Concerning illustration
shelled nut allergen Also Opsins Similarly as gene identified with long-wave affectability. This
research center practice demonstrate that bioinformatics might keep information identified
with microorganisms put away to an inconclusive occasion when the place Any individual who
need to realize something like micro-organism camwood get data constructed accessible
toward him or her starting with bioinformatics.

3
2.0 Introduction

It is important to computerized information for the conduct of biomedical research. This


method is recognized by the late senator Claude Pepper and national Center for Biotechnology
Information (NCBI) was established on 4th November 1988 as a division of the National
Library of Medicine (NLM) and the National Institute of Health (NIH). Their mission is to
grow new information technologies to encourage the understanding of the basic molecular and
genetic processes which control health and disease. NCBI provide many functions to fosters
scientific communication, develop, distribute, supports and coordinates access to a variety of
databases and software for scientific and medical communities and many more.

GenBank is part of the International Nucleotide Sequence Database Collaboration, which


comprise the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive (ENA) and
GenBank at NCBI. They are exchange data on a daily basis. GenBank also provide and
encourage access within the scientific community to the most up to date and comprehensive
DNA sequence information. There is no restriction on the usage or distribution of GenBank
data.

A standout amongst those look What's more recovery framework of NCBI may be Entrez
which gives clients in gaining entrance to sequence, mapping, taxophytina structural
information. Graphical sees about successions Also chromosomes map Additionally would
furnished Toward Entrez together giving work to the capacity will recover related sequences,
structures What's more references. Those diary writing is accessible through An Web quest
interface which gives get to over 11 million diary citations Previously, Medline. To addition,
an alternate system for grouping similitude seeking created at NCBI known as impact might
recognize genes What's more hereditary features. This project will execute whole DNA
database on less 15 seconds

Recently sequenced DNA to possibility protein encoding segments what's more confirmation
of predicted protein camwood a chance to be searched through ORF discoverer. Those
insertion for DNA arrangement will be searched to open perusing frames toward ORF
discoverer. System returns those range for each ORF, alongside its protein interpretation. This

4
web rendition of the ORF discoverer may be restricted to the subrange of the inquiry
arrangement dependent upon 50 kb long (Open perusing Frame, n. D).

Other than that, the outcomes of heading pointer Examine under physical, chemical,
biochemical What's more living viewpoints for nucleic acids Also proteins included done
nucleic corrosive digestion system and collaborations were given Toward nucleic corrosive
Look into (NAR). This website permits fast production from claiming papers under different
classes for example science further engineered biology, computational biology, gene
regulation, chromatin Furthermore epigenetics, and many more. (Nucleic Acids Research,
2016).

Moreover, ExPASy gives right with exploratory databases Also product instruments in
distinctive territories for an aggregation science including proteomics, genomics, phylogeny,
frameworks biology, evolution, populace genetics, transcriptomics toward those sib
Bioinformatics Asset Portal. This portal offers majority of the data on assets on point by point
from more than 20 different sib Assemblies throughout Switzerland. The propelled hunt
characteristic for ExPasy permits to inquiry databases and with uncover assets on the portal
(Expasy, n. D).

3.0 Objectives

The lab of Bioinformatics was introduced to student in order to:

 To give practical experience using the NCBI interface and expose the student about
bioinformatics application.
 To provide the students a medium to explore different biological databases which
provide specialist and very useful information in biological area.
 To navigate the students to obtain the DNA sequences and dividing nucleotide
sequences into codons that will result in naming of amino acids.
 To familiar students on how to perform basic and advanced searches to be used in
future own research.

5
4.0 Theory

Finding Public Biological Databases

Based on ‘Biological Database for Human Research’, biological databases are developed for
diverse purposes, encompass various types of data at heterogeneous coverage and are curated
at different levels with different methods, so that there are accordingly several different criteria
applicable to database classification. The classification is based on scope of data coverage,
level of biocuration and type of data managed. Scope of data average is where the database can
be as comprehensive and specialized databases. Comprehensive databases cover different type
of data from many species likes GenBank, European Molecular Biology Laboratory (EMBL)
and DNA Data Bank of Japan. While specialized databases contain specific types of data or
data from specific organism.

According to level of data curation, biological databases can roughly fall into primary and
secondary or derivative databases. Primary databases contain raw data as archival repository
such as the NCBI Sequence Read Archive (SRA), whereas secondary or derivative databases
contain curated information as added value likes BCBI RefSeq.

The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-


redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.
RefSeq sequences form a foundation for medical, functional, and diversity studies. They
provide a stable reference for genome annotation, gene identification and characterization,
mutation and polymorphism analysis, expression studies, and comparative analyses. RefSeq
genomes are copies of selected assembled genomes available in GenBank. RefSeq transcript
and protein records are generated by several processes. RefSeq is accessible via BLAST,
Entrez, and the NCBI FTP site. Information is also available in NCBI's Assembly, Genomes
and Gene resources, and for some organisms additional information is available in NCBI's
genome browser Map Viewer. Special properties have been defined to facilitate Entrez-based
retrieval.

6
The last classification is type of data managed. According to the types of data managed in
different databases, biological databases can roughly fall into the following categories which
are DNA, RNA, protein, expression, pathway, disease, nomenclature, literature, and standard
and ontology. DNA database centers on managing DNA data from many or some specific
species. The primary function of human DNA databases includes establishment of the
reference genome. It is well acknowledged that only a tiny proportion of the human genome is
transcribed into mRNAs, whereas the vast majority of the genome is transcribed to many
RNAs. Therefore, an increasing number of human RNA databases have been built for
deciphering RNAs.

The purpose of constructing protein databases includes collection of universal proteins likes
UniProt. UniProt includes three member databases likes UniProt Knowledgebase
(UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc). Pathway
databases contain biological pathways for metabolic, signaling, and regulatory pathway
analysis. A representative example is KEGG PATHWAY, a curated biological pathway
resource on the molecular interaction and reaction networks. As the core of KEGG, KEGG
PATHWAY integrates many entities that are stored in KEGG sibling databases, including
genes, proteins, RNAs, chemical compounds, and chemical reactions.

NCBI Entrez and Searching Biological Databases

Entrez is a molecular biology database system that provides integrated access to nucleotide
and protein sequence data, gene-centered and genomic mapping information, 3D structure
data, PubMed MEDLINE, and more. The system is produced by the National Center for
Biotechnology Information (NCBI) and is available via the Internet.

Entrez covers over 20 databases including the complete protein sequence data from PIR-
International, PRF, Swiss-Prot, and PDB and nucleotide sequence data from GenBank that
includes information from EMBL and DDBJ.

7
The Entrez retrieval system uses an intuitive user interface for rapidly searching sequence and
bibliographic data. A unique feature of the system is its use of precomputed similarity searches
for each record to create links to "neighbors" or related records in other Entrez databases. These
links facilitate integrated access across the various databases. An Entrez global query provides
search capability for a subset of Entrez databases at one time. Results may be viewed in various
formats including FlatFile, FASTA, XML, and others. A graphical interface provides easy
visualization of complete genomes or chromosomes, as well as biological annotation on
individual sequences.

Open Reading Frame (ORF) of the Gene

Table of Genetic Code

The genetic code is a set of rules defining how the four-letter code of DNA is translated into
the 20-letter code of amino acids, which are the building blocks of proteins. The genetic code
is a set of three-letter combinations of nucleotides called codons, each of which corresponds
to a specific amino acid or stop signal. Genetic code is also the set of rules by which
information encoded within genetic material which is DNA or mRNA sequences where they
are translated into proteins by living cells. NCBI takes great care to ensure that the translation
for each coding sequence (CDS) present in GenBank records is correct. Central to this effort
is careful checking on the taxonomy of each record and assignment of the correct genetic code.

Figure : Genetic code that translate to amino acids


8
5.0 Materials and Methodology

Materials

1. A computer/laptop with strong internet connection.

Methodology

A. Finding Public Biological Databases

1. Nucleic Acids Research (http://www.oxfordjournals.org/nar/database/a/) website was


clicked and explored.
2. 5 different databases from GenBank, KEGG, UniProtKB, OMIM and GO with two
additional databases website were chosen and explored.
3. Information available was summarized.

B. NCBI Entrez and Searching Biological Databases

1. “All Databases” in NCBI website (http://ncbi.nlm.nih.gov/) was visited.


2. Different search query of Gene, Protein and Nucleotide of Entrez sections were used by
inserting Triose Phosphate Isomerase as problem.
3. A very good search between Gene, Protein and Nucleotide were identified and compared
by specifying both gene name and organism (human triose phosphate (or triosephosphate)
isomerase 1)
4. Gene RefSeq accession number in mRNA and protein form was identified.
5. The gene location on chromosome was also identified.
6. The numbers of amino acids in protein chain were calculated and first five amino acids
were identified.
7. Three authors of the paper that implicates triosephosphate isomerase protein in Lupus
disease were listed along with the paper’s unique PubMed ID.

9
C. Determination of the Open Reading Frame (ORF) for Hemoglobin Alpha 2 (HBA2) Gene

1. From GenBank (https://www.ncbi.nlm.nih.gov/genbank/) database, alpha 2 globin mRNA


sequence (NM_000517) was retrieved.
2. By clicking ‘reference sequence details’ and ‘FASTA’, the protein coding sequences were
identified either through notepad or wordpad.
3. Start and stop codons were identified and the first 10 codons were translated into amino
acid sequences manually by referring to Genetic Code Table.
4. Open reading Frame (ORF) Finder (https://www.ncbi.nlm.nih.gov/orffinder/) website was
opened.
5. Protein coding sequences reviewed from the GenBank database was copied and pasted onto
the large box of ORF Finder webpage. ‘Submit’ button was clicked.
6. By clicking the longest ORF, the automatically translated amino acid sequence was
checked either it is the same or not with the manually translated amino acid sequence.

D. Sequence Extraction

1. NCBI website (http://ncbi.nlm.nih.gov/) was visited.


2. ‘Nucleotide’ was used for search type and ARA H2 as search item was inserted in the
search box.
3. Link related to ARA H2 – “AY_158467” was chosen and clicked.
4. FASTA was clicked and the entire protein sequence starting from “ATGGCC..” until
“..TACTAA” was copied.
5. Another tab was opened for ExPasy page (http://www.expasy.org/) to view translation tool.
‘Categories‘ was clicked and under tools, ‘Translate’ (nucleotide sequence translation) was
chosen.
6. The earlier copied sequence from FASTA was pasted onto the translate tool box. ‘Includes
nucleotide sequence’ was chosen as the Output Format and the ‘Translate sequence’ was
clicked.
7. The cursor was pointed on the 5’3’ Frame 1 and right clicked to open the link in new tab.
The amino acid sequences were copied.

10
8. Another page of BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was opened and ‘protein
blast’ was chosen. The earlier copied amino sequences were pasted onto the large text box
and being blasted.
9. Links with maximum score under the graph was chosen and the Definition or Source
Organism was reviewed and identified for the gene sequences pasted.
10. Steps 1 until 10 were repeated for another gene sequence of Opsins – “NM_020061”.

6.0 Result and Discussion

A. Finding Public Biological Databases

Visit databases of your own selection to see how these databases are accessed and what
information is available. Include Genbank (Nucleotide Sequence Databases), KEGG
(Metabolic Pathways), UniProtKB (Proteins), OMIM and GO (Gene Ontology) in your visit.

11
12
13
14
B. NCBI Entrez and Searching Biological Databases

Problem: Triose Phosphate Isomerase

1. There are many search queries to use for this gene that would specify both the name of the
gene as well as the organism. Search query in the Gene, Protein, and Nucleotide sections
of Entrez are determined. The difference can be observed based on the information given.

Query: All Databases

15
Query: Gene

16
Query: Nucleotide

17
2. What is the RefSeq accession number for this gene in the mRNA form and for the protein
form?

18
19
3. On what chromosome is this gene found?

20
4. How many amino acids are in the protein chain? What are the first five? One of the useful
abilities of Entrez is to cross reference recent publications that relate to this gene. A recent
paper published implicates the triose phosphate isomerase protein in the disease Lupus.

21
22
5. Who were the three authors of this paper? What is this paper’s unique PubMed ID?

Authors: Yu SJ, Liao EC and Tsai JJ.

23
C. Determination of the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 (HBA2)
Gene.
1. Retrieve the alpha 2 globin mRNA sequence (NM_000517) from the GenBank
database. Can you manually identify the Open Reading Frame (ORF), i.e., the coding
sequence (e.g., in notepad or wordpad)? Proceed by determining the start and stop
codons (use genetic code table). Note that the sequence contains triplets of nucleotides
that are similar to the start/stop codons but which are not the true start and stop codons.
Why is that?

24
25
26
2. Once you have determined the ORF of the HBA2 gene, translate the first 10 codons to
the amino acid sequence.

atg – met, gtg – val, ctg – leu, tct – ser, cct – pro, gcc – ala, gac – asp, aag – lys, acc – thr,
aac – asn

3. Are the ORF and the amino acid sequence confirmed by the NM_000517 annotation in
the GenBank database?

Yes, both ORF and amino acid sequences confirmed by the NM_000517 annotation in the
GenBank database.

27
4. For the automatic determination of putative ORFs you can also use the ORF finder at
the NCBI site.

28
D. Sequence Extraction

ara h2

29
30
31
32
opsin

33
34
35
ara h2: peanut allergen

opsins: eye gene related to long-wave sensitivity and color blindness

36
7.0 Conclusion

The coming of web production need incredibly progressed entry will exploratory substance for
a worldwide scale. This need prompted calls starting with the academic group to examination
should a chance to be produced uninhibitedly accessible web instantly upon publication,
without those obstruction for paid membership with right. Hence, the sum accessible open
living databases to nucleic corrosive Scrutinize (NAR) web for example, GenBank, KEGG,
UniProtKB, OMIM, try and a lot of people more could a chance to be accessed further
investigated.

Furthermore, those quest inquiries in the Gene, Protein What's more nucleotide areas for Entrez
doubtlessly provided for distinctive outcomes shown for those issues continuously
investigated. From these three sorts about look queries, despite they will provide for a great
part comparative information, we even now could pick the best scan inquiry to whatever genes
which could define gene sake and additionally the living being. So, it camwood provide
straightforwardness about utilize clinched alongside bioinformatics requisitions What's more
save the long haul for issue seeking.

Moreover, the utilization about open perusing span (ORF) will figure out those coding
successions for at whatever genes being investigated. Starting with there, the begin alternately
stop codon likewise might make recognized whichever manually Eventually Tom's perusing
alluding on hereditary code table alternately Eventually Tom's perusing naturally interpretation
through ORF framework. Hence, it could tell us whichever the manually translated protein
grouping is remedied or not Eventually Tom's perusing contrasting it with those naturally
translated protein successions.

Lastly, this lab also uncovered clients of the utilization of ExPasy page as an interpretation
device around the place it peruses those codons in the arrangement and interpret them under
proteins. It also acquires us of the provision of protein impact with find locales for nearby
similitude between successions. This project basically compares nucleotide successions should
arrangement databases further calculates those factual importance of the matches. Starting with
those effect of the search, we could get to know the entire meaning of the gene title what it
recounts us around.

37
8.0 Reference
1. Basic Local Alignment Search Tool. (n.d). Retrieved October 18, 2016 from
https://blast.ncbi.nlm.nih.gov/Blast.cgi
2. GenBank. (2016). Retrieved October 18, 2016 from
https://www.ncbi.nlm.nih.gov/genbank/
3. Bailey, R. (2016). Genetic Code. Retrieved October 23, 2016 from
http://biology.about.com/od/genetics/ss/genetic-code.htm
4. Nucleic Acids Research. (2016). Retrieved October 22, 2016 from
http://www.oxfordjournals.org/our_journals/nar/about.html
5. Open Reading Frame Finder. (n.d). Retrieved October 21, 2016 from
https://www.ncbi.nlm.nih.gov/orffinder/
6. Expasy. (n.d). Retrieved October 18, 2016 from http://www.expasy.org/features
7. Protein Coding Sequences. (n.d). Retrieved October 20, 2016 from
http://parts.igem.org/Protein_coding_sequences

9.0 Appendix

Name of amino acids

38
39

You might also like