CH-3 Genomics Bioinformatics Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Genomics

&
Biotechnology Bioinformatics
Class XII

Chapter VI

Daisy Benoy
Genomics-Classification

Daisy Benoy
Gene Prediction
Gene prediction and counting
• Annotation is a process which identifies genes, their regulatory
sequences and possible functions. Annonation indicates the non-
protein coding genes, coding genes for r RNA, t RNA and nuclear
RNAs, mobile genetic elements and repetitive sequence families
present in genome.
• Gene prediction is an important problem for computational biology
and there are various algorithms that do gene prediction using known
genes as a training data set.
• Even if we know where the genes are in the genome, it is not entirely
clear how to count them.Why?
• Due to the existence of overlapping genes and splice variants it is
difficult to define the parts of the DNA that should be regarded as the
same or several different genes.

Daisy Benoy
Gene Prediction
• Even if we know where the genes are in the genome, it is not
entirely clear how to count them.Why?
• Due to the existence of overlapping genes and splice variants it
is difficult to define the parts of the DNA that should be regarded
as the same or several different genes.

Daisy Benoy
Gene Prediction

Organism No. of Genome size The Number Part of the


chromo in base pairs of genome
somes Predicted that
genes encodes
for
protein
Bacteria Escherichia coli 1 500,000 5000 90%
Yeast Saccharomyces 16 12,068,000 6340 70%
cerevisiae
Worm Caenorhabditis elegans 6 100,000,000 19,000 27%
Fly Drosophila melanogaster 4 175,000,000 - 13,600 20%
Weed Arabidopsis thaliana 5 196,000,000 25,498 20%
Human Homo sapiens 23 157,000,000 20,000 - 25, < 5%
3,000,000,000 000

Daisy Benoy
Gene Prediction

• One of the surprises is the relatively small number of genes in a


human genome ( 20,000 - 25,000 genes) in comparison to worm
(19,000 genes).
• Are insilico predictions unreliable?
• Yes, unreliability of in silico (i.e., computational) gene prediction)
is due to:
❑Computational errors
❑no simple correlation between the intuitive complexity of an
organism and the number of genes in its genome.
❑Post transcriptional and translational modifications

Daisy Benoy
GENOME SIMILARITY, SNPS AND
COMPARATIVE GENOMICS

• It is said that about 99.8% of 3.2


billion base pairs between two
humans is the same 0.2%
difference in DNA sequence is
enough to make each individual
unique.
• For every 500 nucleotides. only
one nucleotide differs between
two individuals.
• Variation in a few locations in
DNA sequence can lead to
severe disease and different
characters in human beings.

Daisy Benoy
GENOME SIMILARITY, SNPS AND
COMPARATIVE GENOMICS
• SNPs(Single nucleotide
Poltmorphism)
• SNPs are DNA sequence
variations, which occur when
a single base (A,C, G, or T) is
altered so that different
individuals may have different
bases at these positions.
• SNPs, which can occur both in
coding and non-coding
regions of the genome.
• It is believed that SNPs occur
at 1.6 million to 3.2 million sites
in the human genome,

Daisy Benoy
Application-SNP
1. The genetic variations between individuals (particularly, in the
non-coding parts of the genome) are exploited in DNA
fingerprinting, which is used in forensic science.
2.Genomic variations underlie differences in our susceptibility to, or
protection from all kinds of diseases. The severity of illness and the
way our bodies respond to treatments are also manifestations of
genetic variations

Daisy Benoy
Application-SNP
EG.Single base difference in the ApoE gene is associated with alzheimer's
disease
EG.A simple deletion within the chemokine-receptor gene CCR5 leads to
resistance to HIV (human immunodeficiency virus) infections and the
development of AIDS (acquired immunodeficiency syndrome)
3. SNP analysis is therefore important for diagnostics and a SNP database
has been developed to aid these applications
4. Physicians can use patients DNA sample to determine the pattern of
SNP genotype profile and from that they can predict how patients are
likely to respond to a particular drug. SNP analysis can also be used in
population genetics, as some SNPs vary in different frequencies between
populations.
Daisy Benoy
Application-SNP
• 4. The genome sequencing projects have revealed that the
genomes of organism otherwise quite different in appearance are
quite similar for example mouse and man, are quite similar.
Another example is that, among the conserved elements
between different species such as the worm and the yeast,
substantial portion belongs to genomic regions coding for proteins
• 5. It is estimated that the difference between human and
chimpanzee genomes is only 1 to 3%, while human and mouse
share about 97.5% of their working DNA. These similarities suggest
that none of these genomes has changed much since we shared
a common ancestor 100 million years ago.

Daisy Benoy
FUNCTIONAL GENOMICS
o Knowledge about genomes to
understand the gene and their
product functions and interactions.
o The new technique,
microarrayTechnology and
proteomics provide snapshots of all
the genes expressed in a cell or
tissue under different environmental
conditions.
o The DNA microarray technology is
used for analysing the expression of
thousands of messenger RNA
molecules.
Daisy Benoy
FLUORESCENCE IN SITU HYBRIDIZATION

Nick Translation -It is possible to


introduce colours into DNA by a
technique called Nick Translation
developed in1977 by Rigby and
Paul Berg.
The enzymes, DNA polymerase I
makes DNA and DNase I, cuts
DNA
They are combined in a buffered
reaction with dNTP's, including
dUTP labelled with a red or green
fluorescence.
Daisy Benoy
FLUORESCENCE IN SITU HYBRIDIZATION
❑ The DNA polymerase I adds nucleotide
residues to the 3-prime hydroxyl terminus
that is the result of nicks (breaks) created
by the DNase I in the DNA.
❑ In the process, the fluorescence labelled
nucleotide in the free nucleotide mixture
becomes incorporated into the newly
synthesized strands of DNA
❑ The DNA fragment size with fluorescence
probe after nick translation depends
upon the amount of enzyme and the
incubation time of reaction. The size
range can be 300 to 3000 bp.

Daisy Benoy
APPLICATION OF FISH
• Chronic mylogenous leukemia (CML)

• Karyotype analysis of the


lymphocyte preparation made
from blood samples of CML
patients that there was a 9-22
translocation in the chromosome
(also called Philadelphia
chromosome’)
• Counting the number of such cells
it was possible to find out the
severity of the disease, it was not
an easy procedure.
Daisy Benoy
APPLICATION OF FISH

o The regions on the chromosomes


involved in translocation were
identified on chromosomes 9 and
22.
o From the DNA library it was possible
to pick up clones carrying the
particular genes involved in CML.
o Using nick translation it was
possible to flourescently label
chromosome 9 region with red
colour and chromosome 22 region
with green colour and prepare the
probe
Daisy Benoy
APPLICATION OF FISH

o CML lymphocytes smear cells


were hybridized with the two
probes in situ
o When observed under fluorescent
microscope, the cells, which were
affected, appeared yellow (mixing
of green and red colour produces
yellow colour). The unaffected
cells appeared as red and green
o This technique known as
Fluorescence in situ Hybridization
Daisy Benoy
APPLICATION OF FISH

• FISH • Karyotyping
• Allows knowing the status in the • Allows knowing the status only in
interphase metaphase chromosome.
• The status of the disease could easily • The status of the disease could
be identified by counting the not be easily be identified by
number of cells, which appeared counting the number of cells
yellow. FAST
• Slow and diificult
• To monitor the effect of
chemotherapy and drugs by taking • Results-7-10 days
out samples and counting the
number of cells appearing yellow.
• Results-<24hrs

Daisy Benoy
MICROARRAY TECHNOLOGY

• Aim • DNA arrays, gene chips,


• This technology promises to biochips,DNA chips and
monitor the whole genome on a gene arrays.
single chip so that researchers can
have a better picture of the
interactions among thousands of
genes simultaneously.
• Principle:
• A microarray is typically a glass
slide, onto which large numbers of
DNA molecules spotted in a
systematic order at fixed locations
(spots).
Daisy Benoy
MICROARRAY TECHNOLOGY

• Principle:
• The base pairing or hybridization is the
underlying principle of DNA microarray.
Microarray exploit the preferential binding
of complementary single-stranded
cDNA,which is derived from messenger
RNA. In order to detect cDNA bound to the
microarray, they must be labeled with a
reporter molecule. This technique of
introducing fluorescent dyes in DNA and its
use in detection of target molecule by
hybridization is called fluorescent in situ
hybridization (FISH).
Daisy Benoy
MICROARRAY TECHNOLOGY

• Procedure
• to compare a normal cell and a
cancerous cell.
• Comparative hybridization
experiments compare the amounts of
many different mRNA in two cell
populations.
•mRNA is first purified and are reverse
transcribed back into cDNA.
•A differently-colored fluor is used for
each sample represented by the red
and green circles attached to the
cDNAs.(Probes)
Daisy Benoy
MICROARRAY TECHNOLOGY

• Procedure
• The two cDNA probes are tested
by hybridizing them to a DNA
microarray.The array holds
hundreds or thousands of spots,
each of which contains a different
DNA sequence.
• If a probe contains a cDNA whose
sequence is complementary to the
DNA on a given spot, that cDNAwill
hybridize to the spot, where it will
be detectable by its fluorescence.
Daisy Benoy
MICROARRAY TECHNOLOGY

• Interpretation
• Spots, whose mRNA is present at a
higher level in one or the other cell
population show up as predominantly
red or green- red spots correspond to
genes expressed in high amounts in
normal cells. Similarly green spots
correspond to genes expressed in high
amounts in cancerous cells.
• Yellow spots have roughly equal
amounts of bound cDNA from each cell
population-genes expressed
approximately equally in both normal
and cancerous cells

Daisy Benoy
MICROARRAY TECHNOLOGY

• Advantages • To study the following:


o This microarray technology
promises to monitor the whole • 1. Tissue specific genes
genome on a single chip so • 2. Regulatory gene
that researchers can have a defects in a disease
better picture of the • 3. Cellular responses to
interactions among environment
thousands of genes • 4. Cell cycle variations
simultaneously.

o To develop protein arrays


Daisy Benoy
MICROARRAY TECHNOLOGY
• Microarrays are made from a
collection of purified DNA molecules
typically using an arraying machine.
• The choice of DNA to be used in the
spots on a microarray determines
which genes can be detected in a
comparative hybridization assay. In
the case of gene chips, the
substrate for immobilization is a
silicon wafer and the probes are
oligonucleotides spotted through
photolithographic etching.

Daisy Benoy
PROTEOMICS

Proteome refers to the complete • It is impossible to understand


protein set of a cell. mechanisms of disease,
ageing etc. solely by studying
Proteomics refers to the large scale the genome.
characterization of the entire
protein complement of cells, tissues • Only by understanding protein
function and their
and even whole organisms. modifications, drugtargets for
Importance various diseases can be
These include protein-protein identified.
interaction studies, protein function, • Major aim of proteomics is to
and protein localization. create a three dimensional
Proteins are responsible for the map of a cell indicating the
phenotype of the cells. location of proteins.
Daisy Benoy
PROTEOMICS

The proteome of a given cell is dynamic.


In response to internal and external cues
biochemical machinery of the cell could
be modulated. This could lead to several
changes in the proteins such as post-
translational modifications, changes in
cellular localization, effect on their
synthesis or degradation. Thus
examination of a proteome is like taking a
snapshot of the protein environment at a
given time..

Daisy Benoy
PROTEOMICS

Types of Proteomics
Black Brown
Expression proteomics: hair hair
The quantitative study of protein
expression between samples that differ
by some variable is known as expression
proteomics.
Use
• Protein expression of the entire proteome or of
sub proteomes between samples can be
compared.
• Identification of disease specific proteins. For
example: tumor samples from a cancer
patient and a similar tissue sample from a
normal individual could be analyzed for
differential protein expression
Daisy Benoy
PROTEOMICS

Types of Proteomics
Expression proteomics:
• Using two dimensional gel
electrophoresis, followed by mass
spectrometry, proteins, which are over
or under expressed in the cancer
patient compared to the normal
individual can be identified.

• Microarray data Identification of these


could provide a lead in understanding
the basis of tumor development.

Daisy Benoy
PROTEOMICS

Types of Proteomics
Structural proteomics
It is directed to map out the structure and nature of
protein complexes present specifically in a
particular cellular organelle.
Aim
•To identify all proteins present in a complex
•To characterize all protein-protein interactions
occurring between these proteins.
•Isolation of specific sub cellular organelles or
protein complexes by purification can help
assembling information about architecture of
cells
•Explain how expression of certain proteins
gives a cell its unique characteristics.
Daisy Benoy
NCBI
The National Center for Biotechnology Information Resources available from the
(NCBI) NCBI
The NCBI at the National Institutes of Health • Database retrieval tools,
was created in 1988 to develop information • BLAST family of sequence
systems in molecular biology. similarity search programs,
Objective
• Gene level sequences,
• Maintaining the GenBank nucleic acid
sequence database, • Chromosomal sequences,
• NCBI provides data retrieval systems • Genome analysis,
and computational resources for the • Analysis of gene expression
analysis of GenBank data and the variety of patterns,
other biological data made available • Molecular structure.
through NCBI.

Daisy Benoy
NCBI

• Database retrieval tools Taxonomy browser


• It provides information on
taxonomic classification of various
ENTREZ LOCUS LINK species
• The taxonomy database has
TAXONOMY BROWSER information on over 79, 000
Entrez organisms.
• Through this system one can access Locus link
• It carries information on the official
literature (in the form of abstracts), gene names and other descriptive
sequences and structures. information about genes.
• Entrez is an excellent system for • Locus link one can access
information on homologous genes.
obtaining comprehensive information For example, information on the
on a given biological question. mouse homologue of a given human
gene..
Daisy Benoy
NCBI

BLAST family -Basic Local Alignment Search Tool • (b) Top scoring matches are
ranked according to set
• Tool to analyze sequence information criteria that serve to
• These tools are designed to answer the distinguish between a
question "Which sequences in the similarity due to ancestral
database are similar (or homologous) to relationship or due to random
my sequence?“ chance.
• (c)True matches are further
The principles involved are- examined thoroughly with
(a) A given sequence is compared with other details accessible
sequences in the database using through Entrez and other tools
substitution matrices that specify scores to available at NCBI.
either 'reward' a match or 'penalize’ a
mismatch.
Daisy Benoy
NCBI

BLAST family -Basic Local Alignment Search Tool


• Homology is defined as similarity due to
common ancestry. Two sequences
each from species A and species B are
said to be homologous if they have
descended from a common ancestor
to species A and species B.
Homologues will have the same
function

• Duplicated genes within a genome also


may have similarities but these are
referred to as paralogs’. Paralogs may
differ in functions.
Daisy Benoy
NCBI

• Resources for gene level


sequences

UNIGENE REFSEQ
HOMOLOGENE
UNIGENE
Several cdna clones represent the
same gene-EST. To manage the
redundancy in EST data, unigene
database was created. The
objective is to group ESTs into sets
called clusters that belong to
'one' gene
Daisy Benoy
NCBI

• Resources for gene level RefSeq


sequences
• It is a curated database of mRNAs
and proteins of organisms like
UNIGENE REFSEQ
human, mouse and rat.
HOMOLOGENE
• The data provided in RefSeq has
Homologene been used in many cases such as
It is a database of orthologs and designing gene chips and
homologs for several organisms describing the sequence features
like human, mouse, rat, zebrafish
and cow genes represented in of the human genome.
UniGene and Locus Link.
It is easy to infer homologous
relations using this database.
Daisy Benoy
NCBI
Database Information available
EMBL(European Molecular Biology Laboratory) Nucleotide sequence

UniProtKB Annotated protein sequence

PDB (Protein Database) Three dimensional structure of


proteins
Ribosomal RNA database rRNA subunit sequences

PALI database Phylogenetic analysis and


alignment of proteins
Curator: A curator is one who reviews and checks newly submitted data ensuring all mandatory
information has been provided, that biological features are adequately described and that the conceptual
translations of any coding regions obey known translation rules. This process is called curation.
Daisy Benoy
ANALYSIS USING BIOINFORMATICS TOOLS

Processing raw information: The experimentally determined sequence (raw


information) is processed using bioinformatics tools into genes, the proteins encoded
and their function, the regulatory sequences, and inferring phylogenetic relationships.
Genes: Gene prediction can be done by using computer programs like GeneMark
for bacterial genomes and GENSCAN for eukaryotes.
Proteins: Protein sequences can be inferred from the predicted genes by using simple
computer programs.
Regulatory sequences: Regulatory sequences can also be identified and analysed
by using bioinformatics tools.
Inferring phylogenetic relationships: Information regarding the relationships between
organisms can be obtained by aligning multiple sequences, calculating evolutionary
distance and constructing phylogenetic trees.
Making a Discovery: Using the bioinformatics tools and databases, the functions of
unknown genes can be predicted.
Daisy Benoy

You might also like