Microarray Technology: Applications and Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 55

Microarray Technology:

Applications, Analysis of
Data, Clustering Analysis
DR. NEERU REDHU
A S S I S TA N T P R O F E S S O R
M B B B ( B I O I N F O R M AT I C S , C C S H A U , H I S A R
What Is Microarray
• Microarray technology is qualitative or quantitative technique of measuring
the gene expression

• Microarrays are microscope slides that contain an ordered series of samples


(DNA, RNA, protein, tissue).

• Predecessor of this technology were dot blots, slot blots and macroarray
with membranes used as platforms
• The use of a collection of distinct DNAs in arrays for expression profiling was
first described in 1987

• Term comes in existence in 1995 by Schena et. al. in an article of science


under genome issue

• It is first used for monitoring gene expression in 1997


PRINCIPLE
• A microarray works by exploiting the ability of a given
mRNA molecule to bind specifically to, or, hybridize to, the
DNA template from which it originated
• Which is also the principle of traditional gene expression
analyses(southern/northern blotting)
• Expression levels of thousands of genes within a cell can be
determined
• With the aid of computer, profile of gene expression in the
cell can be precisely measured.
History
• First demonstrated by Grunstein and Hogness in 1975 by creating
random and unordered collection of DNA spots, through lysis of bacterial
colonies located on nitrocellulose filters, representing cloned fragments
of E. coli plasmid that contains DNA of interest.
• This helped in rapid screening thousands of E. coli colonies (Grunstein
and Hogness, 1975) through subsequent hybridization of cloned
fragments to radiolabeled DNA/RNA probe.
• The technique was further developed and mechanized by Gergen and
coworkers in 1979 by creating well-organized array/series to replicate
multiple microtiter plates on agar.
• The mechanical device allows the re-use of several times by transferring
colonies onto the whatman filter paper squares (Gergen et al., 1979).
• Around 1990, the method was automated to organize clones from
microtiter plates onto filters as arrays (Craig et al., 1990; Lennon and
Lehrach, 1991).
• During late 90s and early 00s, new methods of array creation and
detection through fluorescence is incorporated into microarray.
• The microarray technique can be separated into
two steps
• Array Fabrication
(involves synthesis and attachment of both arrays and
probes)

• Target Preparation
(involves RNA isolation (at least two samples i.e.,
reference and experimental) followed by reverse
transcription to obtain cDNAs and their labelling with
fluorescent dyes )
• Array fabrication involves synthesis and
attachment of both arrays and probes.
• Therefore, microarray can be classified based on the
• Mode of array preparation
• Spotted arrays
• In-situ synthesized arrays
• Self-assembled arrays
• Types of probe
• DNA, protein, Carbohydrate, Chemical compound
and Cellular among others
• Classification based on the mode of array preparation

• Spotted arrays: Developed by Derisi and coworkers in 1996. It uses


a robotic spotter with multiple pins for DNA spotting on poly-lysine
coated glass slides. It generates very high-density DNA arrays which
are sensitive and cheaper, but the reproducibility of the spotted array
is very low.

• in-situ synthesized arrays: Developed by Fodor and coworkers in


1991 and Licensed to Affymetrix. These are light-directed, spatially
addressable chemical synthesis by combing photolabile protecting
groups with photolithography directly on a solid substrate.
• Another technique of in-situ synthesis is inkjet printing technology
(licensed to Agilent technologies), developed by Blanchard and team
in 1996, delivers four phosphoramidite bases to pre-patterned glass
slides with hydrophilic region surrounded by hydrophobic region.
Both Affymetrix and Agilent chips are primarily used in expression
analysis, genotyping, and sequencing.
Contd…

• self-assembled arrays: Developed by David Walt (Walt,


2000) in 1998 and licensed to Ilumina. Fabrication consists
of DNA synthesis on polystyrene beads that subsequently
placed randomly on the fiber-optic array.
The Microarray Technologies

Spotted Microarray Affymetrix Gene Chip

• cDNAs, clones, or short and long • short oligonucleotides


oligonucleotides deposited onto synthesized in situ onto glass
glass slides wafers

• Simultaneous analysis of two • Each oligonucleotide has single-


samples (treated vs untreated base mismatch partner for
cells)provides internal control. internal control of hybridization
specificity.

• relative gene expressions


• absolute gene expressions
Pros and Cons of Technologies

Spotted Microarray Affymetrix Gene Chip


• Flexible and cheaper • More expensive yet less flexible

• Allows study of genes not yet • Good for whole genome expression
sequenced (spotted ESTs can be used to analysis where genome of that
discover new genes and their functions) organism has been sequenced

• Variability in spot quality from slide to • High quality with little variability
slide between slides

• Provide information only on relative


• Gives a measure of absolute
gene expressions between cells or tissue
samples expression of genes
Based on the types of probes used
(1) DNA microarrays: These are also known as DNA chip, gene
chip, or biochip. These microarrays use oligonucleotide probes on
chip which hybridizes with nucleic acids such as DNA or cDNA.
The oligonucleotide microarrays, BAC microarrays, cDNA
microarrays and SNP microarrays are some popular examples of
DNA microarray.

(2) MMChips: These are also known as Model-based Meta-


analysis of ChIP data. These chip uses oligonucleotide as probe.
It is primarily used for scrutinizing chromatin and protein-DNA
interactions. It provides the integrative analysis of dataset from
different platforms and laboratories like ChIP-chip (Chromatin
immunoprecipitation (ChIP) followed by array hybridization) and
ChIP-seq (ChIP followed by massively parallel sequencing) data.
Contd…
(3) Protein microarrays: These are also known as protein chips.
It mainly used fluorescently labeled protein as probes. These
provide a platform for high throughput protein characterization.
These can be further divide into analytical, functional and
reverse-phase protein microarrays.
Analytical microarrays are also known as capture arrays where antibodies,
aptamers or affibodies are used as probes and specifically binds to a
particular protein.
Functional protein microarrays or target protein arrays use a large number
of purified functional proteins in their native form as probes. These are
used in studies targeting proteome biochemical activities such as protein–
protein, protein–DNA/RNA, protein–phospholipid, and protein–small-
molecule interactions among others.
Reverse phase protein microarrays use cell lysates or serum as probes and
commonly applied in clinical or pharmaceutical application.
Contd…
(4) Peptide microarrays: Typically use peptides as probes and
aims to screen proteome by optimizing protein–protein
interactions.
(5) Tissue microarrays: It uses paraffin blocks of tissue by
embedding them in microarray and primarily applied in
pathological studies.
(6) Cellular microarrays: These are also known as transfection
or living-cell microarrays. Living cells are used as probes.
Typically, they are used for investigating and screening the local
cellular microenvironment.
(7) Chemical compound microarrays: They use chemical
compound as probes and mostly used for screening and
discovery of new drugs.
Contd…
(8) Antibody microarrays: These are also known as antibody chip
or antibody array and uses antibody as probe to detect antigens.
(9) Carbohydrate arrays: These are also known as glycoarrays and
uses carbohydrates as probe. They are used to screen, identify and
calculate carbohydrate-protein binding affinities.
(10) Phenotype microarrays: Pre-configured sets of phenotypic
tests are used as probe in microarray wells. They quantitatively
measure thousands of cellular phenotypes simultaneously as well as
used in functional genomics and toxicological testing.
(11) Interferometric reflectance imaging sensor or IRIS: IRIS is
a biosensor prepared by robotic spotting on Si/SiO2 substrates for
analyzing DNA/DNA, protein/DNA, and protein/protein
interactions without using fluorescent labeling.
Microarray
• Also known as expression chips

• A small 1 cm2 chip that is divided into thousands of squares.

• Each square contains many copies of a single gene.

• Originally developed by Patrick Brown at the Stanford University


School of Medicine
What are DNA Microarrays Used For?

• Expression Analysis
• SNP genotyping
• Chromatin profiling
• Molecular bar-coding
• Identifying insertions
• DNA methylation analysis
• Food Safety
Why use DNA Microarrays for Expression
Analysis?

• Conventional expression analysis only allows the study of


the expression of a single gene in a single experiment

• The highly parallel nature of microarrays allows the


simultaneous study of the expression of different genes in a
single experiment

• Microarrays allow researchers to undertake global


expression analysis that is not feasible with conventional
techniques
Conventional Expression Analysis
Control Treated
Cells Cells

RNA RNA
Electrophoresis
and Blotting

Hybridization
Labeled probe –
gene “X”
Expression Analysis Using Spotted DNA
Microarrays
Control Treated
Cells Cells

RNA RNA
Label with Cy3 Label with Cy5

Hybridize, wash,
and scan

Microarray containing 16 probes


Two different formats of microarray-based technology
depending upon target nucleic acid components:

Oligonucleotide cDNA
• Greatly reduced cross- • Cross-hybridization possible
hybridization
• Non-uniform Tm
• Uniform Tm • No gene sequence
• Requires knowledge of gene knowledge required
sequences
• Less expensive
• More expensive
Oligonucleotide array

• Contains targets of approx. 25 bp in length

• Generated in situ on solid surface by light-directed synthesis

• Designed and synthesized based on sequence information

• Cloning and PCR are not required

• Specific sequences, which are non overlapping, can be


designed to increase the hybridization sensitivity even of
shorter sequence
cDNA array

• Fabricated by printing of cloned and amplified cDNAs on


to the solid surface

• It has less susceptibility and higher specificity due to


longer sequences of the targets
DNA Microarray consist two major
steps
• Material processing and data collection
i. Array fabrication
ii. Preparation of biological samples to be studied
iii. Extraction and labeling of RNA/cDNA
iv. Hybridization of the labeled extracts to the array
v. Scanning of the hybridized array

• Information processing
ARRAY Fabrication

• Currently two widely used microarray technologies

• In situ synthesized oligonucleotides microarray

• Spotted in glass or nylon membrane matrices


Preparation of biological samples

• It involves extracting and purifying the mRNAs from tissue of


interest

• It has many challenges


• Target mRNA accounts for only small fraction in cell
• Difficult to isolate mRNA specific to study from a heterogeneous range
of cell
• Captured mRNA degrades very quickly

• To resolve this problem mRNA usually reverse-transcribed into


more stable cDNA immediately after extraction
Contd..

• Total cellular RNA was isolated by the Trizol method

• RNA quality is checked with the Agilent bioanalyzer

• Only high quality mRNA is converted to cDNA and used for


experiment
Extraction and labeling of cDNA

• Labeling is done so that quantity of RNA present can be known

• In affymetrix array mRNA is extracted from experimental


samples and is labeled with a fluorescent oligo(biotin)
(biomarker tags)

• In spotted array after extracting the control one is labeled with


dyes
• Cy5 (red fluorescent dye)
• Cy3 (green fluorescent dye)
Hybridization

• Labeled fragments in the probe are expected to form duplexes


with their immobilized complementary targets.

• The number of duplexes formed reflects the relative number of


each specific fragment in the probe

• The amount of immobilized target nucleic acid is in excess and


not limiting the kinetics of hybridization

• Two or more samples labeled with different fluorescent dyes can


be hybridized simultaneously,
Contd..
• The microarray hybridization technique works in a very
similar way to that of Southern blotting, except that it is
carried out in reverse, and the target is first attached to a
slide instead of a membrane.

• Thereare several critical factors to perform a successful


microarray hybridization.
• Pre-hybridization
• Hybridization
• Manual
• automated
• Stringency washes
Scanning

• Confocal laser scanner is used


• Two different lasers to read Red and Green dye intensities

• A graphic image is saved

• Imaging software reads Red & Green intensity for each


dot applied
Microarray consist two major steps:
• Material processing and data collection

• Information processing
i. Image quantitation
ii. Data normalization and integration
iii. Gene expression data analysis and mining
iv. Generation from these analyses of new hypothesis
about the underlying biological process
Image Quantitation
• Main steps are:-

1. Image Analysis
• Identification of spots on array
• Specifying position of each sub-grid
• Determination of spot signal and estimation of background
hybridization
• Use of fixed region centered on center of mass
• Identifying the spot boundaries (pixel within it used)

2. Majoring and Reporting Expression


Image Quantitation

1. Image Analysis

2. Majoring and Reporting Expression


• Includes the total intensity for each feature
• Mean, median, mode of the pixel intensity distribution
• Estimation of background
• Calculation and transformation of Expression Ratio
Data normalization and integration
• Allows the data from two samples to be appropriately
compared
• Normalization methods:-
• Global normalization schemes
• Standardization: Data set are standardize to ensure that mean and std.
deviation of each data set are equal
• Iterative linear regression: Variation in data sets is caused by systematic
bias and described by linear correspondence

• Intensity dependent normalization


• LOWESS : Locally weighted linear regression: Remove intensity-
dependent effects in log2(ratio)
• Distribution normalization: Make the distributions of the transformed
spot intensities as similar as possible across the array
Other Methods of Normalizations
• Total intensity normalization
• Normalization factor is calculated and use to rescale the intensity for each gene in the
array

• Mean log centrig


• Mean log2(ratio) should be equal to zero
• But sensitive to outlying, differentially expressed genes

• Linear regression
• Scatterplot of intensities from two samples being compared would cluster along a
straight line making slope equal to 1

• Chen ratio’s statistics


• By probability density it calculates confidence limits which helps in identification of
differentially expressed genes
Data Filtering
• For reducing data complexity and increase its overall quality

• Used where
• data is questionable
• low quality
• to enhance a particular feature of data
• To reduce the fold change values for gene expressed

• Replication of data for identify and reducing the effect of


variability in any experimental assay

• Interpretation of the data and comparison between samples is


straightforward
Identification of differentially
expressed genes

• Intensity-dependent estimation of differential expression


• Calculates Z-score
• For |Z|>1.96 confidence level is 95%

• Analysis of variance (ANOVA)


• Test for significant differences between means comparing variances
• It is a generalization of better known statistical techniques
Gene Expression Data Matrices

• Rows represents genes and column represents experimental


conditions and values are known as gene expression levels

• In discretisation matrix
• 0 means no expression
• 1 means substantially increase in expression
• -1 means substantially reduced expression compared to reference
sample
Gene Expression Data Analysis and
Mining

• Data mining process relies on two methods:


• Supervised methods
• Annotation is used from very beginning
• Obtained information from partitioning of known samples into healthy
and diseased categories
• Unsupervised methods
• Based on looking for structure in data itself, ignoring any annotation
• Gene clustering, Sample clustering, and principal component analysis
Class Prediction/ Classification

• Typical example of a supervised data analysis approach


• Finds data by using external information like annotation
• Uses classification algorithm to classify according to
specific expression patterns
• Different Classification algorithms are:
• K-nearest neighbour method
• Vector machines
• Linear regression
Clustering
• Useful for discovering ‘types’ of behaviour, for reducing
the dimensionality of the data as well as for the detection
of outliers in the data

• Type :
• Hierarchical agglomerative clustering
• Hierarchical divisive clustering
• Non-hierarchical clustering (K-mean)
• Self- organizing maps and trees
Visualization
• Data mining technique for finding pattern in data used for gene
expression data analysis

• Visualization usually coupled with techniques allowing reduction


dimensionality in data

• Various techniques
• Heat maps
• Profile graphs
• Topo or gene expression terrian map

• Also used to find sets of co-expressed genes containing particular


sequence elements in their promoter
• Depending upon the kind of immobilised sample used the
microarray experiments can be categorised in three ways:

MICROARRAY
ANALYSIS

Microarray Microarray for Comparative


expression Mutation Genome
analysis analysis Hybredisation
Microarray expression analysis:

• In this experimental step up, cDNA derived from the mRNA


of known gene is immobilised.
• The sample has gene from both the normal as well as the
diseased tissues.
• Spot with more intensity are obtained for disease tissue
gene if the gene is over expressed in the diseased condition.
• This expression pattern is then compared to the expression
pattern of a gene responsible for a disease
Microarray for Mutation
analysis
• For this analysis, genomic DNA is used.

• The gene might differ from each other by as less as a single


nucleotide base.

• A single base difference between two sequences is known as


Single Nucleotide Polymorphism and detecting them is
known as SNP detection
Comparative Genome Hybridization
• Comparative genomic hybridization(CGH) is a molecular-cytogenetic
method for the analysis of copy number changes (gains /losses) in
the DNA content of tumor cells. (resolution ~10MB)

• First described in 1993: Kallioniemiet al

• It is used for the identification in the increase or decrease of the


important chromosomal fragments harboring genes involved in a
disease

• Array comparative genomic hybridization detects chromosome copy


number changes at a higher resolution level than conventional
chromosome-based comparative genome hybridization (CGH).
Sources of variability

• Binding of cDNA to microarray


• Environmental Conditions
• Experimental Design
• Hybridization of RNA to DNA
• Instrument Error
• Microarray surface chemistry
• Spot placement
• Quality of spotted genes on Array
Tools for Microarrays
• TM4
• Package of Open Source software programs for microarray analysis

• MIDAS
• Allows raw experimental data to be processed through various data
normalization , filters and transformations via a user-designed analysis pipeline

• MeV
• Identify patterns of gene expression and differentially expressed genes

• Spotfinder
• Tool for microarray image processing uses TIFF files

• Arrayviewer
• Identifies the genes that are differentially expressed

• TGICL
• TGI Clustering Tool automates clustering and assembly of a large EST/mRNA
dataset using CAP3 assembly program
Public Repository for Microarray Data

• ArrayExpress
• From Alvis Brazma and colleagues at the EBI
• Aimed at storing well annotated data in accordance with MGED
recommendation

• GEO : Gene Expression Omnibus at NCBI in USA

• CIBEX : Center for information Biology Experimentation


Database in Japan
Public Repository for Microarray Data
Tools for Microarray Data Submissions
• MIAMExpress
• Minimum Information About a Microarray Experiment
• Web based tool for submitting data to ArrayExpress database

• Tox-MIAMExpress
• Is an annotation and submission tool
• Link to biological end point

• Expression Profiler
• Provided by EBI
• Extensible web-based collaborative platform for microarray gene
expression
• Mainly used for sequence data analysis, exposing distinct chainable
components for clustering, pattern discovery, machine-learning
algorithms and visualization
Major Applications of Microarray
• Gene expression analysis
• Genomic analysis
• In analysis of SNPs and mutation
• Predicts splice variants of transcripts

• Drug discovery
• By looking to co-expressed genes
• Defines toxic properties

• Evolution of genotyping
• Epitope mapping and clonal analysis
• Diagnosis of genotypic polymorphisms and mutations
• Phenotypic analysis and monitoring disease
• Predicting gene function
Current Challenges
• There is no consensus or precedent on how to translate
observations made through microarray experiment into user-
friendly clinical tests

• Standardization

• Transcriptional profiling methods require fresh or frozen


tissue , type of tissue sampling method may also influence the
profiling results
THANK YOU

You might also like