BB221 AmitDatta DNA Sequencing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

First Generation DNA Sequencing Methods

The principles of first-generation DNA sequencing revolve around methods developed in the 1970s, specifically
Sanger sequencing (dideoxy sequencing) and Maxam-Gilbert sequencing. Sanger sequencing became the most widely
used and is still considered the foundation of DNA sequencing technologies.
Sanger Sequencing (Chain-Termination Method):
Principles:
1. DNA Template and Primer: A single-stranded DNA template is used along with a short complementary
primer to initiate synthesis.
2. DNA Polymerase: This enzyme extends the primer by adding nucleotides complementary to the template
strand.
3. Dideoxynucleotide Triphosphates (ddNTPs):
o In addition to the regular deoxynucleotide triphosphates (dNTPs), the reaction includes small amounts
of ddNTPs.
o ddNTPs lack a 3’-OH group, causing DNA synthesis to terminate when incorporated.
4. Random Termination: Each reaction contains one type of ddNTP (A, T, G, or C). DNA synthesis terminates
randomly at each occurrence of that base.
5. Fragment Separation:
o The resulting DNA fragments of varying lengths are separated by size using polyacrylamide gel
electrophoresis or capillary electrophoresis.
6. Detection:
o Originally, radiolabeled or fluorescently labeled ddNTPs were used to visualize terminated fragments.
o Modern versions use fluorescent labels detected by a laser in automated sequencers.
Output:
• The sequence is determined by analyzing the order of fragment sizes corresponding to the ddNTPs that caused
termination.
Dideoxy nucleotides are similar to regular, or deoxy, nucleotides, but with one key difference: they lack a hydroxyl
group on the 3’ carbon of the sugar ring. In a regular nucleotide, the 3’ hydroxyl group acts as a “hook," allowing a new
nucleotide to be added to an existing chain.
Once a dideoxy nucleotide has been added to the chain, there is no hydroxyl available and no further nucleotides can be
added. The chain ends with the dideoxy nucleotide, which is marked with a particular color of dye depending on the
base (A, T, C or G) that it carries.

Advantages of First-Generation Sequencing


1. High Accuracy:
o Sanger sequencing provides highly reliable results with error rates as low as 0.001%.
2. Long Read Lengths:
o It can sequence up to 800–1000 base pairs in a single run, allowing easier assembly of sequences.
3. Gold Standard for Validation:
o It is still used to confirm results from newer technologies due to its precision.
4. Well-Established Method:
o It has a long history of use and is supported by extensive literature and well-developed protocols.
5. Broad Applicability:
o Suitable for a wide range of applications, including small-scale sequencing and targeted sequencing
projects.
Disadvantages of First-Generation Sequencing
1. Low Throughput:
o It processes one DNA fragment at a time, making it inefficient for large-scale projects like whole-
genome sequencing.
2. Time-Consuming:
o The manual preparation and gel electrophoresis steps are labor-intensive and slow.
3. Expensive for Large-Scale Use:
o The cost per base is higher than next-generation sequencing methods for large datasets.
4. Requires Large Input DNA:
o It needs relatively high-quality and quantity of DNA compared to newer methods.
5. Limited Automation:
o While automated sequencers exist, they are not as advanced or scalable as next-generation sequencing
systems.
First-generation sequencing methods, particularly Sanger sequencing, are highly accurate and versatile but are not
efficient for high-throughput or large-scale projects, which has led to their partial replacement by next-generation
technologies.
The automation and modernization of Sanger sequencing have significantly improved its efficiency, accuracy, and
usability while reducing labor intensity. Below are key advancements that have shaped modern Sanger sequencing:
1. Capillary Electrophoresis (CE):
• Replacement of Gel Electrophoresis:
Traditional polyacrylamide gel electrophoresis was replaced by capillary electrophoresis, where DNA
fragments migrate through a thin capillary filled with a polymer.
• Advantages:
o Faster separation of DNA fragments.
o Higher resolution for accurate base calling.
o Fully automated sample loading and running.
2. Fluorescent Dye Terminators:
• Improvement Over Radioactive Labeling:
o Fluorescently labeled dideoxynucleotides (ddNTPs) replaced radioactive markers, eliminating the
need for hazardous materials.
• Simultaneous Reactions:
o Different fluorescent dyes are used for each nucleotide (A, T, G, C), allowing all four reactions to
occur in one tube instead of four separate reactions.
• Benefits:
o Simplifies the process.
o Enables automated detection by a laser system.

Sequencing by Synthesis
Sequencing by synthesis (SBS) is a DNA sequencing method in which the DNA sequence is determined during strand
synthesis. This approach is widely used in modern next-generation sequencing (NGS) technologies like Illumina
sequencing. It involves detecting signals emitted as nucleotides are incorporated into a growing DNA strand by DNA
polymerase.
The key parts are highly similar for all embodiments of SBS and includes
(1) amplification of DNA (to enhance the subsequent signal) and attach the DNA to be sequenced to a solid
support,
(2) generation of single stranded DNA on the solid support,
(3) incorporation of nucleotides using an engineered polymerase and
(4) real-time detection of the incorporation of nucleotide.
The steps 3-4 are repeated and the sequence is assembled from the signals obtained in step 4.
This principle of real-time sequencing-by-synthesis has been used for almost all massive parallel sequencing
instruments, including 454, PacBio, IonTorrent, Illumina and MGI.
Pyrosequencing
Pyrosequencing is a sequencing-by-synthesis (SBS) technique that detects DNA sequences by measuring light emitted
during nucleotide incorporation. Developed in the late 1990s, it is based on enzymatic reactions that release
pyrophosphate (PPi) as nucleotides are incorporated into the DNA strand.
Principles:
 DNA Synthesis:
• A single-stranded DNA template is hybridized with a primer to initiate DNA synthesis.
• DNA polymerase extends the primer by adding complementary nucleotides.
 Release of Pyrophosphate (PPi):
• When a nucleotide is incorporated into the growing DNA strand, PPi is released.
 Conversion to Light Signal:
• PPi is converted to adenosine triphosphate (ATP) by ATP sulfurylase in the presence of adenosine 5'
phosphosulfate (APS).
• The ATP drives a reaction catalyzed by luciferase, producing light proportional to the amount of PPi
released.
 Sequential Addition of Nucleotides:
• Nucleotides are added one at a time in a predefined order.
• Only the nucleotide complementary to the template generates a signal.
• Apyrase degrades unused nucleotides before the next addition.
 Detection and Analysis:
• The emitted light is detected by a charge-coupled device (CCD) camera or photodetector, and the signal
intensity reflects the number of incorporated nucleotides.
Next-generation sequencing (NGS) or Second-generation sequencing
Next-Generation Sequencing (NGS) using massively parallel sequencing by synthesis (SBS) is a high-throughput
method that enables the simultaneous sequencing of millions of DNA fragments. This approach forms the backbone of
popular platforms like Illumina, which have revolutionized genomics by making large-scale sequencing faster, more
accurate, and cost-effective.
Key Concepts in NGS Using Massively Parallel SBS
1. Massive Parallelization:
o Millions of DNA fragments are sequenced at the same time on a single platform.
o This is achieved by immobilizing DNA fragments on a solid surface (e.g., flow cell) in spatially
separated clusters.
2. Sequencing by Synthesis (SBS):
o DNA polymerase synthesizes a complementary strand to the DNA template.
o Fluorescently labeled nucleotides with reversible terminators are added one at a time.
o The sequence of the template is determined by detecting fluorescence emitted during nucleotide
incorporation.

Platforms Using Massively Parallel SBS


1. Illumina: The most popular SBS-based platform, offering high accuracy and scalability.
2. Ion Torrent: Although not fluorescence-based, it uses a similar massively parallel sequencing approach with
ion detection.
Key Principles of Illumina Sequencing

1. Fragmentation and Library Preparation:


o DNA is fragmented into smaller pieces (200–600 bp).
o Adapters containing primer binding sites, indices (for multiplexing), and flow cell binding sites are
ligated to both ends of the DNA fragments.
2. Cluster Generation (Bridge Amplification):
o The prepared DNA library is loaded onto a flow cell coated with oligonucleotides complementary to
the adapter sequences.
o Each DNA fragment hybridizes to the oligonucleotides on the flow cell.
o Bridge amplification: DNA bends to form a bridge, and the attached oligonucleotides serve as primers
for amplification.
o The process creates clusters of identical DNA fragments, each originating from a single template
molecule.
3. Sequencing by Synthesis (SBS):
o Reversible Dye Terminators:
 Four fluorescently labeled nucleotides (A, T, G, C), each with a reversible terminator, are
added to the flow cell.
 DNA polymerase incorporates the complementary nucleotide to the growing DNA strand.
o Signal Detection:
 After incorporation, the fluorescent label is excited by a laser, and the emitted light is
captured by a detector.
 Each nucleotide emits a unique fluorescence signal, indicating its identity.
o Cleavage of Terminators:
 The fluorescent label and reversible terminator group are removed enzymatically or
chemically, allowing the next nucleotide to be added.
o This process repeats cycle-by-cycle, reading the sequence of bases in real time.
4. Massive Parallelization:
o Millions to billions of DNA clusters are sequenced simultaneously, generating massive amounts of
data.
5. Data Analysis:
o The fluorescence signals are converted into base calls by software.
o Reads are aligned to a reference genome or assembled de novo, depending on the application.
Principle of Ion Torrent Sequencing
1. DNA Synthesis and Hydrogen Ion Release:
o During DNA synthesis, DNA polymerase incorporates nucleotides complementary to the template
strand.
o Each incorporation releases a hydrogen ion (H⁺), which lowers the pH of the surrounding solution.
2. pH Detection:
o The change in pH is detected by an ion-sensitive field-effect transistor (ISFET) on a semiconductor
chip.
o Each well on the chip corresponds to a DNA template, and the pH change is proportional to the
number of nucleotides incorporated.
3. Sequential Nucleotide Addition:
o Nucleotides (A, T, G, C) are added one at a time in a predetermined order.
o If the nucleotide is complementary to the template, it is incorporated, and a signal is generated.
o If no nucleotide is incorporated, no pH change occurs.
Third-Generation DNA Sequencing
Third-generation sequencing (TGS) refers to advanced sequencing technologies that allow real-time, long-read
sequencing of individual DNA or RNA molecules without the need for amplification. Unlike earlier methods, TGS
provides highly contiguous sequences, enabling detailed analysis of complex genomic regions, epigenetics, and
transcriptomes.

Key Technologies in Third-Generation Sequencing


1. Single-Molecule Real-Time (SMRT) Sequencing by PacBio
o Developed by Pacific Biosciences.
o Uses zero-mode waveguides (ZMWs) to observe DNA polymerase activity in real time.
2. Nanopore Sequencing by Oxford Nanopore Technologies (ONT)
o Employs protein nanopores embedded in a membrane to detect changes in electrical current as single
DNA or RNA strands pass through the pore.

Principle of SMRT Sequencing


Single-Molecule Real-Time (SMRT) sequencing, developed by Pacific Biosciences (PacBio), is a third-generation
sequencing technology. It sequences individual DNA molecules in real-time, offering long read lengths, high accuracy
through consensus sequencing, and the ability to detect epigenetic modifications.

1. Zero-Mode Waveguides (ZMWs):


o ZMWs are nanophotonic structures that confine light to a small volume, enabling real-time
observation of individual DNA polymerase molecules during sequencing.
2. Sequencing by Synthesis:
o DNA polymerase incorporates fluorescently labeled nucleotides into a growing DNA strand.
o Each nucleotide carries a fluorescent dye attached to its phosphate group.
o Upon incorporation, the dye is cleaved off, leaving the natural DNA strand intact.
3. Real-Time Detection:
o As nucleotides are incorporated, the ZMW detects the fluorescence emitted from the dye, identifying
the nucleotide added.
4. Circular DNA Templates:
o A hairpin loop is added to each end of DNA fragments, forming a circular template.
o This enables multiple passes of the polymerase over the same molecule, increasing accuracy through
consensus reads.
2. Nanopore Sequencing
Nanopore sequencing, developed by Oxford Nanopore Technologies (ONT), is a third-generation sequencing
technology. It directly sequences DNA or RNA molecules by measuring changes in electrical current as nucleotides
pass through a nanopore. This real-time method enables ultra-long reads and detection of epigenetic modifications.

Nanopore Structure:
• A nanopore is a tiny protein-based pore embedded in an electrically resistant membrane.
• An ionic current is established across the pore by applying a voltage.
DNA/RNA Passage:
• Single-stranded DNA or RNA molecules are passed through the nanopore.
• A motor protein unwinds the DNA/RNA and guides it through the nanopore one nucleotide at a time.
Electrical Current Disruption:
• As each nucleotide passes through the pore, it causes a unique disruption in the ionic current.
• These disruptions (current signals) are specific to the nucleotide (A, T, G, C, or U).
Base Calling:
• The changes in current are detected and recorded.
• Advanced algorithms analyze the current patterns to determine the nucleotide sequence.
Overall Comparison

1st Generation 2nd Generation 3rd Generation


Feature
Sequencing Sequencing Sequencing
Illumina (SBS), Roche 454,
Main Techniques Sanger Sequencing PacBio SMRT, Nanopore
SOLiD
Chain termination with Sequencing by synthesis Real-time sequencing of
Principle
dideoxynucleotides (SBS) or ligation single molecules
>10,000 bp; up to 1 Mb
Read Length 700–1,000 bp 50–300 bp
(Nanopore)
Moderate for raw reads
Accuracy High (99.99%) Very High (>99.9%)
(90–95%); HiFi >99.9%
Throughput Low High Moderate
Requires PCR
Sample Preparation Requires cloning or PCR No amplification needed
amplification
Fast (real-time data
Run Time Slow (hours to days) Moderate (hours)
output)
Cost Per Base High Low Moderate
Indirect (e.g., bisulfite Direct (methylation and
Epigenetic Detection Not possible
sequencing) modifications)
Portable options (e.g.,
Portability Large machines Benchtop instruments
Nanopore MinION)
High throughput, cost- Long reads, epigenetic
Key Advantages High accuracy
effective detection, real-time
Labor-intensive, low Short reads limit structural Higher error rates (raw
Key Limitations
throughput variant resolution reads), high cost
Structural variants, de
Small-scale projects, gene Large-scale genome and
Applications novo assembly,
sequencing transcriptome sequencing
epigenetics

Application Comparison

Application 1st Gen 2nd Gen 3rd Gen


Ideal (e.g., single gene
Small-Scale Projects Overkill Suitable but more costly
sequencing)
Whole-Genome Excellent for structural
Limited (low throughput) Most common use
Sequencing variants
De Novo Assembly Limited Challenging (short reads) Best suited (long reads)
Indirect (bisulfite Direct methylation
Epigenetics Not possible
sequencing) detection
Mutation detection in Cancer genomics, Comprehensive genomic
Clinical Applications
specific genes pathogen detection profiling
RNA-seq, differential Full-length isoform
Transcriptomics Basic (cDNA sequencing)
expression analysis sequencing
1st Generation Sequencing: Sanger Sequencing
• Technology: Chain termination method using fluorescently labeled dideoxynucleotides.
• Strengths:
o High accuracy.
o Suitable for small-scale sequencing, such as single genes or targeted regions.
• Limitations:
o Low throughput and labor-intensive.
o Expensive for large-scale sequencing.

2nd Generation Sequencing (Next-Generation Sequencing or NGS)


• Technologies: Illumina (dominant), Roche 454 (discontinued), SOLiD, Ion Torrent.
• Principle: Amplifies DNA fragments, then sequences millions of short reads in parallel.
• Strengths:
o High throughput, low cost per base.
o Ideal for whole-genome sequencing, RNA-seq, and metagenomics.
• Limitations:
o Short read lengths limit resolution of repetitive regions and structural variants.
o Requires complex sample preparation, including PCR amplification, which can introduce biases.

3rd Generation Sequencing


• Technologies: PacBio SMRT (Single-Molecule Real-Time) sequencing, Oxford Nanopore.
• Principle: Directly sequences single DNA or RNA molecules in real-time, without amplification.
• Strengths:
o Ultra-long reads enable resolution of structural variants and de novo genome assembly.
o Detects epigenetic modifications directly.
o Real-time data output for rapid analysis.
• Limitations:
o Higher initial cost and moderate throughput compared to 2nd gen.
o Raw reads have higher error rates, but accuracy improves with consensus sequencing (e.g., HiFi
reads).

You might also like