GATKwr12 3 IndelRealignment PDF
GATKwr12 3 IndelRealignment PDF
GATKwr12 3 IndelRealignment PDF
Indel-based Realignment
Raw Reads
111 Analysis-Ready Var. Calling 111 Analysis-Ready SNPs
Reads HC in ERC mode Variants & Indels
BWA mem
Genotype
Mark Duplicates Refinement
Variant
& Sort (Picard) Joint Genotyping
Annotation
Ref seq A G C T A G G G T C A G C T A G G G T C
Sample seq A G C T A G G G T C A G C G G T C
TC
T
Inser&on Dele&on
The problem we want to fix
Alignment by BWA
Several consecu3ve
“SNPs” only found on
reads ending on the
right of the
homopolymer
Several consecu3ve
“SNPs” only found on
reads ending on the
le; of the
homopolymer 7bp “T”
homopolymer run
A;er realignment
Adding a
1-bp inser3on
brings sanity to
the en3re
alignment
Why does this happen?
How does the realignment algorithm work?
1. Find the best alternate consensus sequence that, together with the
reference, best fits the reads in a pile (maximum of 1 indel)
2. Score for alternate consensus = total sum of quality scores of mismatching bases
➔ IndelRealigner
RealignerTargetCreator
Known Sites
Before Aier
Old data
(lower quality)
New data
(higher quality)
DePristo, M., Banks, E., Poplin, R. et. al, A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Gen.
Can I see the effects of realignment?
-> Can grep for realigned regions and view in genome browser (IGV)
20GAVAAXX100126:1:67:10041:180738 99 20 10011431 70 87M1D14M = 10011720 390
TTAAATGTGTTTATCTATTGTTCTACTATTCAGTTACCTGATTATAAAATCAAAGATTATTTCATGAAACTCAGTACCCCTTCAGGGAAAAAAAAA
AAAAT
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
HHHHHHHHHGGGGGGGG X0:i:1 X1:i:0 MC:Z:101M OC:Z:101M PG:Z:MarkDuplicates RG:Z:20GAV.1 XG:i:0 AM:i:37
NM:i:1 SM:i:37 XM:i:1 XO:i:0
BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@cccddc``a`^\[Y MQ:i:60 XT:A:
Is realignment s8ll necessary with latest soiware?
• BUT poten8al improvement for Base Quality Score Recalibra8on when run
on realigned BAM files (ar8factual SNPs are replaced with real indels).
Raw Reads
111 Analysis-Ready Var. Calling 111 Analysis-Ready SNPs
Reads HC in ERC mode Variants & Indels
BWA mem
Genotype
Mark Duplicates Refinement
Variant
& Sort (Picard) Joint Genotyping
Annotation
Further reading
h`p://www.broadins8tute.org/gatk/guide/best-prac8ces
h`p://www.broadins8tute.org/gatk/guide/ar8cle?id=38
h`ps://www.broadins8tute.org/gatk/gatkdocs/
org_broadins8tute_gatk_tools_walkers_indels_IndelRealigner.php
h`ps://www.broadins8tute.org/gatk/gatkdocs/
org_broadins8tute_gatk_tools_walkers_indels_RealignerTargetCreator.php