Sequence alignment methods
Dr. P. Borah
Professor & Head, Dept. of Animal Biotechnology
College of Veterinary Science
AAU, Khanapara, Guwahati-781022
Two types of alignment:
• Global alignment
• Local alignment
Global alignment
• Compares sequences and gives the
best overall alignment.
• May fail to find the best local region of
similarity (such as a shared motif)
among distantly related sequences.
Needleman-Wunsch algorithm
Local Alignment
Finds regions of un-gapped sequence with a
high degree of similarity.
Better at finding motifs, especially for
sequences that are different overall.
Will return only the best matching segment for
a given pair of sequences.
Smith-Waterman algorithm
Dotplots
Dot plots are two dimensional graphs,
showing a comparison of two sequences.
The two axes of the graph represent the two
sequences being compared.
Every region of the sequence is compared to
every region of the other sequence.
A T G C A T G C
A * *
T * *
G * *
C * *
A * *
T * *
G * *
C * *
Dotplots
Dotplotting is the best way to see all of the
structures in common between two sequences.
Dotplotting can also be used to view repeated
structures or inverted repeats in a single sequence.
This is accomplished by comparing a sequence to
itself.
Dotplotting helps recognize large regions of
similarity. In most cases it is not sensitive enough to
see small structures, like promoters.
Needleman-Wunsch algorithm
for
Global Alignment
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 Mismatch = 0
Gap = -1
A
C Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1
T -2
A -3
T -4
G -5
C -6
A -7
G -8
Needleman Wunsch
C -9
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1
T -2
Z + M/MM X + Gap
A -3
T -4
G -5 Y + Gap
C -6
A -7
G -8
Needleman Wunsch
C -9
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1
T -2
A -3 Z + M/MM X + Gap
T -4
G -5
Y + Gap
C -6
A -7
G -8
C -9 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1 0 -1
T -2
A -3 Z + M/MM X + Gap
T -4
G -5
Y + Gap
C -6
A -7
G -8
C -9 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1 0 -1 -2 -3 -4 -5 -6 -7
T -2 0 1 1 0 -1 -2 -3 -4 -5
A -3 -1 1 1 1 0 0 -1 -2 -3 Z + M/MM X + Gap
T -4 -2 0 2 1 1 0 0 -1 -1
G -5 -3 -1 1 3 2 1 1 0 -1
Y + Gap
C -6 -4 -2 0 2 4 3 2 2 1
A -7 -5 -3 -1 0 3 5 4 3 2
G -8 -6 -4 -2 0 2 4 6 5 4
C -9 -7 -5 -3 -1 1 3 5 7 6 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
ATATGCAGC-
A A T G C A G C T Match = 1
Mismatch = 0
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Gap = -1
A -1 1 0 -1 -2 -3 -4 -5 -6 -7
T -2 0 1 1 0 -1 -2 -3 -4 -5
Z + M/MM X + Gap
A -3 -1 1 1 1 0 0 -1 -2 -3
T -4 -2 0 2 1 1 0 0 -1 -1
Y + Gap
G -5 -3 -1 1 3 2 1 1 0 -1
C -6 -4 -2 0 2 4 3 2 2 1
A -7 -5 -3 -1 0 3 5 4 3 2
G -8 -6 -4 -2 0 2 4 6 5 4 Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6 algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
ATATGCAGC-
A A T G C A G C T Match = 1
Mismatch = 0
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Gap = -1
A -1 1 0 -1 -2 -3 -4 -5 -6 -7
T -2 0 1 1 0 -1 -2 -3 -4 -5
Z + M/MM X + Gap
A -3 -1 1 1 1 0 0 -1 -2 -3
T -4 -2 0 2 1 1 0 0 -1 -1
Y + Gap
G -5 -3 -1 1 3 2 1 1 0 -1
C -6 -4 -2 0 2 4 3 2 2 1
A -7 -5 -3 -1 0 3 5 4 3 2
G -8 -6 -4 -2 0 2 4 6 5 4 Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6 algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
Match = 1
ATATGCAGC- Mismatch = 0
Gap = -1
Matches = 8
Mismatch = 0 M/MM Gap
Indel = 2
Total score = 8 x 1 + 0 + 2 x (-1) = 8-2 = 6 Gap
Needleman Wunsch
algorithm
Total score = 8 x 1 + 0 + 2 x (-1) = 8-2 = 6
A A T G C A G C T
0 -1 -2 -3 -4 -5 -6 -7 -8 -9
A -1 1 0 -1 -2 -3 -4 -5 -6 -7
T -2 0 1 1 0 -1 -2 -3 -4 -5
A -3 -1 1 1 1 0 0 -1 -2 -3
T -4 -2 0 2 1 1 0 0 -1 -1
G -5 -3 -1 1 3 2 1 1 0 -1
C -6 -4 -2 0 2 4 3 2 2 1
A -7 -5 -3 -1 0 3 5 4 3 2
G -8 -6 -4 -2 0 2 4 6 5 4
Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6
algorithm
Smith – Waterman Algorithm for
Local Alignment
It is a modification of Needleman-Wunch Algorithm
Modifications:
1. No negative values are allowed.
2. Negative values are replaced by zero.
3. Back tracking is initiated from the largest value in the last
column or last row.
4. Find the path for that value.
5. Then align the sequences based on the directions of the
arrows as done in Needleman-Wunch algorithm.
M/MM Gap
Xi,j = The largest value of - Gap
(i) Xi-1,j + gap, or
(ii) Xi,j-1 + gap, or
(iii) Xi-1, j-1 + (Match or Mismatch)
i and j are column and row numbers.
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 77 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2
AATCGA_TCGG
TCAAGTC
Match = 5 x 2 = 10
Mismatch = 1 x (-1) = -1
Gap = 1 x (-2) = -2
Total score = +7
DOT PLOT METHOD
It’s a graphical method for comparing two
biological sequences and identifying regions
of close similarity after sequence alignment.
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
A x x x
G x x x
T x x x
C x x
A x x x
G x x x
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Gap in
A x x x Seq 1
G x x x
T x x x
C x x
A x x x Break
G x x x
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Indel
A x x x
G x x x
T x x x
C x x
A x x x Substitution
G x x x
Seq 1: TAATCGATCGG TAATCGA –TCGG Final
alignment
Seq 2: TTCGAGTCAG TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Indel
A x x x
G x x x
T x x x
C x x
A x x x Substitution
G x x x
Analysis of dot plot matrix
Thank you