Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
Prepared for
Zunaira Rauf
COMSATS University
Group Head
Salman Khan (046)
Prepared by
Bioinformatics Page 1
Abstract:
‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine for pair wise
DNA-DNA or protein-protein sequence comparison and is based on the same algorithm and
statistics of local alignments While the standard BLAST program is widely used to search for
homologous sequences in nucleotide and protein databases, one often needs to compare only two
sequences that are already known to be homologous, coming from related species or, e.g.
different isolates of the same virus. In such cases searching the entire database would be
unnecessarily time-consuming. A World Wide Web version of the program can be used
interactively at the NCBI .The resulting alignments are presented in both graphical and text
form.
Bioinformatics Page 2
Contents
Abstract: ...................................................................................................................................................... 2
Background ................................................................................................................................................... 4
What is sequence Alignment? ................................................................................................................. 4
Purpose of Sequence Alignment? ............................................................................................................ 4
Software use for Sequence alignment: ................................................................................................... 4
Types of Sequence alignment: ................................................................................................................. 5
Local alignment ..................................................................................................................................... 5
Global alignment: .................................................................................................................................. 5
Blast: ............................................................................................................................................................. 5
Needleman–Wunsch algorithm ................................................................................................................... 6
Smith–Waterman algorithm ........................................................................................................................ 8
INTRODUCTION ............................................................................................................................................ 9
Algorithm .................................................................................................................................................... 12
Dynamic programming ........................................................................................................................... 13
Heuristic method: ................................................................................................................................... 13
USER DEFINE PARAMETERS ....................................................................................................................... 13
Method of Study:....................................................................................................................................... 14
Limitations of the Study: .......................................................................................................................... 14
RESULT AND DISCUSSION .......................................................................................................................... 14
Conclusion ................................................................................................................................................. 16
REFERENCE.................................................................................................................................................. 16
Bioinformatics Page 3
BLAST 2 SEQUENCES,
Background
What is sequence Alignment?
In bioinformatics, a sequence alignment is a way of arranging the sequences
of DNA, RNA, or protein to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships between the
sequences. Aligned sequences of nucleotide or amino acid residues are typically
represented as rows within a matrix. Gaps are inserted between the residues so
that identical or similar characters are aligned in successive columns.
Bioinformatics Page 4
Types of Sequence alignment:
There are two types of sequence alignment
Local alignment
Global alignment
Local alignment
In a local alignment, you try to match your query with a substring (a portion) of
your subject
Global alignment:
In a global alignment you perform an end to end alignment with the subject
Blast:
Basic Local Alignment Search Tool
Bioinformatics Page 5
Two types of sequences can be entered
Needleman–Wunsch algorithm
Bioinformatics Page 6
Steps:
1. Constructing the grid
Start the first string in the top of the third column and start the other string at the
start of the third row.
2. Choosing a scoring system
Next, decide how to score each individual pair of letters. Using the example above,
one possible alignment candidate might be:
3. Filling the table:
Start with a zero in the second row, second column. Move through the cells row by
row, calculating the score for each cell. Calculate the candidate scores for each of
the three possibilities:
Bioinformatics Page 7
Smith–Waterman algorithm
The Smith–Waterman algorithm performs local sequence alignment; that is, for
determining similar regions between two strings of nucleic acid
sequences or protein sequences.
Steps:
1. Determine the substitution matrix and the gap penalty scheme
A substitution matrix assigns each pair of bases or amino acids a score for
match or mismatch. Usually matches get positive scores, whereas
mismatches get relatively lower scores.
2. Initialize the scoring matrix
3. Scoring.
Score each element from left to right, top to bottom in the matrix,
considering the outcomes of substitutions (diagonal scores) or adding gaps
(horizontal and vertical scores).
4. Traceback.
Starting at the element with the highest score, traceback based on the source
of each score recursively, until 0 is encountered. The segments that have the
highest similarity score based on the given scoring system is generated in
this process. To obtain the second best local alignment, apply the traceback
process starting at the second highest score outside the trace of the best
alignment.
Bioinformatics Page 8
Comparison with the Needleman–Wunsch algorithm
Three main differences are:
Begin with the highest score, end Begin with the cell at the lower right of the
Traceback
when 0 is encountered matrix, end at top left cell
INTRODUCTION
While the standard BLAST program is widely used to search for homologous
sequences in nucleotide and protein databases, one often needs to compare only
two sequences that are already known to be homologous, coming from related
species
The BLAST 2 SEQUENCES program finds multiple local alignments between two
sequences, allowing the user to detect homologous protein domains or internal
sequence duplications.
BLAST 2 SEQUENCES has been very useful for the comparison of homologous
genes from complete microbial genomes.
Bioinformatics Page 9
Using BLAST 2 SEQUENCES for nucleotide sequence comparison of different
strains or isolates of the same virus offers a convenient strategy to study the
genome variations and evolutionary events, such as substitutions, insertions and
deletions.
Bioinformatics Page 10
Input
Bioinformatics Page 11
Output
Algorithm
‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine
for pairwise DNA‐DNA or protein‐protein sequence comparison and is
based on the same algorithm and statistics of local alignments. The BLAST
2.0 algorithm generates a gapped alignment by using dynamic
programming to extend the central pair of aligned residues. The heuristic
methods confine the alignments to a predefined region of the path graph. A
performance evaluation of the new gapped BLAST algorithm and its
comparison to that of the original ungapped BLAST and the
Smith‐Waterman algorithm have been presented
Bioinformatics Page 12
Dynamic programming
Dynamic programming (also known as dynamic optimization) is a
method for solving a complex problem by breaking it down into a collection
of simpler sub problems, solving each of those sub problems just once, and
storing their solutions.
Heuristic method:
In heuristic method we made an educated guess
Bioinformatics Page 13
`word size', but it is restricted to the range 7-20 for nucleotide comparisons and 2-3 for
proteins.
Method of Study:
Data for this study was collected from some websites or different research papers
This study is limited to the BLAST and BLAST 2 A new tool for comparing protein and
nucleotide sequences
Bioinformatics Page 14
the regular BLAST report, providing an easy way to compare the results of
the alignment of two sequences with the results of an entire database search.
Bioinformatics Page 15
Conclusion
• (Original BLAST) Only ungapped alignments sometimes combined together.
REFERENCE
References
[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990)
Basic local alignment search tool. J. Mol. Biol. 215, 403-410.
[2] Altschul, S.F., Madden, T.L., A.A., Zhang, J., Zhang, Z., Miller, W. and
Lipman, D.J. Res. 25, 3389-3402.
[3] Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular
subsequences. J. Mol. Biol. 147, 195-197.
[4] Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the
search for similarities in the amino acid sequences of two proteins. J. Mol. Biol.
48, 443-453.
[5] Tomb, J.-F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G.,
Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A.,
Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E.F., Peterson, S., Loftus, B.,
Richardson, D., Dodson, R., Khalak, H.G., Glodek, A., McKenney.
[6] Alm, R.A., Ling, L.-S.L., Moir, D.T., King, B.L., Brown, E.D., Doig, P.C.,
Smith, D.R., Noonan, B., Guild, B.C., deJonge, B.L., Carmel, G., Tummino, P.J.,
Caruso, A., UriaNickelsen, M., Mills, D.M., Ives, C., Gibson, R.
Bioinformatics Page 16
Bioinformatics Page 17