Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences

BLAST 2 SEQUENCES,
A new tool for comparing protein and nucleotide

sequences
Prepared for
Zunaira Rauf
COMSATS University
Group Head
Salman Khan (046)
Prepared by
Ahmer Shoaib (002)

Ammar Baig (024)
Humayun Asghar (037)
Muhammad Kashif (043)
May 21, 2018
Bioinformatics Page 1
Abstract:
‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine for pair wise
DNA-DNA or protein-protein sequence comparison and is based on the same algorithm and
statistics of local alignments While the standard BLAST program is widely used to search for
homologous sequences in nucleotide and protein databases, one often needs to compare only two
sequences that are already known to be homologous, coming from related species or, e.g.
different isolates of the same virus. In such cases searching the entire database would be
unnecessarily time-consuming. A World Wide Web version of the program can be used
interactively at the NCBI .The resulting alignments are presented in both graphical and text
form.
Contents
Abstract: ...................................................................................................................................................... 2
Background ................................................................................................................................................... 4
What is sequence Alignment? ................................................................................................................. 4
Purpose of Sequence Alignment? ............................................................................................................ 4
Software use for Sequence alignment: ................................................................................................... 4
Types of Sequence alignment: ................................................................................................................. 5
Local alignment ..................................................................................................................................... 5
Global alignment: .................................................................................................................................. 5
Blast: ............................................................................................................................................................. 5
Needleman–Wunsch algorithm ................................................................................................................... 6
Smith–Waterman algorithm ........................................................................................................................ 8
INTRODUCTION ............................................................................................................................................ 9
Algorithm .................................................................................................................................................... 12
Dynamic programming ........................................................................................................................... 13
Heuristic method: ................................................................................................................................... 13
USER DEFINE PARAMETERS ....................................................................................................................... 13
Method of Study:....................................................................................................................................... 14
Limitations of the Study: .......................................................................................................................... 14
RESULT AND DISCUSSION .......................................................................................................................... 14
Conclusion ................................................................................................................................................. 16
REFERENCE.................................................................................................................................................. 16
BLAST 2 SEQUENCES,
A new tool for comparing protein and nucleotide sequences
Background
What is sequence Alignment?
In bioinformatics, a sequence alignment is a way of arranging the sequences
of DNA, RNA, or protein to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships between the
sequences. Aligned sequences of nucleotide or amino acid residues are typically
represented as rows within a matrix. Gaps are inserted between the residues so
that identical or similar characters are aligned in successive columns.
Purpose of Sequence Alignment?

The purpose of sequence alignment was of arranging the sequences of DNA,
RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences.
Software use for Sequence alignment:

 Clustal-W - the famous Clustal-W multiple alignment program
 Clustal-X - provides a window-based user interface to the Clustal-W multiple
alignment program
 GOR – protein secondary structure prediction
 LALIGN-pairwise alignment
Types of Sequence alignment:
There are two types of sequence alignment
 Local alignment
 Global alignment
Local alignment
In a local alignment, you try to match your query with a substring (a portion) of
your subject
Global alignment:
In a global alignment you perform an end to end alignment with the subject
Blast:
Basic Local Alignment Search Tool
BLAST finds regions of similarity between biological sequences. The program

compares nucleotide or protein sequences to sequence databases and calculates the
statistical significance.
Two types of sequences can be entered
Nucleotide BLAST Protein BLAST
BLAST compares protein and nucleotide sequences much faster than

dynamic programming methods such as Smith-Waterman and Needleman-
Wunsch .
Needleman–Wunsch algorithm
The Needleman–Wunsch algorithm is an algorithm used

in bioinformatics to align protein or nucleotide sequences. It was one of the first
applications of dynamic programming to compare biological sequences. The
algorithm was developed by Saul B. Needleman and Christian D. Wunsch and
published in 1970. The algorithm essentially divides a large problem (e.g. the full
sequence) into a series of smaller problems and uses the solutions to the smaller
problems to reconstruct a solution to the larger problem.It is also sometimes
referred to as the optimal matching algorithm and the global
alignment technique. The Needleman–Wunsch algorithm is still widely used for
optimal global alignment, particularly when the quality of the global alignment is
of the utmost importance.
Steps:
1. Constructing the grid
Start the first string in the top of the third column and start the other string at the
start of the third row.
2. Choosing a scoring system
Next, decide how to score each individual pair of letters. Using the example above,
one possible alignment candidate might be:
3. Filling the table:
Start with a zero in the second row, second column. Move through the cells row by
row, calculating the score for each cell. Calculate the candidate scores for each of
the three possibilities:
4. Tracing arrows back to origin

5. Find optimal alignment
Smith–Waterman algorithm
The Smith–Waterman algorithm performs local sequence alignment; that is, for
determining similar regions between two strings of nucleic acid
sequences or protein sequences.
Steps:
1. Determine the substitution matrix and the gap penalty scheme
A substitution matrix assigns each pair of bases or amino acids a score for
match or mismatch. Usually matches get positive scores, whereas
mismatches get relatively lower scores.
2. Initialize the scoring matrix
3. Scoring.
Score each element from left to right, top to bottom in the matrix,
considering the outcomes of substitutions (diagonal scores) or adding gaps
(horizontal and vertical scores).
4. Traceback.
Starting at the element with the highest score, traceback based on the source
of each score recursively, until 0 is encountered. The segments that have the
highest similarity score based on the given scoring system is generated in
this process. To obtain the second best local alignment, apply the traceback
process starting at the second highest score outside the trace of the best
alignment.
Comparison with the Needleman–Wunsch algorithm
Three main differences are:
Smith–Waterman algorithm Needleman–Wunsch algorithm
First row and first column are subject to gap

Initialization First row and first column are set to 0
penalty
Scoring Negative score is set to 0 Score can be negative
Begin with the highest score, end Begin with the cell at the lower right of the
Traceback
when 0 is encountered matrix, end at top left cell
INTRODUCTION
‘BLAST 2 SEQUENCES’, a new BLAST‐based tool for aligning two protein or

nucleotide sequences, is described.
While the standard BLAST program is widely used to search for homologous
sequences in nucleotide and protein databases, one often needs to compare only
two sequences that are already known to be homologous, coming from related
species
The BLAST 2 SEQUENCES program finds multiple local alignments between two
sequences, allowing the user to detect homologous protein domains or internal
sequence duplications.
BLAST 2 SEQUENCES has been very useful for the comparison of homologous
genes from complete microbial genomes.
Using BLAST 2 SEQUENCES for nucleotide sequence comparison of different
strains or isolates of the same virus offers a convenient strategy to study the
genome variations and evolutionary events, such as substitutions, insertions and
deletions.
Search BLAST 2 on a google
Input
Output
Algorithm
‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine
for pairwise DNA‐DNA or protein‐protein sequence comparison and is
based on the same algorithm and statistics of local alignments. The BLAST
2.0 algorithm generates a gapped alignment by using dynamic
programming to extend the central pair of aligned residues. The heuristic
methods confine the alignments to a predefined region of the path graph. A
performance evaluation of the new gapped BLAST algorithm and its
comparison to that of the original ungapped BLAST and the
Smith‐Waterman algorithm have been presented
Dynamic programming
Dynamic programming (also known as dynamic optimization) is a
method for solving a complex problem by breaking it down into a collection
of simpler sub problems, solving each of those sub problems just once, and
storing their solutions.
Heuristic method:
In heuristic method we made an educated guess
USER DEFINE PARAMETERS

The `BLAST 2 SEQUENCES' interface allows the user to perform a series of searches
with various parameters. The program can align hundreds of sequences within a
reasonable time. Different scoring matrices are provided for protein-protein comparisons;
each matrix is most sensitive at ending similarities at a specific evolutionary distance.
The default matrix, BLOSUM62 is generally considered to be the best for a wide variety
of distances.
Changing the gap existence and extension penalties may change the number and length of
gaps in an alignment. There is no analytical formula that determines the `best' gap values
to use, so that one may wish to experiment with values in order to explore more of the
alignment `space'.
BLAST initiates extensions between sequences using a word, meaning that alignments
need to share similarity along at least a `word size' number of letters. The default value is
11 for nucleotide-nucleotide alignments and an exact match of `word size' nucleotides
between the two sequences is required; three is the default value for protein-protein
matches and the sequences may merely be similar along the words, according to the
matrix selected. If better sensitivity is needed, one should use a smaller value for the
`word size', but it is restricted to the range 7-20 for nucleotide comparisons and 2-3 for
proteins.
Method of Study:
Data for this study was collected from some websites or different research papers
Limitations of the Study:
This study is limited to the BLAST and BLAST 2 A new tool for comparing protein and
nucleotide sequences
RESULT AND DISCUSSION

The result starts with the values of parameters that were selected to produce
the results. The user can recalculate the alignments by changing the
parameters from this page and clicking on the `Align' button, which provides
a fast and convenient way of comparing the results for different values of
parameters. It might be useful to compare the results of protein-protein
alignments for different scoring matrices or change the expectation value.
The graphical representation shows schematically the set of gapped local
alignments found between the two sequences with gaps shown in red.
Clicking on the graphics brings the user to the detailed representation of that
alignment .If the sequences are taken from the GenBank database and
defined by gi or an accession number the hot link to the Entre query system
will be provided. The last part of the report shows the parameters of the
calculations and the summary of BLAST statistics. The report is the same as
the regular BLAST report, providing an easy way to compare the results of
the alignment of two sequences with the results of an entire database search.
Conclusion
• (Original BLAST) Only ungapped alignments sometimes combined together.
• (BLAST2) Extend the HSPs using gapped alignment
REFERENCE
References
[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990)
Basic local alignment search tool. J. Mol. Biol. 215, 403-410.
[2] Altschul, S.F., Madden, T.L., A.A., Zhang, J., Zhang, Z., Miller, W. and
Lipman, D.J. Res. 25, 3389-3402.
[3] Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular
subsequences. J. Mol. Biol. 147, 195-197.
[4] Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the
search for similarities in the amino acid sequences of two proteins. J. Mol. Biol.
48, 443-453.
[5] Tomb, J.-F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G.,
Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A.,
Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E.F., Peterson, S., Loftus, B.,
Richardson, D., Dodson, R., Khalak, H.G., Glodek, A., McKenney.
[6] Alm, R.A., Ling, L.-S.L., Moir, D.T., King, B.L., Brown, E.D., Doig, P.C.,
Smith, D.R., Noonan, B., Guild, B.C., deJonge, B.L., Carmel, G., Tummino, P.J.,
Caruso, A., UriaNickelsen, M., Mills, D.M., Ives, C., Gibson, R.

Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences

Uploaded by

Copyright:

Available Formats

Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences

Uploaded by

Copyright:

Available Formats

BLAST 2 SEQUENCES,

A new tool for comparing protein and nucleotide

Ahmer Shoaib (002)

May 21, 2018

A new tool for comparing protein and nucleotide sequences

Purpose of Sequence Alignment?

Software use for Sequence alignment:

BLAST finds regions of similarity between biological sequences. The program

Nucleotide BLAST Protein BLAST

BLAST compares protein and nucleotide sequences much faster than

The Needleman–Wunsch algorithm is an algorithm used

4. Tracing arrows back to origin

Smith–Waterman algorithm Needleman–Wunsch algorithm

First row and first column are subject to gap

Scoring Negative score is set to 0 Score can be negative

‘BLAST 2 SEQUENCES’, a new BLAST‐based tool for aligning two protein or

Search BLAST 2 on a google

USER DEFINE PARAMETERS

Limitations of the Study:

RESULT AND DISCUSSION

• (BLAST2) Extend the HSPs using gapped alignment

You might also like