Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

BLAST 2 SEQUENCES,

A new tool for comparing protein and nucleotide


sequences

Prepared for
Zunaira Rauf
COMSATS University

Group Head
Salman Khan (046)

Prepared by

Ahmer Shoaib (002)


Ammar Baig (024)
Humayun Asghar (037)
Muhammad Kashif (043)

May 21, 2018

Bioinformatics Page 1
Abstract:

‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine for pair wise

DNA-DNA or protein-protein sequence comparison and is based on the same algorithm and

statistics of local alignments While the standard BLAST program is widely used to search for

homologous sequences in nucleotide and protein databases, one often needs to compare only two

sequences that are already known to be homologous, coming from related species or, e.g.

different isolates of the same virus. In such cases searching the entire database would be

unnecessarily time-consuming. A World Wide Web version of the program can be used

interactively at the NCBI .The resulting alignments are presented in both graphical and text

form.

Bioinformatics Page 2
Contents
Abstract: ...................................................................................................................................................... 2
Background ................................................................................................................................................... 4
What is sequence Alignment? ................................................................................................................. 4
Purpose of Sequence Alignment? ............................................................................................................ 4
Software use for Sequence alignment: ................................................................................................... 4
Types of Sequence alignment: ................................................................................................................. 5
Local alignment ..................................................................................................................................... 5
Global alignment: .................................................................................................................................. 5
Blast: ............................................................................................................................................................. 5
Needleman–Wunsch algorithm ................................................................................................................... 6
Smith–Waterman algorithm ........................................................................................................................ 8
INTRODUCTION ............................................................................................................................................ 9
Algorithm .................................................................................................................................................... 12
Dynamic programming ........................................................................................................................... 13
Heuristic method: ................................................................................................................................... 13
USER DEFINE PARAMETERS ....................................................................................................................... 13
Method of Study:....................................................................................................................................... 14
Limitations of the Study: .......................................................................................................................... 14
RESULT AND DISCUSSION .......................................................................................................................... 14
Conclusion ................................................................................................................................................. 16
REFERENCE.................................................................................................................................................. 16

Bioinformatics Page 3
BLAST 2 SEQUENCES,

A new tool for comparing protein and nucleotide sequences

Background
What is sequence Alignment?
In bioinformatics, a sequence alignment is a way of arranging the sequences
of DNA, RNA, or protein to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships between the
sequences. Aligned sequences of nucleotide or amino acid residues are typically
represented as rows within a matrix. Gaps are inserted between the residues so
that identical or similar characters are aligned in successive columns.

Purpose of Sequence Alignment?


The purpose of sequence alignment was of arranging the sequences of DNA,
RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences.

Software use for Sequence alignment:


 Clustal-W - the famous Clustal-W multiple alignment program
 Clustal-X - provides a window-based user interface to the Clustal-W multiple
alignment program
 GOR – protein secondary structure prediction
 LALIGN-pairwise alignment

Bioinformatics Page 4
Types of Sequence alignment:
There are two types of sequence alignment

 Local alignment
 Global alignment

Local alignment
In a local alignment, you try to match your query with a substring (a portion) of
your subject

Global alignment:
In a global alignment you perform an end to end alignment with the subject

Blast:
Basic Local Alignment Search Tool

BLAST finds regions of similarity between biological sequences. The program


compares nucleotide or protein sequences to sequence databases and calculates the
statistical significance.

Bioinformatics Page 5
Two types of sequences can be entered

Nucleotide BLAST Protein BLAST

BLAST compares protein and nucleotide sequences much faster than


dynamic programming methods such as Smith-Waterman and Needleman-
Wunsch .

Needleman–Wunsch algorithm

The Needleman–Wunsch algorithm is an algorithm used


in bioinformatics to align protein or nucleotide sequences. It was one of the first
applications of dynamic programming to compare biological sequences. The
algorithm was developed by Saul B. Needleman and Christian D. Wunsch and
published in 1970. The algorithm essentially divides a large problem (e.g. the full
sequence) into a series of smaller problems and uses the solutions to the smaller
problems to reconstruct a solution to the larger problem.It is also sometimes
referred to as the optimal matching algorithm and the global
alignment technique. The Needleman–Wunsch algorithm is still widely used for
optimal global alignment, particularly when the quality of the global alignment is
of the utmost importance.

Bioinformatics Page 6
Steps:
1. Constructing the grid
Start the first string in the top of the third column and start the other string at the
start of the third row.
2. Choosing a scoring system
Next, decide how to score each individual pair of letters. Using the example above,
one possible alignment candidate might be:
3. Filling the table:

Start with a zero in the second row, second column. Move through the cells row by
row, calculating the score for each cell. Calculate the candidate scores for each of
the three possibilities:

4. Tracing arrows back to origin


5. Find optimal alignment

Bioinformatics Page 7
Smith–Waterman algorithm
The Smith–Waterman algorithm performs local sequence alignment; that is, for
determining similar regions between two strings of nucleic acid
sequences or protein sequences.

Steps:
1. Determine the substitution matrix and the gap penalty scheme
A substitution matrix assigns each pair of bases or amino acids a score for
match or mismatch. Usually matches get positive scores, whereas
mismatches get relatively lower scores.
2. Initialize the scoring matrix
3. Scoring.
Score each element from left to right, top to bottom in the matrix,
considering the outcomes of substitutions (diagonal scores) or adding gaps
(horizontal and vertical scores).

4. Traceback.

Starting at the element with the highest score, traceback based on the source
of each score recursively, until 0 is encountered. The segments that have the
highest similarity score based on the given scoring system is generated in
this process. To obtain the second best local alignment, apply the traceback
process starting at the second highest score outside the trace of the best
alignment.

Bioinformatics Page 8
Comparison with the Needleman–Wunsch algorithm
Three main differences are:

Smith–Waterman algorithm Needleman–Wunsch algorithm

First row and first column are subject to gap


Initialization First row and first column are set to 0
penalty

Scoring Negative score is set to 0 Score can be negative

Begin with the highest score, end Begin with the cell at the lower right of the
Traceback
when 0 is encountered matrix, end at top left cell

INTRODUCTION

‘BLAST 2 SEQUENCES’, a new BLAST‐based tool for aligning two protein or


nucleotide sequences, is described.

While the standard BLAST program is widely used to search for homologous
sequences in nucleotide and protein databases, one often needs to compare only
two sequences that are already known to be homologous, coming from related
species

The BLAST 2 SEQUENCES program finds multiple local alignments between two
sequences, allowing the user to detect homologous protein domains or internal
sequence duplications.

BLAST 2 SEQUENCES has been very useful for the comparison of homologous
genes from complete microbial genomes.
Bioinformatics Page 9
Using BLAST 2 SEQUENCES for nucleotide sequence comparison of different
strains or isolates of the same virus offers a convenient strategy to study the
genome variations and evolutionary events, such as substitutions, insertions and
deletions.

Search BLAST 2 on a google

Bioinformatics Page 10
Input

Bioinformatics Page 11
Output

Algorithm
‘BLAST 2 SEQUENCES’ is an interactive tool that utilizes the BLAST engine
for pairwise DNA‐DNA or protein‐protein sequence comparison and is
based on the same algorithm and statistics of local alignments. The BLAST
2.0 algorithm generates a gapped alignment by using dynamic
programming to extend the central pair of aligned residues. The heuristic
methods confine the alignments to a predefined region of the path graph. A
performance evaluation of the new gapped BLAST algorithm and its
comparison to that of the original ungapped BLAST and the
Smith‐Waterman algorithm have been presented

Bioinformatics Page 12
Dynamic programming
Dynamic programming (also known as dynamic optimization) is a
method for solving a complex problem by breaking it down into a collection
of simpler sub problems, solving each of those sub problems just once, and
storing their solutions.

Heuristic method:
In heuristic method we made an educated guess

USER DEFINE PARAMETERS


The `BLAST 2 SEQUENCES' interface allows the user to perform a series of searches
with various parameters. The program can align hundreds of sequences within a
reasonable time. Different scoring matrices are provided for protein-protein comparisons;
each matrix is most sensitive at ending similarities at a specific evolutionary distance.
The default matrix, BLOSUM62 is generally considered to be the best for a wide variety
of distances.
Changing the gap existence and extension penalties may change the number and length of
gaps in an alignment. There is no analytical formula that determines the `best' gap values
to use, so that one may wish to experiment with values in order to explore more of the
alignment `space'.
BLAST initiates extensions between sequences using a word, meaning that alignments
need to share similarity along at least a `word size' number of letters. The default value is
11 for nucleotide-nucleotide alignments and an exact match of `word size' nucleotides
between the two sequences is required; three is the default value for protein-protein
matches and the sequences may merely be similar along the words, according to the
matrix selected. If better sensitivity is needed, one should use a smaller value for the

Bioinformatics Page 13
`word size', but it is restricted to the range 7-20 for nucleotide comparisons and 2-3 for
proteins.

Method of Study:
Data for this study was collected from some websites or different research papers

Limitations of the Study:

This study is limited to the BLAST and BLAST 2 A new tool for comparing protein and
nucleotide sequences

RESULT AND DISCUSSION


The result starts with the values of parameters that were selected to produce
the results. The user can recalculate the alignments by changing the
parameters from this page and clicking on the `Align' button, which provides
a fast and convenient way of comparing the results for different values of
parameters. It might be useful to compare the results of protein-protein
alignments for different scoring matrices or change the expectation value.
The graphical representation shows schematically the set of gapped local
alignments found between the two sequences with gaps shown in red.
Clicking on the graphics brings the user to the detailed representation of that
alignment .If the sequences are taken from the GenBank database and
defined by gi or an accession number the hot link to the Entre query system
will be provided. The last part of the report shows the parameters of the
calculations and the summary of BLAST statistics. The report is the same as

Bioinformatics Page 14
the regular BLAST report, providing an easy way to compare the results of
the alignment of two sequences with the results of an entire database search.

Bioinformatics Page 15
Conclusion
• (Original BLAST) Only ungapped alignments sometimes combined together.

• (BLAST2) Extend the HSPs using gapped alignment

REFERENCE
References

[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990)
Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

[2] Altschul, S.F., Madden, T.L., A.A., Zhang, J., Zhang, Z., Miller, W. and
Lipman, D.J. Res. 25, 3389-3402.

[3] Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular
subsequences. J. Mol. Biol. 147, 195-197.

[4] Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the
search for similarities in the amino acid sequences of two proteins. J. Mol. Biol.
48, 443-453.

[5] Tomb, J.-F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G.,
Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A.,
Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E.F., Peterson, S., Loftus, B.,
Richardson, D., Dodson, R., Khalak, H.G., Glodek, A., McKenney.

[6] Alm, R.A., Ling, L.-S.L., Moir, D.T., King, B.L., Brown, E.D., Doig, P.C.,
Smith, D.R., Noonan, B., Guild, B.C., deJonge, B.L., Carmel, G., Tummino, P.J.,
Caruso, A., UriaNickelsen, M., Mills, D.M., Ives, C., Gibson, R.

Bioinformatics Page 16
Bioinformatics Page 17

You might also like