Sorting by Reversals: Bogdan Pasaniuc
Sorting by Reversals: Bogdan Pasaniuc
Sorting by Reversals: Bogdan Pasaniuc
Bogdan Pasaniuc
Dept. of Computer Science & Engineering
Overview
Mouse (X chrom.)
Unknown ancestor
Human (X chrom.)
What is the evolutionary path ? What is the ancestor chromosome? Chromosomes lists of genes permutation
(1 2 3 4 5 6 7) (1 4 3 2 5 6 7) (1 2 3 4 5 6 7) (1 5 6 2 3 4 7) (1 2 3 4 5 6 7) (1 2 3 4 5 2 3 4 6 7)
Inversions
Known as reversals The most common Most often reflect the differences between and within species
What is the minimum number of reversals required to transform one perm. into another? Reversal distance good approx. for evolutionary distance
Reversals
1
3
9 8 10
Genes (blocks)
7 6
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Reversals
1
3 9
8 4
10
7 6
1, 2, 3, 8, 7, 6, 5, 4, 9, 10
Reversals
1
3 9
8 4
10
Breakpoints
7 6
1, 2, 3, 8, 7, 6, 5, 4, 9, 10
The values i i+1are not consecutive If | i - i+1| = 1 then the values i i+1are adjacent Introduce 0 = 0 , n+1 = n+1
A reversal affects the breakpoints only at its endpoints Any reversal can remove or induce at most 2 bkpts.
Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip)
Quality of approximation
Lemma1: Every permutation with a decreasing strip has a reversal that removes one breakpoint.
Proof:
consider the decreasing strip with i being the smallest i -1 must be in an increasing strip that lies to the left or right
Lemma2: has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips has a reversal that removes two bkpts.
Proof: consider the decreasing strip with i being the smallest increasing strip must be to the left. i
consider the decreasing strip with j being the largest decreasing strip containing j +1 must be to the right. j
must lie in i if it doesnt then oi has the decreasing strip that contains must lie in if it doesnt then o has the decreasing strip that contains
j j i
j j
Fact 2. i = j
If i - j 0 then - if i - j contains an increasing strip oj has a decreasing strip - if i - j contains an decreasing strip oi has a decreasing strip
If has a decreasing strip at most () -1 reversals If has no decreasing strip every reversal induces a decreasing strip after one step we can apply lemma3 at most () reversals
Runtime
#of steps O(n).
At each step we need to analyze
n 2
reversalsO(n2).
Total runtime = O(n3). analyze only reversals that remove bkpts O(n2).
Signed permutations:
reversals change the sign:
(1,2,3,4,5,6,7,8,9,10) (1,2,3,-8,-7,-6,-5,-4,9,10)
Problem: Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm.
polynomial algorithm (Hannenhalli&Pevzner 95)
relies on several intermediary constructions these constructions have been simplified first completely elementary treatment of the problem (Bergeron 05)
Oriented pair a pair of consecutive integers with different signs (0,3,1,6,5,-2,4,7) o.p. (3,-2) and (1,-2). o.p. reversals that create consecutive integers (3,-2) : (0,3,1,6,5,-2,4,7) (0,3,2,-5,-6,-1,4,7) (1,-2) : (0,3,1,6,5,-2,4,7) (0,3,-5,-6,-1,-2,4,7)
Oriented reversal: reversal that creates consecutive integers Score of a reversal: # of oriented pairs it creates.
Algorithm1: As long as has an oriented pair, choose the oriented reversal that has the maximal score.
output will be a permutation with positive elements. 0 and n+1 are positive; if there is a negative element there exists an o.p.
Claim1: If Alg1 applies k reversals to , yielding then d() = d() + k.
Sorting positive perms.: - signed perm. with positive elements - circular order: 0 successor of n+1. - reduced if it does not contain consecutive elements. framed interval in : i j+1 j+2 j+k-1i+k s.t. i < j+1 j+2 j+k-1 < i+k (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) (0 2 5 4 3 6 1 7 ) hurdle a framed int. that contains no shorter framed int.
Idea: create oriented pairs and then apply Algorithm1 Operations on Hurdles: Hurdle Cutting: i j+1 j+2 i+1j+k-1i+k
(0 1 4 3 2 5 ) (0 -3 -4 -1 2 5 )
Simple hurdle if cutting it decreases the # of hurdles Super hurdles if cutting it increases the # of hurdles
Algorithm2: has 2k hurdles merge any two non-consecutive hurdles has 2k+1 hurdles cut one simple hurdle (if it has none merge any two non-consecutive)
(0 2 1 5 6 9 10 7 8 11 12 4 3 13 ) arcs
Every oriented vertex corresponds to an oriented pair. Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1. T= #oriented vertices. U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v
Oriented component if it contains an oriented v Safe reversal does not create new unoriented components.
J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. 1995. A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory. 2005 A. Caprara. Sorting by reversals is difficult. 1997
S. Hannenhalli and Pavel Pevzner. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999