29
1 Michal Ozery-Flato and Ron Shamir 3/2 An ( log( )) A lgorithm for S orting by ReciprocalT ranslocations On n

1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

1

Michal Ozery-Flato and Ron Shamir

3/ 2An ( log( )) Algorithm

f or Sorting by

Reciprocal Translocations

O n n

2

The Genomic Sorting Problem

HOW?

3

Overview

• Preliminaries• Reduction to a simpler case• The main algorithm (reduced case)

• Preliminaries• Reduction to a simpler case• The main algorithm (reduced case)

4

Genome Modeling

+1+2+3+4

+1 -2+3 +5

-4+6+7

5

Genome Modeling

+1 -2+3 +5

-4+6+7

-4-3-2-1

Chromosome flip

6

Reciprocal Translocations

• Exchange non-empty ends between two chromosomes

Prefix-prefix

Prefix-postfix

X1 X2

Y1 Y2X1 X2

Y1 Y2X1 X2

Y1 Y2-Y1

-X2

7

Sorting by Reciprocal Translocations

• Tails {(1, 2,-4), (-3, 5) ,(6,-8,-7,9)} = {1, 4, -3,-5, 6, -9 }

• A B:

– genes(A) = genes(B)

– Tails (A) = Tails(B)

• An O(n3) algorithm (Hannenhalli 96, Bergeron et al. 06)

reciprocal translocations

8

The Cycle Graph

40 41 11 10

31 30 21 20 50 51

60 61 71 70 80 81

cycle graph(A,B)

external

internal

adjacency

#cycles(A,B) =3

A={(4, -1), (-3,-2, 5), (6,-7,8)}B={(1,2,3), (4,5), (6,7,8)}

9

A = (4, -1, -3,-2, 5, 6 -7 ,8) (concatenation of A’s chrs)

The Overlap Graph (with Chromosomes)

edge

chromosome

Overlap graph (A, B, A)

(1,2) (4,5) (2,3) (6,7) (7,8)

40 41 11 10 31 30 21 20 50 51 60 61 71 70 80 81

10

(Connected) Components

Overlap graph (A, B, A)

(1,2) (4,5) (2,3) (6,7) (7,8)

bad component= non-trivial internal

component

trivial component

=adjacency

11

Overview

• Preliminaries• Reduction to a simpler case• The main algorithm (reduced case)

12

The Reciprocal Translocation Distance

• dRT(A,B) = reciprocal translocation distance

• Theorem [Hannenhalli 96, Bergeron et al. 06]: dRT(A,B) = #genes - #chrs - #cycles(A,B) + F(A,B)– F(A,B) = depends on the topology of the bad

components. If there are no bad components then F=0.

13

Reduced Case: No Bad Components

Result 1:

The problem“Sorting by Reciprocal Translocations”can be reduced to the problem“Sorting by Reciprocal Translocations, No Bad Components”in linear time.

14

Reduction’s Main Idea

• Isolation: all bad components are found in one chromosome.

• Goal: eliminate the bad components without creating– Maintain two lists of chromosomes:

• Exactly one minimal bad component• Two or more minimal bad components

– Use prefix-prefix translocations (no sign changes)

15

Overview

• Preliminaries• Reduction to a simpler case• The main algorithm (reduced case)

16

Translocations Defined by External Edges

e = external edge(e) = transforms e into an adjacency

– Increases #cycles(A,B)– May create a bad component

dRT(A,B) = #genes – #chrs – #cycles(A,B) +F(A,B)

1 2

eG

yx 1 2

G(e) e

y x

17

The Main Algorithm1. Mark all edges (except adjacencies) as “unused”,

S, L2. While there is an unused external edge e

a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

“Farward part”

(S)

“Backward part”

(L)

Solution

18

The Main Algorithm

AUnused edges

SL

(1,-5,6) (3,-4,2)1,3,4,5

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

19

The Main Algorithm

AUnused edges

SL

(3-,4-,5,6( )1,2) 3,4,51

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

20

The Main Algorithm

AUnused edges

SL

(1,-5,6) (3,-4,2) 3,4,51

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

21

The Main Algorithm

AUnused edges

SL

(3,6) (1,-5,-4,2)3,541

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

22

The Main Algorithm

AUnused edges

SL

(-2,6) (1,-5,-4,-3)54,31

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

23

The Main Algorithm

AUnused edges

SL

(-2,6) (1,-5,-4,-3)4,31

1. Mark all edges (except adjacencies) as “unused”,S, L

2. While there is an unused external edge e a. Mark e as “used”b. If (e) (FIRST(L)):

Apply (e) to A and APPEND (S, e) 3. If all the edges are used return (S,L)4. While all the unused edges are internal

Undo last translocation and PREPEND(L, POP(S))5. Goto 1

B= {(1,2),(3,4,5,6)} edge (i,i+1) identified by i

24

Implementation of the Algorithm

• Simple O(n2) time implementation• time implementation

using a data structure that:– Maintains a fragmented signed

permutation– Allows one to find an external edge e and

perform the translocation (e) in time

– Based on a data structure by Kaplan & Verbin 05'

3/ 2( log( ))O n n

( log( ))O n n

25

Thank You !

26

Simulating Translocations by Reversals [Hannenhalli & Pevzner]

A translocation can be simulated by:

• A reversal on A, or

• A chromosome flip in A + a reversal on A

10 11 20 21 30 31 40 41 50 51

cycle graph(A,B)

10 11 41 40 31 30 21 20 50 51

27

Working on the overlap graph

• H = overlap graph(A, B, A)

• H is sorted if every component is trivial

• Operations: (v) : a reversal on an oriented external

vertex v (cost = 1) (X) : a flip on chromosome X (cost = 0)

28

H●(v) (two chromosome only)

unoriented edgeoriented edge

chromosome

Hv

unoriented edgeoriented edge

chromosome

H● (v)v unoriented edge

oriented edgechromosome

Hv

29

H●(X)

unoriented edgeoriented edge

chromosome

H

X

unoriented edgeoriented edge

chromosome

H●(X)

X

unoriented edgeoriented edge

chromosome

H

X