53
Genome Rearrangement By Ghada Badr Part II

Genome Rearrangement

  • Upload
    homer

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Genome Rearrangement. By Ghada Badr Part II. Genome Models. Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome. permutations :. - PowerPoint PPT Presentation

Citation preview

Page 1: Genome Rearrangement

Genome Rearrangement

ByGhada Badr

Part II

Page 2: Genome Rearrangement

2

Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.

Genome Models

permutations:

Signed Permutation: Each gene may be assigned + or - sign to indicate the strand it resides on.

Unsigned Permutation: If the corresponding strand is unknown.

Page 3: Genome Rearrangement

3

Our problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

Genome Rearrangement

What are the Rearrangement events (Operation)?

Page 4: Genome Rearrangement

4

Rearrangement operations affect gene orderand gene content. There are various types:

In case of single-chromosome genome:• Inversions• Transpositions• Reverse transpositions• Gene Duplications• Gene loss

In case of multiple-chromosomes genomes we add:

• Translocations• fusions • fissions

Rearrangement Operations

Page 5: Genome Rearrangement

5

Rearrangement Problems

Our problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

Any set of operations yields a distance between genomes, by counting the minimum number of operations needed to transform one genome into the other.

Page 6: Genome Rearrangement

6

Rearrangement Problems

Our problem: Given a set of genomes and a set of

possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another.

• Computing the distance d()• Computing one optimal sorting sequence of events.

Two classical problems

Page 7: Genome Rearrangement

7

Rearrangement Operations

Can we have a unifying framework in which circular and linear chromosomes can coexist throughout evolving genomes?

Can we have a unifying view of Genome Rearrangements? (Bergeron 2006)

A Double Cut and Join Operation DCJ was introduced.

Page 8: Genome Rearrangement

8

Rearrangement Operations - DCJ

• Double Cut-and-Join DCJ was first proposed by Yancopoulos et. al. (2005).

• Allows to model all the classical operations (inversions, translocations, fissions, fusions, transposition, and block interchanges) with a single operation.

• This general model accounts for the genomic evidence of the coexistence of both linear and circular chromosomes in many genomes.

• Both the DCJ sorting and distance problems can be solved in O(n) time by Bergeron et. al. (2006)

Page 9: Genome Rearrangement

9

Rearrangement Operations - DCJ

• A gene a is an oriented sequence of DNA that starts with a tail at and ends with a head ah.

• Two consecutive genes do not necessarily have the same orientation, thus adjacency of two consecutive genes a and b, can be of four different types:

{ah,bt},{ah,bh},{at,bt},{at,bh} , , , • An extremity that is not adjacent to any other gene

is called telomeres by a singleton set {ah} or {at}.

• We can use adjacencies to represent both genomes with multiple or uni-chromosomes.

Page 10: Genome Rearrangement

10

Rearrangement Operations - DCJ

• A genome is a set of adjacencies and telomeres such that the tail or head of any gene appears in exactly one adjacency or telomere.

Genome A: chr1: a c -d chr2: b e chr3: f g

Replace each gene by two extremities

at ah ct ch dh dtbt bh et ehft fh gt gh

Adjacencies: {ah, ct}{ch, dh} {bh, et} {fh, gt}

Telomere:{at} {dt} {bt} {eh}{ft}{gh}

A = {{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}}

Example

Page 11: Genome Rearrangement

11

Rearrangement Operations - DCJ

• DCJ operations:

{p,q}{r,s} {p,r}{s,q} or { p,s} {q,r}a)

Page 12: Genome Rearrangement

12

Rearrangement Operations - DCJ

• DCJ operations:

{p,q}{r} {p,r}{q} or{p}{q,r}b)

Page 13: Genome Rearrangement

13

Rearrangement Operations - DCJ

• DCJ operations:

{q} {r} {q,r}c)

Page 14: Genome Rearrangement

14

Rearrangement Operations - DCJ

• DCJ operations:

Genome A: chr1: a c -d chr2: b e chr3: f g

{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}

{ah,ct}{fh, gt} -->{ah,fh}{ct,gt} Genome A: chr1: a -f chr2: b e chr3: d -c g

{ah,ct}{fh, gt} -->{ah,gt}{ct,fh} Genome A: chr1: a g chr2: b e chr3: f c -d

Example:

Adjacencies and telomeres are:

Page 15: Genome Rearrangement

15

Problem: Given two genomes A and B defined on the same set of genes, find a shortest sequence of DCJ operations that transforms A into B. The length of such a sequence is called the DCJ distance between A and B, dcj(A,B).

DCJ sorting and Distance problems

Page 16: Genome Rearrangement

16

DCJ sorting and Distance problems

Example:

Genome A: chr1: a c -d chr2: b e chr3: f g

Genome B: chr 1: a b c d chr 2: e f g

Replace each gene by two extremities

at ah ct ch dh dtbt bh et ehft fh gt gh

at ah bt bh ct ch dt dhet eh ft fh gt gh

A= {{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}}

B = {{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}}

Get adjacencies and telomeres for each genome:

Page 17: Genome Rearrangement

17

DCJ sorting and Distance problems

Greedy Algorithm to sort by DCJ:

{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}

{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}

{ah, bt}{ch, dh} {bh, et} {fh, gt} {at} {dt} {ct} {eh}{ft}{gh}

{ah, bt} {ch, dh} {bh, ct} {fh, gt} {at} {dt} {et} {eh}{ft}{gh}

{ah, bt} {ch, dt} {bh, ct} {fh, gt} {at} {dh} {et} {eh} {ft}{gh}

Genome A: chr1: a c -d chr2: b e chr3: f g

Genome A: chr1: a b e chr2: c -d chr3: f gGenome A: chr1: a b c -d chr2: e chr3: f gGenome A: chr1: a b c d chr2: e chr3: f g

Genome B: chr1: a b c d chr2: e f g

Page 18: Genome Rearrangement

18

DCJ sorting and Distance problems

Optimal and O(n) time.

Page 19: Genome Rearrangement

19

DCJ sorting and Distance problems

{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}

{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}

Vertices: adjacencies and telomeresEdges: connect an edge from A to B between adjacencies or telomers that have common elements.

Adjacency Graph (bipartite graph):

Graph can be easily constructed in O(n) time and space

Page 20: Genome Rearrangement

20

DCJ sorting and Distance problems

In each iteration: the algorithm increments C by one or I by two

{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}

Adjacency Graph (bipartite graph): IF SORTED

{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}

When sorted: n = C + I/2

dcj(A,B) n

Page 21: Genome Rearrangement

21

DCJ sorting and Distance problems

Adjacency Graph (bipartite graph):

{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}

{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}

1 cycle 4 odd paths

1 even path

dcj(A,B) = n - (cycles + oddPath/2) = 7-1-4/2 = 4

Page 22: Genome Rearrangement

22

Genome rearrangements events are rare, these changes of gene orders enable biologists to reconstruct histories far back in time.

Extend the notion of genome rearrangement distance to the optimal positioning of Steiner points in the appropriate space of a given distance metric.

Two phylogenetic versions of the Steiner Problem (the first inside the other):

Inner problem: optimizing internal nodes of a given tree, where n leaves are labeled.

Outer problem: optimizing over all trees with n leaves.

Genome Rearrangement and phylogeny

Page 23: Genome Rearrangement

23

We will discuss the inner problem defined as follows:

Given a fixed phylogeny (tree) T, together with a set of K permutations (genome), each of size n corresponding to the terminal (leaf) nodes.

Find a set of permutations corresponding to the internal nodes such that the total weight w(T) is minimized, where w(T) is defined as:

w(T) = ∑ d(x,y) for all (x,y) in T Here d(.,.) is the genome rearrangement distance metric

defined on pairs of permutations.

Genome Rearrangement and phylogeny

Page 24: Genome Rearrangement

24

Consider a heuristic for the problem of computing the internal nodes, where T is a star on three vertices.

We will study a more basic problem, the median problem.

Divide the problem on an arbitrary binary tree into a number of overlapping median problems and apply the median algorithm iteratively to search for a heuristic solution to the original problem.

internal nodes retain biological meaning, and edges represent transitions between states of genome.

Genome Rearrangement and phylogeny

Page 25: Genome Rearrangement

25

The median-based method for phylogeny reconstruction was first proposed by Sankoff and Blanchette (1998).

The idea is to build the global solution by aggregating local solutions for the simplest problem: Find a Steiner point M of three genomes.

After an initialization step, the algorithm iterates over a tree, repeatedly resetting the permutations of internal nodes to the medians of their three neighbors. Continue till a convergence occurs.

Median Problem

Page 26: Genome Rearrangement

26

Median Problem

The median of three or simply the median problem: Find a permutation such that the sum of distances is minimized between and each of the starting permutation = {}.

Find a permutation M that minimizes the median score S(), where:

S() = d1, M + d2,M + d3,M

Page 27: Genome Rearrangement

27

Median Problem

Constructing phylogeny from medians

Page 28: Genome Rearrangement

28

Median Problem

The median problem: Find a permutation such that the sum of distances is minimized between and each of the starting permutation = {}.

What are the distance measures that we can use?

Distances: breakpoint, reversal …

A breakpoint median has no straightforward biological interpretation and they are not unique.

Breakpoint medians score poorly compared to reversal medians.

Page 29: Genome Rearrangement

29

Reversal median Problem: Find a solution to the median problem using the reversal distance.

Find a permutation such that the sum of reversal distances is minimized between and each of the starting genomes.

The reversal median is NP-hard problem.

Why?

Reversal Median

Page 30: Genome Rearrangement

30

Vertices: all permutations of n = 3.Edges: connect an edge between 1 and 2 if reversal distance

d(1, 2) = 1.

Reversal Median

Reversal graph for n = 3

Page 31: Genome Rearrangement

31

distance d(i, k) = shortest path between v1 and v2. Finding the median is equivalent to finding the minimum

Steiner tree for the graph.

Reversal Median

Reversal graph for n = 3

Page 32: Genome Rearrangement

32

The graph is huge |V| = n!.2n

A feasible graph-search algorithm is not possible!

Reversal Median

Reversal graph for n = 3

What technique we can use to develop an algorithm for this kind of problems?

Page 33: Genome Rearrangement

33

We will study a branch-and-bound algorithm by Adam Siepel 2001.

This algorithm depends only on the availability of a rapidly computable distance metric.

Reversal Median

Page 34: Genome Rearrangement

34

The median score S of a set of equally sized permutations = {}, separated by distances d1,2, d1,3, and d2,3, obeys these bounds:

d1,2 + d1,3+ d2,3 S min { (d1,2+d2,3),(d1,2+d1,3), (d2,3+d1,3)} 2

Reversal Median

Page 35: Genome Rearrangement

35

• Assume that is in the shortest path between and the median M, and is separated from by distances d1,, d1,, and d2,, the median score S

d2, + d3,+ d2,3 d1, + S d1,+min{(d2,+d3,),(d3,+d2,3), (d2,3+d2,)} 2

Reversal Median

Page 36: Genome Rearrangement

36

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

Page 37: Genome Rearrangement

37

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

Page 38: Genome Rearrangement

38

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

Page 39: Genome Rearrangement

39

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

Page 40: Genome Rearrangement

40

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

Page 41: Genome Rearrangement

41

Algorithm (sketch): Establish upper and lower bounds using a rapid reversal distance

algorithm, Mmin and Mmax. Start with one of the three permutations, say . Assume the median is M = . Push the corresponding vertex v in a priority stack s for the best scoring

vertices. While s is not empty

Pop the most promising vertex v from s. If best score of v Mmax then stop Generate all possible vertices that can be obtained from v by single reversal. For each possible unmarked

Calculate bound for the previous equation min,max. If max= Mmin then M = and stop. (median is found) Add to stack s only if max< Mmax (pruning) update Mmax= max if max< Mmax .

End for loop. End while loop.

Reversal Median

O(n3d) with d = min{d1,2 + d1,3+ d2,3}With faster average running time

Page 42: Genome Rearrangement

42

Conclusions

Described Double Cut and Join DCJ operation: A unifying view of genome rearrangements.

Presented a branch and bound median-based approach for building phylogeny using reversal distance.

Many other problems in genome rearrangement as “Genome halving problem”

Page 43: Genome Rearrangement

43

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

Page 44: Genome Rearrangement

44

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

Duplication

Page 45: Genome Rearrangement

45

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

Page 46: Genome Rearrangement

46

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

Page 47: Genome Rearrangement

47

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

ab c d e g f

Page 48: Genome Rearrangement

48

Genome Halving

a c

eg

f b

d

ab c d e g f

Page 49: Genome Rearrangement

49

Genome Halving

a c

eg

f b

d

ab c d e g f

ab c d e g f

ab c d e g f

Page 50: Genome Rearrangement

50

Genome Halving

a c

eg

f b

d

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

ab c d e g f

Page 51: Genome Rearrangement

51

Genome Halving

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

Page 52: Genome Rearrangement

52

Genome Halving

a

b

c d

eg

f

a

b

c

d eg

f

ab c d e g f

ab c d e g f

Page 53: Genome Rearrangement

53

References

1. Bergeron A., A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Applied Mathematics, vol. 146, 134-145, 1005.

2. Marília D. V. Braga. Exploring the solution space of sorting by reversals when analyzing genome rearrangements. PhD thesis, University of Claude Bernard, 2009.

3. Guillaume Fertin, Anthony Labarre, Irena Rusu, Eric Tannier, Stephan Vialette. Combinatorics of Genome Rearrangements. The MIT Press,Cambridge, England, 2009.

4. Siepel A. Exact algorithms for the reversal median problem. Master Thesis, University of New Mexico, 2001.

5. Yancopoulos S., Attie O., Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block exchange. Bioinformatics 21, 3340 - 3346 2005.

6. Anne Bergeron, Julia Mixtacki, Jens Stoye. A unifying view of Genome Rearrangements. WABI 2006, LNBI 4175, 163-173, 2006.

7. Julia Mixtacki. Genome halving under DCJ revisited. Lecture Notes in Computer Science, 5092 2008.

8. Richard C. Deonier, Simon Tavere, Michael S. Waterman. Computational Genome Analysis, an introduction. Springer, 2005.

9. Neil C. Jones, Pavel A. Pevzner. An introduction to bioinformatics algorithms. MIT press, 2004.