Upload
aspen
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Genome Rearrangements. Compare to other areas in bioinformatics we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. Some material of this lecture borrowed from: Nipun Mehra, www.stanford.edu/class/cs374/Notes/lec17.ppt - PowerPoint PPT Presentation
Citation preview
5. Lecture WS 2003/04
Bioinformatics III 1
Genome Rearrangements
Compare to other areas in bioinformatics we still know very little about
the rearrangement events that produced the existing varieties of
genomic architectures ...
Some material of this lecture borrowed from:Nipun Mehra, www.stanford.edu/class/cs374/Notes/lec17.pptwww.sna.csie.ndhu.edu.tw/~lung/seminar/20020502.ppt
Bafna V., and P.A. Pevzner. "Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X Chromosome."
Hannenhalli S., and P.A. Pevzner. "Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals.“
“Computational Molecular Biology” book by P.A. Pevzner, MIT press, chapter 10
5. Lecture WS 2003/04
Bioinformatics III 2
Processes of Evolution
- Substitution
- Insertion
- Deletion
- Translocation
- Inversion/ Reversal
- Duplication
5. Lecture WS 2003/04
Bioinformatics III 3
What is a reversal = inversion ?
Break and Invert
A T G C C T G T A C T A
T A C G G A C A T G A T
A T G T A C A G G C T A
T A C A T G T C C G A T
• Purines (A, G) and Pyrimidines (C, T) switch strands
•Many organisms have highly similar genes but very different
gene orders.•Very prominent in Prokaryotes, Mitochondrial DNA and
Mamallian X-chromosome.
5. Lecture WS 2003/04
Bioinformatics III 4
Types of Genome Rearrangements
Two genomes may have many genes in common, but the genes may be
arranged in a different sequence or be moved between chromosomes. Such
differences in gene orders are the results of rearrangement events that are
common in molecular evolution.
For example, in unichromosomal genomes, the most common rearrangement
events are reversals, in which a contiguous interval of genes is put into the
reverse order.
For multichromosomal genomes, the most common rearrangement events are
reversals, translocations, fissions, and fusions.
The pairwise genome rearrangement problem is to find an optimal scenario
transforming one genome to another via these rearrangement events.
5. Lecture WS 2003/04
Bioinformatics III 5
Representation of a genome
We consider a unichromosomal genome to bef a sequence of n genes. The
genes are represented by numbers 1, 2, ..., n.
The two orientations of gene i are represented by i and -i.
A genome is represented as a signed permutation of the numbers 1, 2, ..., n.
For example, a unichromosomal genome with n = 5 genes is 5 -3 4 2 -1
5. Lecture WS 2003/04
Bioinformatics III 6
Multichromosal Genome
A multichromosomal genome consists of n genes spread over m
chromosomes. We represent it as a signed permutation of 1, 2, ..., n, with
delimiters "$" or ";" inserted between the chromosomes. For example, a genome
with 12 genes spread over 3 chromosomes is
7 -2 8 3 $ 5 9 -6 -1 12 $11 4 10 $ The order of the chromosomes and the direction of the chromosomes do not
matter in the multichromosomal algorithms. Thus, we could represent this same
genome by flipping the first chromosome (reverse the order of its entries and
negate them) and then moving the last chromosome to the beginning:
11 4 10 $ -3 -8 2 -7 $ 5 9 -6 -1 12 $
5. Lecture WS 2003/04
Bioinformatics III 7
Unichromosomal genomes: sorting by reversal
A reversal in a signed permutation is an operation that takes an interval in a
permutation, reverses the order of the numbers, and changes all their signs. For
example,
5 1 3 2 -9 7 -4 6 8
5 1 -7 9 -2 -3 -4 6 8
The reversal distance between two genomes is the minimum number of
reversals it takes to get from one genome to the other.
For a given pair of genomes, the reversal distance is unique, but there are
usually many possible reversal scenarios with this distance.
However, it is possible that this mathematical notion of reversal distance can
underestimate the actual number of steps that occurred biologically.
5. Lecture WS 2003/04
Bioinformatics III 8
Multichromosomal genomes: rearrangement operations
We treat four elementary rearrangement events in multichromosomal genomes:
reversals, translocations, fusions, and fissions.
Reversal: An interval within a single chromosome may be reversed in the
same fashion as a reversal acts in the unichromosomal case:
7 -2 8 3 $ 7 -2 8 3 $
5 9 -6 -1 12 $ 5 9 -12 1 6 $
11 4 10 $ 11 4 10 $
Note: When the programs are run in unichromosomal mode, the genomes
3 1 2 and -2 -1 -3are considered different (one reversal apart, distance=1), while in
multichromosomal mode, those same genomes are considered equivalent
(distance=0) because we have simply flipped an entire chromosome, which
gives an equivalent genome in the multichromosomal mode.
5. Lecture WS 2003/04
Bioinformatics III 9
Two chromosomes "A B" and "C D" may be rearranged into "A D" and "C B".
(The letters A, B, C, D stand for sequences of genes.)
Because flipping chromosomes does not alter a genome (only its
representation is altered), "A -C" and "-B D" is another possible translocation.
(-B means to reverse the order of the genes in sequence B and negate each
one.)
For example, a translocation on chromosomes 1 and 3 is
7 -2 8 3 $ 7 -2 8 -4 -11 $
5 9 -6 -1 12 $ 5 9 -6 -1 12 $
11 4 10 $ -3 10 $
Translocation
5. Lecture WS 2003/04
Bioinformatics III 10
Fussion & Fission
Fusion: Two chromosomes may be fused together into a single chromosome.
Due to chromosome flippings, there are four distinct fusions between each pair of
chromosomes. Here is one of the fusions between chromosomes 1 and 3:
7 -2 8 3 $ 7 -2 8 3 -10 -4 -11 $
5 9 -6 -1 12 $ 5 9 -6 -1 12 $
11 4 10 $
Fission: A chromosome may be broken into two chromosomes between any pair
of genes:
7 -2 8 3 $ 7 -2 8 3 $
5 9 -6 -1 12 $ 5 9 $
11 4 10 $ -6 -1 12 $
11 4 10 $
5. Lecture WS 2003/04
Bioinformatics III 11
Signed and unsigned genomes
Most comparative mapping techniques determine the physical locations and
relative order of genes in each chromosome, but do not determine which of two
orientations each gene has.
Current sequencing methods do provide the orientations. It turns out that the
genome rearrangement problem (uni- and multichromosomal) for unsigned
permutations is NP-hard, but the same problems for signed data can be done in
polynomial time.
Fortunately, with many genomes currently being sequenced, it is likely that
many comparative maps (corresponding to unsigned permutations) will soon be
replaced by sequencing data (corresponding to signed permutations).
5. Lecture WS 2003/04
Bioinformatics III 12
Multichromosomal genomes: rearrangement operations
For example, to turn the unsigned genome
1 2 3 4 5
into the unsigned genome
1 4 3 2 5
requires one unsigned reversal. An assignment of signs may be designed in
the source and destination genomes that give a signed reversal scenario
requiring this same number of steps. Here, we get
1 2 3 4 5
1 -4 -3 -2 5
which also takes one step. Note that there may be other sign assignments
taking this minimum number of steps.
5. Lecture WS 2003/04
Bioinformatics III 13
Multichromosomal genomes: rearrangement operations
It is possible that correctly signed data would have increased the number of
steps:
1 2 3 4 5
1 -4 -3 -2 5
1 -4 3 -2 5
If the data collection method did not determine signs, it is impossible to know
mathematically whether the one step or two step scenario is more biologically
accurate; the mathematical problem the genome rearrangement programs solve
is to find the signs giving the minimum possible distance.
5. Lecture WS 2003/04
Bioinformatics III 14
X-Alignments
• Implication: The reversals took place equidistant from the center of chromosome.• Those along the diagonal are orthologs between species.• Those along anti-diagonal are duplicates separated by inversion, within species.
• The “X” Factor discovered by Eisen et al • Alignment of whole genomes of prokaryotes like bacteria revealed X-like
patterns in dot plots – called X-alignments.
5. Lecture WS 2003/04
Bioinformatics III 15
A biological model case
8 7 6 5 4 3 2 1 11 10 9
4 3 2 8 7 1 5 6 11 10 9
cabbage
turnip
Palmer and Herbon found that the mitochondrial genomes in cabbage and
turnip had very similar gene sequences, but with fairly different gene orders.
How to design a „transformation“ of cabbage into turnip?
Mitochondrial DNA of cabbage and turnip are composed of five conserved
blocks of genes that are shuffled in cabbage as compared to turnip. Every
conserved block has a direction that is shown by a + or – sign.
5. Lecture WS 2003/04
Bioinformatics III 16
Inversion, Transposition and inverted Transposition
inversion
transposition
inverted transposition
5. Lecture WS 2003/04
Bioinformatics III 17
Oriented/Unoriented Blocks
Remember that the unoriented case results in an NP-Hard problem, whereas
the oriented case can be solved in polynomial time.
2 1 3 7 5 4 8 6
1 2 3 4 5 6 7 8
8 7 6 5 4 3 2 1 11 10 9
4 3 2 8 7 1 5 6 11 10 9
UNORIENTED BLOCKS
ORIENTED BLOCKS
Polynomial Time
NP-Hard
5. Lecture WS 2003/04
Bioinformatics III 18
Sorting by Reversals
8 7 6 5 4 3 2 1 11 10 9
8 7 6 5 4 3 2 1 11 10 9
8 2 3 4 5 6 7 1 11 10 9
4 3 2 8 7 1 5 6 11 10 9
8 2 3 4 5 1 7 6 11 10 9
4 3 2 8 5 1 7 6 11 10 9
4 3 2 8 7 1 5 6 11 10 9
4 3 2 8 7 1 5 6 11 10 9
Cabbage
Turnip
5. Lecture WS 2003/04
Bioinformatics III 19
Permutation () : an ordered arrangement of
the set { 1,2,…,n}
Reversal () :a rearrangement that inverts a
block in {3 4 7 6 1 5 2 } (3,6) ={3 4 5 1 6 7 2}
Signed Permutation (): a permutation
where the elements are oriented
a reversal switches element orientation
{+3 -4 +7 -6 +1 -5 +2 } (3,6) ={+3 -4 +5 -1 +6 -7 +2}
5. Lecture WS 2003/04
Bioinformatics III 20
easy to do by eye ...
8 7 6 5 4 3 2 1 11 10 9
8 7 6 5 4 3 2 1 11 10 9
8 2 3 4 5 6 7 1 11 10 9
4 3 2 8 7 1 5 6 11 10 9
8 2 3 4 5 1 7 6 11 10 9
4 3 2 8 5 1 7 6 11 10 9
4 3 2 8 7 1 5 6 11 10 9
4 3 2 8 7 1 5 6 11 10 9
1
12
123
12….t=
= t …. 21
5. Lecture WS 2003/04
Bioinformatics III 21
Formal Approach: Sorting by Reversals
The order of genes in 2 organisms is represented by permutations = 12 ... n and = 12 ... n. A reversal of an interval [i,j] is the permutation
1 2 ... i-1 i i+1 ... j-1 j j+1 ... n
1 2 ... i-1 j j-1 ... i+1 i j+1 ... n
(i,j) has the effect of reversing the order of ii+1 ... j and transforming
1 ... i-1i ... j j+1 ... n into •(i,j) = 1 ... i-1j ... ij+1 ... n .
Given permutations and , the reversal distance problem is to find a series of
reversals 12 ... t such that •1•2 ... t = and t is minimal.
t is called the reversal distance between and .
5. Lecture WS 2003/04
Bioinformatics III 22
Breakpoint Graph
Sort a permutation is a hard problem.
Breakpoints were introduced by Watterson et al. (1982) and by Nadeau and Taylor
(1984) and correlations were noticed between the reversal distance and the
number of breakpoints.
Let i j if |i – j| = 1. Extend a permutation = 12 ... n by adding 0 = 0 and
n+1 = n + 1. We call a pair of elements (i,i+1), 0 i n, of an adjacency
if i i+1, and a breakpoint if i i+1.
2 3 1 4 6 5 7
0 2 3 1 4 6 5 7 8
adjacencies
breakpoints
As the identity permutation has no
breakpoints, sorting by reversals
corresponds to eliminating breakpoints.
An observation that every reversal can
eliminate at most 2 breakpoints implies that
the reversal distance d() b() / 2 where
b() is the number of breakpoints in .
However, this is a clear overestimate.
5. Lecture WS 2003/04
Bioinformatics III 23
Breakpoint Graph
The breakpoint graph of a permutation is an edge-colored graph G() with n +
2 vertices {0, 1 ... n, n+1} {0, 1, ..., n, n+1}. We join vertices i and i+1 by a
black edge for 0 i n. We join vertices i and j by a gray edge if i j.
Black path
0 2 3 1 4 6 5 7
Grey path
0 2 3 1 4 6 5 7
Superposition of black and grey paths formsthe breakpoint graph:
A breakpoint graph is obtained by a super-position of a black pathtraversing the vertices0, 1, ..., n, n+1 in the order given by the permutation and a graypath traversing the verticesin the order given by theidentity permutation.
5. Lecture WS 2003/04
Bioinformatics III 24
Cycle decomposition
A cycle in an edge-colored graph G is called alternating if the colors of every two
consecutive edges of this cycle are distinct. In the following, cycles will mean
alternating cycles.
Cycle decomposition ofthe breakpoint graph:
0 2 3 1 4 6 5 7
0 2 3 1 4 6 5 7
0 2 3 1 4 6 5 7
0 2 3 1 4 6 5 7
A vertex v in a graph G is called balanced if the
number of black edges incident to v equals the
number of grey edges incident to v.
A balanced graph is a graph in which every
vertex is balanced. G() is a balanced graph.
Therefore, there exists a cycle decomposition
of G() into edge-disjoint alternating cycles
(every edge in the graph belongs to exactly one
cycle in the decomposition). Cycles in an edge
decomposition may be self-intersecting. The
previous breakpoint graph can be decomposed
into 4 cycles, one of which is self-intersecting.
5. Lecture WS 2003/04
Bioinformatics III 25
Cycle decomposition
What is the decomposition of the breakpoint graph into a maximum number c()
of edge-disjoint alternating cycles? Here, c() = 4.
Cycle decompositions play an important role in estimating reversal distances.
When a reversal is applied to a permutation, the number of cycles in a maximum
decomposition can change by at most one (while the number of breakpoints
can change by two).
Bafna&Pevzner (1996) proved the bound:
d() n + 1 - c()
Which is much tighter than the bound in terms of breakpoints d() b() / 2.
For many biological problems, d() = n + 1 - c().
Therefore, the reversal distance problem reduces to the problem of finding
the maximal cycle decomposition.
5. Lecture WS 2003/04
Bioinformatics III 26
Effects of reversals on cycles
(A) For reversals acting on two
cycles, (b – c) = 1.
(B) For reversals acting on an
unoriented cycle, (b – c) = 0.
(C) For reversals acting on an
oriented cycle, (b – c) = -1
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 27
Effect of reversals on gray edges
(a) A proper reversal on an oriented gray edge.
(b) A nonproper reversal on an unoriented gray edge.
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 28
Transform signed into unsigned permutation
(a) Optimal sorting of a
permutation (3 5 8 6 4 7 9 2 1 10
11) by 5 reversals.
(b) Breakpoint graph of this
permutation: black edges
connect adjacent vertices that
are not consecutive, gray edges
connect consecutive vertices that
are not adjacent.
(c) Transformation of a signed
permutation into an unsigned
permutation and the breakpoint
graph G(); (d) Interleaving graph
H with two oriented and
one unoriented unoriented
component.
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 29
The Problems
Minimum Sorting by Reversals (MinSortRv):
Given a permutation , what is the shortest sequence (12….t ) of reversals
that sorts ? (Distance: d())
Complexity remains open. (NP-Hard)
Minimum Signed Sorting by Reversals (SignedSortRv):
Given a signed permutation , what is the shortest sequence (12….t ) of
reversals that sorts ?
Solvable in polynomial time.
5. Lecture WS 2003/04
Bioinformatics III 30
KS93 -Kececioglu and Sanko“Exact and approximation algorithms for the inversion distance between two chromosomes", 4th CPM- studied MinSortRv- introduced notion of breakpoints- 2 approximation algorithm
BP93 -Bafna and Pevzner“Genome Rearrangements and Sorting by Reversals", 34th FOCS- breakpoint graph and cycle decomposition- introduced signed sorting SignedSortRv- 3/2 approx algorithm for SignedSortRv- 7/4 approx algorithm for MinSortRv
HP95 - Hannenhali and Pevzner “Transforming Cabbage into Turnip”, 27th STOC- SignedSortRv resolved- O(n4) algorithm- introduced hurdles and fortresses- d() = b() - c() + h() + f()
Important developments
5. Lecture WS 2003/04
Bioinformatics III 31
KS93- Breakpoints
Extend to include element 0 (L) on the left and element n+1 (R) on the right.
A breakpoint occurs between two adjacent elements that do not differ by 1
Example: = { 3 5 6 7 2 1 4 8 } has 5 breakpoints, (b() = 5).
R 3 5 6 7 2 1 4 8 L
Breakpoints partition sequence into strips that are increasing or decreasing.
Reversals add or remove breakpoints. Sorted permutation has 0 breakpoints.
i-reversal (i = 0,1, 2): a reversal that decreases number of breakpoints by i.
Theorem (KS): Let contain a decreasing strip. Then has a 1- or 2-reversal.
If every reversal that removes a breakpoint of results in a permutation with no
decreasing strips, then has a 2-reversal.
5. Lecture WS 2003/04
Bioinformatics III 32
Algorithm KS()
i 0
while contains a breakpoint
i i+1
the reversal that removes the most
breakpoints, resolving ties in favor of
reversals that leave a decreasing strip
return
Optimal reversal distance is at least b()/2
KS returns a solution that is at most 2*optimal = b()
5. Lecture WS 2003/04
Bioinformatics III 33
BP93 – Breakpoint Graph
Vertices: elements of (plus 0 (L) and n+1 (R) )
2 3 1 6 5 4L R
+-6 6
THE DIAGRAM OF REALITY AND DESIRE
5. Lecture WS 2003/04
Bioinformatics III 34
Construction of a diagram of reality and desire
3 2 1 4 5L R
L -3 +3 +2 -2 +1 -1 -4 +4 +5 -5 R
L -3 +3 +2 -2 +1 -1 -4 +4 +5 -5 R
Reality edges
L -3 +3 +2 -2 +1 -1 -4 +4 +5 -5 R
Desire edges
1 2 3 4 5L RDesire
Reality
5. Lecture WS 2003/04
Bioinformatics III 35
L -3 +3 +2 -2 +1 -1 -4 +4 +5 -5 R
L R-3
+3
+2
-2
+1 -1 -4
+4
+5
-5
5. Lecture WS 2003/04
Bioinformatics III 36
c() = number of cycles in a maximum cycle decomposition
Observation: reversals affect c().
Example:
{L [+1 -1] –2 +2 +3 -3 R}- removes 2 breakpoints and 1 cycle.
effect of reversals on c()
L +1 -1 -2 +2 +3 -3 R L -1 +1 -2 +2 +3 -3 R
5. Lecture WS 2003/04
Bioinformatics III 37
d() >= b() - c()
Cycles of length 4 are eliminated by 2-reversals.
Let c4() = number of 4-cycles.
(c() - c4()) : Cycles of length > 4 include at least three breakpoints
d() >= b() – c4() - (c() - c4()) / 3
5. Lecture WS 2003/04
Bioinformatics III 38
Algorithm BP()
while contains a breakpoint
if has no decreasing strips
if a 4-cycle C remains
Find cycle C’ that crosses C
0-reversal on C’, 2-reversal on C
else
Regular 0-reversal
else
Regular greedy choice
Algorithm BP produces a solution that is at most (3*optimal)/2
5. Lecture WS 2003/04
Bioinformatics III 39
A FB C
D
E Interleaving Graph
HP95 – Hurdles and Fortress
5. Lecture WS 2003/04
Bioinformatics III 40
HurdlesA hurdle is a bad component that does not separate any other two bad
components. Separation is an important concept, in that a reversal through
reality edges in different components A and C will result in every component B,
that separates A and C being twisted. A bad component becomes good when
twisted.
Bad Components
Non-Hurdles Hurdles
SimpleHurdles
SuperHurdles
B
A FCD
E
5. Lecture WS 2003/04
Bioinformatics III 41
Fortress
A permutation a is called fortress f() when its reality and desire diagram
contains an odd number of hurdles and all of them are super hurdles.
Fortresses are permutations that require
one extra reversal to sort, due to their
special structure
A smallest possible fortress.
5. Lecture WS 2003/04
Bioinformatics III 42
Algorithm HP()If there is a good component in RD() then
pick two divergent edges e,f in this component,
making sure the corresponding reversal does
not create any bad components
Return the reversal characterized by e and f
Else
if h() is even then
Return merging of two opposite hurdles
else
if h() is odd and there is a simple hurdle
return a reversal cutting this hurdle
else // fortress
return merging of any two hurdles
d() b() - c() + h() + f() h(): number of hurdles
f(): 0/1, according to being a fortress or not
5. Lecture WS 2003/04
Bioinformatics III 43
Hurdles
(a) Unoriented component U separates U‘ and U‘‘ by virtue of the edge (0, 1)
(b) Hurdle U does not separate U‘ and U‘‘.
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 44
Effects of reversals on cycles
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
Reversal on a cycle C (i) deletes vertex C from the interleaving graph; (ii) changes
the orientation of vertices in V(C); (iii) complements the subgraph induced by V(C).
5. Lecture WS 2003/04
Bioinformatics III 45
Merging hurdles
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 46
Hannenvalli-Pevzner algorithm
Hannenvalli, Pevzner, Journal of the ACM 46, 1 (1999)
5. Lecture WS 2003/04
Bioinformatics III 47
Improvements of Hannenhalli-Pevzner algorithm
Several websites offer programs to sort permutations by reversals. At their roots
is the Hannenhalli-Pevzner algorithm for sorting signed permutations by
reversals. Successive authors improved the algorithm.
• By the Hannenhalli&Pevzner algorithm, the distance computation is
performed in time O(n4).
• improvements in the algorithm developed by Haim Kaplan, Ron Shamir and
Robert E. Tarian bring the time to compute distance down to O(n2).
• GRAPPA is written by a multitude of authors. It reduces the distance
computation time to O(n) using improvements by David A. Bader, Bernard
M.E. Moret and Mi Yan.
The main purpose of GRAPPA is to construct phylogenetic trees for multiple
signed unichromosomal genomes; the distance computation on which we are
focused here is but a mere subroutine in that context.
5. Lecture WS 2003/04
Bioinformatics III 48
Algorithm by Kaplan, Shamir, Tarjan
The algorithm has three main stages:
1. Pre-process the permutation. This pre-processing contains 3 sub stages:
1a. Unsign the permutation, e.g., p will be unsigned to the permutation 0,
(7,8), (4,3), (1,2), (5,6), (12,11), (9,10), 13.
1b. Define the Overlap graph of the permutation
1c. Find the connected components of the overlap graph
2. Clear the hurdles. A hurdle is a problematic connected component of the
overlap graph. In this stage each reversal merges two hurdles in distinct
connected components into one non-hurdle component.
3. Generate a sequence of safe reversals. A safe reversal is defined as a
reversal that reduces b-c (the number of breakpoints minus the number of
cycles) without creating new hurdles.
5. Lecture WS 2003/04
Bioinformatics III 49
Multichromosomal genomes: more tricky
Word problems and insertions/deletions
So far we did not consider "word problems" in which some genes are repeated,
1 2 -1 3 4
nor did we allow gaps in the numbering (as may arise from insertion/deletion),
1 3 -9 -7 5
Distinguish between microrearrangements (e.g. intrachromosomal
rearrangements with a span < 1 Mb) and macrorearrangements (e.g.
intrachromosomal rearrangements of larger span as well as interchromosomal
rearrangements). The existing rearrangement algorithms do not distinguish
between these two types of rearrangements.
First identify conserved synteny blocks (segments that can be converted into
conserved segments by microrearrangements).
5. Lecture WS 2003/04
Bioinformatics III 50
Genome Rearrangements: Synteny
(a) Human and mouse synteny blocks of
conserved gene order. Every block
corresponds to a rectangle, with a diagonal
showing whether the arrangements of
anchors in human and mouse (within the
synteny block) are the same or reversed.
(b) Combining anchors into clusters by the
GRIMM-Synteny algorithm at G = 100 kb. The
edges in the anchor graph connect the closest
ends of the anchors. The anchors are color-
coded by the resulting clusters. At G = 1 Mb,
this forms a single cluster, which in turn forms
a synteny block (the lower right block in the
human 18/mouse 17 rectangle in a).
Pevzner, Tesler, Genome Res 13, 37 (2003)
5. Lecture WS 2003/04
Bioinformatics III 51
From Anchors to Breakpoint Graphs
X-chromosome: from local similarities, to
synteny blocks, to breakpoint graph, to
rearrangement scenario. (a) Dot-plot of
anchors. Anchors are enlarged for
visibility. (b) Clusters of anchors. (c)
Rectified clusters. (d) Synteny blocks. (e)
Synteny blocks (symbolic representation
as genome rearrangement units). (f) 2D
breakpoint graph superimposed on
synteny blocks. The projections of the 2D
graph onto the human and mouse axes
form the conventional breakpoint graphs.
(g) 2D breakpoint graph. The four cycles
in the breakpoint graph are shown by
different colors. (h) A most parsimonious
rearrangement scenario for human and
mouse X-chromosomes. Pevzner, Tesler, Genome Res 13, 37 (2003)
5. Lecture WS 2003/04
Bioinformatics III 52
Genome Rearrangements
Construction of the breakpoint
graph from synteny blocks.
(a) Solid path through human.
(b) Dotted path through mouse.
(c) Superposition of paths.
(d) Remove blocks to obtain
cycles.
Pevzner, Tesler, Genome Res 13, 37 (2003)
5. Lecture WS 2003/04
Bioinformatics III 53
Multichromosomal breakpoint graph
Multichromosomal breakpoint
graph of the whole human and
mouse genomes. The
conventional chromosome
order and orientation are not
suitable for such graphs; an
optimal chromosome order and
orientation were determined by
the algorithm in Tesler (2002).
Three "null chromosomes," N1,
N2, N3, were added to mouse
to equalize the number of
chromosomes in the two
genomes.
Pevzner, Tesler, Genome Res 13, 37 (2003)
5. Lecture WS 2003/04
Bioinformatics III 54
Summary
Computational studies on genome rearrangements assume that evolution took
the path of shortest reversal distance.
Due to algorithmic improvements, the computational costs could be
significantly reduced.
It may be very advantageous to simultaneously analyze more than 2 genomes
at the same time!