39
Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress.

Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Reconstructing Ancestral DNA (at least the gaps)

Using unrooted phylogeny, multiple alignment, and affine gap cost function.

Work in progress.

2

Overview

• Introduction

• Examples

• Gap Graph construction

• Theory

• Algorithm

• Results

• Next steps

3

Example

N3 ???

4

Example

(a) Two long indels.

N3 nnn

5

Example

(a) Two long indels.

(b) Three short indels.

N3 n-n

6

Example

(a) Two long indels.

(b) Three short indels.

Which is more parsimonious depends on gap cost function:

Cost of indel of length k is g(k) = a + b*k

N3 nnn/n-n

7

Harder Example

N8, N9, N10, N11, N12, N13 ???Problem: find optimal explanation for gaps in terms of indels.

8

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

9

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

Vertex:

a) subtree with gaps in all leaves

b) section of alignment

10

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

11

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

12

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

13

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

14

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

15

Gap Representation

1. Find gap intervals

2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

16

Gap Graph Construction

3. Create connections between neighbors v and w if one is contained in the other.

17

Gap Graph Construction

3. Create connections between neighbors v and w if one is contained in the other.

18

Gap Graph Construction

3. Create connections between neighbors v and w if one is contained in the other.

19

Gap Graph Construction

3. Create connections between neighbors v and w if one is contained in the other.

20

What is a vertex?Either one indel created all gaps in the subtree, or the vertex (subtree) is decomposed into several indels.

Algorithm goal: confirm or decompose vertices using gap cost function.

21

Flashback: ~ Jotun’s Algorithm

This example can be solved optimally: using a=5, b=3, all vertices are confirmed.

- i.e., all gaps created ‘as high as possible’ in the tree.

22

Horrific Counter Example

At first sight: confirm all vertices..

(0,1)

(0,1,2,3)

(1,2,3,4)

23

Horrific Counter Example

At first sight: confirm all vertices.. 6 indels.

(0,1)

(0,1,2,3)

(1,2,3,4)

24

Horrific Counter Example

At first sight: confirm all vertices.. 6 indels.

BUT: solution with 5 indels can be found!

(0,1)

(0,1,2,3)

(1,2,3,4)

Depending on gap cost function, this may be cheaper. Thus first solution may not be optimal

Problem: the indel (2) is invisible.

25

New Type of Connection Needed!

(0,1)(1,2,3,4)

(0,1,2,3)

3. Create connections between neighbors v and w if one is contained in the other if they share leaves.

- The indel (2) lies in the intersection of the cousins.

26

Now The(st)ory BeginsBy construction of the gap graph, we can prove two theorems:

Theorem 1

Each optimal indel either corresponds directly to a vertex, or it crosses a cousin connection.

Only possible optimal indels: (0,1) (3) (0,1,2,3) (1,2,3,4) (1) (4) (2)

(0,1)

(0,1,2,3)

(1,2,3,4)

27

Now Theory BeginsBy construction of the gap graph, we can prove two theorems:

Theorem 2

If a vertex v is decomposed in the optimal solution, all decomposing indels extend beyond v’s section of the alignment, and they do not all extend in the same direction.

Thus we have to decompose none or both of (0,1,2,3) and (1,2,3,4): otherwise (2) doesn’t extend beyond the region of (0,1,2,3)

(0,1,2,3)

(1,2,3,4)

28

Now Theory BeginsFrom the theorems we can prove some lemmas:

1: Leaf vertices can be confirmed.

2: Orphans / end vertices can be confirmed.

3: Patriarchs can be confirmed and trimmed.

29

Solving Earlier Example

1: Leaf vertices can be confirmed.

30

Solving Earlier Example

1: Leaf vertices can be confirmed.

2: Orphans / end vertices can be confirmed.

31

Solving Earlier Example

1: Leaf vertices can be confirmed.

2: Orphans / end vertices can be confirmed.

3: Patriarchs can be confirmed and trimmed.

32

Solving Earlier Example

1: Leaf vertices can be confirmed.

2: Orphans / end vertices can be confirmed.

3: Patriarchs can be confirmed and trimmed.

33

Solving Earlier Example

1: Leaf vertices can be confirmed.

2: Orphans / end vertices can be confirmed.

3: Patriarchs can be confirmed and trimmed.

4: Mono-chain vertices can be decided locally.

34

End of Pre-Processing

• In longer examples there will be undecided vertices (purple) after pre-processing.

• Find possible decompositions for each vertex and check all combinations in each chain

35

9 sequences, 60% gaps, preproc.time < 4 s---------------------• Alignment length 3936, divided in 3922 gap intervals.• ---------------------• 1497 vertices undecided before trimming.• 1112 vertices undecided after trimming.• ---------------------• Created 8912 vertices, 871 connections. Confirmed• 5469 leaf vertices,• 2285 patriarchs,• 210 end vertices,• 217 locally confirmed non-cousin chain vertices, • 37 locally confirmed cousin chain vertices, and• 487 mono-chain decomposed vertices.• ---------------------• 207 vertices undecided after all preprocessing.• #chains with undecided: 89, max #undecided in same chain (C31): 7• estimated number of combinations: 2788, max in same chain: 1152 • ---------------------

36

9 sequences, 60% gaps, preproc.time < 4 s---------------------• Alignment length 3936, divided in 3922 gap intervals.• ---------------------• 1497 vertices undecided before trimming.• 1112 vertices undecided after trimming.• ---------------------• Created 8912 vertices, 871 connections. Confirmed• 5469 leaf vertices,• 2285 patriarchs,• 210 end vertices,• 217 locally confirmed non-cousin chain vertices, • 37 locally confirmed cousin chain vertices, and• 487 mono-chain decomposed vertices.• ---------------------• 207 vertices undecided after all preprocessing.• #chains with undecided: 89, max #undecided in same chain (C31): 7• estimated number of combinations: 2788, max in same chain: 1152 • ---------------------

37

Is Pre-Processing Important?

9 sequences, 60% gaps; no pre-processing:• ---------------------• Created 10082 vertices, 7121 connections.• ---------------------• 1497 vertices undecided with no preprocessing.• #chains with undecided: 950, max #undecided in same chain (C40): 10• estimated number of combinations: 71950, max in same chain: 34560

9 sequences, 60% gaps; with pre-processing:• ---------------------• Created 8912 vertices, 871 connections. • ---------------------• 207 vertices undecided after all preprocessing.• #chains with undecided: 89, max #undecided in same chain (C31): 7• estimated number of combinations: 2788, max in same chain: 1152

38

Next Steps

• Make poster for Recomb (suggestions?)• Finish program• Run it on real data

• Ideas for applications? (Score ranks alignment – use to find alignment..)

• Demo

39

Screenshots(in case demo doesn’t work)