61
Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Did you know thatMultiple Alignment

is NP-hard?

Isaac Elias

Royal Institute of TechnologySweden

1

Page 2: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Results

• Multiple Alignment with SP-score

• Star Alignment

• Tree Alignment (with given phylogeny)

are NP-hard under all metrics!

2

Page 3: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Pairwise Alignment

Mutations: substitutions, insertions, and deletions.

Input: Two related strings s1 and s2.

Output: The least number of mutations needed to s1 → s2.

s1 = a a g a c t

s2 = a g t g c t

3

Page 4: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Pairwise Alignment

Mutations: substitutions, insertions, and deletions.

Input: Two related strings s1 and s2.

Output: The least number of mutations needed to s1 → s2.

2× l matrix As1 = a a g a − c t

s2 = − a g t g c t

dA(s1, s2) = 3 mutations

3

Page 5: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Metric symbol distance

Unit metric - binary alphabet

(the edit distance)

Σ 0 1 −0 0 1 1

1 1 0 1

− 1 1 0

4

Page 6: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Metric symbol distance

Unit metric - binary alphabet

(the edit distance)

Σ 0 1 −0 0 1 1.5

1 1 0 1.5

− 1.5 1.5 0

Insertions and deletions occur less frequently!

4

Page 7: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Metric symbol distance

Unit metric - binary alphabet

(the edit distance)

Σ 0 1 −0 0 1 1

1 1 0 1

− 1 1 0

Insertions and deletions occur less frequently!

General metric - α, β, γ have

metric properties

(identity, symmetry, triangle ineq.)

Σ 0 1 −0 0 α β

1 α 0 γ

− β γ 0

4

Page 8: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Multiple Alignment

When sequence similarity is low pairwise alignment may fail to find

similarities.

5

Page 9: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Multiple Alignment

When sequence similarity is low pairwise alignment may fail to find

similarities.

Considering several strings may be helpful.

Input: k strings s1 . . . sk.

Output: k × l matrix A.

a a g a - c t

- - g a g c t

a c g a g c -

a - g t g c t

5

Page 10: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Multiple Alignment

When sequence similarity is low pairwise alignment may fail to find

similarities.

Considering several strings may be helpful.

Input: k strings s1 . . . sk.

Output: k × l matrix A.

a a g a - c t

- - g a g c t

a c g a g c -

a - g t g c t

How do we score columns?

5

Page 11: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Sum of Pairs score (SP-score)

Let the cost be the sum of costs for all pairs of rows:

k∑i=1

k∑j=i

dA(si, sj),

where dA(si, sj) is the pairwise distance between the rows

containing strings si and sj.

6

Page 12: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier NP results

• Wang and Jiang ’94 - Non-metric

• Bonizzoni and Vedova ’01 - Specific metric

(we build on their construction)

• Just ’01 - Many metrics

Page 13: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier NP results

• Wang and Jiang ’94 - Non-metric

• Bonizzoni and Vedova ’01 - Specific metric

(we build on their construction)

• Just ’01 - Many metrics

Page 14: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier NP results

• Wang and Jiang ’94 - Non-metric

• Bonizzoni and Vedova ’01 - Specific metric

(we build on their construction)

• Just ’01 - Many metrics

Page 15: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier NP results

• Wang and Jiang ’94 - Non-metric

• Bonizzoni and Vedova ’01 - Specific metric

(we build on their construction)

• Just ’01 - Many metrics

In particular it was unknown for the unit metric.

Page 16: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier NP results

• Wang and Jiang ’94 - Non-metric

• Bonizzoni and Vedova ’01 - Specific metric

(we build on their construction)

• Just ’01 - Many metrics

In particular it was unknown for the unit metric.

Here: All binary or larger alphabets under all metrics!

Page 17: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Independent R3 Set

Independent set in three regular graphs, i.e. all vertices have degree

3, is NP-hard.

Input: A three regular graph G = (V,E) and a integer c.

Output: “Yes” if there is an independent set V ′ ⊆ V of size ≥ c,

otherwise “No”.

ss@

@@

@@

ss

s ss s�

��

��

ss

@@

@@

@

ss ���

ss @@@ ss s s s

s�

��

��

ss

ssss @

@@

@@

ss

��

��

ss@

@@

s s���s s

8

Page 18: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Independent R3 Set

Independent set in three regular graphs, i.e. all vertices have degree

3, is NP-hard.

Input: A three regular graph G = (V,E) and a integer c.

Output: “Yes” if there is an independent set V ′ ⊆ V of size ≥ c,

otherwise “No”.

ss@

@@

@@

ss

s ss s�

��

��

ss

@@

@@

@

ss ���

ss @@@ ss s s s

s�

��

��

ss

ssss @

@@

@@

ss

��

��

ss@

@@

s s���s s

•• •

8

Page 19: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Independent R3 SetG = (V,E)

c

set of size ≥ c

V ′

SP-scoreSet of strings

K

matrix of cost ≤ K

A

9

Page 20: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Independent R3 SetG = (V,E)

c

set of size ≥ c

V ′

SP-scoreSet of strings

K

matrix of cost ≤ K

A

ss

��

��

ss

ss@

@@

@@

ss

ss ss•

v1

v2 v3

v4

v1 v2 v3 v41 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1...

......

......

......

......

...1 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1

1...1

0 0. . . 10. . . 1 0. . . 0. . . 0 0. . . 0. . . 0 0. . .0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0

0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 00. . . 0 0. . . 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 00. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

9

Page 21: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Gadgetering

1. Each vertex is represented by a column (n vertex columns).

2. c vertex columns are picked.

3. Each edge is represented by an edge string.

v1 v2 . . . vi . . . vn1 0. . . 0. . . 1 . . . 1 . . . 1...

......

......

...... . . .

...1 0. . . 0. . . 1 . . . 1 . . . 1

1 1...

...1 1

0 0. . . 10. . . 1 0. . . 0. . . 0 0. . . 0. . . 0 0. . .0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0

0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 00. . . 0 0. . . 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 00. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

10

Page 22: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Template strings → Vertex columns

We add very many template strings:

T = (10b)n−1

v1 v2 . . . vi . . . vn

1 0. . . 0. . . 1 . . . 1 . . . 1

......

......

......

... . . ....

1 0. . . 0. . . 1 . . . 1 . . . 1

Fact: Identical strings are aligned identically in an optimal

alignment.

11

Page 23: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Pick strings → V ′ ⊆ V

We add very many pick strings: P = 1c.

v1 v2 . . . vi . . . vn

1 0. . . 0. . . 1 . . . 1 . . . 1...

......

......

...... . . .

...

1 0. . . 0. . . 1 . . . 1 . . . 1

1 1...

...

1 1

Fact: c vertex columns are picked since this is the alignment with

least missmatches.

12

Page 24: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Edge strings

Property: If there is an edge (vi, vj) then both vi and vj can not be

part of the independent set.

Eij = 0s(00b)i−110b−s(00b)j−i−110b(00b)n−j−100s

v1 vi . . . . . . vj . . . vn

1 0. . . 0. . . 1 . . . . . . 1 . . . 1...

......

... . . . . . .... . . .

...

1 0. . . 0. . . 1 . . . . . . 1 . . . 1

good 0s 0 0. . . 0. . . 1 0. . . 1 0s−1 0 0. . . 0. . . 0

good 0 0. . . 0. . . 0 0s−1 1 0. . . 1 0. . . 0. . . 0 0s

bad 0s 0 0. . . 0. . . 1 −s 0. . . 1 0. . . 0. . . 0 0s

13

Page 25: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Independent set ≥ c ⇔ Alignment ≤ K

K = (n − c)γb2 + b(n − 1)βb2 + (sβ + nα)bm +(cα + (s + b(n −

1) + n− c− 2)β + 2γ)bm− 3cb(α + γ − β) + (sβ + 2α)m2

14

Page 26: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Independent set ≥ c ⇔ Alignment ≤ K

1. Remember three regular graph. So there are atmost three ones in

each vertex column.

2. If a column with only two ones is picked then there is a missmatch

extra for each pick string (⇒ score > K).

14

Page 27: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Example

1. Atmost three ones in each vertex column.

2. Pick strings pick columns with most ones.

ss

��

��

ss

ss@

@@

@@

ss

ss ss•

v1

v2 v3

v4

v1 v2 v3 v41 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1...

......

......

......

......

...1 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1

1...

1

E12 0 0. . . 10. . . 1 0. . . 0. . . 0 0. . . 0. . . 0 0. . .E13 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0E14 0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E23 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0

E24 0. . . 0 0. . . 0. . . 1 0. . . 0. . . 0 0. . . 0. . . 0E34 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

15

Page 28: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Example

1. Atmost three ones in each vertex column.

2. Pick strings pick columns with most ones.

ss

��

��

ss

ss@

@@

@@

ss

ss ss•

v1

v2 v3

v4

v1 v2 v3 v41 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1...

......

......

......

......

...1 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1

1...

1

E12 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 0E13 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0E14 0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E23 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0

E24 0. . . 0 0. . . 0. . . 1 0. . . 0. . . 0 0. . . 0. . . 0E34 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

15

Page 29: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Example

1. Atmost three ones in each vertex column.

2. Pick strings pick columns with most ones.

ss

��

��

ss

ss@

@@

@@

ss

ss ss•

v1

v2 v3

v4

v1 v2 v3 v41 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1...

......

......

......

......

...1 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1

1...

1

E12 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 0E13 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0E14 0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E23 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0

E24 0 0. . . 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E34 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

15

Page 30: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Example

1. Atmost three ones in each vertex column.

2. Pick strings pick columns with most ones.

ss

��

��

ss

ss@

@@

@@

ss

ss ss•

v1

v2 v3

v4

v1 v2 v3 v41 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1...

......

......

......

......

...1 0. . . 0. . . 1 0. . . 0. . . 1 0. . . 0. . . 1

1...

1

E12 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 0E13 0. . . 1 0. . . 0. . . 0 0. . . 10. . . 0 0. . . 0. . . 0E14 0 0. . . 10. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E23 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0 0. . . 0. . . 0

E24 0 0. . . 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . .E34 0. . . 0 0. . . 0. . . 0 0. . . 0. . . 1 0. . . 10. . . 0

15

Page 31: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Star Alignment

SP-score not a good model of evolution.

cs���

���

s1

@@

@@

@I

s2

@@

@@

@Rs3

��

��

�s4

16

Page 32: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Star Alignment

SP-score not a good model of evolution.

Input: k strings s1 . . . sk

Output: string c minimizing∑i

d(c, si)?

s1s2

s3s4

16

Page 33: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Star Alignment

SP-score not a good model of evolution.

Input: k strings s1 . . . sk

Output: string c minimizing∑i

d(c, si)?

s1s2

s3s4

Symbol distance metric =⇒ Pairwise alignment distance metric

Star Alignment is a special case of Steiner Star in a metric space.

16

Page 34: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier results - Star Alignment

• Wang and Jiang ’94 - Non-metric APX-complete

• Li, Ma, Wang ’99 - Constant number gaps, NP-hard and PTAS

• Sim and Park ’01 - Specific metric NP-hard

Page 35: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier results - Star Alignment

• Wang and Jiang ’94 - Non-metric APX-complete

• Li, Ma, Wang ’99 - Constant number gaps, NP-hard and PTAS

• Sim and Park ’01 - Specific metric NP-hard

Page 36: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier results - Star Alignment

• Wang and Jiang ’94 - Non-metric APX-complete

• Li, Ma, Wang ’99 - Constant number gaps, NP-hard and PTAS

• Sim and Park ’01 - Specific metric NP-hard

Page 37: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Earlier results - Star Alignment

• Wang and Jiang ’94 - Non-metric APX-complete

• Li, Ma, Wang ’99 - Constant number gaps, NP-hard and PTAS

• Sim and Park ’01 - Specific metric NP-hard

Here: All binary or larger alphabets under all metrics!

Page 38: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Vertex CoverG = (V,E)

minimum cover

V ′

Star AlignmentSet of strings

minimum string

c = DDCDD

18

Page 39: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Vertex CoverG = (V,E)

minimum cover

V ′

Star AlignmentSet of strings

minimum string

c = DDCDD

v1 v2 v3 . . . vn

C = . . . ? . . . ? . . . ? . . . ? . . .

18

Page 40: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Vertex CoverG = (V,E)

minimum cover

V ′

Star AlignmentSet of strings

minimum string

c = DDCDD

v1 v2 v3 . . . vn

CV = . . . 1 . . . 1 . . . 1 . . . 1 . . .

CV - Only 1’s.

18

Page 41: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Reduction

Vertex CoverG = (V,E)

minimum cover

V ′

Star AlignmentSet of strings

minimum string

c = DDCDD

v1 v2 v3 . . . vn

C∅ = . . . 0 . . . 0 . . . 0 . . . 0 . . .

CV - Only 1’s. C∅ - Only 0’s.

18

Page 42: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Construction Idea

(E,G) → c = DDCDD

(E,Sab) → c = vertex cover

G → minimum cover

c,

,,

,,

,,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

ll

ll

ll

l

ssSab

QQ

QQ

QQ

Q

ss

E &%'$

,,

,,

,,

,

ss

E

��

��

��

ssSa′b′

&%'$

s

sG

String minimizing∑

i d(c, si) ↔ minimum cover

19

Page 43: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Canonical Structure

E = DDCV DD G = DDC∅DD

Differ only in the vertex positions.

20

Page 44: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Canonical Structure

E = DDCV DD G = DDC∅DD

Differ only in the vertex positions.

Many such pairs will vote for the

canonical structure. c ,,

,,

,,

,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

20

Page 45: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Canonical Structure

E = DDCV DD G = DDC∅DD

Differ only in the vertex positions.

Many such pairs will vote for the

canonical structure.

Vote even in the vertex positions!

c ,,

,,

,,

,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

20

Page 46: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Canonical Structure

E = DDCV DD G = DDC∅DD

Differ only in the vertex positions.

Many such pairs will vote for the

canonical structure.

Vote even in the vertex positions!

c ,,

,,

,,

,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

⇒ c = DDCDD

20

Page 47: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

String → Cover

Edge (va, vb) add two strings

E = DDCV DD Sab = CaDCb

v1 . . . va . . . vb . . . vn

Ca = . . . 0 . . . 1 . . . 0 . . . 0 . . .

21

Page 48: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

String → Cover

Edge (va, vb) add two strings

E = DDCV DD Sab = CaDCb

v1 . . . va . . . vb . . . vn

Cb = . . . 0 . . . 0 . . . 1 . . . 0 . . .

21

Page 49: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

String → Cover

Edge (va, vb) add two strings

E = DDCV DD Sab = CaDCb

E votes 1 in all vertex positions.

21

Page 50: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

String → Cover

Edge (va, vb) add two strings

E = DDCV DD Sab = CaDCb

E votes 1 in all vertex positions.

Sab votes 0 in all except vertex

position a or b.

D D C D D

Ca D Cb

Ca D Cb

21

Page 51: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

String → Cover

Edge (va, vb) add two strings

E = DDCV DD Sab = CaDCb

E votes 1 in all vertex positions.

Sab votes 0 in all except vertex

position a or b.

D D C D D

Ca D Cb

Ca D Cb

Either vertex va or vertex vb is part of the cover.

21

Page 52: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Minimal String ↔ Minimum Cover

(E,G) → c = DDCDD

(E,Sab) → c = vertex cover

c,

,,

,,

,,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

ll

ll

ll

l

ssSab

QQ

QQ

QQ

Q

ss

E &%'$

,,

,,

,,

,

ss

E

��

��

��

ssSa′b′

&%'$

22

Page 53: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Minimal String ↔ Minimum Cover

(E,G) → c = DDCDD

(E,Sab) → c = vertex cover

c,

,,

,,

,,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

ll

ll

ll

l

ssSab

QQ

QQ

QQ

Q

ss

E &%'$

,,

,,

,,

,

ss

E

��

��

��

ssSa′b′

&%'$

s

sG

G = DDC∅DD penalizes each 1 in vertex positions.

22

Page 54: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Minimal String ↔ Minimum Cover

(E,G) → c = DDCDD

(E,Sab) → c = vertex cover

c,

,,

,,

,,

ss E

��

��

��

ss

G&%'$

ll

ll

ll

l

ss

G

QQ

QQ

QQ

Q

ss E

&%'$

ll

ll

ll

l

ssSab

QQ

QQ

QQ

Q

ss

E &%'$

,,

,,

,,

,

ss

E

��

��

��

ssSa′b′

&%'$

s

sG

G = DDC∅DD penalizes each 1 in vertex positions.

String minimizing∑

i d(c, si) ↔ minimum cover

22

Page 55: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Tree Alignment (given phylogeny)

Star Alignment not a good model of evolution.

p1s

p2 s p3s

s

sJJJJJJJ

s

sAAAAAAAAA

s

s

���������

s

s

��

��

��

��

s

s

CCCCCCCCC

s

ss1 s2 s3 s4

23

Page 56: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Tree Alignment (given phylogeny)

Star Alignment not a good model of evolution.

Input: k strings s1 . . . sk

phylogeny T

Output: strings p1 . . . pk−1

min∑

(a,b)∈E(T )

d(a, b)

?

? ?

s

sJJJJJJJ

s

sAAAAAAAAA

s

s

���������

s

s

��

��

��

��

s

s

CCCCCCCCC

s

ss1 s2 s3 s4

23

Page 57: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Tree Alignment (given phylogeny)

Star Alignment not a good model of evolution.

Input: k strings s1 . . . sk

phylogeny T

Output: strings p1 . . . pk−1

min∑

(a,b)∈E(T )

d(a, b)

?

? ?

s

sJJJJJJJ

s

sAAAAAAAAA

s

s

���������

s

s

��

��

��

��

s

s

CCCCCCCCC

s

ss1 s2 s3 s4

Symbol distance metric =⇒ Pairwise alignment distance metric

Tree Alignment is a special case of Steiner Tree in a metric space.

23

Page 58: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Construction Idea

Same idea as for Star Alignment.

��

��

��

��

��

��

��

��

s

sG

@@

@@

@@

@@

@@

@@

@@

@

s

sSij

���

ssE

��

��

ss

E

@@@

s sG

��

��

ss

E

@@@

s sG

r basecomp.

@@

@@

@@

@@

@@

@@

@@

@

s

sSi′j′

���

ssE

��

��

ss

E

@@@

s sG

��

��

ss

E

@@@

s sG

r basecomp.

��

��

m branches@@

@@

@

ssE

���s sG @

@@

@@

ssE

���s sG

r base comp. inbetweeneach branch

24

Page 59: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Overview and Open problems

Problem Here ApproxSP-score all metrics 2-approx [GPBL]

Star Alignment all metrics 2-approx

Tree Alignment all metrics PTAS [WJGL]

(given phylogeny)

25

Page 60: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Overview and Open problems

Problem Here ApproxSP-score all metrics 2-approx [GPBL]

Star Alignment all metrics 2-approx

Tree Alignment all metrics PTAS [WJGL]

(given phylogeny)

Consensus Patterns NP-hard PTAS [LMW]

Substring Parsimony NP-hard PTAS [≈ WJGL]

25

Page 61: Did you know that Multiple Alignment is NP-hard?bchor/CG05/HardnessElias.pdf · Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

Acknowledgments

My advisor

Prof. Jens Lagergren

Prof. Benny for hosting me

Thanks!

26