40
DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

DNA computing

Solving Optimization problems on a DNA computer

Ka-Lok Ng

Dept. of Bioinformatics

Taichung Healthcare and Management University

Page 2: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 3: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 4: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Why consider DNA computing ?Essences of computation1. Massive Parallelism

A test tube can contains 1022 DNA strands, each reaction take place independently.

2. Number of operations/sec. Silicon-based computer is much better, ~ 10 operations/secDNA computing needs human interception.

3. Extreme large associative memoryMemory density ~ 1 bit/(nm) 3 >> video tape ~ 1 bit/1012 (nm) 3

Human synapses ~ 1014 , each store a few bits.Associative memory – match a sub-sequenceStore 000000….

110100….000111….

Input seq.*1*1…… retrieve & read the 2nd strand

Page 5: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 6: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Basic Molecular Biology5’ 3’

A G T C C …………………….

T C A G G ……………………

3’ 5’

The DNA Double Helix

Purines (Double ring structure) Adenine, A Guanine, G

Pyrimidines (Single ring structure) Thymine, T Cytosine, C

(A,T) Watson-Crick complement

(C,G) pairwise attraction

Hydrogen boning

Page 7: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Basic DNA Operations1. DNA synthesiser – make arbitrary DNA strands, time ~ hours

Notation : ATGC = 5’-ATGC-3’

(ATGC)C = 3’-TACG-5’

2. Hybridization (annealing), by hydrogen boning, time ~ 30 sec.

it is a 2nd order kinetic reaction

3. Denature – by heating till the longest strands unstable,

dsDNA ssDNA

4. Ligation

x y

x C y C x y

x C y C

Ligase enzyme

Page 8: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Basic DNA Operations5. Polymerase Extensions

Polymerase enzyme attached to the 3’ end of the promer seq. & construct the xC of the longer seq.

3‘5‘ 3‘

Primer

3‘5‘

Polymeraseenzyme

x

xC

Page 9: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Basic DNA Operations6. Cut – Type II Restriction enzyme (endonucleases), cut ssDNA or dsDNA

strand at a specific sub-seq., usually a 4 to 8 nucleotides seq.Nomenclature Specific-site Expected freq.EcoRI 5’-GAATTC-3’ 46 = 4096HaeIII 5’-GGCC-3’ 44 = 256PstI 5’-CTGCAG-3’ 46 = 4096HaeIII PstI EcoRI

GGCC CTGCAG GAATTCCCGG GACGTC CTTAAG

Blunt 3’-protruding 5’-protruding

Page 10: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Basic DNA Operations7. Merge – combine two or more test tubes of DNA solution

8. Separation by length – Gel Electrophoresis

Agarose Gel Electrophoresis (AGE)

Long seq. – 300 ~ 50,000 bp, t ~ 5 hours

Polyacrylamide Gel Electrophoresis (PAGE)

Short seq. – 1 ~ 1000 bp, t ~ 1 hour

- gel

buffer

+ gel

buffer

- +

Shortest DNA strands

Page 11: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

High Level Manipulations1. Polymerase Chain Reaction (PCR) – Amplification

developed by Kary Mullis (Noble Prize in medicine 1994)

Prepare

Primers

x y z

xC y C zC

3‘ 5‘

5‘ 3‘

xC

zC

zC5‘

5‘

5‘

3‘xC

5‘ 3‘

3‘3‘

Template

Page 12: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

High Level Manipulations1. Polymerase Chain Reaction (PCR)

Polymerase dsDNA melting polymerase ……

Repeat the above two processes amplification

x y z

xC

3‘ 5‘

5‘polymerase

z

zC

5‘

5‘ 3‘

polymerase

Page 13: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

High Level Manipulations2. Separation by sub-sequence – by magnetic bead (affinity

purification)

3. Append

3.1 Polymerase

3.2 Ligation

s sC

magnet

A primer with attached bead

anneal a short seq.s

sC

x y

xC yC xC y C

3‘

5‘5‘

3‘

x y

Page 14: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

High Level Manipulations4. Mark – for separation or operate selectively

4.1 appending a tag seq.

4.2 methylation or (de)phosphorylation

4.3 forming a dsDNA through hybridization or the

action of polymerase

5. Unmark – removes the mark on the strand

append a tag seq.ssDNA

3‘ 5‘ Methylation or (de)phosphorylation of the 5’ end. Carry out by specific enzymes, it canStop some restriction enzymes cutting to theSite.

Page 15: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 16: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsHamiltonian Path Problem

3.1 Directed Hamiltonian Path Problem (DHPP)Adleman, Science, 266, 1024 (1994)Problem : Given 7 cities, is there a unique path every cities visit once ?

4

3 1

0 6

2 5A possible solution : 0 1 2 3 4 5 6

Algorithm : 1. For each vertex V and edge E, create a 20-mer DNA strand

V : x0y0, x1y1,……, x6y6 and x0c,y6

c

E : y0cx1

c , y1cx2

c,…., y5cx6

c , y0cx3

c , y0cx6

c ,…..

in out

Page 17: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsHamiltonian Path Problem

Algorithm (DHPP)

2. Hybridization – possibility of forming the following DNA strands Not x0 begin, y6 end ( 1 2 3 4 5 6 )

x0 begin, not y6 end ( 0 1 2 3 4 5 )

x0 begin, y6 end but not visit every cities once

( 0 3 2 3 4 5 6 ) consider to be the noise x0 begin, y6 end, and visit every cities once ( 0 1 2 3 4 5 6 )

Page 18: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problems Hamiltonian Path Problem

Algorithm (DHPP)

3. Separation Select those dsDNA start with x0 and end with y6 Amplify – use PCR to amplify the above type of DNA strands

4. Separate out all dsDNA that go through exactly 7 vertices (140-mer), by PAGE, for N<150, d = a – b ln N

then amplify by PCR

d

N

a

Page 19: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsHamiltonian Path Problem

Algorithm (DHPP)

5. Separate out all DNA strands go through all 7 cities – by affinity purification

Melt the dsDNA strands from above Extract by affinity purification

………….

6. Detect if there are any DNA strands remain,Yes solution of the DHPPNo No solution of the DHPP

y0cx1

c with attached bead

y5cx6

c

with attached bead

Page 20: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsHamiltonian Path Problem

Step Time

Create DNA strands

Hybridization ~30 sec

PCR ~ 2 hrs.

Gel Electrophoresis ~ 5 hrs. (AGE), ~ 1.2 hrs. (PAGE)

Affinity Purification 7 times ~ 1 hrs.

Detect ~ sec.

Total ~ 7 days !!

Page 21: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 22: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Boolean Formula, B

In particular, consider the Conjunctive Normal Form (CNF)

B = C1C2 C3… Cm

where Ck = x1x2 x3’…

C is called the clause , x is called the literal ,

is the logical AND , is the logical OR and x’ is the negation of x

B = (x1x2)(x1x2 x3’) … Cm

Satisfiability Problem ( B True )

Determine a set of of the logical variables (x1, x2, x3…) such that B T

Page 23: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Algorithm (Boolean Formula)

Example : B = (x1x2 )(x1’ x2’)

x1 x2 x1x2 (x1x2 )(x1’ x2’) x1’ x2’ x1’ x2’

0 0 0 0 1 1 1

0 1 1 1 1 0 1

1 0 1 1 0 1 1

1 1 1 0 0 0 0

Page 24: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

3.2 Boolean Formula

Lipton, Science, 268, 542 (1995)

Encode an n bit binary number by a graph, Gn

x1 x2 xn

a1 a2 a3………… an an+1

x1’ x2’ xn’

Notation : x=1 True, x’=0 False, vertex ai, Edges Eaixi, Eaixi’, Exiai+1,Exi’ai+1

a1x1a2x2’a3 encode binary number 10

In general, graph Gn represent {0,1}n

X

X

Page 25: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Algorithm (Boolean Formula)1. Create DNA strands to encode vertices and edges

x1 x2 xn

a1 a2 a3…………… an an+1

x1’ x2’ xn’

Vertex (3n+1 strands) Edge (4n strands)

a1 Ea1x1 .

an+1 Ea1x1’

x1 Ex1a2

x1’ Ex1’a2

pa1qa1

5‘ 3‘

pan+1qan+1

5‘ 3‘

qa1cpx1

c5‘ 3‘

qa1cpx1’

c5‘ 3‘

qx1cpa2

c5‘ 3‘

qx1’cpa2

c5‘ 3‘

px1qx1

5‘ 3‘

px1’qx1’

5‘ 3‘

Page 26: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Algorithm (Boolean Formula)

2. Hybridization

A path V-E-V-E-V……..V denote an n bit binary number

Example : a path a1x1a2x2’a3 denote a 2 bit binary number, 10

pa1qa1

5‘px1qx1 px2’qx2’pa2qa2 pa3qa3

qa1cpx1

c qx1cpa2

c qa2cpx2’

c qx2’cpa3

c

Page 27: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problems Boolean formula

Algorithm (Boolean Formula)

3. Extraction

Define E(t,i,a) to represent extracting test tube t, where the ith position has Boolean value, a = 0 or 1.

OR – are done by using multiple tubes

AND – are done by repeated extraction

Page 28: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Algorithm (Boolean Formula)

3. Extraction

Test tube Operation Value present

t0 Create DNA strands 00, 01, 10, 11

t1 E(t0,1,1) 10, 11

t1’ Remainder of t1 00, 01

t2 E(t1’,2,1) 01

t3 Merge, t1 t2 10, 11, 01 T

t4 (need to remove 11) E(t3,1,0) 01

t4’ Remainder of t4 10, 11

t5 E (t4’,2,0) 10

t6 Merge, t4 t5 01, 10

Page 29: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsBoolean formula

Algorithm (Boolean CNF Formula)1. Create DNA strands to encode all n bit binary number2. Hybridization3. Extraction

Let tk be the test tube satisfies C1C2C3…Ck

and let Ck+1 = xaxa+1 … xm (where x is either 0 or 1), for simplification consider Ck+1 = xaxa+1

E(tk,a,1) xa=1 T1a

T1aR E(T1a

R,a+1,1) xa+1=1 Ta+1

T1aTa+1 satisfies Ck+1

E(tk,a,0) xa=0 T0a E(T0a,a+1,1) xa+1=1 Ta+1

T0aR

T0aRTa+1 satisfies Ck+1

Page 30: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

Integer knapsack problem

Given a set of integers ai and integer A, does there exist a subset

S{1,…n}, s.t. iS ≦A.

1. To solve this problem, make use of the synthesis, annealing and merging operations.

2. Prepare a starter, S, one strand end is blunt and blocked by 5’-biotinylation and the other end is sticky.

3. Use DNA double strands to encode integers a1 ….an. With length proportional to the magnitude and both ends are sticky.

B S

a1xCx

xC

Page 31: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

4. Generation of all possible combinations, 2n, by concatenation of the DNA strands.

5. The final DNA solution consisting of 2n different DNA double strands; the final answer is to check if the solution containing strands with length equal to A by agarose gel electrophoresis.

Page 32: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

Limitation : This brute-force algorithm has an exponential time-complexity, O(2n). The concept of encoding all possible solutions by DNA strands is suffered from the exponential growth in the size of the solution space, for instance, a 70 cities of the DHPP will fit in a milliliter of solution (1020 DNA strands).

Hence, people consider to develop a parallel computation model.

Dynamic programming approach

1.Parallelism : parallel algorithm, because of the principle of optimality applied, hence, a DNA computer might be useful for solving large instances of problems.

2.For the integer knapsack problem : the worst-case time complexity is O(minimum(2n, nA)) [Ref. 1].

Page 33: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

Given a set of integers w1,w2 ……wn and W, with the corresponding profit integers p1,p2 ……pn, is it exist that a sub-set S {1,2,….n}, that satisfy

iSwi W≦ max and maximize iS pi .

Dynamic programming Solution

Let fi(x) be the optimal solution to the integer knapsack problem,

fi(x) = max { fi-1(x) , pn+ fn-1(x-wn) } where x is the capacity remaining, and fi(x) = 0 for x>0 and fi(x) = ∞ for x ﹣ 0.

Notice that fi(x) is an ascending function, i.e. 0=x1 x2 ….. xn , fi(x1)

fi(x2)….. fi(xk) ; fi(x) = ∞﹣ , x x1 ; fi(x) = fi(xk) , x xk ; and fi(x) = fi(xj) , xj x xj+1 。 To solve this problem, we make use of the method

suggested by Horowitz etc. [Ref. 2] to compute fi(xj) for 1 j k.

Let the ordered set S1i = { ( P,W ) | ( P - pi+1, W - pi+1 ) S i } to represent

fi(x), where P = fi(xj) , W = xj and S0 ={ ( 0,0 ) }.

Page 34: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

S i+1 can be computed from S i by first computing

S1i = { ( P,W ) | ( P - pi+1, W - wi+1 ) S i }

where S i+1= S i S∪ 1i . If Si contains ( Pj,Wj ) and ( Pk,Wk ) with Pj Pk

and Wj Wk then the pair ( Pj,Wj ) can be discarded from Si, and this

condition is known as the dominance rules.

For example, consider the case n=3, (w1, w2, w3)=(2,3,4), (p1, p2, p3)=(1,2,5) and Wmax=6. For this case, we have

S0={(0,0)}; S10={(1,2)}

S1={(0,0),(1,2)}; S11={(2,3),(3,5)}

S2={(0,0) ,(1,2), (2,3),(3,5)}; S12={(5,4),(6,6),(7,7),(8,9)}

S3={(0,0) ,(1,2), (2,3),(3,5),(5,4),(6,6),(7,7),(8,9)}

The pair (3,5) is discarded because of the dominance rules.

Page 35: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

Implementation of dynamic programming

Consider the case n = 3, (w1, w2, w3) = (2,3,4), (p1, p2, p3) = (1,2,5) and Wmax = 6.

DNA Operation Test Tubes, TP and TW Test Tubes, TP and TW

S0 = {(0,0)}Copy S0 = {(0,0)}Addition : (p,w) = (1,2)

S01 = {(1,2)}

Merge S1 = S0 S01 = {(0,0), (1,2)}

Copy S1 = {(0,0), (1,2)}

Addition: (p,w) = (2,3) S11 = {(2,3), (3,5)}

MergeS2 = S1 S1

1 = {(0,0), (1,2), (2,3), (3,5)}

Copy S2 = {(0,0), (1,2), (2,3), (3,5)}

Addition: (p,w) = (5,4) S21 = {(5,4), (6,6), (7,7), (8,9)}

MergeS3 = S2 S2

1 = {(0,0), (1,2), (2,3), (3,5), (5,4), (6,6), (7,7), (8,9)}

Page 36: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Molecular Computation - Solved problemsInteger knapsack problem

Implementation of dynamic programming

Difficulties

1. Do not know how to communicate between DNA strands. This operation is required in order to match Pk and Wk.

2. Do not know how to compare numbers between DNA strands. This operation is required in order to test the dominance rules.

Page 37: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 38: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Limitations and ErrorsLimitations and Errors

1. DNA synthesis ~ 90% efficiency

2. Long strands of DNA decay quickly, 10000 is the maximum base length can be kept in vitro without significant breakage.

3. Extraction

A good path were lost during extract

Take a bad path as if a good one

4. Undesirable hybridization

5. Seq. s could anneal with a similar seq. sc

Page 39: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Content1. Why consider DNA computing ?

2. Basic molecular biology & Basic DNA operations

3. Molecular Computation - Solved problems

3.1 Hamiltonian Path Problem

3.2 Boolean formula

3.3 Integer knapsack problem

4. Limitations and Errors

5. Prospective

Page 40: DNA computing Solving Optimization problems on a DNA computer Ka-Lok Ng Dept. of Bioinformatics Taichung Healthcare and Management University

Prospective1. There appears little theoretical difficulty in creating a functional

DNA computer.

2. Depend on finding killer applications uniquely suitable for computation by DNA.

3. Improvements in reducing errors and operation costs.