View
218
Download
0
Tags:
Embed Size (px)
Citation preview
DNA computing
Solving Optimization problems on a DNA computer
Ka-Lok Ng
Dept. of Bioinformatics
Taichung Healthcare and Management University
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Why consider DNA computing ?Essences of computation1. Massive Parallelism
A test tube can contains 1022 DNA strands, each reaction take place independently.
2. Number of operations/sec. Silicon-based computer is much better, ~ 10 operations/secDNA computing needs human interception.
3. Extreme large associative memoryMemory density ~ 1 bit/(nm) 3 >> video tape ~ 1 bit/1012 (nm) 3
Human synapses ~ 1014 , each store a few bits.Associative memory – match a sub-sequenceStore 000000….
110100….000111….
Input seq.*1*1…… retrieve & read the 2nd strand
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Basic Molecular Biology5’ 3’
A G T C C …………………….
T C A G G ……………………
3’ 5’
The DNA Double Helix
Purines (Double ring structure) Adenine, A Guanine, G
Pyrimidines (Single ring structure) Thymine, T Cytosine, C
(A,T) Watson-Crick complement
(C,G) pairwise attraction
Hydrogen boning
Basic DNA Operations1. DNA synthesiser – make arbitrary DNA strands, time ~ hours
Notation : ATGC = 5’-ATGC-3’
(ATGC)C = 3’-TACG-5’
2. Hybridization (annealing), by hydrogen boning, time ~ 30 sec.
it is a 2nd order kinetic reaction
3. Denature – by heating till the longest strands unstable,
dsDNA ssDNA
4. Ligation
x y
x C y C x y
x C y C
Ligase enzyme
Basic DNA Operations5. Polymerase Extensions
Polymerase enzyme attached to the 3’ end of the promer seq. & construct the xC of the longer seq.
3‘5‘ 3‘
Primer
3‘5‘
Polymeraseenzyme
x
xC
Basic DNA Operations6. Cut – Type II Restriction enzyme (endonucleases), cut ssDNA or dsDNA
strand at a specific sub-seq., usually a 4 to 8 nucleotides seq.Nomenclature Specific-site Expected freq.EcoRI 5’-GAATTC-3’ 46 = 4096HaeIII 5’-GGCC-3’ 44 = 256PstI 5’-CTGCAG-3’ 46 = 4096HaeIII PstI EcoRI
GGCC CTGCAG GAATTCCCGG GACGTC CTTAAG
Blunt 3’-protruding 5’-protruding
Basic DNA Operations7. Merge – combine two or more test tubes of DNA solution
8. Separation by length – Gel Electrophoresis
Agarose Gel Electrophoresis (AGE)
Long seq. – 300 ~ 50,000 bp, t ~ 5 hours
Polyacrylamide Gel Electrophoresis (PAGE)
Short seq. – 1 ~ 1000 bp, t ~ 1 hour
- gel
buffer
+ gel
buffer
- +
Shortest DNA strands
High Level Manipulations1. Polymerase Chain Reaction (PCR) – Amplification
developed by Kary Mullis (Noble Prize in medicine 1994)
Prepare
Primers
x y z
xC y C zC
3‘ 5‘
5‘ 3‘
xC
zC
zC5‘
5‘
5‘
3‘xC
5‘ 3‘
3‘3‘
Template
High Level Manipulations1. Polymerase Chain Reaction (PCR)
Polymerase dsDNA melting polymerase ……
Repeat the above two processes amplification
x y z
xC
3‘ 5‘
5‘polymerase
z
zC
5‘
5‘ 3‘
polymerase
High Level Manipulations2. Separation by sub-sequence – by magnetic bead (affinity
purification)
3. Append
3.1 Polymerase
3.2 Ligation
s sC
magnet
A primer with attached bead
anneal a short seq.s
sC
x y
xC yC xC y C
3‘
5‘5‘
3‘
x y
High Level Manipulations4. Mark – for separation or operate selectively
4.1 appending a tag seq.
4.2 methylation or (de)phosphorylation
4.3 forming a dsDNA through hybridization or the
action of polymerase
5. Unmark – removes the mark on the strand
append a tag seq.ssDNA
3‘ 5‘ Methylation or (de)phosphorylation of the 5’ end. Carry out by specific enzymes, it canStop some restriction enzymes cutting to theSite.
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Molecular Computation - Solved problemsHamiltonian Path Problem
3.1 Directed Hamiltonian Path Problem (DHPP)Adleman, Science, 266, 1024 (1994)Problem : Given 7 cities, is there a unique path every cities visit once ?
4
3 1
0 6
2 5A possible solution : 0 1 2 3 4 5 6
Algorithm : 1. For each vertex V and edge E, create a 20-mer DNA strand
V : x0y0, x1y1,……, x6y6 and x0c,y6
c
E : y0cx1
c , y1cx2
c,…., y5cx6
c , y0cx3
c , y0cx6
c ,…..
in out
Molecular Computation - Solved problemsHamiltonian Path Problem
Algorithm (DHPP)
2. Hybridization – possibility of forming the following DNA strands Not x0 begin, y6 end ( 1 2 3 4 5 6 )
x0 begin, not y6 end ( 0 1 2 3 4 5 )
x0 begin, y6 end but not visit every cities once
( 0 3 2 3 4 5 6 ) consider to be the noise x0 begin, y6 end, and visit every cities once ( 0 1 2 3 4 5 6 )
Molecular Computation - Solved problems Hamiltonian Path Problem
Algorithm (DHPP)
3. Separation Select those dsDNA start with x0 and end with y6 Amplify – use PCR to amplify the above type of DNA strands
4. Separate out all dsDNA that go through exactly 7 vertices (140-mer), by PAGE, for N<150, d = a – b ln N
then amplify by PCR
d
N
a
Molecular Computation - Solved problemsHamiltonian Path Problem
Algorithm (DHPP)
5. Separate out all DNA strands go through all 7 cities – by affinity purification
Melt the dsDNA strands from above Extract by affinity purification
………….
6. Detect if there are any DNA strands remain,Yes solution of the DHPPNo No solution of the DHPP
y0cx1
c with attached bead
y5cx6
c
with attached bead
Molecular Computation - Solved problemsHamiltonian Path Problem
Step Time
Create DNA strands
Hybridization ~30 sec
PCR ~ 2 hrs.
Gel Electrophoresis ~ 5 hrs. (AGE), ~ 1.2 hrs. (PAGE)
Affinity Purification 7 times ~ 1 hrs.
Detect ~ sec.
Total ~ 7 days !!
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Molecular Computation - Solved problemsBoolean formula
Boolean Formula, B
In particular, consider the Conjunctive Normal Form (CNF)
B = C1C2 C3… Cm
where Ck = x1x2 x3’…
C is called the clause , x is called the literal ,
is the logical AND , is the logical OR and x’ is the negation of x
B = (x1x2)(x1x2 x3’) … Cm
Satisfiability Problem ( B True )
Determine a set of of the logical variables (x1, x2, x3…) such that B T
Molecular Computation - Solved problemsBoolean formula
Algorithm (Boolean Formula)
Example : B = (x1x2 )(x1’ x2’)
x1 x2 x1x2 (x1x2 )(x1’ x2’) x1’ x2’ x1’ x2’
0 0 0 0 1 1 1
0 1 1 1 1 0 1
1 0 1 1 0 1 1
1 1 1 0 0 0 0
Molecular Computation - Solved problemsBoolean formula
3.2 Boolean Formula
Lipton, Science, 268, 542 (1995)
Encode an n bit binary number by a graph, Gn
x1 x2 xn
a1 a2 a3………… an an+1
x1’ x2’ xn’
Notation : x=1 True, x’=0 False, vertex ai, Edges Eaixi, Eaixi’, Exiai+1,Exi’ai+1
a1x1a2x2’a3 encode binary number 10
In general, graph Gn represent {0,1}n
X
X
Molecular Computation - Solved problemsBoolean formula
Algorithm (Boolean Formula)1. Create DNA strands to encode vertices and edges
x1 x2 xn
a1 a2 a3…………… an an+1
x1’ x2’ xn’
Vertex (3n+1 strands) Edge (4n strands)
a1 Ea1x1 .
an+1 Ea1x1’
x1 Ex1a2
x1’ Ex1’a2
pa1qa1
5‘ 3‘
pan+1qan+1
5‘ 3‘
qa1cpx1
c5‘ 3‘
qa1cpx1’
c5‘ 3‘
qx1cpa2
c5‘ 3‘
qx1’cpa2
c5‘ 3‘
px1qx1
5‘ 3‘
px1’qx1’
5‘ 3‘
Molecular Computation - Solved problemsBoolean formula
Algorithm (Boolean Formula)
2. Hybridization
A path V-E-V-E-V……..V denote an n bit binary number
Example : a path a1x1a2x2’a3 denote a 2 bit binary number, 10
pa1qa1
5‘px1qx1 px2’qx2’pa2qa2 pa3qa3
qa1cpx1
c qx1cpa2
c qa2cpx2’
c qx2’cpa3
c
Molecular Computation - Solved problems Boolean formula
Algorithm (Boolean Formula)
3. Extraction
Define E(t,i,a) to represent extracting test tube t, where the ith position has Boolean value, a = 0 or 1.
OR – are done by using multiple tubes
AND – are done by repeated extraction
Molecular Computation - Solved problemsBoolean formula
Algorithm (Boolean Formula)
3. Extraction
Test tube Operation Value present
t0 Create DNA strands 00, 01, 10, 11
t1 E(t0,1,1) 10, 11
t1’ Remainder of t1 00, 01
t2 E(t1’,2,1) 01
t3 Merge, t1 t2 10, 11, 01 T
t4 (need to remove 11) E(t3,1,0) 01
t4’ Remainder of t4 10, 11
t5 E (t4’,2,0) 10
t6 Merge, t4 t5 01, 10
Molecular Computation - Solved problemsBoolean formula
Algorithm (Boolean CNF Formula)1. Create DNA strands to encode all n bit binary number2. Hybridization3. Extraction
Let tk be the test tube satisfies C1C2C3…Ck
and let Ck+1 = xaxa+1 … xm (where x is either 0 or 1), for simplification consider Ck+1 = xaxa+1
E(tk,a,1) xa=1 T1a
T1aR E(T1a
R,a+1,1) xa+1=1 Ta+1
T1aTa+1 satisfies Ck+1
E(tk,a,0) xa=0 T0a E(T0a,a+1,1) xa+1=1 Ta+1
T0aR
T0aRTa+1 satisfies Ck+1
Molecular Computation - Solved problemsInteger knapsack problem
Integer knapsack problem
Given a set of integers ai and integer A, does there exist a subset
S{1,…n}, s.t. iS ≦A.
1. To solve this problem, make use of the synthesis, annealing and merging operations.
2. Prepare a starter, S, one strand end is blunt and blocked by 5’-biotinylation and the other end is sticky.
3. Use DNA double strands to encode integers a1 ….an. With length proportional to the magnitude and both ends are sticky.
B S
a1xCx
xC
Molecular Computation - Solved problemsInteger knapsack problem
4. Generation of all possible combinations, 2n, by concatenation of the DNA strands.
5. The final DNA solution consisting of 2n different DNA double strands; the final answer is to check if the solution containing strands with length equal to A by agarose gel electrophoresis.
Molecular Computation - Solved problemsInteger knapsack problem
Limitation : This brute-force algorithm has an exponential time-complexity, O(2n). The concept of encoding all possible solutions by DNA strands is suffered from the exponential growth in the size of the solution space, for instance, a 70 cities of the DHPP will fit in a milliliter of solution (1020 DNA strands).
Hence, people consider to develop a parallel computation model.
Dynamic programming approach
1.Parallelism : parallel algorithm, because of the principle of optimality applied, hence, a DNA computer might be useful for solving large instances of problems.
2.For the integer knapsack problem : the worst-case time complexity is O(minimum(2n, nA)) [Ref. 1].
Molecular Computation - Solved problemsInteger knapsack problem
Given a set of integers w1,w2 ……wn and W, with the corresponding profit integers p1,p2 ……pn, is it exist that a sub-set S {1,2,….n}, that satisfy
iSwi W≦ max and maximize iS pi .
Dynamic programming Solution
Let fi(x) be the optimal solution to the integer knapsack problem,
fi(x) = max { fi-1(x) , pn+ fn-1(x-wn) } where x is the capacity remaining, and fi(x) = 0 for x>0 and fi(x) = ∞ for x ﹣ 0.
Notice that fi(x) is an ascending function, i.e. 0=x1 x2 ….. xn , fi(x1)
fi(x2)….. fi(xk) ; fi(x) = ∞﹣ , x x1 ; fi(x) = fi(xk) , x xk ; and fi(x) = fi(xj) , xj x xj+1 。 To solve this problem, we make use of the method
suggested by Horowitz etc. [Ref. 2] to compute fi(xj) for 1 j k.
Let the ordered set S1i = { ( P,W ) | ( P - pi+1, W - pi+1 ) S i } to represent
fi(x), where P = fi(xj) , W = xj and S0 ={ ( 0,0 ) }.
Molecular Computation - Solved problemsInteger knapsack problem
S i+1 can be computed from S i by first computing
S1i = { ( P,W ) | ( P - pi+1, W - wi+1 ) S i }
where S i+1= S i S∪ 1i . If Si contains ( Pj,Wj ) and ( Pk,Wk ) with Pj Pk
and Wj Wk then the pair ( Pj,Wj ) can be discarded from Si, and this
condition is known as the dominance rules.
For example, consider the case n=3, (w1, w2, w3)=(2,3,4), (p1, p2, p3)=(1,2,5) and Wmax=6. For this case, we have
S0={(0,0)}; S10={(1,2)}
S1={(0,0),(1,2)}; S11={(2,3),(3,5)}
S2={(0,0) ,(1,2), (2,3),(3,5)}; S12={(5,4),(6,6),(7,7),(8,9)}
S3={(0,0) ,(1,2), (2,3),(3,5),(5,4),(6,6),(7,7),(8,9)}
The pair (3,5) is discarded because of the dominance rules.
Molecular Computation - Solved problemsInteger knapsack problem
Implementation of dynamic programming
Consider the case n = 3, (w1, w2, w3) = (2,3,4), (p1, p2, p3) = (1,2,5) and Wmax = 6.
DNA Operation Test Tubes, TP and TW Test Tubes, TP and TW
S0 = {(0,0)}Copy S0 = {(0,0)}Addition : (p,w) = (1,2)
S01 = {(1,2)}
Merge S1 = S0 S01 = {(0,0), (1,2)}
Copy S1 = {(0,0), (1,2)}
Addition: (p,w) = (2,3) S11 = {(2,3), (3,5)}
MergeS2 = S1 S1
1 = {(0,0), (1,2), (2,3), (3,5)}
Copy S2 = {(0,0), (1,2), (2,3), (3,5)}
Addition: (p,w) = (5,4) S21 = {(5,4), (6,6), (7,7), (8,9)}
MergeS3 = S2 S2
1 = {(0,0), (1,2), (2,3), (3,5), (5,4), (6,6), (7,7), (8,9)}
Molecular Computation - Solved problemsInteger knapsack problem
Implementation of dynamic programming
Difficulties
1. Do not know how to communicate between DNA strands. This operation is required in order to match Pk and Wk.
2. Do not know how to compare numbers between DNA strands. This operation is required in order to test the dominance rules.
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Limitations and ErrorsLimitations and Errors
1. DNA synthesis ~ 90% efficiency
2. Long strands of DNA decay quickly, 10000 is the maximum base length can be kept in vitro without significant breakage.
3. Extraction
A good path were lost during extract
Take a bad path as if a good one
4. Undesirable hybridization
5. Seq. s could anneal with a similar seq. sc
Content1. Why consider DNA computing ?
2. Basic molecular biology & Basic DNA operations
3. Molecular Computation - Solved problems
3.1 Hamiltonian Path Problem
3.2 Boolean formula
3.3 Integer knapsack problem
4. Limitations and Errors
5. Prospective
Prospective1. There appears little theoretical difficulty in creating a functional
DNA computer.
2. Depend on finding killer applications uniquely suitable for computation by DNA.
3. Improvements in reducing errors and operation costs.