Upload
yogesh-aggarwal
View
246
Download
2
Tags:
Embed Size (px)
Citation preview
Decision Problems
Observation: Many polynomial algorithms.
Questions: Can we solve all problems in
polynomial time?
Answer: No, absolutely not.
Definition: The class of problems that can
be solved by polynomial-time algorithms is
called P.
Contrary to P, we also have the notion of NP
problems, which describes the hardness of a
problem. However, NP 6= NotP .
A problem is called a decision problems if it
has a yes/no solution. Sometimes we also call
it a language-recognition problem. Many
problems can be cast as decision problems by
imposing simple constraints.
Let U be the set of all possible inputs to the
decision problem, let L ⊂ U be the set of in-
puts for which the answer to the problem is
“yes”. L is called the language correspond-
ing to the problem.
Jiming Peng, AdvOL, CAS, McMaster 1
Polynomial Reduction
Definition: Let L1 and L2 be two languages
from the input spaces U1 and U2. We say
L1 is polynomial reducible to L2 if there is a
polynomial-time algorithm that converts each
input u1 ∈ U1 to another input u2 ∈ U2 such
that u1 ∈ L1 if and only if u2 ∈ L2.
Remark: In the above definition, we assume
the algorithm is polynomial in the size of u1,
which implies the size u2 is also polynomial
in the size of u1.
Theorem If L1 is polynomial reducible to
L2 and there is a polynomial algorithm for
L2, then there is a polynomial algorithm for
L1.
Theorem If L1 is polynomially reducible to
L2, and L2 to L3, then L1 is polynomially
reducible to L3.
Jiming Peng, AdvOL, CAS, McMaster 2
NP Problems
Definition: NP denotes the class of prob-
lems that a positive answer has a ‘certificate’
so that the correctness of a positive answer
can be derived in polynomial time.
The algorithm to verify the correctness of
the positive answer is called a nondetermin-
istic algorithm.
Examples:
L1 := {G|G, a graph with a perfect matching}
L′1 := {(G, M)|a Graph and a perfect Matching in G}.
L′1 is polynomially solvable, thus L1 ∈ NP !
L2 := {G|G, a Hamiltonian graph},
L′2 := (G, M)| a Graph and a Hamilton cycle (M) in G}.
L′2 ∈ P , therefore L2 ∈ NP .
Clearly, P ⊂ NP .
The relation between P and NP has become
one of the open mysteries in computer sci-
ence and math. It is one of seven open “Mil-
lion” questions in the new century.
Jiming Peng, AdvOL, CAS, McMaster 3
NP-Completeness
Question: Does P equal to NP?
Answer: Seems NP is much larger than P.
However, we have not found a single problem
in NP that is not in P!!
We next introduce two classes of problems
that have not been proven to be P yet.
Definition: A problem X is called an NP-
hard problem if every problem in NP is poly-
nomial reducible to X.
Conclusion: If we can solve a NP-hard prob-
lem in polynomial time, then we can solve all
the problems in NP in polynomial time!
Definition: A problem X is called an NP-
complete problem if it is NP-hard and be-
longs to NP.
Conclusion: NP-complete problems are the
hardest problems in NP. If we can prove one
NP-complete problem is P, then P=NP.
Jiming Peng, AdvOL, CAS, McMaster 4
SAT: An NP-Complete Problem
The following lemma by Cook (1971) is fun-
damental in the theory of NP-Completeness
(NPC).
Lemma: A problem X is an NPC problem if
(1) X belongs NP and (2) Y is polynomially
reducible to X for some NPC Y .
In his seminar paper, Cook gave the first
example of NPC problem.
Satisfiability (SAT): Let S be a Boolean
expression, such as S = (x+y+z) ·(x+y+ z)
here addition denotes ‘or’ and multiplication
means ‘and’. A Boolean expression is said to
be satisfiable if there is an assignment of 0s
and 1s such that the value of the expression is
1. The SAT problem is to determine whether
a given expression is satisfiable.
A detailed proof can be found in Cook’s pa-
per, which used the similarity between Turing
machine and the Boolean expression. From
now on we use the fact that SAT is NPC.
Jiming Peng, AdvOL, CAS, McMaster 5
Other NPC Problems
Definition: An instance of 3SAT is a Boolean
expression in which each clause contains ex-
actly three variables.
3SAT Problem: Given a Boolean expression
in which each clause contains exactly three
variables, determine whether it is satisfiable.
Theorem: 3SAT is NPC.
Proof: Obviously 3SAT belongs to NP be-
cause we can verify whether an assignment is
satisfiable in polynomial time. We next con-
struct a polynomial reduction that transforms
a general SAT into 3SAT.
Let E be an arbitrary instance of SAT. We
try to replace each clause of E by several
3-clauses. We first consider a clause C =
(x1 + x2 + · · · + xk) with k ≥ 4.
Let y1, . . . , yk−3 be new variables. By usingthese new variables, we can define
C ′=(x1+x2+y1)·(x3+y1+y2)·(x4+y2+y3) · · · (xk−1+xk+yk−3).
Jiming Peng, AdvOL, CAS, McMaster 6
Proof of NPC 3SAT
Statement: C′ is satisfiable if and only if C
is satisfiable.
To prove the statement, observe that if C is
satisfiable, then at least one of the xis must
be 1. For instance, if xi = 1 for some i > 2,
then we set y1, · · · , yi−2 to be 1 and the rest
be 0, which satisfies C′. If x1 or x2 is 1, then
we can set all yj to zero.
Conversely, if C′ is satisfiable, then at leastxi must be 1. Otherwise, if all xis are 0, then
C ′ = y1.(y1 + y2) · (y2 + y3) · · · (yk−3).
This expression is clearly unsatisfiable.
Other cases: If C = (x1 + x2), then
C′ = (x1 + x2 + z) · (x1 + x2 + z),
where z is a new variable. If C = x1, then
C′ = (x1+y+z)·(x1+y+z)·(x1+y+z)·(x1+y+z).
This reduction can be done in polynomial
time. Therefore, SAT can be reduced to
3SAT in polynomial time. Since SAT is NPC,
so is 3SAT.
Jiming Peng, AdvOL, CAS, McMaster 7
3-Coloring Problem
Let G = (V, E) be an undirected graph. A
valid coloring of G is an assignment of col-
ors to the vertices such that no two adjacent
vertices have the same color.
Problem: Given a graph G = (V, E), de-
termine whether G can be colored with three
colors.
Theorem: 3-coloring is NP-complete.
Proof: Obviously 3-coloring is in NP. To
prove it is NP-complete, we reduce 3SAT to
3-coloring.
Let E = (x + y + z) be an clause of 3SAT.
We want to construct a graph G such that
E is satisfiable if and only if G can be 3-
colored. First we construct the main triangle,
denote by M and labelled with colors T ,F ,A.
These colors are used only for proof. For each
variable x, we then build another triangle Mx
whose vertices are A, x, x.
Jiming Peng, AdvOL, CAS, McMaster 8
From 3-SAT to 3-coloring
A
T F
y
y
z
x x
z
Basic (or Main) Trangle
Jiming Peng, AdvOL, CAS, McMaster 9
Impose Satisfiability
We now try to impose the condition that
at least one variable in the clause must be
1.We introduce 6 new vertices and connect
them to the new graph. Let’s call the three
new vertices connected to T and x, y, z by
O (Outer vertices), and the other three new
vertices in a triangle as I (Inner vertices).
I1
I2 I3
O2 O3
O1
x T
y T z
T
Constructing the graph
Jiming Peng, AdvOL, CAS, McMaster 10
Reduce 3-SAT to 3-coloring
We claim that if this graph can be colored
with no more than 3 colors, then at least one
of x, y, z must be colored T . Since otherwise
all the outer vertices must be colored with A,
and then we can not color the inner triangle.
Now we consider the converse case. Sup-
pose that E is satisfiable and we want to color
the graph by 3 colors. Because E is satisfi-
able, we can assume w.l.g. that x is 1. Then
we can color the vertex connected to x in the
outer vertices as F , and the rest outer ver-
tices with A. Correspondingly we can color
the inner triangle.
This is a polynomial reduction from 3-SAT
to 3-Coloring problem. Because 3-SAT is
NP-complete, so is 3-Coloring.
Jiming Peng, AdvOL, CAS, McMaster 11
A graph for x+y+z=1
A
T F
y
y
z
x x
z
O1
O3 O2
x+y+z=1 if and only if this graph can be colored with three colors
Jiming Peng, AdvOL, CAS, McMaster 12
NP-Complete Clique Problem
Problem: A clique C is a subgraph of G such
that all the vertices in it are connected to
each other. The clique problem is to deter-
mine, for a given G and constant k, whether
G has a clique of size ≥ k.
Theorem: The clique problem is NP-complete.
Proof: Obviously the clique problem belongs
to NP. It suffices if we can reduce SAT to
the clique problem. Let E = E1 · · ·Em be an
arbitrary Boolean expression. We construct
a graph in the following way:
1 Cast each variable in one clause as a ver-
tex in the graph;
2 Add edges to link the vertices from differ-
ent clauses unless they are complements
to each other;
3 Vertices from the same clause are not
connected.
Jiming Peng, AdvOL, CAS, McMaster 13
A Graph for Boolean Clause
x
y
z
x
y
z
y
z
A graph for the expression (x+y+ z ).( x + y +z).(y+ z )
Jiming Peng, AdvOL, CAS, McMaster 14
NP-completeness of Clique
Statement: G has a clique ≥ m if and only
if E is satisfiable.
Proof: The construction guarantees that the
maximal clique size does not exceed m. Sup-
pose E is satisfiable, then there is a true as-
signment such that each clause has at least
one ‘true’ variable. We claim all these ‘true’
vertices in G are connected because the cho-
sen vertices can not have a complement pair.
This means the resulting subgraph is a clique.
Conversely, assume that G contains a clique
of size ≥ m. The clique must consists of m
vertices from distinct ‘column’ of clauses. We
assign the corresponding variable a value of
1, and their complements 0, and the rest vari-
ables arbitrarily. Since all the vertices in the
clique are connected, and we know a com-
plementary pair is never connected. This as-
signment is consistent.
Jiming Peng, AdvOL, CAS, McMaster 15
Vertex Covering Problem
Definition: Let G = (V, E) be a graph. A
vertex set S of G is called vertex cover if each
edge in G is incident to at least one of the
vertices in S.
Problem: Given an undirected graph G =
(V, E) and an integer k, decide whether G has
a vertex cover containing ≤ k vertices.
Definitely set covering problem belongs to
the set of NP-problems. Recall that Clique
is NP-complete. Only if we can reduce any
clique problem into a set covering problem.
The idea is to construct the complement G
of G with the same vertex set. However, all
the edges in G are broken, while all the dis-
joint vertices in G are connected in G.
Now we can show that a clique of size k in
G matches a vertex covering of size n − k in
G, versa via.
Jiming Peng, AdvOL, CAS, McMaster 16
From Clique to Set Cover
V1
V2 V3
V4
V5 V6
A graph with clique (v2,v5,v6)
V1
V2 V3
V4
V5 V6
The complement has a vertex cover (v1,v3,v4)
Jiming Peng, AdvOL, CAS, McMaster 17
NP-complete family
SAT
Clique 3-SAT
Set Cover
3- Coloring
Dominati ng Set
Independ ent Set
Hamilton Cycle
TSP
NP-complete Family
Partition
Jiming Peng, AdvOL, CAS, McMaster 18
Branch and Bound for NPCPs
Consider the 3-coloring problem. Note that
if a vertex v is colored, then there are two
ways to color its neighbor. This fits into the
structure of a binary tree.
We can start with any two vertices and ex-
plore all the possibilities for the rest vertices.
Let pick one child in the tree, and continue
this process until the whole graph is colored
or a ‘No’ answer is reported. In the later case,
we track back and try other children.
Algorithm 3-coloring (G, Var U);
Input: G=(V,E), an empty set U;
Output: a coloring.
Begin
If U=V, then G is colored, stop.
else pick v not in U;
for C := 1 to 3 do
if no neighbor of v is colored with C
U := U + v, v is colored by C,
3-coloring (G,U)
End
Jiming Peng, AdvOL, CAS, McMaster 19
Backtrack for 3-Coloring
We use the colors R(ed), G(reen) and B(lue).
v1 v2 v3 v4
v5
1B 2G
3R 3B
4G 4B
No No
4R 4G
5R
Yes
No
Branch and Bound for 3-coloring
Jiming Peng, AdvOL, CAS, McMaster 20
Branch and Bound for ILP
The technique of branch and bound is fre-
quently used in integer linear programming
where we usually want to minimize or maxi-
mize a linear objective subject to some con-
straints. The heuristics in ILP is to fix some
variables temporarily, and then solve the re-
sulting ILP which is usually smaller and rel-
atively easier than the original problem. We
can also use the relaxed linear program to
solve ILP. If the solution of the relaxed LP is
integer, then it solves ILP.
Example:
minx1 − 2x2 x1, x2 ∈ {0,1}.
We first set x1 to 0, and then solve the sub-
problem
min−2x2 x2 ∈ {0,1},
which has a solution at x2 = 1 with value -2.
Then we set x1 to 1, and solve
min1 − 2x2 x2 ∈ {0,1}.
The minimal solution of the above problem
has a value -1. Comparing these two values,
we get the solution to the original problem as
x1 = 0, x2 = 1.
Jiming Peng, AdvOL, CAS, McMaster 21
ILP for Clique Problem
Problem: Find a clique C in graph G =
(V, E) with the maximal size.
We try to model the problem as an integer
linear programming problem. Let us define n
variables corresponding to vertices as follows:
xi =
1 if the vertex vi is in C,
0 otherwise.
Therefore, we can formulate the maximal clique
problem as the following
max z =n
∑
i=1
xi;
xi ∈ {0,1};
xi + xj ≤ 1, ∀(vi, vj) 6∈ E.
Jiming Peng, AdvOL, CAS, McMaster 22
ILP for Clique
V1
V2 V3
V4
V5 V6
A graph with clique (v2,v5,v6)
For the above problem, the ILP model reads
max z =6
∑
i=1
xi;
xi ∈ {0,1};
x1 + x3 ≤ 1, x1 + x4 ≤ 1;
x1 + x5 ≤ 1, x1 + x6 ≤ 1;
x2 + x3 ≤ 1, x2 + x4 ≤ 1;
x3 + x5 ≤ 1, x4 + x5 ≤ 1.
The final solution is x1 = 0, x2 = 1, x3 =
0, x4 = 0, x5 = 1, x6 = 1.
Jiming Peng, AdvOL, CAS, McMaster 23
LP Relaxation for ILP
Note that in our ILP example for clique prob-
lem, if we relax the constraints xi ∈ {0,1} to
0 ≤ xi ≤ 1, then we get a LP problem that
could be solved efficiently. By solving the re-
laxed problem, we can get a solution to the
original problem!
However, this is not true for general ILPs.
Nevertheless, the LP relaxation provides us a
useful approach for solving ILP. For example,
we can employ the backtracking technique,
and use the values of the easily solvable re-
laxed LP problems to drop some children. For
instance, we have already a feasible solution
and thus a value z1. After fixing some vari-
ables, we solve the relaxed LP. If the resulting
optimal value is worse than z1, then we can
throw away the whole branch, and thus avoid
unnecessary work.
The worst case of this branch-bound algo-
rithm is exponential. But by exploring the
special structure of the underlying problem,
special heuristics can be developed and many
results have been reported.
Jiming Peng, AdvOL, CAS, McMaster 24
Approximation Algorithms
Definition: An algorithm that may not lead
to the optimal result but yet give a good fea-
sible solution is called an approximation al-
gorithm.
If NCPs are so hard to solve, why not try
approximate algorithms?
Definition: For a given problem, an approxi-
mate algorithm is called an ρ-approximate al-
gorithm if it always give a solution satisfying
C∗/C ≤ ρ or C/C∗ ≤ ρ, depending on the un-
derlying problem is minimizing or maximizing,
where C∗ is the exact solution of the under-
lying problem.
Question: If we do not know the exact solu-
tion C∗ of the problem, how to estimate the
approximate ratio ρ?
Jiming Peng, AdvOL, CAS, McMaster 25
Approximate Bin Packing
Bin Packing: Let X be a set of elements
xi ∈ [0,1], i = 1, · · · , n. Partition these el-
ements into subsets such that the summa-
tion of elements in each subset is less than
or equal to 1.
This is a variant of Knapsack problem.
A direct is to fit each bin until there is no
room in all the previous bins for the next item.
This is called first fit algorithm, which re-
quires at most 2OPT bins. The proof is triv-
ial, because first fit can not leave two bins
less than half full.
A better idea is to do ordering first, and
then using first fit again, and the solution
can be improved to 1.22OPT. The proof can
be found in the main textbook.
Jiming Peng, AdvOL, CAS, McMaster 26
Approximate Vertex Cover
Definition: Vertex Cover is a set of vertices
that each edge in G = (V, E) is incident to it.
Problem: Find the minimal vertex cover in
a graph.
The problem is NPC. So we try to find an
approximate solution.
We can use method for maximal matching to
approach it. The set of all vertices in a max-
imal matching forms a vertex cover, which is
at most 2OPT, where OPT is the number of
vertices in the best vertex cover.
But we need to know how to find a maximal
matching in a graph first.
Jiming Peng, AdvOL, CAS, McMaster 27
Euclidean TSP
Problem: Let ci, i = 1, · · · , n are n points
on the plane. Find the Hamilton cycle with
minimal distance.
The problem is NP-hard, but since the graph
is on the plane, it satisfies the triangle in-
equality.
We can start with a minimal cost span-
ning tree, which can be obtained in polyno-
mial time. Therefore, the cost of the tree
is less than or equal to the cycle (Note that
by removing one edge in the cycle we get a
spanning tree).
We can construct a circuit which traverse
each vertex twice, by using depth first algo-
rithm. The length of this circuit is at most
twice as that of the minimal TSP tour. Now
we can construct a cycle from this circuit.
Instead of backtracking, we can move to the
first new vertex. This gives a cycle whose
length is less than 2OPT for TSP.
Jiming Peng, AdvOL, CAS, McMaster 28
TSP tour VS Spanning Tree
A spanning tree
A TSP Tour expanded from the spanning tree
Jiming Peng, AdvOL, CAS, McMaster 29
Further improvement
We can use the idea of Eulerian circuit to
construct a better TSP tour. This can be
done by adding edges to link all nodes with
an odd degree. Note that the number of odd-
degree nodes should be even! (Why?)
Suppose in total there are 2k odd-degree
nodes, we can use k edges to connect them.
We want to use the edges whose total dis-
tance is minimal. This gives rise to a minimum-
distance matching problem which can be solved
in n3 time.
We can prove that the total distance of the
minimum matching is less than half of the
distance in the final TSP tour. This can be
shown in the following way. We construct
two disjoint paths that link all the odd-degree
nodes together such that the total length of
these two paths is less that that of the TSP
tour.
Jiming Peng, AdvOL, CAS, McMaster 30
Eulerian Circuit to TSP
A spanning tree + Matching
A TSP Tour derived from Eulerian Circuit
Two different matchings that link all the odd-degree nodes together
Jiming Peng, AdvOL, CAS, McMaster 31
Matching Problem in Graph
Definition: Let G = (V, E) be a graph.
1 A matching is a set of edges such that
no two of which have a vertex in common;
2 A perfect matching is a matching in
which all the vertices are matched;
3 A matching is called Maximal Matching
if it has the maximal cardinality.
Sometimes, finding a perfect matching in a
graph is impossible. However, if the graph
is very dense, for example, |V | = 2n and the
degree of each vertex in the graph is greater
than n, then we can use induction and greedy
algorithm to find a perfect matching in such
a dense graph.
The algorithm takes a procedure as follows.
First take any edge in the graph, and remove
the corresponding two vertices and the edges
linked to these two. Then we have a smaller
graph with |V1| = 2n − 2. Since the original
graph is very dense, so is the reduced graph.
Jiming Peng, AdvOL, CAS, McMaster 32
Maximal matching
Theorem: A matching is maximal if and only
if it has no augmenting path.
Finding a maximal matching
• Start with M = φ;
• Finding an augmenting path P relative to
M and replace M by M + P
• Repeat the process until no augmenting
path exists.
Idea: From two matchings M, N , we can ob-
tain an augmenting path for M or N .
Consider the graph G′ = (V, M + N):
• Each vertex is an endpoint of at most one
edge from M and one edge from N;
• Each connected component of G′ form a
path with edges alternating between M and
N;
• Each path that is not a cycle form an
augmenting path for M or N.
Jiming Peng, AdvOL, CAS, McMaster 33
Finding augmenting path
• Level 0: Count all unmatched vertices
from V ;
• At Odd Level i: Add new vertices that
are adjacent to a vertex at level i−1 by a
non-matching edge (edge is also added);
• At Even Level i: Add new vertices that
are adjacent to a vertex at level i − 1 be-
cause of an edge in the matching M , to-
gether with that edge;
• Continue the process until an unmatched
vertex is added at an odd level, or no more
vertices can be added;
• Remove all the edges in the original match-
ing.
Jiming Peng, AdvOL, CAS, McMaster 34
Augmenting path: 1
V1 V2
V3 V4
V5 V6
V7 V8
V2
V4
V6
V8
v1
v3
v5
v7
Add V5 and (v5,v4) to the matching
Unmatched vertices V5, V8
Jiming Peng, AdvOL, CAS, McMaster 35
Augmenting path: 2
V1 V2
V3 V4
V5 V6
V7 V8
V2
V4
V6
V8
v1
v3
v5
v7
Add v6 and (v3,v6) to the matching
Add v3 and (v4,v3)
Jiming Peng, AdvOL, CAS, McMaster 36
Augmenting path: 3
V1 V2
V3 V4
V5 V6
V7 V8
V2
V4
V6
V8
v1
v3
v5
v7
Add v8 and (v7,v8) to the matching
Add v7 and (v6,v7)
Jiming Peng, AdvOL, CAS, McMaster 37
Augmenting path: 4
V1 V2
V3 V4
V5 V6
V7 V8
V2
V4
V6
V8
v1
v3
v5
v7
The final maximal matching
Remove edges in the original
matching
Jiming Peng, AdvOL, CAS, McMaster 38
Computing with DNA
So far we are working on digital computers.How about DNA-based computers?
In 1994, Len Adleman (computer scientist)showed an NP-complete problem can be solvedusing DNA! This is impossible in the classicway. Adleman’s work is based on biochemicalprocess work on huge numbers of moleculesin parallel.
The problem Adleman tackled is Hamiltonpath problem (HP) in directed graph G =(V, E) with designated start v0 and end vn
vertices. The problem is to decide whetherthere is a path from v0 to vn with n = |V |such that it pass all the other vertices in Gexactly once. This is an NP-hard problem.
Let w0, w1, · · · , wq be any path in G, we cancheck whether it is HP by determining if itsatisfies the following properties
1: w0 = v0, wq = vn;2: q = n3: Every vertex in V appear once in the
pathThis can be done in polynomial time, but theproblem is that too many possible paths...
Jiming Peng, AdvOL, CAS, McMaster 39
Background knowledge on DNA
Using the DNA model, we can perform the
following process
• Generate DNA strands to represent paths
in G;
• Use biochemical processes to extract strands
satisfying 1-3, and discard all others
a: Extract strands that start at v0 and
end at vn;
b: Extract strands that include n ver-
tices;
c: Extract strands that contain every
vertex.
• Any strand that remains represents a HP.
If no, then there is no HP in G.
DNA is deoxyribonucleic acid, the genetic
material that encodes the characteristics of
living things. It consists of strings of chem-
icals called nucleotides denoted by: adenine
(A), cytosine (C), guanine (G) and thymine
(T). Thus we can encode any information
using this four-letter alphabet, different from
the binary 0-1 coding.
Jiming Peng, AdvOL, CAS, McMaster 40
Background knowledge on DNA
Two noble winners, J. Watson and F. Crick
found the double helix structure of DNA: A
and T are complements, C and G are com-
plements. Two strands of nucleotides will at-
tach to each other if they have complemen-
tary components in corresponding positions.
But it is also possible that DNA strands might
attach to each other without the complemen-
tary elements.
We associate a string Ri = di,1di,2 · · · di,20
of 20 letters from alphabet A,C, G,T with
each vertex vi in the graph G. The recipe
for generating DNA strands uses two ingre-
dients for edges and vertices. For each edge
vivj 6= v0vn, make a strand Si,j of 20 letters,
where the first half is the same as last part
di,11 · · · di,20 of Ri , and the second half is as
the first half dj,1 · · · dj,10 of Rj. For edges
start from v0 or end at vn we use all of R0
and Rn, added the corresponding half of an-
other vertex. Thus for these edges, we get
30 letters.
Jiming Peng, AdvOL, CAS, McMaster 41
DNA model
A large number of the edge strands, about
1014 copies of each for the graph are syn-
thesized and put into ‘pot’. For each of Ri
(except R0 and Rn), create its complement
Ri and put a large amount of copies. For
a path S4,5, S5,2, S2,1, by the construction we
know it contains the substring R5R2, we can
attach their complement R5R2 to it. Now we
can get all the paths in the graph.
We can verify strands with correct start and
end. A DNA molecule representing a path has
a complete copy of Ri for each vertex vi in
the path. Thus we can extract DNA strands
with length n × 20 by biochemical process.
For each vi (except v0 and vn), we mix in
copies of Ri, extract the strands to which
they attach, and discard others. Then the Ri
molecules are separated from the strands and
removed. Now the remaining strands repre-
sents paths that pass through vi.
When this process is finished for all vertices,
then the remaining path is the desired HP
path.
Jiming Peng, AdvOL, CAS, McMaster 42
Comments on DNA model
Theoretically, all these steps can be done
in linear time of the problem size. But it
also depends on the volume of the involved
material in the biochemical process. What is
the increase speed of this volume regarding
the size of the underlying problem?
It is also possible that some errors happen
in the biochemical process. Then we won’t
get the exact solution. Like the probabilistic
methods, fast but no guarantee of correct-
ness!
Extensive research on this direction is going
on...
Jiming Peng, AdvOL, CAS, McMaster 43