Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
University of Lugano
University of Applied Sciences of
Southern Switzerland
IDSIA
Dalle Molle Institute for
Artificial Intelligence
Optimization approaches for
the design of DNA codes
Roberto [email protected]
2
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
3
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
4
DNA – The Blueprint of Life
chimp
cow
dinosaur bird
fish
worm
bacteriahuman
DNA
9 pictures taken from ClipArt
Background: DNA
5
What is DNA?
• All organisms on this planet are made of the same type of
genetic blueprint.
6
Real applications
• DNA computing => using DNA for massively parallel computations
• DNA chemical libraries => for the development and test of new drugs
• DNA microarrays => for profiling genes and tracing genes within long DNA strands
• DNA nanotechnologies => for the development of new materials/devices
7
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
8
DNA, Wikimedia Commons
What is DNA?
• genetic material
• four letter alphabet (nucleotides, bases):
– A (adenine),
– C (cytosine),
– G (guanine),
– T (thymine)
• complementary base pairs CG, AT
• hybridization via base pairing
A
A
C
G
T
3
’
5’
T
T
G
C
A
3’
5’
A
T
G
G
T
3
’
5’
T
T
G
C
A
3’
5’
Perfect hybridization Imperfect hybridizationBackground: DNA
9
Modeling
Uniform Stability
A
A
C
G
T
3’
5’
T
T
G
C
A
3’
5’
A
A
C
G
T
5’
3’
C
A
C
C
C
3’
5’
Non-interaction
Design Goals
Desired properties
• Desired properties coming from real applications
• Notice that properties are not the same for all applications
10
DNA Codes Design Problem description
Input data:
• The alphabet {A, C, G, T}
• A fixed length n for the codewords
• A required distance d among codewords (used by constraints
in Z)
•A set Z of constraints (explained in the next slides)
Optimization objective:
• Find the largest possible set of codewords (= code) of length
n on alphabet {A, C, G, T}, feasible with respect to constraints
Z (based on d)Why to maximize the size of the code? To have
more flexibility in the applications seen before!
11
AATTCCGGACCTGATT
ATTCCCAG
ACCTTTTT
Codeword
Word Length n = 8
TATATATA
CATTCACC
GCTTATTC
GATTCAAT
TCACCATG
CCGTTACA
GCGCGCGC
CTATTCAC
TTGGCCAA
GGCTTTTA
CTACTACG
The solution respects a
given a constraints set Z
(we do not know Z at
this stage!)
ExampleCode (solution)
DNA Codes Design Problem description
12
Requirements of a DNA Code
• Success in specific hybridization between a
DNA codeword and its complement.
• No hybridization between DNA codewords
from the same DNA code or between a DNA
codeword and others complement.
How do these requirements translate into our
constraints set Z?
DNA Codes Design Problem description
13
Constraints considered (set Z):
• Requirement: the distance between two codewords must be large (no
hybridization).
• Answer: HD (Hamming Distance)
- Given two codewords w1 and w2
- H(w1, w2) = number of positions i in which the ith letter of w1
differs from the ith letter of w2
- example: w1 = GCTA, w2 = ATTA, H(w1, w2) = 2
- Constraint: H(w1, w2) ≥ d
DNA Codes Design Problem description
14
Constraints considered (set Z):
• Requirement: the number of G or C of each codeword must be the
same (uniform stability) [=> self-hybridization is likely]
• Answer: GC (GC-content constraint)
- A fixed number of the letters of each word has to be
either G or C: floor(n/2) in our case
- example: ATA is not feasible, AGA is feasible
DNA Codes Design Problem description
15
• Requirement: the distance between a codeword and the complement of
another codeword must be large.
Watson-Crick complement of a DNA codeword
wcc(w) = Watson-Crick complement of a DNA codeword w,
obtained by reversing w and then by replacing each A in w
by T (and vice-versa) and each C in G (and vice-versa)
- example: wcc(ATGC) = GCAT
DNA Codes Design Problem description
16
Constraints considered (set Z):
•Requirement: the distance between a codeword and the complement of
another codeword must be large.
• Answer: RC (Reverse Complement Hamming distance)
- Given two codewords w1 and w2
- example: GCTA, ATGC
H(GCTA, wcc(ATGC)) = H(GCTA,GCAT) = 2
- Constraint: H(w1, wcc(w2)) ≥ d
DNA Codes Design Problem description
17
Example of a problem and its solution
• Input data: n = 4, d = 3.
• Constraints considered: HD, GC, RC
• Solution:
the largest possible code with the characteristics above contains
6 codewords.
Optimal code with respect to the constraints considered (not
unique!):
CTTC GGTT GTCA
AGGA ACTG TTGG
18
Problem description
• Other kinds of constraints are possible.
• They depend on the real-world application
considered
• In this mini-course we limit ourselves to the
constraints on the previous slides
Important observation
19
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
20
Construction Heuristics
Construction Heuristic (CH)
All possible codewords with the required GC-content are examined in a
given order.
Codewords are incrementally accepted if feasible with respect to the
already accepted ones.
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary
constant weight codes. IEEE Transactions on Information Theory 55(10), 4651-
4656 (2009)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Math. Modelling and
Algorithms 7, 311-326 (2008).
Smith, D.H., Hughes L.A., Perkins S. A new table of constant weight binary
codes of length grater than 28. Electron. J. of Combinatorics, 13(1), #A2 (2006).
21
Construction Heuristics
Example: n = 4, d = 3.
Constraints: HD, GC, RC
Lexicographic order:
AACC AACG AAGC AAGG ACAC ACAG ACCA ACCT ACGA ACGT ACTC ACTG AGAC AGAG
AGCA AGCT AGGA AGGT AGTC AGTG ATCC ATCG ATGC ATGG CAAC CAAG CACA CACT
CAGA CAGT CATC CATG CCAA CCAT CCTA CCTT CGAA CGAT CGTA CGTT CTAC CTAG
CTCA CTCT CTGA CTGT CTTC CTTG GAAC GAAG GACA GACT GAGA GAGT GATC GATG
GCAA GCAT GCTA GCTT GGAA GGAT GGTA GGTT GTAC GTAG GTCA GTCT GTGA GTGT
GTTC GTTG TACC TACG TAGC TAGG TCAC TCAG TCCA TCCT TCGA TCGT TCTC TCTG
TGAC TGAG TGCA TGCT TGGA TGGT TGTC TGTG TTCC TTCG TTGC TTGG
Solution: AACC ACAG AGGA CCTA GTCA
22
Construction Heuristics
• The method works over any possible order of the nodes
(lexicographic, reverse lexicographic, random) => different
algorithms in fact…
• Computational experiments suggest that random orders guarantee
better results on DNA code design problems
• Slow for large problems (all possible codewords have to be
examined!)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. J. of Math. Modelling and
Algorithms 7, 311-326 (2008).
23
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
24
Seed Building local search
Seed Building (SB)
Iterative approach
A set of seed codewords is considered
The set of seed codewords is dynamically adapted through iterations
During each iteration:
• All possible codewords with the required GC-content are examined in a given
order.
• Codewords are incrementally accepted if feasible with those already accepted in
the current iteration and with the seed codewords.
Statistics are used to expand or contract the set of seed codewords every ItrSeed
iterations, based on the quality of the solutions built.
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. J. of Math. Modelling and
Algorithms 7, 311-326 (2008).
Brouwer A.E., Shearer J.B., Sloane N.J.A., Smith W.D. A new table of constant
weight codes. IEEE Trans. Inf. Theory 36, 1334-1380 (1990).
25
Seed Building local search
Seed
codewords
management
26
Seed Building local search
Example: n = 4, d = 3.
Constraints: HD, GC, RC
Seed codewords: AACC ACAG
Random order:
CTTC CTTG CTCA CTCT CTGA CTGT CTAC CTAG CATC CATG CACA CACT CAGA
CAGT CAAC CAAG CCTA CCTT CCAA CCAT CGTA CGTT CGAA CGAT GTTC GTTG
GTCA GTCT GTGA GTGT GTAC GTAG GATC GATG GACA GACT GAGA GAGT GAAC
GAAG GCTA GCTT GCAA GCAT GGTA GGTT GGAA GGAT TTCC TTCG TTGC TTGG
TACC TACG TAGC TAGG TCTC TCTG TCCA TCCT TCGA TCGT TCAC TCAG TGTC
TGTG TGCA TGCT TGGA TGGT TGAC TGAG ATCC ATCG ATGC ATGG AACC AACG
AAGC AAGG ACTC ACTG ACCA ACCT ACGA ACGT ACAC ACAG AGTC AGTG AGCA
AGCT AGGAAGGT AGAC AGAG
Solution: AACC ACAG CCTA GTCA TCCT
27
Seed Building local search
• The method works over any possible order of the nodes
(lexicographic, reverse lexicographic, random).
• Experiments clearly show that a random order has to be
preferred for DNA codes design problems.
• The process of identify a good set of codewords is
intrinsically difficult => codes produced are sometimes very
good and sometimes very poor => not a very robust method
• Slow for large problems (all possible codewords are
examined at each iteration!)
28
• Clique
Given an undirected graph G, a clique is a set of the vertices in
which every vertex is connected to every other vertex of the clique
• Maximal clique problemGiven an undirected graph G, identify the largest (number of nodes)
clique of G
• ComplexityClassic NP-hard problem
Clique Search local search
• {0, 3, 4} is a clique
• {2, 3, 4, 5} is a
maximal clique
29
Clique Search local search
Clique Search (CS)
Iterative approach
A partial code can be completed by solving a subproblem (which is a
maximum clique problem) to optimality
During each iteration:
• All possible codewords with the required GC-content are examined in a
random order.
• Codewords are accepted for the second phase if feasible with those of the
partial code.
• A maximum clique problem is solved on the set of accepted codewords to
complete the partial code
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Math. Modelling and
Algorithms 7, 311-326 (2008).
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary
constant weight codes. IEEE Transactions on Information Theory 55(10), 4651-
4656 (2009)
30
Clique Search local search
31
Clique Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code: CTTC CGAA TGGT GTGA
Maximum clique problem on feasible extensions of the partial
solution:
CACT AGTG
AAGC GCTT
32
Clique Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code: CTTC CGAA TGGT GTGA
Maximum clique problem on feasible extensions of the partial
solution:
CACT AGTG
AAGC GCTT
Solution: CTTC CGAA TGGT GTGA CACT GCTT
33
Clique Search local search
• Solving a maximum clique problem (sub-procedure) is an NP-
hard problem itself!
• Heuristics have to be used for the maximum clique problem
=> no optimality is guarantee for the sub-problem solutions
• The choice of the number of codewords to eliminate is crucial
too many codewords eliminated => very large maximum
clique problem => high probability of having suboptimality
not enough codewords eliminated => very likely to find a
code with the same number of codewords of the original
This aspect deserves a deeper study to tackle large problems!
34
Hybrid Search local search
Hybrid Search (HS)
Iterative approach
Merges the concepts of the two methods analyzed before.
A set of seed codewords is managed exactly as in Seed Building.
Seed codewords represent the partial code in the context of the Clique
Search.
A relaxed distance d' < d is introduced.
A candidate code has to be at least at distance d from the seeds, and d' from
the other candidate codes (this to keep the maximum clique problem to a
reasonable size!)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
35
Hybrid Search local search
Seed Building
Clique Search
36
Hybrid Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code (seed codewords): CAAC AGAG
Maximum clique problem on feasible extensions of the partial solution (heuristic
distance d'=1 to reduce the codewords considered):
TGGT
TCTC TGTC
TTGC TAGG
TACG ATGC
ACTC
37
Hybrid Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
Partial code (seed codewords): CAAC AGAG
Maximum clique problem on feasible extensions of the partial solution (heuristic
distance d'=1 to reduce the codewords considered):
TGGT
TCTC TGTC
TTGC TAGG
TACG ATGC
ACTC
Solution: CAAC AGAG TCTC TGGT TACG ATGC
38
Hybrid Search local search
• Sums the advantages of Seed Building to those of Clique Search
but…
• There is the risk of summing up drawbacks instead!
• The method deserves a further detailed study for larger problems
39
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
A (5,3,2) identifies the problem with constraints Cstrs (HD is always
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
4Cstrs
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
40
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
41
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
42
Experimental comparison of some of the heuristic
algorithms
Comments
• No clear ranking is possible among the methods considered:
Seed Building, Clique Search, and Hybrid Search
• Methods are therefore likely to represent different
neighbourhoods
43
Idea
• All the methods seen until now work on the search space of
feasible solutions (we never have constraints violated…)
• What if we move into the search space of infeasible solutions?
=> we will have to minimize (i.e. bring down to zero!) a
measure of infeasibility!
• This makes it possible to develop a completely different kind
of local search!
• It is likely that the search space is visited in a different way by
such a family of algorithms…
44
Iterated Greedy Search local search
Iterated Greedy Search (IGS)
Iterative approach Working on an infeasible code W, trying to make it feasible.
Measure of the infeasibility of W:
where w = floor(n/2)
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
45
Iterated Greedy Search local search
Iterated Greedy Search (IGS)
An infeasible solution is obtained by adding a random codeword to a perturbed feasible
solution
During each iteration:
• A codeword σ is selected at random and the optimal (according to Inf(W)) change of one
bit of σ is carried out.
• If Inf(W)=0, we are done, and we can add a random codeword
46
Iterated Greedy Search local search
Perturbation of
the solution
Optimization
of the solution
47
Iterated Greedy Search local search
Example: n = 4, d = 3. Constraints: HD, GC, RC
W Inf(W)
...
TGGT GACC CGAA TCAC CCTT 1
TGGT GACT CGAA TCAC CCTT 0
TGGT GGCA CGAA TCAC CCTT TTTG 8
TGGT GGCA CGTA TCAC CCTT TTTG 8
TGGT GGCA CGTA TCAC GCTT TTTG 7
TGGT GGCA CGTC TCAC GCTT TTTG 7
…
TGGT AGTG CGTC TCAC GCTT TTTG 4
TGGT AGTG CGTC TCAC GCTT TTCG 3
TGGT AGTG CTTC TCAC GCTT TTCG 0
TGGT AGTG GTAG TCAC GGTT TTCG AACT 9
TGGT AGTG GTAG TCTC GGTT TTCG AACT 9
...
48
Iterated Greedy Search local search
• We change exactly one bit of a random codeword at each
iteration: more complex neighbourhoods could be considered…
• We never accept changes that make the solution worse: might be
an idea to escape from local minima
• A further investigation is deserved…
49
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
A (5,3,2) identifies the problem with constraints Cstrs (HD is always
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
4Cstrs
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
50
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
• IGS = Iterated Greedy Search
51
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
• IGS = Iterated Greedy Search
52
Experimental comparison of some of the heuristic
algorithms
Comments
• No clear ranking is possible among the methods considered:
Seed Building, Clique Search, Hybrid Search and Iterative
Greedy Search
• Methods are likely to represent different neighbourhoods
53
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
54
A VNS algorithm for DNA codes design
A primitive Variable Neighbourhood Search (VNS) algorithm is
introduced.
It iteratively runs in turns the local search algorithms (basic
ingredients) seen before.
The reference solution for local searches is always the best solution
retrieved so far.
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes via a
variable neighbourhood search algorithm. Journal of Mathematical Modelling and
Algorithms 7, 311-326 (2008).
Montemanni, R., Smith, D.H. Heuristic algorithms for constructing binary constant
weight codes. IEEE Transactions on Information Theory 55(10), 4651-4656 (2009)
Montemanni, R., Smith, D.H., Koul, N. Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic
International Conference. S. Voss and M. Caserta eds., Springer (to appear)
55
A VNS algorithm for DNA codes design
Methods involved in
our implementation
56
A VNS algorithm for DNA codes design
• We hope to take advantage of the different philosophies behind the
local search methods listed before
• From previous experiments we know that the basic local searches
visit the search space is a different way
• We hope basic local searches will help each other to exit from
local minima within a VNS framework
57
Experimental comparison of some of the heuristic
algorithms
Experimental settings
Methods coded in ANSI C
Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
Maximum computation times: 10'000 seconds (2.8 hours)
Statistics over 5 runs for each combination problem/method
A (5,3,2) identifies the problem with constraints Cstrs (HD is always
present, and therefore not listed), and with n = 5, d = 3, and GC content
= floor(n/2) = 2. [this funny notation comes from coding theory…]
4Cstrs
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
58
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
• IGS = Iterated Greedy Search
• VNS = Variable Neighbourhood
Search
59
Experimental comparison of some of the heuristic
algorithms
• SB = Seed Building
• CS = Clique Search
• HS = Hybrid Search
• IGS = Iterated Greedy Search
• VNS = Variable Neighbourhood
Search
60
Experimental comparison of some of the heuristic
algorithms
Comments
• No clear ranking is possible among the basic methods considered:
Seed Building, Clique Search, Hybrid Search and Iterative Greedy
Search (as seen before…)
Methods are likely to represent different neighbourhoods
• Variable Neighbourhood Search clearly dominates the other
methods
VNS takes advantage of the different neighbourhoods
VNS is likely to be competitive against all the other methods!
61Reference algorithm
Experimental results of VNS
The VNS algorithm discussed in:
• Montemanni, R., Smith, D.H. (2008). Construction of constant GC-content DNA codes via a
Variable Neighbourhood Search Algorithm. Journal of Mathematical Modelling and
Algorithms, 7, 311-326.
is compared with the methods discussed in the following 6 papers [which provide all the best
known codes]:
• Li, M., Lee, H. J., Condon, A. E., and Corn, R. M. (2002). DNA word design strategy for
creating sets of non-interacting oligonucleotides for DNA microarrays. Langmuir, 18, 805-812.
• Tulpan, D. C., Hoos, H. H., and Condon, A. E. (2002). Stochastic local search algorithms for
DNA word design. Lectures Notes in Computer Science, Springer, 2568, 229-241.
• Tulpan, D. C. and Hoos, H. H. (2003). Hybrid randomised neighbourhoods improve
stochastic local search for DNA code design. Lectures Notes in Computer Science, Springer,
2671, 418-433.
• King, O. D. (2003). Bounds for DNA codes with constant GC-content. Electronic Journal of
Combinatorics, 10, #R33.
• Gaborit, P. and King, O. D. (2005). Linear construction for DNA codes. Theoretical
Computer Science, 334, 99-113.
• Chee, Y. M. and Ling, S. (2008). Improved lower bounds for constant GC-content DNA
codes. IEEE Transactions on Information Theory, 54(1), 391-394.
Theor. Constructions Heuristic Algorithms
62
Experimental results of VNS
Experimental settings
• Methods coded in ANSI C
• Experiments on Dual AMD Opteron 250 2.4GHz / 4GB RAM
machines
• Maximum computation times: 100'000 seconds (27.8 hours)
=> Comparable with that of other heuristic algorithms
• Best over 5 runs for each combination problem/method
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
63
• We will consider 254 problems with
- 4 ≤ n ≤ 20
- 3 ≤ d ≤ n ≤ 20
- Case 1: HD and GC constraints
- Case 2: HD, RC and GC constraints
• These settings matches those of the state-of-the-art tables
maintained at http://llama.med.harvard.edu/~king/dnacodes.html by O.D.
King (last checked November 2009)
• We left out problems corresponding to very large codes (the
current VNS algorithm cannot tackle them)
Experimental results of VNS
64
• over 254 problems considered:
• in 128 cases the best known result is matched
• in 52 cases a new best result is found
Experimental results of VNS
Montemanni R., Smith D.H. Construction of constant GC-content DNA codes
via a variable neighbourhood search algorithm. Journal of Mathematical
Modelling and Algorithms 7, 311-326 (2008).
65
Detailed results of VNS
66
Detailed results of VNS
67
Detailed results of VNS
68
Detailed results of VNS
69
• After the publication of the paper we have been improving
the VNS algorithms in many ways (work still in progress!)
• over 254 problems considered:
• in 128 132 cases the best known result is matched
• in 52 87 cases a new best result is found
• We miss the best known solution in 13.8% of the cases only!
• We feel there is room for further improvements…
Experimental results of VNS
Montemanni, R., Smith D.H. Metaheuristics for the construction of constant GC-
content DNA codes. Proceedings of the MIC 2009 Conference (2009)
Montemanni, R., Smith, D.H., Koul, N. Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic
International Conference. S. Voss and M. Caserta eds., Springer (to appear)
70
Detailed results of VNS
Comments
• VNS works (slightly) better on problems with RC contraints
• Result confirmed also by our last improved implementations
• Is this because the other methods are more competitive
without RC constraints?
YES => we might have not too much chances to improve
on problems without RC constraints
NO => we probably have chances to improve on problems
without RC constraints
=> Worth to be investigated!
71
Outline
• Introduction
• The DNA Codes Design problem
• Construction heuristics
• Simple local searches
• A Variable Neighbourhood Search Metaheuristic
• Bibliography
72
Essential bibliography (1/4)
[HEUR] => Heuristics related publication.
Brenner, S., Lerner, R.A. (1992). Encoded combinatorial chemistry. Proceedings of the
National Academy of Science USA, 89, 5381-5383.
Adleman, L. (1994) Molecular computation of solutions to combinatorial problems. Science,
266, 1021-1024.
Frutos, A.G., Liu, Q., Thiel, A.J., Sanner, A.M.W., Condon, A.E., Smith, L.M., Corn, R.M.
(1997). Demonstration of a word design strategy for DNA computing on surfaces. Nucleic
Acids Research, 25, 4748-4757.
Hansen, P., Mladenovic, N. (2001). Variable neighbourhood search: principles and
applications. European Journal of Operational Research, 130, 449-467. [HEUR]
Marathe, A., Condon, A.E., Corn, R.M.. (2001). On combinatorial DNA word design.
Journal of Computational Biology, 8, 201-219.
Arita, M., Kobayashi, S. (2002). DNA sequence design using templates. New Generation
Computing, 20, 263-277.
73
Essential bibliography (2/4)
Li, M., Lee, H.J., Condon, A.E., Corn, R.M. (2002). DNA word design strategy for creating
sets of non-interacting oligonucleotides for DNA microarrays. Langmuir, 18, 805-812.
Tulpan, D.C., Hoos, H.H., Condon, A.E. (2002). Stochastic local search algorithms for DNA
word design. Lectures Notes in Computer Science, Springer, Berlin, 2568, 229-241.
[HEUR]
Tulpan, D.C. Hoos, H.H. (2003). Hybrid randomised neighbourhoods improve stochastic
local search for DNA code design. Lectures Notes in Computer Science, Springer, Berlin,
2671, 418-433. [HEUR]
King, O.D. (2003). Bounds for DNA codes with constant GC-content. Electronic Journal of
Combinatorics, 10, #R33. [HEUR]
Kobayashi, S., Konto, T., Arita, M. (2003). On template methods for DNA sequence design.
Lecture Notes in Computer Science, 2568, 205-214.
Hoos, H.H., Stuetzle, T. (2004). Stochastic Local Search: foundations and applications.
Morgan Kaufmann/Elsevier. [HEUR]
74
Essential bibliography (3/4)
Gaborit, P., King, O.D. (2005). Linear construction for DNA codes. Theoretical Computer
Science, 334, 99-113. [HEUR]
Tulpan, D.C. (2006). Effective heuristic methods for DNA strand design. PhD thesis,
University of British Columbia. [HEUR]
King, O.D. (2006). Tables of lower bounds for DNA codes with constant GC-content.
http://llama.med.harvard.edu/~king/dnacodes.html, last checked: November 2009. [HEUR]
Chee, Y. M, Ling, S. (2008). Improved lower bounds for constant GC-content DNA codes.
IEEE Transactions on Information Theory, 54(1), 391-394. [HEUR]
Montemanni, R., Smith, D.H. (2008). Construction of constant GC-content DNA codes via a
Variable Neighbourhood Search Algorithm. Journal of Mathematical Modelling and
Algorithms, 7, 311-326. [HEUR]
Montemanni, R., Smith, D.H. (2009). Heuristic algorithms for constructing binary constant
weight codes. IEEE Transactions on Information Theory 55(10), 4651-4656. [HEUR]
Montemanni, R., Smith D.H. (2009). Metaheuristics for the construction of constant GC-
content DNA codes. Proceedings of the MIC 2009 Conference. [HEUR]
75
Essential bibliography (4/4)
Montemanni, R., Smith D.H., Koul, N. (2010). Three metaheuristics for the construction of
constant GC-content DNA codes. Post-proceedings of the VIII Metaheuristic International
Conference. S. Voss and M. Caserta eds., Springer. [HEUR]
Tulpan, D., Montemanni, R., Ghiggi, A. (2010). Computational Sequence Design
Techniques for DNA Microarray Technologies. Submitted for publication. [HEUR]
Ghiggi, A. (2010). DNA strand design with thermodynamic constraints. Master thesis, USI.
[HEUR]
Koul, N. (2010). Heuristic Algorithms for Construction of Constant GC content DNA codes.
Master thesis, USI. [HEUR]
Neelakandan, I. (2010). New Approaches for Constructing Constant Weight Binary Codes.
Master thesis, USI. [HEUR]