76
*Supported by AFOSR and Air Force Research Lab, Rome NY SynDCode: Tool for Biomolecular Nanotechnology and Engineeri Anthony J. Macula, Morgan Bishop, Thomas Renz Undergraduate Students: Jackie Dresch, Niels Hanson

*Supported by AFOSR and Air Force Research Lab, Rome NY SynDCode: A Tool for Biomolecular Nanotechnology and Engineering Anthony J. Macula, Morgan Bishop,

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

*Supported by AFOSR and Air Force Research Lab, Rome NY

SynDCode:A Tool for Biomolecular Nanotechnology and Engineering

Anthony J. Macula, Morgan Bishop, Thomas Renz

Undergraduate Students: Jackie Dresch, Niels Hanson

Outline

• DNA Hybridization/Cross Hybridization• DNA Codes• Nearest Neighbor Thermodynamics

– complete computations– bounds

• Overview of Applications and Purposes• DNA Bitstring Library• Biomolecular Computing• Specific Design Techniques and Tools

– SynDCode• Comparison to Other Tools

– SLSDesinger

This research addresses the development of a engineering tool that generates blueprints for synthetic DNA for biomolecular nanotechnology,computing and medical applications.

In this research, we developed methods and tools for of generating of large collections of single stranded DNA sequences called a DNA code.

DNA Hybridization• DNA strands are modeled by directed 3’--> 5’ sequences of letters

from the alphabet {A, C, G, T}

• (A, T) and (C, G) are complementary pairs.

• Two oppositely directed DNA sequences are capable of coalescing into a duplex.

• Because an A (C) in one strand can (usually) only bind to a T (G) in the oppositely directed strand, the greatest energy of duplex formation is obtained when the two sequences are reverse-complements (complements)

TACGCGACTTTC GAAAGTCGCGTAATCAAACGATGC GCATCGTTTGATTGTGTGCTCGTC GACGAGCACACAATTTTTGCGTTA, TAACGCAAAAATCACTAAATACAA TTGTATTTAGTGGAAAAAGAAGAA, TTCTTCTTTTTC5’ 3’ 5’ 3’

5’TACGCGACTTTC3’5’GAAAGTCGCGTA3’

ATCAAACGATGC

GCATCGTTTGAT

Watson Crick(WC) Duplexes

TACGCGACTTTC Cross Hybridized(CH) Duplexes

GCATCGTTTGAT

ATTTTTGCGTTAGAAAAAGAAGAA

A DNA Code

Coding Strandsfor Ligation

Probing Complement Strandsfor Reading

Must Have

Must Avoid

5’ ggCaCaTcatAct 3’ 3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatct 3’

5’agatgGttAAccT 3’ =y

x=5’ ggCaCaTcatAct 3’

y=5’agatgGttAAccT 3’

5’ g g c a c a 3’ 3’ c c g t g t 5’

5’ g g c a c a 3’5’ g g c a c a 3’

  AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT

AA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

AC 0 1.44 0 0 0 0.52 0 0 0 0 0 0 0 0 0 0

AG 0 0.13 1.28 0 0 0 0 0 0 0 0 0 0 0 0 0

AT 0 0 0 0.88 0 0 0 0 0 0 0 0 0 0 0 0

CA 0 0 0 0 1.45 0 0 0 0 0 0 0 0 0 0 0

CC 0 0 0 0 0 1.84 0 0 0 0 0 0 0 0 0 0

CG 0 0 0 0 0.47 0.11 2.17 0 0 0 0 0 0 0 0 0

CT 0 0 0 0 0.12 0.32 0 1.28 0 0 0 0 0 0 0 0

GA 0 0 0 0 0 0 0 0 1.3 0.25 0 0 0 0 0 0

GC 0 0.59 0 0 0 1.11 0 0 0 2.24 0 0.27 0 0.25 0 0

GG 0 0 0.32 0 0 0 0.11 0 0 1.11 1.84 0.52 0 0 0 0

GT 0 0 0 0 0 0 0 0.13 0 0.59 0 1.44 0 0 0 0

TA 0 0 0 0 0 0 0 0 0 0 0 0 0.58 0 0 0

TC 0 0 0 0 0 0 0 0 0 0 0 0 0 1.3 0 0

TG 0 0 0.12 0 0 0 0.47 0 0 0 0 0 0 0 1.45 0

TT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

5’ g g c a c a 3’5’ g g c a c a 3’

1.84 1.45

2.24 1.44

1.45

NNFE=8.42

Watson-Crick Nearest Neighbor Computation

WCDuplex

5’ g g C a C a T c a t A ct 3’5’ A g g T T a a C c a t ct 3’

1.84

1.45

0.88

1.285’ ggCaCaTcatAct 3’ 3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatct 3’

  AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT

AA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

AC 0 1.44 0 0 0 0.52 0 0 0 0 0 0 0 0 0 0

AG 0 0.13 1.28 0 0 0 0 0 0 0 0 0 0 0 0 0

AT 0 0 0 0.88 0 0 0 0 0 0 0 0 0 0 0 0

CA 0 0 0 0 1.45 0 0 0 0 0 0 0 0 0 0 0

CC 0 0 0 0 0 1.84 0 0 0 0 0 0 0 0 0 0

CG 0 0 0 0 0.47 0.11 2.17 0 0 0 0 0 0 0 0 0

CT 0 0 0 0 0.12 0.32 0 1.28 0 0 0 0 0 0 0 0

GA 0 0 0 0 0 0 0 0 1.3 0.25 0 0 0 0 0 0

GC 0 0.59 0 0 0 1.11 0 0 0 2.24 0 0.27 0 0.25 0 0

GG 0 0 0.32 0 0 0 0.11 0 0 1.11 1.84 0.52 0 0 0 0

GT 0 0 0 0 0 0 0 0.13 0 0.59 0 1.44 0 0 0 0

TA 0 0 0 0 0 0 0 0 0 0 0 0 0.58 0 0 0

TC 0 0 0 0 0 0 0 0 0 0 0 0 0 1.3 0 0

TG 0 0 0.12 0 0 0 0.47 0 0 0 0 0 0 0 1.45 0

TT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

NNFE~<5.45.27

NNFE~<5.72

Cross Hybridized Nearest Neighbor Upper Bound Computation

5’ g gC aCaTcatAct 3’ 3’ Tc cA AttGgtaga 5’

2.90 (5.3) 2.20 (5.3)

loop

loopsymmetric

asymmetric

Intermolecular InteractionsDuplexes

CAAGACTTTTTGGTAGTAAA

***TTTCCC*********GGAA***GGGAAA***********TTCC***

Intramolecular Interactions

NNFE~<5.66 NNFE~<5.66+ .59 + .32=6.57

Precise FE

Our FE bound

Length 16435 random

correlation=.737

DNA codes serve as universal components for biomolecular computing. DNA codes are closed under reverse-complementation. The strands in a DNA code have such binding specificity that a code strand will only hybridize with its reverse-complement and will not cross hybridize with any other code strand in the DNA code

Such collections of strands are crucial to the success biomolecular computing and biomolecular nanotechnology.

Basic idea is to have correct, parallel and autonomous addressing

Characterization of synthetic DNA bar codes in Saccharomyces cervisiae gene-deletion strains”

(Eason et al., PNAS). DNA codes for self-assembly of any components that can be attachedto DNA. Their size presents the potential for increased complexity and location control in nanostructures produced by assembly that is driven by DNA duplex formation. Fundamental physical limits and increasing costs of fabrication facilities will force alternatives to conventional microelectronics manufacturing to be developed.

In self-assembly, weak, local interactions among molecular components spontaneously organize those components into aggregates with properties that range from simple to complex

DNA memory:The capacity and storage density of such memories is potentially very large. Information couldbe mined through massively parallel template-matching reactions. In addition, information could be processed based upon context, and information matched associatively based upon content.

Advantages of DNA Computing- Massive parallelism of DNA strands- High density of information storage- Ease of constructing many copies of huge data- New data structures (very long words, non volatile)- New operations (cut, paste, adjoin, insert, delete, multiply…)- New algorithms and computability- Reconfigurable hardware and hybrid computers: DNA, cell, optical, electronic

“DNA Computer” Performance Evaluation• Information density:

1015 CDs per cm3

• Massively parallel information processing: 106 ops/sec for PCs1012 ops/sec for supercomputers1020 ops/sec possible for DNADNA computers would be > 1,000,000 times faster than any computer today.

• Energy efficiency:2 * 1019 operations/joule for DNA109 operations/joule for silicon-based computers

DNA Computing Potential

DNA Code and DNA Bitstring Library

A A A A A A A A C C=T1G G T T T T T T T T =BEAD PROBE (T1)

T T T C C A A A A A =F1T T T T T G G A A A = BEAD PROBE (F1)

T T T C T T A A C C=T2G G T T A A G A A A= BEAD PROBE (T2)

A C T A A C A A A A=F2T T T T G T T A G T= BEAD PROBE (F2)

C A T A A A A C A C=T3G T G T T T T A T G= BEAD PROBE (T3)

A T C T T T T C A A=F3T T G A A A A G A T= BEAD PROBE (F3)

C A A T C C A T T A=T4T A A T G G A T T G= BEAD PROBE (T4)

C C T T C T A A A T=F4A T T T A G A A G G= BEAD PROBE (F4)

A C T C C T A A T A=T5T A T T A G G A G T= BEAD PROBE (T5)

T C T C T C T A C T=F5A G T A G A G A G A= BEAD PROBE (F5)

1. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-T52. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-F53. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-T54. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-F55. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-T56. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-F57. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-T58. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-F59. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-T510. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-F511. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-T512. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-F513. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-T514. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-F515. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-T516. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-F517. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-T518. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-F519. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-T520. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-F521. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-T522. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-F523. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-T524. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-F525. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-T526. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-F527. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-T528. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-F529. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-T530. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-F531. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-T532. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-F5

DNA CODEDNA LIBRARY=DNA BITSTRINGS

C C A A C C A A A A A A = T1

A A A A A A A C C A C C=F1

T T T T T C C T T C C A =T2

T T A C C T C A A A C C =F2

T T T C A C A A C T C C=T3

DNA Computing Strand EngineeringNo codeword-codewode CH (cc-CH)No codeword-probe CH (cp-CH)No probe-probe CH (pp-CH)

T T C A A T C C A C A A =F3

T C A C T C T C T C A A =T4

T C T T T C T C C T C T=F4

C A T C T C A C C A T C =T5

A A C A C T A C A C A C =F5

T T T T T T G G T T G G=Probe(T1)

G G T G G T T T T T T T=Probe(F1)

T G G A A G G A A A A A=Probe(T2)

G G T T T G A G G T A A =Probe(F2)

G G A G T T G T G A A A=Probe(T3)

T T G T G G A T T G A A=Probe(F3)

T T G A G A G A G T G A=Probe(T4)

A G A G G A G A A A G A=Probe(F4)

G A T G G T G A G A T G=Probe(T5)

G T G T G T A G T G T T=Probe(F5)

A A A A A A A A C C=T1G G T T T T T T T T =BEAD PROBE (T1)

T T T C C A A A A A =F1T T T T T G G A A A = BEAD PROBE (F1)

T T T C T T A A C C=T2G G T T A A G A A A= BEAD PROBE (T2)

A C T A A C A A A A=F2T T T T G T T A G T= BEAD PROBE (F2)

C A T A A A A C A C=T3G T G T T T T A T G= BEAD PROBE (T3)

A T C T T T T C A A=F3T T G A A A A G A T= BEAD PROBE (F3)

C A A T C C A T T A=T4T A A T G G A T T G= BEAD PROBE (T4)

C C T T C T A A A T=F4A T T T A G A A G G= BEAD PROBE (F4)

A C T C C T A A T A=T5T A T T A G G A G T= BEAD PROBE (T5)

T C T C T C T A C T=F1A G T A G A G A G A= BEAD PROBE (F5)

Only Allowed Hybridizations

No cc-CH

No cp-CH

No pp-CH

C C A A C C A A A A A A = T1T T T T T T G G T T G G=Probe(T1)

A A A A A A A C C A C C=F1G G T G G T T T T T T T=Probe(F1)

T T T T T C C T T C C A =T2T G G A A G G A A A A A=Probe(T2)

T T A C C T C A A A C C =F2G G T T T G A G G T A A =Probe(F2)

T T T C A C A A C T C C=T3G G A G T T G T G A A A=Probe(T3)

C A A C C A A A A A A- T T A C C T C A A A C C- T T C A A T C C A C A A- T C A C T C T C T C A A - C A T C T C A C C A T C

T1-F2-F3-T4-T5 1 0 0 1 1

G G T T T G A G G T A A

Yes WC bondingYes, bitstring is F2Good read

PROBE(F2)

DNA Computing Strand EngineeringNo codeword cp-CH

T T C A A T C C A C A A =F3T T G T G G A T T G A A=Probe(F3)

T C A C T C T C T C A A =T4T T G A G A G A G T G A=Probe(T4)

T C T T T C T C C T C T=F4A G A G G A G A A A G A=Probe(F4)

C A T C T C A C C A T C =T5G A T G G T G A G A T G=Probe(T5)

A A C A C T A C A C A C =F5G T G T G T A G T G T T=Probe(F5)

C A A C C A A A A A A- T T A C C T C A A A C C- T T C A A T C C A C A A- T C A C T C T C T C A A - C A T C T C A C C A T C

G G A G T T G T G A A

PROBE(T2)

T1-F2-F3-T4-T5

Darn! CH bondingNo, bitstring is not T2 Bad read

C A A C C A A A A A A- T T A C C T C A A A C C- T T C A A T C C A C A A- T C A C T C T C T C A A - C A T C T C A C C A T C

T1-F2-F3-T4-T5

G G T T T G A G G T A A

PROBE(F2)

DNA Computing Strand EngineeringNo codeword pp-CH, cc-CH

C A A C C A A A A A A- T T A C C T C A A A C C- T T C A A T C C A C A A- T C A C T C T C T C A A - C A T C T C A C C A T C

G G T T T G A G G T A A

PROBE(F2)

T T G A G A G A GT G

PROBE(T4)

pp-CHinterferes with reading

T T G A G A G A GT Gbonding site competition

cc-CHinterferes with separation and leads to unwanted library strand interaction

T T T C C A A A A-A T T A C C T C A A A C C- T T T C A C A A C T C C- T C A C T C T C T C A A - A A C A C T A C A C A C

F1-F2-T3-T4-f5

C C A A C C A A A A A A = T1

A A A A A A A C C A C C=F1

T T T T T C C T T C C A =T2

T T A C C T C A A A C C =F2

T T T C A C A A C T C C=T3

DNA Computing Strand EngineeringNo junction CH

T T C A A T C C A C A A =F3

T C A C T C T C T C A A =T4

T C T T T C T C C T C T=F4

C A T C T C A C C A T C =T5

A A C A C T A C A C A C =F5

T T T T T T G G T T G G=Probe(T1)

G G T G G T T T T T T T=Probe(F1)

T G G A A G G A A A A A=Probe(T2)

G G T T T G A G G T A A =Probe(F2)

G G A G T T G T G A A A=Probe(T3)

T T G T G G A T T G A A=Probe(F3)

T T G A G A G A G T G A=Probe(T4)

A G A G G A G A A A G A=Probe(F4)

G A T G G T G A G A T G=Probe(T5)

G T G T G T A G T G T T=Probe(F5)

A A A A A A -T T T T T C=T1T2 A C C A C C- T T T T T C= F1T2 A A A A A A- T T A C C T =T1F2 A C C A C C -T T A C C T=F1F2 C T T C C A- T T T C A C=T2T3

C A A A C C -T T T C A C = F2T3 C T T C C A- T T C A A T =T2F3 C A A A C C- T T C A A T,=F2F3 A A C T C C- T C A C T C=T3T4 C C A C A A- T C A C T C =F3T4 A A C T C C -T C T T T C =T3F4 C C A C A A -T C T T T C=F3F4 T C T C A A- C A T C T C=T4T5 T C C T C T- C A T C T C =F4T5 T C T C A A- A A C A C T =T4F5 T C C T C T -A A C A C T=F4F5

coding library intra-strand junctions from coding strands

probes

No CH

C A A C C A (A A A A A- T T A C C T) ( C A A A C C- T T C A A T) ( C C A C A A- T C A C T C) (T C T C A A - C A T C T C) A C C A T C

T1-(T1F2)(F2F3)(F3T4)(T4T5)T5

G G A G T T G T G A A

PROBE(T2)

Darn! Junction CH bondingNo, bitstring is not T2 Bad read

T1-F2-F3-T4-T5

DNA Computing Tools: SynDCode

Verified and Extended codewords

• Both the original and “improved” codewords examined by Feldkamp et al. were then run through SynDCode and our uniqueness parameters were obtained.

RESEARCHER/CODEWORDS USED # OF PAIRS # NON INTERACTING PAIRS WC LOWER BOUND WC UPPER BOUND MAX 1-STEMS MAX 2-STEMS

DEATONcttgtgaccgcttctggggacattggcggcgcgtaggcttatagagtggatagttctggggatggtgcttagagaagtggtgtatctcgttttaacatccgaaaaaggaccaaaagagagttgtaagcctactgcgtgac

DEATON(SeqGEN)aaagccgtcgtttaaggaccaccattttggaggtggaacgtatatcgtagagccacacgctccgcgtactgataatcctcatatgcttaggcacggttggtctcgtgaattggtctggacttactcatctctgtgacgcc

14 107 2 23.33 31.35

13 87 4 26.27 27.16

• Then we extended these codes, using SynDCode’s “Extend” feature.

Verified and Extended codewords

Make Extend[A,A,A,G,C,C,G,T,C,G,T,T,T,A,A,G,G,A,C,C],

[G,G,T,C,C,T,T,A,A,A,C,G,A,C,G,G,C,T,T,T],

[A,C,C,A,T,T,T,T,G,G,A,G,G,T,G,G,A,A,C,G],

[C,G,T,T,C,C,A,C,C,T,C,C,A,A,A,A,T,G,G,T],

[T,A,T,A,T,C,G,T,A,G,A,G,C,C,A,C,A,C,G,C],

[G,C,G,T,G,T,G,G,C,T,C,T,A,C,G,A,T,A,T,A],

[T,C,C,G,C,G,T,A,C,T,G,A,T,A,A,T,C,C,T,C],

[G,A,G,G,A,T,T,A,T,C,A,G,T,A,C,G,C,G,G,A],

[A,T,A,T,G,C,T,T,A,G,G,C,A,C,G,G,T,T,G,G],

[C,C,A,A,C,C,G,T,G,C,C,T,A,A,G,C,A,T,A,T],

[T,C,T,C,G,T,G,A,A,T,T,G,G,T,C,T,G,G,A,C],

[G,T,C,C,A,G,A,C,C,A,A,T,T,C,A,C,G,A,G,A],

[T,T,A,C,T,C,A,T,C,T,C,T,G,T,G,A,C,G,C,C],

[G,G,C,G,T,C,A,C,A,G,A,G,A,T,G,A,G,T,A,A],

[C,C,C,C,C,C,C,C,C,C,A,A,A,A,A,A,A,A,A,A],

[T,T,T,T,T,T,T,T,T,T,G,G,G,G,G,G,G,G,G,G],

[C,C,C,G,G,G,G,G,G,G,A,A,A,T,T,T,T,T,T,T],

[A,A,A,A,A,A,A,T,T,T,C,C,C,C,C,C,C,G,G,G],

[C,C,C,C,C,T,T,T,G,G,T,T,C,C,C,T,T,T,T,T],

[A,A,A,A,A,G,G,G,A,A,C,C,A,A,A,G,G,G,G,G],

[T,T,T,T,T,T,A,A,A,C,C,G,C,C,C,A,C,C,C,T],

[A,G,G,G,T,G,G,G,C,G,G,T,T,T,A,A,A,A,A,A],

[T,T,T,T,A,G,G,A,T,T,T,T,C,G,G,C,C,C,C,C],

[G,G,G,G,G,C,C,G,A,A,A,A,T,C,C,T,A,A,A,A],

[A,A,G,G,T,T,G,G,G,G,T,T,T,T,C,C,C,A,T,G],

[C,A,T,G,G,G,A,A,A,A,C,C,C,C,A,A,C,C,T,T],

[A,A,G,A,A,A,A,T,G,G,G,G,G,G,C,A,A,A,C,A],

[T,G,T,T,T,G,C,C,C,C,C,C,A,T,T,T,T,C,T,T],

[A,G,C,C,C,C,A,A,A,C,T,T,T,T,A,A,C,G,G,G],

[C,C,C,G,T,T,A,A,A,A,G,T,T,T,G,G,G,G,C,T],

[C,T,T,G,A,A,A,C,C,C,T,C,C,A,G,G,G,G,A,A],

[T,T,C,C,C,C,T,G,G,A,G,G,G,T,T,T,C,A,A,G],

[A,A,A,A,C,C,C,G,G,C,T,C,T,C,G,A,A,A,A,A],

[T,T,T,T,T,C,G,A,G,A,G,C,C,G,G,G,T,T,T,T],

[G,G,G,C,T,T,T,T,T,T,C,T,C,C,T,T,C,C,C,C],

[G,G,G,G,A,A,G,G,A,G,A,A,A,A,A,A,G,C,C,C],

[A,A,A,A,G,G,C,C,A,C,C,G,T,T,T,C,T,T,T,G],

[C,A,A,A,G,A,A,A,C,G,G,T,G,G,C,C,T,T,T,T],

[A,C,G,A,A,A,G,T,G,T,T,T,T,G,G,T,C,C,C,C],

[G,G,G,G,A,C,C,A,A,A,A,C,A,C,T,T,T,C,G,T],

[C,C,C,A,A,T,T,C,C,C,A,A,A,G,G,T,C,C,A,A],

[T,T,G,G,A,C,C,T,T,T,G,G,G,A,A,T,T,G,G,G],

[A,A,G,T,C,T,C,T,C,C,C,C,A,A,C,A,T,C,C,C],

[G,G,G,A,T,G,T,T,G,G,G,G,A,G,A,G,A,C,T,T],

[C,C,T,T,G,G,C,A,A,C,T,T,A,A,C,C,C,G,T,A],

[T,A,C,G,G,G,T,T,A,A,G,T,T,G,C,C,A,A,G,G],

[C,T,T,T,T,G,T,T,G,A,G,C,A,A,A,G,G,G,C,C],

[G,G,C,C,C,T,T,T,G,C,T,C,A,A,C,A,A,A,A,G],

[A,A,T,T,G,A,A,A,T,T,G,T,C,G,G,C,G,G,T,T],

[A,A,C,C,G,C,C,G,A,C,A,A,T,T,T,C,A,A,T,T],

[A,A,A,C,A,A,A,A,A,C,A,A,C,A,G,G,C,C,C,C],

[G,G,G,G,C,C,T,G,T,T,G,T,T,T,T,T,G,T,T,T],

[A,T,C,A,C,A,A,T,C,C,C,C,C,C,T,G,T,G,T,T],

[A,A,C,A,C,A,G,G,G,G,G,G,A,T,T,G,T,G,A,T],

[T,T,A,T,G,G,T,A,A,A,T,C,C,G,C,T,G,G,G,C],

[G,C,C,C,A,G,C,G,G,A,T,T,T,A,C,C,A,T,A,A],

[C,A,A,T,T,T,T,C,C,T,C,C,A,G,C,T,T,C,G,G],

[C,C,G,A,A,G,C,T,G,G,A,G,G,A,A,A,A,T,T,G],

[T,C,A,A,T,G,G,T,G,C,C,C,G,G,A,A,T,A,A,A],

[T,T,T,A,T,T,C,C,G,G,G,C,A,C,C,A,T,T,G,A],

[A,A,A,A,G,T,A,A,G,C,G,G,A,A,A,C,T,G,C,G],

[C,G,C,A,G,T,T,T,C,C,G,C,T,T,A,C,T,T,T,T],

[G,T,G,G,G,G,A,G,G,T,A,A,T,G,C,A,A,T,G,T],

[A,C,A,T,T,G,C,A,T,T,A,C,C,T,C,C,C,C,A,C],

[C,T,T,T,C,G,A,A,G,A,T,G,G,C,G,A,T,C,C,A],

[T,G,G,A,T,C,G,C,C,A,T,C,T,T,C,G,A,A,A,G],

[A,C,C,G,T,G,A,T,G,A,A,G,G,G,T,G,A,A,T,T],

[A,A,T,T,C,A,C,C,C,T,T,C,A,T,C,A,C,G,G,T],

[C,A,G,G,A,A,G,T,G,A,A,G,G,C,T,A,G,T,C,C],

[G,G,A,C,T,A,G,C,C,T,T,C,A,C,T,T,C,C,T,G],

[T,C,C,C,T,C,C,G,G,C,T,C,A,G,T,A,T,T,T,C],

[G,A,A,A,T,A,C,T,G,A,G,C,C,G,G,A,G,G,G,A],

[T,T,G,C,T,G,C,C,C,T,A,A,A,G,A,A,C,A,C,T],

[A,G,T,G,T,T,C,T,T,T,A,G,G,G,C,A,G,C,A,A],

[C,T,C,C,T,T,C,T,G,G,G,G,A,C,C,T,C,T,A,A],

[T,T,A,G,A,G,G,T,C,C,C,C,A,G,A,A,G,G,A,G],

[C,C,C,G,C,A,T,A,T,A,G,C,A,A,G,A,A,A,C,C],

[G,G,T,T,T,C,T,T,G,C,T,A,T,A,T,G,C,G,G,G],

[A,A,C,C,G,A,T,T,T,A,G,G,C,C,T,T,C,C,T,C],

[G,A,G,G,A,A,G,G,C,C,T,A,A,A,T,C,G,G,T,T],

[G,A,G,A,C,C,T,C,T,T,T,T,G,C,C,C,G,A,T,C],

[G,A,T,C,G,G,G,C,A,A,A,A,G,A,G,G,T,C,T,C],

[A,T,T,C,A,T,C,G,G,G,G,C,T,T,T,A,T,G,G,C],

[G,C,C,A,T,A,A,A,G,C,C,C,C,G,A,T,G,A,A,T],

[C,A,A,C,T,G,G,G,T,T,C,T,G,G,C,T,C,A,T,T],

[A,A,T,G,A,G,C,C,A,G,A,A,C,C,C,A,G,T,T,G],

[T,T,T,G,G,C,C,G,C,T,T,G,T,T,A,A,G,A,T,C],

[G,A,T,C,T,T,A,A,C,A,A,G,C,G,G,C,C,A,A,A],

[G,G,G,T,T,C,A,A,A,C,T,T,G,G,T,G,G,T,T,G],

[C,A,A,C,C,A,C,C,A,A,G,T,T,T,G,A,A,C,C,C],

[G,C,T,G,A,C,G,A,T,A,A,A,G,G,A,T,C,C,G,G],

[C,C,G,G,A,T,C,C,T,T,T,A,T,C,G,T,C,A,G,C],

[G,C,C,A,G,G,T,T,C,T,A,A,A,A,T,T,C,G,C,G],

[C,G,C,G,A,A,T,T,T,T,A,G,A,A,C,C,T,G,G,C],

[A,C,C,T,T,C,T,C,A,C,T,G,A,C,G,T,C,G,A,T],

[A,T,C,G,A,C,G,T,C,A,G,T,G,A,G,A,A,G,G,T],

[G,A,A,A,A,A,A,C,G,C,A,C,G,A,C,C,T,A,G,T],

[A,C,T,A,G,G,T,C,G,T,G,C,G,T,T,T,T,T,T,C],

[A,C,T,A,A,T,G,A,G,G,G,G,C,T,C,C,G,T,A,G],

[C,T,A,C,G,G,A,G,C,C,C,C,T,C,A,T,T,A,G,T],

[T,C,G,A,C,T,C,G,A,A,C,G,T,C,T,T,T,G,T,T],

[A,A,C,A,A,A,G,A,C,G,T,T,C,G,A,G,T,C,G,A],

[C,C,T,G,C,A,C,C,C,C,T,T,G,T,C,A,A,A,T,A],

[T,A,T,T,T,G,A,C,A,A,G,G,G,G,T,G,C,A,G,G],

[C,C,A,C,C,G,G,G,G,A,A,T,A,G,T,A,A,C,A,G],

[C,T,G,T,T,A,C,T,A,T,T,C,C,C,C,G,G,T,G,G],

[T,G,A,A,A,G,G,A,G,G,A,C,G,T,G,G,T,T,A,C],

[G,T,A,A,C,C,A,C,G,T,C,C,T,C,C,T,T,T,C,A],

[A,A,T,A,A,T,G,C,C,C,C,A,T,A,G,T,G,G,C,C],

[G,G,C,C,A,C,T,A,T,G,G,G,G,C,A,T,T,A,T,T],

[C,C,C,C,C,A,C,G,A,A,G,A,T,T,G,T,T,A,G,T],

[A,C,T,A,A,C,A,A,T,C,T,T,C,G,T,G,G,G,G,G],

[C,C,C,A,A,G,A,A,A,G,A,A,T,G,C,C,C,T,C,A],

[T,G,A,G,G,G,C,A,T,T,C,T,T,T,C,T,T,G,G,G],

[A,T,T,G,G,T,C,G,T,C,A,A,A,A,A,C,C,G,A,G],

[

G,A,T,C,T,A,A,C,G,G,A,T,C,G,T,T,C,C,C,G],

[C,G,G,G,A,A,C,G,A,T,C,C,G,T,T,A,G,A,T,C],

[G,A,A,C,G,T,A,T,A,A,C,C,C,T,G,C,G,A,C,A],

[T,G,T,C,G,C,A,G,G,G,T,T,A,T,A,C,G,T,T,C],

[C,G,G,T,T,C,G,G,A,T,T,T,G,T,T,G,C,T,T,A],

[T,A,A,G,C,A,A,C,A,A,A,T,C,C,G,A,A,C,C,G],

[A,A,C,C,C,A,C,T,T,G,G,A,G,A,T,C,T,T,G,C],

[G,C,A,A,G,A,T,C,T,C,C,A,A,G,T,G,G,G,T,T],

[A,G,T,C,C,C,C,T,T,A,C,T,A,A,G,C,C,A,G,C],

[G,C,T,G,G,C,T,T,A,G,T,A,A,G,G,G,G,A,C,T],

[T,C,G,A,A,A,C,A,C,C,C,A,T,C,T,A,C,A,G,C],

[G,C,T,G,T,A,G,A,T,G,G,G,T,G,T,T,T,C,G,A],

[T,C,T,C,A,A,A,A,C,C,A,G,G,G,T,A,G,C,T,G],

[C,A,G,C,T,A,C,C,C,T,G,G,T,T,T,T,G,A,G,A],

[T,G,C,A,C,T,A,G,G,A,C,A,T,C,G,C,A,C,T,A],

[T,A,G,T,G,C,G,A,T,G,T,C,C,T,A,G,T,G,C,A],

[A,G,T,T,G,G,T,A,C,A,C,C,A,T,A,G,A,G,C,G],

[C,G,C,T,C,T,A,T,G,G,T,G,T,A,C,C,A,A,C,T],

[A,G,T,T,G,A,G,T,C,A,G,A,T,T,C,G,C,C,T,C],

[G,A,G,G,C,G,A,A,T,C,T,G,A,C,T,C,A,A,C,T],

[T,A,T,A,T,G,G,T,G,G,T,G,A,G,G,C,G,A,G,A],

[T,C,T,C,G,C,C,T,C,A,C,C,A,C,C,A,T,A,T,A],

[T,C,A,G,T,G,C,G,A,G,G,A,T,G,G,A,T,T,T,T],

[A,A,A,A,T,C,C,A,T,C,C,T,C,G,C,A,C,T,G,A],

C,T,C,G,G,T,T,T,T,T,G,A,C,G,A,C,C,A,A,T],

[C,T,A,C,G,C,C,T,G,G,A,A,A,A,A,C,G,A,A,A],

[T,T,T,C,G,T,T,T,T,T,C,C,A,G,G,C,G,T,A,G],

[G,A,A,G,C,A,A,T,G,T,T,T,C,G,C,C,C,A,A,G],

[C,T,T,G,G,G,C,G,A,A,A,C,A,T,T,G,C,T,T,C],

[A,G,A,A,C,T,T,C,T,G,C,T,G,T,G,T,C,A,G,G],

[C,C,T,G,A,C,A,C,A,G,C,A,G,A,A,G,T,T,C,T],

[C,C,A,T,A,C,G,G,A,A,C,A,C,T,G,G,G,A,A,G],

[C,T,T,C,C,C,A,G,T,G,T,T,C,C,G,T,A,T,G,G],

[G,G,G,G,G,G,G,G,G,G,A,A,A,A,A,A,A,A,A,A],

[T,T,T,T,T,T,T,T,T,T,C,C,C,C,C,C,C,C,C,C],

[G,G,G,C,C,A,A,A,A,G,G,G,G,T,T,T,T,T,T,T],

[A,A,A,A,A,A,A,C,C,C,C,T,T,T,T,G,G,C,C,C],

[C,C,C,T,T,C,T,T,T,T,T,T,T,G,G,G,G,G,G,G],

[C,C,C,C,C,C,C,A,A,A,A,A,A,A,G,A,A,G,G,G],

[G,G,T,T,T,T,G,G,C,T,T,T,T,T,A,C,C,G,G,G],

[C,C,C,G,G,T,A,A,A,A,A,G,C,C,A,A,A,A,C,C],

[G,G,G,C,C,C,C,C,C,T,T,T,A,T,T,A,A,A,C,C],

[G,G,T,T,T,A,A,T,A,A,A,G,G,G,G,G,G,C,C,C],

[T,G,C,G,G,A,A,A,G,G,G,T,T,T,G,A,A,A,A,G],

[C,T,T,T,T,C,A,A,A,C,C,C,T,T,T,C,C,G,C,A],

[T,T,T,G,G,G,G,A,A,A,T,C,C,C,G,G,A,A,G,T],

[A,C,T,T,C,C,G,G,G,A,T,T,T,C,C,C,C,A,A,A],

[A,A,A,A,T,T,G,G,G,G,G,A,G,G,G,T,T,G,T,T],

[A,A,C,A,A,C,C,C,T,C,C,C,C,C,A,A,T,T,T,T],

[T,T,T,C,C,C,C,G,G,G,T,T,T,G,G,G,T,T,A,T],

[A,T,A,A,C,C,C,A,A,A,C,C,C,G,G,G,G,A,A,A],

[C,C,G,G,C,G,G,C,T,T,T,A,A,C,A,A,A,T,T,T],

[A,A,A,T,T,T,G,T,T,A,A,A,G,C,C,G,C,C,G,G],

[G,G,G,G,A,G,G,A,G,A,A,G,G,C,A,A,T,T,T,T],

[A,A,A,A,T,T,G,C,C,T,T,C,T,C,C,T,C,C,C,C],

[C,G,C,A,T,T,T,A,A,C,C,G,C,C,C,T,A,A,A,A],

[T,T,T,T,A,G,G,G,C,G,G,T,T,A,A,A,T,G,C,G],

[A,A,A,C,A,A,A,C,A,A,A,G,G,G,C,C,A,A,G,C],

[G,C,T,T,G,G,C,C,C,T,T,T,G,T,T,T,G,T,T,T],

[C,C,C,A,G,G,G,A,A,C,C,T,G,A,A,C,T,T,T,T],

[A,A,A,A,G,T,T,C,A,G,G,T,T,C,C,C,T,G,G,G],

[C,G,G,A,G,G,A,A,A,C,C,T,T,T,A,G,G,A,C,C],

[G,G,T,C,C,T,A,A,A,G,G,T,T,T,C,C,T,C,C,G],

[G,C,C,C,A,T,T,C,C,A,A,A,A,T,C,T,T,G,C,T],

[A,G,C,A,A,G,A,T,T,T,T,G,G,A,A,T,G,G,G,C],

[T,T,A,T,T,T,T,C,G,A,G,G,C,A,A,G,G,C,C,T],

[A,G,G,C,C,T,T,G,C,C,T,C,G,A,A,A,A,T,A,A],

[C,A,C,C,G,T,T,G,G,A,G,G,T,A,A,A,A,G,G,T],

[A,C,C,T,T,T,T,A,C,C,T,C,C,A,A,C,G,G,T,G],

[A,A,G,G,T,G,G,G,C,G,A,A,A,A,A,C,C,A,T,T],

[A,A,T,G,G,T,T,T,T,T,C,G,C,C,C,A,C,C,T,T],

[A,A,T,T,A,G,T,T,G,G,G,T,A,C,C,A,C,C,C,G],

[C,G,G,G,T,G,G,T,A,C,C,C,A,A,C,T,A,A,T,T],

[T,T,C,A,A,A,A,C,G,T,G,G,G,A,T,G,G,C,T,T],

[A,A,G,C,C,A,T,C,C,C,A,C,G,T,T,T,T,G,A,A],

[C,G,G,G,G,T,A,A,T,G,A,T,T,T,C,T,G,G,G,G],

[C,C,C,C,A,G,A,A,A,T,C,A,T,T,A,C,C,C,C,G],

[T,G,A,C,C,G,T,T,T,T,T,T,G,A,C,T,T,G,C,C],

[G,G,C,A,A,G,T,C,A,A,A,A,A,A,C,G,G,T,C,A],

[A,A,C,C,C,G,G,T,T,T,C,A,A,T,A,G,T,C,G,G],

[C,C,G,A,C,T,A,T,T,G,A,A,A,C,C,G,G,G,T,T],

[T,T,C,G,C,T,A,A,A,A,G,G,A,A,C,A,A,C,G,G],

[C,C,G,T,T,G,T,T,C,C,T,T,T,T,A,G,C,G,A,A],

[C,A,A,T,C,C,C,A,T,A,T,T,C,G,G,C,C,T,C,A],

[T,G,A,G,G,C,C,G,A,A,T,A,T,G,G,G,A,T,T,G],

[T,G,A,T,G,C,T,T,T,C,A,G,G,G,G,G,T,A,G,T],

[A,C,T,A,C,C,C,C,C,T,G,A,A,A,G,C,A,T,C,A],

[C,C,A,C,A,C,A,G,A,A,G,T,G,G,C,G,T,T,T,A],

[T,A,A,A,C,G,C,C,A,C,T,T,C,T,G,T,G,T,G,G],

[T,G,A,G,T,T,C,C,A,G,A,G,G,G,C,A,A,A,A,A],

[T,T,T,T,T,G,C,C,C,T,C,T,G,G,A,A,C,T,C,A],

[G,G,T,T,A,A,C,T,C,C,T,G,G,T,C,A,G,G,G,A],

[T,C,C,C,T,G,A,C,C,A,G,G,A,G,T,T,A,A,C,C],

[G,G,G,C,T,C,T,G,G,A,T,T,T,A,A,G,G,T,C,C],

[G,G,A,C,C,T,T,A,A,A,T,C,C,A,G,A,G,C,C,C],

[G,G,T,T,T,G,G,A,G,A,T,T,T,G,C,C,T,G,C,T],

[A,G,C,A,G,G,C,A,A,A,T,C,T,C,C,A,A,A,C,C],

[T,T,T,C,C,C,G,C,A,T,A,C,A,T,T,C,C,A,G,G],

[C,C,T,G,G,A,A,T,G,T,A,T,G,C,G,G,G,A,A,A],

[T,C,T,A,A,T,T,C,C,C,C,G,C,C,A,A,G,A,C,T],

[A,G,T,C,T,T,G,G,C,G,G,G,G,A,A,T,T,A,G,A],

[T,G,T,G,G,T,G,T,T,T,A,G,T,G,G,A,A,C,C,C],

[G,G,G,T,T,C,C,A,C,T,A,A,A,C,A,C,C,A,C,A],

[A,A,G,G,G,G,C,T,T,A,C,G,T,T,C,A,G,A,A,G],

[C,T,T,C,T,G,A,A,C,G,T,A,A,G,C,C,C,C,T,T],

[G,C,C,T,A,T,T,T,T,T,C,G,T,C,G,G,A,T,C,G],

[C,G,A,T,C,C,G,A,C,G,A,A,A,A,A,T,A,G,G,C],

[A,A,A,A,C,A,C,A,C,G,T,T,C,G,A,C,C,A,A,A],

[T,T,T,G,G,T,C,G,A,A,C,G,T,G,T,G,T,T,T,T],

[A,A,A,G,T,G,C,T,A,C,C,A,A,G,T,A,G,G,C,C],

[G,G,C,C,T,A,C,T,T,G,G,T,A,G,C,A,C,T,T,T],

[G,G,A,G,A,G,T,C,C,A,G,C,T,T,C,A,G,T,T,C],

[G,A,A,C,T,G,A,A,G,C,T,G,G,A,C,T,C,T,C,C],

[C,C,C,A,A,C,G,A,G,A,T,T,G,A,C,A,G,A,G,G],

[C,C,T,C,T,G,T,C,A,A,T,C,T,C,G,T,T,G,G,G],

[C,C,A,A,A,T,A,G,C,A,A,G,G,A,C,C,G,A,C,T],

[A,G,T,C,G,G,T,C,C,T,T,G,C,T,A,T,T,T,G,G],

[C,T,G,G,G,A,A,G,G,A,A,A,A,T,C,G,C,C,T,A],

[T,A,G,G,C,G,A,T,T,T,T,C,C,T,T,C,C,C,A,G],

[C,A,A,C,T,C,G,G,A,C,A,G,G,T,C,T,T,C,A,T],

[A,T,G,A,A,G,A,C,C,T,G,T,C,C,G,A,G,T,T,G],

[T,T,G,G,A,A,A,C,T,G,A,T,C,C,A,C,C,G,A,C],

[G,T,C,G,G,T,G,G,A,T,C,A,G,T,T,T,C,C,A,A],

[C,C,C,T,G,T,T,G,T,T,G,A,T,T,C,C,C,T,C,C],

[G,G,A,G,G,G,A,A,T,C,A,A,C,A,A,C,A,G,G,G],

[C,T,T,C,C,T,T,T,A,A,G,G,C,T,T,G,C,A,C,C],

[G,G,T,G,C,A,A,G,C,C,T,T,A,A,A,G,G,A,A,G],

C,A,C,C,C,A,C,A,T,G,A,T,C,C,G,T,A,T,C,G],

[G,T,G,C,A,T,G,G,T,T,G,C,A,T,A,A,C,T,C,G],

[C,G,A,G,T,T,A,T,G,C,A,A,C,C,A,T,G,C,A,C],

[G,G,A,C,A,G,A,C,G,G,T,T,T,T,T,A,C,G,A,G],

[C,T,C,G,T,A,A,A,A,A,C,C,G,T,C,T,G,T,C,C],

[A,T,C,G,T,C,A,G,G,G,C,T,C,T,A,C,C,C,T,A],

[T,A,G,G,G,T,A,G,A,G,C,C,C,T,G,A,C,G,A,T],

[C,A,G,T,A,G,A,G,T,T,G,A,G,T,G,T,G,C,C,A],

[T,G,G,C,A,C,A,C,T,C,A,A,C,T,C,T,A,C,T,G],

[G,G,A,C,C,C,A,G,T,G,C,A,G,T,C,T,G,A,T,A],

[T,A,T,C,A,G,A,C,T,G,C,A,C,T,G,G,G,T,C,C],

[T,T,A,C,C,A,A,T,T,C,T,G,C,C,A,G,C,A,G,G],

[C,C,T,G,C,T,G,G,C,A,G,A,A,T,T,G,G,T,A,A],

[T,G,A,C,C,C,C,T,C,G,T,T,C,T,C,C,T,T,A,A],

[T,T,A,A,G,G,A,G,A,A,C,G,A,G,G,G,G,T,C,A],

[T,G,T,T,A,C,A,G,G,C,C,A,C,C,A,G,A,A,T,C],

[G,A,T,T,C,T,G,G,T,G,G,C,C,T,G,T,A,A,C,A],

[C,T,T,T,C,G,G,T,T,C,G,C,A,G,T,C,T,A,A,C],

[G,T,T,A,G,A,C,T,G,C,G,A,A,C,C,G,A,A,A,G],

[A,A,A,A,T,C,G,A,A,A,A,C,G,A,G,A,C,C,C,G],

[C,G,G,G,T,C,T,C,G,T,T,T,T,C,G,A,T,T,T,T],

[G,C,G,A,A,A,G,G,A,A,T,T,C,G,A,T,G,C,A,T],

[A,T,G,C,A,T,C,G,A,A,T,T,C,C,T,T,T,C,G,C],

[G,A,A,G,T,A,T,A,T,C,C,C,G,C,T,C,T,G,C,G],

[C,G,C,A,G,A,G,C,G,G,G,A,T,A,T,A,C,T,T,C],

[G,G,A,A,G,A,A,G,A,G,T,A,C,A,A,C,C,C,C,C],

[G,G,G,G,G,T,T,G,T,A,C,T,C,T,T,C,T,T,C,C],

[A,A,A,A,A,T,G,G,G,G,T,G,C,A,A,A,G,C,A,C],

[G,T,G,C,T,T,T,G,C,A,C,C,C,C,A,T,T,T,T,T],

[G,A,A,A,A,G,C,T,A,G,C,G,A,G,C,A,T,G,G,A],

[T,C,C,A,T,G,C,T,C,G,C,T,A,G,C,T,T,T,T,C],

[T,T,C,C,G,A,G,T,A,G,T,A,G,G,G,A,A,G,G,G],

[C,C,C,T,T,C,C,C,T,A,C,T,A,C,T,C,G,G,A,A],

[C,T,G,A,C,A,C,G,A,C,T,C,C,A,A,T,G,G,A,A],

[T,T,C,C,A,T,T,G,G,A,G,T,C,G,T,G,T,C,A,G],

[C,C,C,T,T,A,C,G,C,C,T,G,T,T,C,C,T,A,T,T],

[A,A,T,A,G,G,A,A,C,A,G,G,C,G,T,A,A,G,G,G],

[G,C,C,T,C,G,T,T,T,C,A,C,A,A,C,A,G,A,A,A],

[T,T,T,C,T,G,T,T,G,T,G,A,A,A,C,G,A,G,G,C],

[A,T,T,G,G,C,C,G,C,A,A,G,A,G,T,A,G,A,T,C],

[G,A,T,C,T,A,C,T,C,T,T,G,C,G,G,C,C,A,A,T],

[C,G,C,T,G,A,T,C,T,C,C,C,A,T,C,A,C,A,A,G],

[C,T,T,G,T,G,A,T,G,G,G,A,G,A,T,C,A,G,C,G],

[T,T,C,A,A,G,C,T,C,G,T,G,C,C,A,T,A,A,A,C],

[G,T,T,T,A,T,G,G,C,A,C,G,A,G,C,T,T,G,A,A],

[A,A,T,A,A,C,C,G,G,A,G,C,T,A,G,A,A,C,G,C],

[G,C,G,T,T,C,T,A,G,C,T,C,C,G,G,T,T,A,T,T],

[C,A,A,C,G,T,T,A,A,C,A,T,C,A,C,A,C,G,C,T],

[A,G,C,G,T,G,T,G,A,T,G,T,T,A,A,C,G,T,T,G],

[T,G,C,A,C,T,G,A,A,A,A,A,G,A,G,C,C,T,G,T],

[A,C,A,G,G,C,T,C,T,T,T,T,T,C,A,G,T,G,C,A],

[T,C,A,G,T,T,T,G,T,A,A,G,C,G,C,G,T,T,C,T],

[A,G,A,A,C,G,C,G,C,T,T,A,C,A,A,A,C,T,G,A],

[C,C,A,A,G,C,A,A,T,G,T,T,C,A,C,A,G,T,C,C],

[G,G,A,C,T,G,T,G,A,A,C,A,T,T,G,C,T,T,G,G],

[A,T,T,A,C,G,C,T,T,C,G,G,T,T,T,G,A,G,C,T],

[A,G,C,T,C,A,A,A,C,C,G,A,A,G,C,G,T,A,A,T],

[C,G,T,A,C,A,T,G,G,G,T,C,G,G,A,A,C,A,A,T],

[A,T,T,G,T,T,C,C,G,A,C,C,C,A,T,G,T,A,C,G],

[C,T,A,C,C,A,C,T,G,G,G,C,T,T,G,A,G,T,T,G],

[C,A,A,C,T,C,A,A,G,C,C,C,A,G,T,G,G,T,A,G],

[T,C,C,G,A,G,A,C,A,C,A,A,C,C,T,T,C,A,A,A],

[T,T,T,G,A,A,G,G,T,T,G,T,G,T,C,T,C,G,G,A],

[C,G,A,T,A,C,G,G,A,T,C,A,T,G,T,G,G,G,T,G],

Verified and Extended codewords

• In addition, the same “verify” feature of SynDCode was used to obtain our parameters for the yeast primer tags in “Characterization of synthetic DNA bar codes in Saccharomyces cervisiae gene-deletion strains” (Eason et al., PNAS).

• PNAS used 5992 unique “uptags”

• Using SynDCode, we were able to generate an equivalent code containing over 20,000 sequences

• The tagging of each deletion strain allows identification of genes without prior knowledge of gene function

• Parallel hybridization to a microarray was used to analyze assays

SynDCode parameters:- Free Energy Upper Bound- WC Upper Bound- WC Lower Bound- t-stems and thresholds

Results

In each instance, we could generate more codewords in less time than SlsDesigner

Length of Sequence WC Lower Bound WC Upper Bound CH Upper Bound Univ. B.C. SLS C codes generated raw SynDCode extensions SynDCode SLS extensions12 13.86 16.94 3.85 25<|C|<50 72 6416 18.9 23.1 5.25 25<|C|<50 572 40520 23.94 29.26 6.65 25<|C|<35 1056 668

SynDCode provides the means to create collections of synthetic DNA strands (i.e., a DNA code) with controlled properties such as resistance to crosshybridization. The user has the ability to either verify the properties of an existing DNA code, expand a given DNA code or create an entirely new DNA code. The models built into SynDCode allow for the specification of thermodynamic properties of the generated DNA code and for collections of concatenated combinations of strands taken from the generated code. All pairwise strand computations, including the computation of the bound on free energy of formation, ΔG(x:y), of the x:y duplex have complexity O(n²). This is significant, as other DNA code software tools use ΔG(x:y) approximation algorithms with complexity O(n³). These more computationally complex algorithms give more accurate pairwise ΔG(x:y) computations, but pairwise accuracy is not necessarily the most important consideration when the primary objective is maximizing the size of a code for global thresholds.

SynDCode Conclusions

A DNA Computing Paradigm

1 2 k

i

i

Let Q , Q ,...Q be fixed subsets of {1,2,...,n}.

a. Find all subsets S {1,2,...,n} with S Q for i with 1 i k.

b. Find all subsets T {1,2,...,n} with Q T for i with 1 i k

The identification of maximal frequent sets in data fields are the computational bottleneck in association rule discovery. This is an important problem and the independent sets and maximal cliques problems fit this paradigm.

1

2

3

4

5

Edges in G are {1,2}, {2,3}, {3,4},{4,5},{1,4},{2,5} Edges in G’ are

{1,3}, {1,5}, {2,4},{3,5},

Example: Independent Sets and Cliques

An independent set is a collection of vertices that contains no edge.A clique is a subgraph were every pair of vertices has an edge between them.For a graph G, its complement G’ is the set of edges not in GA maximal independent set in G is a maximal clique in G’, e.g., {1,3,5}.

G = G’=

15

4

2

3

1

2

3

4

5

1

3

5

1

2

3

4

5

Edges in G are {1,2}, {2,3}, {3,4},{4,5},{1,4},{2,5} Edges in G’ are

{1,3}, {1,5}, {2,4},{3,5},

Example: Independent Sets and Cliques

An independent set is a collection of vertices that contains no edge.A clique is a subgraph were every pair of vertices has an edge between them.For a graph G, its complement G’ is the set of edges not in GA maximal independent set in G is a maximal clique in G’, e.g., {1,3,5}.

G = G’=

15

4

2

3

1

2

3

4

5

1

3

5

DNA Computing for Independent Sets and Cliques

1 2 k

i

i

Let Q , Q ,...Q be fixed subsets of {1,2,...,n}.

a. Find all subsets S {1,2,...,n} with S Q for i with 1 i k.

b. Find all subsets T {1,2,...,n} with Q T for i with 1 i k

1 6

i

Let {1,2},{1,4},{2,3},{2,5},{3,4},{4,5}=Q ,...,Q be fixed subsets of {1,2,...,5}.

Finding all subsets T {1,2,...,n} with Q T for i with 1 i 6, is finding all independent sets

in G or all cliques

in the complement G'.

G = G’=

15

4

2

3

1

2

3

4

5

T T T T T G G A A A

T T T T G T T A G T

9. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-T510. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-F511. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-T512. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-F513. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-T514. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-F515. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-T516. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-F517. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-T518. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-F519. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-T520. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-F521. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-T522. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-F523. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-T524. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-F525. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-T526. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-F527. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-T528. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-F529. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-T530. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-F531. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-T532. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-F5

1. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-T52. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-F53. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-T54. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-F55. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-T56. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-F57. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-T58. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-F59. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-T510. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-F511. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-T512. A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-F4-F513. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-T514. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-T4-F515. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-T516. A A A A A A A A C C-A C T A A C A A A A -A T C T T T T C A A-F4-F5

17. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-T518. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-T4-F519. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-T520. T T T C C A A A A A -T T T C T T A A C C-C A T A A A A C A C-F4-F521. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-T522. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-T4-F523. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-T524. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-F525. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-T526. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-T4-F527. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-T528. T T T C C A A A A A -A C T A A C A A A A-C A T A A A A C A C-F4-F529. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-T530. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-F531. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-T532. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-F4-F5

X1=T and X2=T

DNA Library

All subsetsnot containing {1,2}

X1=F or X2=F

Edge {1,2} STM

T T T T T G G A A A=Probe(F1)

T T T T G T T A G T=Probe(F2)

10.A A A A A A A A C C-A C T A A C A A A A-C A T A A A A C A C-T4-F5

24. T T T C C A A A A A -T T T C T T A A C C -A T C T T T T C A A-F4-F5

1. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-T52. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-T4-F53. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-T54. A A A A A A A A C C -T T T C T T A A C C-C A T A A A A C A C-F4-F55. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-T56. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-T4-F57. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-T58. A A A A A A A A C C -T T T C T T A A C C -A T C T T T T C A A-F4-F5

2^( # Coding Strands / 2)# Coding Strands / 2 Bits

29. T T T C C A A A A A -A C T A A C A A A A -A T C T T T T C A A-T4-T5

T T T T T G G A A A T T T T G T T A G T

{1,2}

{1,3}

{1,4}

{1,5}

{2,3}

{2,4}

{3,4}

{2,5}

{3,5}

{4,5}

Black ON, Red OFF =Independent Sets in G

Black OFF, Red ON =Cliques in G

DNA Library

{1,2}

{1,3}

{n-2,n}

{n-1,n}

Edges in G ON, Edges in G’ OFF =Independent Sets in Gwhen flow completed

Edges in G OFF, Edges in G’ ON =Cliques in G when flow completed

DNA Library

Every Graph G on n vertices hasG union G’= all possible pairs onn vertices. This enables the constructionof a universal device.

.

Universal DNA Computer for anyGraph on n Vertices

Each possible edge is an STM. Then dependingon the problem, the flow is directed by the edgespresent (or absent) in the given graph

We use the concept (t-gap) block isomorphic subsequences couple with classical sequence alignment ideas to describe new abstract weighted string metrics that are similar to the weighted Levenshtein insertion-deletion metric. These metrics for DNA code design capture key aspects of the nearest neighbor thermodynamic model for hybridized DNA duplexes. One of the versions of the metric gives the maximum number of stacked pairs of hydrogen bonded nucleotide base pairs that can be present in any secondary structure in a hybridized DNA duplex without pseudoknots.

The SynDCode generates a collection of DNA blueprints such that certain conditions hold for strands, pairs of strands, and concatenation of certain pairs of strands. What is generated is a DNA code C such that:

.|| CxandnxCx

it syxCyx i ),(

CHF GyxCyx ),(2

wcFwc GxGCx )2(

Each CH duplex in C has a free energy of formation above -∆GCH

Each WC duplex in C has a free energy of formation between -∆Gwc and -∆Gwc.

The maximum number of ti - stems in each CH duplex from C is at most si.

WC complement of each strand in the code is also in the code.

Script for Comparing Results

read_input SlsDnaDesigner

complementpairfold

get_threshold

We made this program to ask the user to specify parameter

settings to be used in the University of British

Columbia’s SlsDnaDesigner to initially generate

codewords.

Script for Comparing Results

read_input SlsDnaDesigner

complementpairfold

get_threshold

This is the University of British Columbia’s

SlsDnaDesigner which uses the specified parameters to

generate a pool of codewords.

Script for Comparing Results

read_input SlsDnaDesigner

complementpairfold

get_threshold

We made this program to take the codewords that

SlsDnaDesigner output and create a complemented pool

of codewords.

Script for Comparing Results

read_input SlsDnaDesigner

complementpairfold

get_threshold

This is the University of British Columbia’s pairfold program,

which uses the complemented pool of codewords and does

pairwise comparisons to obtain thermodynamic values.

Script for Comparing Results

read_input SlsDnaDesigner

complementpairfold

get_threshold

We made this program to take the output from pairfold,

find the thermodynamic upper bound, and store this

value to be used later.

Script for Comparing Results

get_params read_params

SynDCode pairfold

Subcode

We made this program to read in the parameters that

get_params outputs and output them in the correct

format to be read in by SynDCode.

Script for Comparing Results

get_params read_params

SynDCode pairfold

Subcode

Morgan Bishop made this program which extends the

pool of codewords, given the inputted constraints,

creating new codewords.

Script for Comparing Results

get_params read_params

SynDCode pairfold

Subcode

This is the University of British Columbia’s pairfold program,

which uses the pool of codewords that was extended

by SynDCode and does pairwise comparisons to

obtain thermodynamic values.

Script for Comparing Results

get_params read_params

SynDCode pairfold

Subcode

We made this program to take the output from pairfold and

create a binary matrix of sequence comparisons using the

thermodynamic upper bound stored earlier. It then uses a

Greedy algorithm to parse the matrix and obtain a new pool of codewords, with sequences that

“violate” the thermodynamic upper bound deleted.

S1-C1 S2-C2 S3-C3 S4-C4 S5-C5 S6-C6

S1-C1 0 1 0 0 1 0

S2-C2 1 0 1 1 1 0

S3-C3 0 1 0 0 0 0

S4-C4 0 1 0 0 0 1

S5-C5 1 1 0 0 1 0

S6-C6 0 0 0 1 0 0

S1-C1 S2-C2 S3-C3 S4-C4 S5-C5 S6-C6

S1-C1 0 1 0 0 1 0

S2-C2 1 0 1 1 1 0

S3-C3 0 1 0 0 0 0

S4-C4 0 1 0 0 0 1

S5-C5 1 1 0 0 1 0

S6-C6 0 0 0 1 0 0

S1-C1 S2-C2 S3-C3 S4-C4 S5-C5 S6-C6

S1-C1 0 1 0 0 1 0

S2-C2 1 0 1 1 1 0

S3-C3 0 1 0 0 0 0

S4-C4 0 1 0 0 0 1

S5-C5 1 1 0 0 1 0

S6-C6 0 0 0 1 0 0

S1-C1 S2-C2 S3-C3 S4-C4 X S6-C6

S1-C1 0 1 0 0 0 0

S2-C2 1 0 1 1 0 0

S3-C3 0 1 0 0 0 0

S4-C4 0 1 0 0 0 1

X 0 0 0 0 0 0

S6-C6 0 0 0 1 0 0

S1-C1 S2-C2 S3-C3 S4-C4 X S6-C6

S1-C1 0 1 0 0 0 0

S2-C2 1 0 1 1 0 0

S3-C3 0 1 0 0 0 0

S4-C4 0 1 0 0 0 1

X 0 0 0 0 0 0

S6-C6 0 0 0 1 0 0

r1= aattttaa r7= atgcgttg l1= ttaaaatt l7 = caacgcat r2 = taaccccg r8= l2 = cggggtta l8=

r3= ttccaagg r9= l3= ccttggaa l9 =

r4 = ggccaatt r10 = l4 = aattggcc l10=

r5= gctacggg r11 = l5 = cccgtagc l11 =

r6= gtattgat r12 = l6 = atcaatac l12=

1 0

3’ aattttaa S cggggtta S ccttggaa S aattggcc S gctacggg S atcaatac S atgcgttg 5’

1 0 0 0 1 0 1

DNA Bit String

correct read3’ aattttaa S cggggtta S ccttggaa S aattggcc Sgctacggg S atcaatac S atgcgttg 5’

a correct DNA bit string is affixed

Read/Write/Filter

3’ aattttaa S cggggtta S ccttggaa S aattggcc S gctacggg S atcaatac S atgcgttg 5’

incorrect read

incorrect DNA bit string is affixed

cccgtagc

A “0” (gctacgg) is being correctly read at register 5. Note l5= cccgtagc is correctly reading a “0” at register 5.

A “0” (gtgcgttg) at register 7 is being incorrectly read as a “0” (gctacgg) at register 5. Note l5= cccgtagc is incorrectly hybridizing register 7.

Electronic

Database--->

Electronic Queries

P1 P2 , ..., Pmyes no yes

DNA bitstrings

Filter 1 ON

Filter 2 OFF

Filter m ON---> --> --> -->

MOLECULAR

PATTERNS

MOLECULAR

READSElectronicPatterns

non patterns dumped

non patterns dumped

Algorithm work done 2002 with L. Popyack

Universal objects

Wet lab work

1. p(x,y) is the maximum length of all subsequences common to both n-sequences x and y.

2. The insertion-deletion (Levenshtein) metric is L(x ,y ) = n- p(x,y). 3. L(x ,y ) ≥ d <=> p(x,y) ≤ n-d.

4. For x = 31011301 y = 22113300 lcs(x,y) = 4, e.g., 1130

5. x --> delete blue --> 1130 --> insert 2, 2, 3, 0 --> y

6. There are spheres of the same radius but different volumes (non-uniform distance)

00000000 has only one subsequence of length 4By making 4 deletions you can only get to one subsequence 0000.

00001111 has 5 subsequences of length 4. By making 4 deletions you can get to 5 subsequences 0000,0001, 0011,0111,1111.

7. The volume of the (ID) sphere of given radius centered at x depends on the number of runs in x. 111211000033 has five runs. More runs in x the more codewords in sphere (exact value unknown.)

Insertion-Deletion Metric

V. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions, and Reversals, Soviet Phys.--Doklady, 10 707-710, (1966).

r1= aattttaa r7= atgcgttg l1= ttaaaatt l7 = caacgcat r2 = taaccccg r8= cccccccc l2 = cggggtta l8= gggggggg r3= ttccaagg r9= aaaaaaaa l3= ccttggaa l9 = tttttttt

r4 = ggccaatt r10 = ggtttccc l4 = aattggcc l10= gggaaacc r5= gctacggg r11 = cccctttt l5 = cccgtagc l11 = aaaagggg r6= gtattgat r12 = ttttgggg l6 = atcaatac l12= cccctttt

r1= aattttaa (tttta) r7= atgcgttg l1= ttaaaatt l7 = caacgcat

r2 = taaccccg (acccg) r8= cccccccc l2 = cggggtta (ggtta) l8= gggggggg

r3= ttccaagg (tcaag) r9= aaaaaaaa l3= ccttggaa l9 = tttttttt

r4 = ggccaatt r10 = ggtttccc (ggtcc) l4 = aattggcc l10= gggaaacc

r5= gctacggg r11 = cccctttt l5 = cccgtagc l11 = aaaagggg

r6= gtattgat r12 = ttttgggg l6 = atcaatac (tcaac) l12= cccctttt

DNA(8,4) Code --no pair of codewords have a common subsequence of length 5, or longestcommon subsequence is of length 4.

A collection of DNA strands (5’-3’) of length is called a DNA(n,d) code if:

1. The complement of every strand in the collection is also in the collection.2. No strand is equal to its complement.3. The longest common subsequence between any two strands in the collection is at most n-d.

5’ ggCaCaTcatAct 3’3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatc 3’

5’ gg C a C a T c a t A ct 3’3’ T cc AA t t G g t a ga 5’

5’ gg C a C a T c a t A ct 3’5’ A gg TT a a C c a t ct 3’

Virtual DuplexVirtual Stacked Pairs

5’ ggCaCaTcatAct 3’3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatct3’

5’ gg C a C a T c a t A ct 3’3’ T cc AA t t G g t a ga 5’

5’ gg C a C a T c a t A ct 3’5’ A gg TT a a C c a t ct 3’

1.84

1.45

0.88

1.28

5’ GGCACATCATACT 3’

5’ AGGTTAACCATCT3’

5’ AGTATGATGTGCC 3’

5’ AGATGGTTAACCT3’

5’ GGCACATCATACT 3’

5’ AGTATGATGTGCC 3’

1.84 2.241.45

…= 18.8

Neareast Neighbor Appr. Free Energyof duplex formation (WC)

5’ GGCACATCATACT 3’

5’ AGGTTAACCATCT3’

NNFECH =6.45

1-stems=16

2-stems= 3 2 5 2 = 12

3-stems= 2 1 4 1 = 8

4-stems= 1 0 3 0 = 4

1615,139,8,74,31 2,0,3 3,0,0,1,1,3,0,3,2,2,2,0,1,2,2,x

2,102,3,02,12x 2,1,3,2,1)( bs

X=2,0,1,2,2,3,0,3,2,0,0

Y=3,2,0,2,1,3,2,0,3,2,0

X=2,0,1,2,2,3,0,3,2,0,0

Y=3,2,0,2,1,3,2,0,3,2,0

9,10,111,2-4-9,10

1-3-8,9,102,3,4-6,7-11

2-5-9,10,11

2,0,2,3,2,0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 3 3 3 3 3 3

0 0 1 1 2 3 4 4 4 4 4

0 0 1 1 2 3 4 4 5 5 5

0 0 1 2 2 3 4 4 5 6 6

0 0 1 2 3 3 4 4 5 6 6

0 0 1 2 3 3 4 4 5 6 7

0 0 1 2 3 4 4 4 5 6 7

,,

,,

,,

,,

10

03

02

03

0

, , , , , , , ,1 2 0 0 3 4 0 2 3, , , , , , , ,1 2 0 0 3 4 0 2 3

,,

,,

,,

,,

10

03

02

03

0

1,2,0,0,3,4,0,2,3

1,0,0,3,0,2,0,3,0

1,2,0,0,3,4,0,2,3

1,0,0,3,0,2,0,3,0

1,2,0,0,3,4,0,2,3

1,0,0,3,0,2,0,3,0

1,2,0,0,3,4,0,2,3

1,0,0,3,0,2,0,3,0

***-*

*-**-*

***-*

*-**-*

*-***-***

*-***-*-*

****-**

******-*

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 3 3 3 3 3 3

0 0 1 1 2 3 4 4 4 4 4

0 0 1 1 2 3 4 4 5 5 5

0 0 1 2 2 3 4 4 5 6 6

0 0 1 2 3 3 4 4 5 6 6

0 0 1 2 3 3 4 4 5 6 7

0 0 1 2 3 4 4 4 5 6 7

,,

,,

,,

,,

10

03

02

03

0

, , , , , , , ,1 2 0 0 3 4 0 2 3

,,

,,

,,

,,

10

03

02

03

0

, , , , , , , ,1 2 0 0 3 4 0 2 3

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

, , , , , , , ,1 2 0 0 3 4 0 2 3, , , , , , , ,1 2 0 0 3 4 0 2 3

,,

,,

,,

,,

10

03

02

03

0,

,,

,,

,,

,1

00

30

20

30

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1 1 1 1

0 0 1 1 2 2 2 2 2 2 2

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 3 3

0 0 1 1 2 2 3 3 3 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

0 0 1 1 2 2 3 3 4 4 4

)}(,,max{ 1,1,11,, kjkijijiji MkMMM }',','max{' ,,11,, kjkijijiji MkMMM

),( ji yxLongComSufk

AGGTCAT

AGCGGTC

AG,GG,GT,TC,CA,AT

AG,GC,CG,GG,GT,TC

AGGTCAT

AGCGGTC

5’ ggCaCaTcatAct 3’3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatct 3’

5’ gg C a C a T c a t A ct 3’3’ T cc AA t t G g t a ga 5’

5’ gg C a C a T c a t A ct 3’5’ A gg TT a a C c a t ct 3’

5’ gg C a C a T c a t A ct 3’

5’ A gg TT a a C c a t ct 3’

AGCGGTC

AGGTCAT

5’ ggCaCaTcatAct 3’3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatct 3’

5’ 2 2 1 0 1 0 3 1 0 3 0 1 3 3’5’ 0 2 2 3 3 0 0 1 1 0 3 1 3 3’

5’ 22,21,10,01,10,03,31,10,03,30,01,13 3’5’ 02,22,23,33,30,00,01,11,10,03,31,13 3’

5’ gg C a C a T c a t A ct 3’3’ T cc AA t t G g t a ga 5’

5’ gg C a C a T c a t A ct 3’5’ A gg TT a a C c a t ct 3’

5’ 22, 21,10,01,10,03,31, 10,03, 30,01, 13 3’5’ 02, 22, 23,33,30,00,01,11, 10,03, 31, 13 3’

5’ 10, 9, 4,1,4,3,13, 4,3, 12,1, 7 3’5’ 2, 10, 11,15,12,0,1,5, 4,3, 13, 7 3’

Quaternary

Decimal

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 2.17 2.17 2.17

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 2.17 2.17 2.17

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 2.17 2.17 2.17

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 2.17 2.17 2.17

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 2.17 2.17 2.17

0 0 0 0 0 0 0 0 0 2.17 2.17 2.17 4.01 4.01 4.01

0 0 0 2.24 2.24 2.24 2.24 2.24 2.24 2.24 2.24 2.24 4.01 6.25 6.25

0 0 0 2.24 2.24 2.24 2.24 4.41 4.41 4.41 4.41 4.41 4.41 6.25 8.42

0 0 0 2.24 2.24 2.24 2.24 4.41 4.41 4.41 4.41 4.41 4.41 6.25 8.42

0 0 0 2.24 2.24 2.24 4.48 4.48 4.48 4.48 4.48 4.48 4.48 6.65 8.42

0 0 0 2.24 2.24 2.24 4.48 4.48 4.48 4.48 4.48 4.48 4.48 6.65 8.42

0 0 0 2.24 2.24 2.24 4.48 4.48 4.48 4.48 4.48 4.48 4.48 6.65 8.42

0 0 0 2.24 2.24 2.24 4.48 4.48 4.48 4.48 4.48 4.48 6.32 6.65 8.42

,[ ], , , , , , , , , , , , ,C G C A G C G C C A G G C G [ ], , , , , , , , , , , , ,T C C C T G G C G G C T G G

CG,GC,CA,AG,GC,CG,GC,CC,CA,AG,GG,GC,CG TC,CC,CC,CT,TG,GG,GC,CG,GG,GC,CT,TG,GG

CG, GC, CA, AG, GC, CG, GC, CC, CA, AG, GG, GC, CG

TC

CC

, CC

, CT

, TG

, GG

, GC

, CG

, GG

, GC

, CT

, TG

, GG

• E. Baum. DNA sequences useful for computation, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44, 235-242

• H. Cai, P. White, D. Torney, A. Deshpande, Z. Wang, B. Marrone, and J. Nolan, Flow Cytometry-Based Minisequencing: A New Platform for High Throughput Single Nucleotide Polymorphism Scoring, Genomics, 66, 135-143, (2000.)

• A. D'yachkov and D. Torney, On Similarity Codes, IEEE Trans. on Information Theory 46, 1558-1564, (2000.)

• A. D'yachkov, Torney, D.,P. Vilenkin, and P. White, On a Class of Codes for Insertion-Deletion Metric, 2002 IEEE International Symposium on Information Theory, Lausanne, Switzerland, (2002.)

• A. Brenneman and A. Condon, Strand Design for biomolecular computation, Theoretical Computer Science, 287, 39-58, (2002).

• A. D'yachkov, P. Erdos, A. Macula, V. Rykov, D. Torney, C. Tung, P. Vilenkin, P. White, Exordium for DNA Codes, Journal of Combinatorial Optimization, 7, no.4, 369-380 (2003.)

• A. D'yachkov, D. Torney, P. Vilenkin, and P. White, Reverse-Complement Similarity Codes, IEEE Trans.on Information Theory to appear

• V. Levenshtein, Efficient reconstruction of sequences from their subsequences or supersequences, Journal of Combinatorial Theory, Series A, 93, 310-332 (2001.)

• V. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions, and Reversals, Soviet Phys.--Doklady, 10 707-710, (1966).

• V. Levenshtein, Bounds for Deletion-Insertion Correcting Codes, 2002 IEEE International Symposium on Information Theory, Lausanne, Switzerland, (2002).

• Macula, DNA-TAT Codes, USAF Technical Report, TR-2003-57, AFRL-IF-RS http://stinet.dtic.mil/cgi-bin/fulcrum_main.pl (2003.)

• Macula, G. Page, T. Renz and V. Rykov, DNA Code Generator, available at https://community.biospice.org

5’ ggCaCaTcatAct 3’3’ TccAAttGgtaga 5’

5’ ggCaCaTcatAct 3’5’ AggTTaaCcatc 3’

5’ 2 2 1 0 1 0 3 1 0 3 0 1 3 3’5’ 0 2 2 3 3 0 0 1 1 0 3 1 3 3’

5’ 22,21,10,01,10,03,31,10,03,30,01,13 3’5’ 02,22,23,33,30,00,01,11,10,03,31,13 3’

5’ gg C a C a T c a t A ct 3’3’ T cc AA t t G g t a ga 5’

5’ gg C a C a T c a t A ct 3’5’ A gg TT a a C c a t ct 3’

5’ 22, 21,10,01,10,03,31, 10,03, 30,01, 13 3’5’ 02, 22, 23,33,30,00,01,11, 10,03, 31, 13 3’

5’ 10, 9, 4,1,4,3,13, 4,3, 12,1, 7 3’5’ 2, 10, 11,15,12,0,1,5, 4,3, 13, 7 3’

5’ 2 2, 2 1, 1 0, 0 1, 1 0, 0 3, 3 1, 1 0, 0 3, 3 0, 0 1, 1 3 3’5’ 0 2, 2 2, 2 3, 3 3, 3 0, 0 0, 0 1 , 1 1, 1 0 , 0 3 , 3 1, 1 3 3’

5’ 22, 21,10,01,10,03,31,10,03, 30,01, 13 3’5’ 02, 22, 23,33,30,00,01,11, 10,03, 31,13 3’

5’ 2 2 1 0 1 0 3 1 0 3 0 1 3 3’5’ 0 2 2 3 3 0 0 1 1 0 3 1 3 3’