67
Improved Models and Algorithms for Universal DNA Tag Systems Tejas Iyer Georgia Tech David Cash Georgia Tech

Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Improved Models and Algorithms for Universal DNA Tag Systems

Tejas IyerGeorgia Tech

David CashGeorgia Tech

Page 2: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Outline of Part 1: ExposiFon

Mo#va#on:  The bio problem and applicaFons

Formaliza#on:  The math problem

Analysis:  Bounding the best possible soluFon

Part 2 (Tejas) is original contribuFon

Page 3: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

Page 4: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 5: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

• A trivial (ignored) step in most models of computaFon.e.g. Turing machines, circuit families, random access machines

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 6: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

• A trivial (ignored) step in most models of computaFon.e.g. Turing machines, circuit families, random access machines

• But the thermodynamics of DNA gets in the way.          HybridizaFon?  Secondary structures?  More...

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 7: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

Page 8: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

• TesFng several SNPs is expensive or impossible if done individually

Page 9: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

• TesFng several SNPs is expensive or impossible if done individually

• One soluFon:  SNP microarrays

• Main technical component mass produced to reduce cost.

• Allow one to run hundreds of thousands of SNPs simultaneously

Page 10: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Page 11: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

G T

A C

Page 12: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTCG T

A C

Page 13: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A C

?TGAA

Page 14: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

ACTT

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAATGGATTAAC

G T

A

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAG

Page 15: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

ACTT

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAATGGATTAACA

G T

A

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

C

Page 16: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

C

Page 17: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

ACTT

CGTAATCCAA

Page 18: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

?TGAA

ACTT

CGTAATCCAA

ACTT

TTATGA

CCAG

GGGTTACACACTT

G

Page 19: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

?TGAA

ACTT

CGTAATCCAA

ACTT

TTATGA

CCAG

GGGTTACACACTT

GObserve

Page 20: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

Page 21: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

Page 22: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

ACTT

TTATGA

CCAG

Page 23: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTCACTT

TTATGA

CCAG

Page 24: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

ACTTT

TATGACCAG

Page 25: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

Page 26: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

Page 27: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

• One approach: choose tags to have high Hamming distance

• i.e. few matches when aligned

• Use techniques from error correcFng codes

• Limited success...

Page 28: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

• One approach: choose tags to have high Hamming distance

• i.e. few matches when aligned

• Use techniques from error correcFng codes

• Limited success...

• Other ad hoc approaches suggested

Page 29: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

Page 30: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

Page 31: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

• SFll very simple and unrealisFc, but allows one to formalize the problem and get provably good results for tag sets.

Page 32: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

• SFll very simple and unrealisFc, but allows one to formalize the problem and get provably good results for tag sets.

• But how good is it in pracFce?

• Not addressed in current work!

Page 33: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

Page 34: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

• Higher implies stronger bond

Page 35: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

• Higher implies stronger bond

• CalculaFng melFng temperature:

1. 2‐4 Rule: TM(U,V) proporFonal to 2(# A‐T bonds) + 4(# G‐C bonds)

2. Nearest neighbor: look up interacFons between adjacent bases in experimental table.

3. Wetmur’s equa#on:  applies to longer strings only.

Page 36: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

Page 37: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

Page 38: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

Page 39: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

• (1) ensures that each tag hybridizes with its anF‐tag strongly.

Page 40: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

• (1) ensures that each tag hybridizes with its anF‐tag strongly.

• (2) is meant to ensure that tags do not bond with the wrong anF‐tag, but it is more subtle.

Page 41: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

Page 42: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

Page 43: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

Page 44: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

• Also incorporates 2‐4 Rule:  more G/C bases imply stronger bond

Page 45: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

• Also incorporates 2‐4 Rule:  more G/C bases imply stronger bond

• Allows them to prove an upper bound on the number of tags in an allowed system.

Page 46: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)(1 +!

3)n

Page 47: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)

Theorem: For any c and h, an (h,c)‐code may contain at most

                                                         tags

(1 +!

3)n

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

h! c + 1

Page 48: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)

Theorem: For any c and h, an (h,c)‐code may contain at most

                                                         tags

(1 +!

3)n

Remark:  SFll exponenFal in c, so it allows for quite large codes.

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

h! c + 1

Page 49: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

DefiniFons:

Page 50: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 51: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 52: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

The tail weight of a tag is the sum of tail weights of all of the c‐tokens it contains.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 53: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

The tail weight of a tag is the sum of tail weights of all of the c‐tokens it contains.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Strategy:1.  Show that each tag has tail weight ≥ h ‐ c + 12.  Show that a (h,c)‐code can have total tail weight at most

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 54: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 55: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 56: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 57: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 58: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 59: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 60: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

ObservaFon:  every character gets counted, except at most (c‐1) beginning weight

Page 61: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 62: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 63: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 64: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 65: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 66: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Actually 2⋅Gc-2

Page 67: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Part 2