85
Motivation Implementations End-to-End Design THANK YOU! Portable and Random-Access DNA-Based Storage Systems Olgica Milenkovic A joint work with R. Gabrys, E. Garcia Ruiz, H. M. Kiah, J. Ma, G. J. Puleo, M. Riedel, D. Soloveichik, H. M. Tabatabaei Yazdi, Y. Yuan, E. Yaakobi and H. Zhao Henry Taub TCE Conference June 2017

Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Portable and Random-Access DNA-Based Storage Systems

Olgica Milenkovic

A joint work with R. Gabrys, E. Garcia Ruiz, H. M. Kiah, J. Ma, G. J. Puleo, M.

Riedel, D. Soloveichik, H. M. Tabatabaei Yazdi, Y. Yuan, E. Yaakobi and H. Zhao

Henry Taub TCE Conference

June 2017

Page 2: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Motivation

Page 3: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

DNA as Storage Media

DNA stores the “information of life” - the most massive datasource on Earth.

▸ DNA is extremely durable: Can still “read” DNA from mammoths,Neanderthals, and 700,000 old horses!

▸ DNA information content of Human cell: 6.4 GB. Mass of a cell: ∼ 3picograms. No. of cells: 15 − 40 × 1012.

▸ Can one store information in DNA?

▸ This question has been raised before: “There is plenty of room at thebottom,” R. Feynman.

Page 4: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

DNA as Storage Media

DNA stores the “information of life” - the most massive datasource on Earth.

▸ DNA is extremely durable: Can still “read” DNA from mammoths,Neanderthals, and 700,000 old horses!

▸ DNA information content of Human cell: 6.4 GB. Mass of a cell: ∼ 3picograms. No. of cells: 15 − 40 × 1012.

▸ Can one store information in DNA?

▸ This question has been raised before: “There is plenty of room at thebottom,” R. Feynman.

Page 5: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

DNA as Storage Media

DNA stores the “information of life” - the most massive datasource on Earth.

▸ DNA is extremely durable: Can still “read” DNA from mammoths,Neanderthals, and 700,000 old horses!

▸ DNA information content of Human cell: 6.4 GB. Mass of a cell: ∼ 3picograms. No. of cells: 15 − 40 × 1012.

▸ Can one store information in DNA?

▸ This question has been raised before: “There is plenty of room at thebottom,” R. Feynman.

Page 6: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

DNA as Storage Media

DNA stores the “information of life” - the most massive datasource on Earth.

▸ DNA is extremely durable: Can still “read” DNA from mammoths,Neanderthals, and 700,000 old horses!

▸ DNA information content of Human cell: 6.4 GB. Mass of a cell: ∼ 3picograms. No. of cells: 15 − 40 × 1012.

▸ Can one store information in DNA?

▸ This question has been raised before: “There is plenty of room at thebottom,” R. Feynman.

Page 7: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We can write...

We can “write” in DNA using what is called the process of DNA Synthesis.

Biochemistry of synthesis: Stitching together bases from the set {A,T,G,C}through deprotection & coupling cycles.

Coupling

3

Capping

Oxidation

De-blocking

6

5’

3’

Final deprotection and cleavage from the support

Initial deprotection

nucleoside2-cyanoethyl phosphoramidite

Blocking unreacted 5’-OH groups

Base 2O

O

O

DMTr

O NP

CH3

CH2CH2NC

CH3

CH3

CH3

5’

3’1

Base 1O

O

O

DMTr

AAAA

5’

3’2

Base 1O

OH

O

AAAA

5’

3’4

Base 1O

OP

O

A

Base 2O

O

O

DMTr

OCH2CH2NC

AAAAA

53’

5’ Base 1O

O

O

A

O

CH3

AAAA

O

Base 1O

O

P

O

A

Base 2O

O

O

DMTr

OCH2CH2NC

AAAA

Final oligonucleotide

5’

3’7

O

Base 1O

O

P

OH

Base 2O

O

O

OCH2CH2NC

OP

O

OCH2CH2NC

O

OP

Base nO

OH

O

OCH2CH2NC

Page 8: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We can write...

We can “write” in DNA using what is called the process of DNA Synthesis.

Biochemistry of synthesis: Stitching together bases from the set {A,T,G,C}through deprotection & coupling cycles.

Coupling

3

Capping

Oxidation

De-blocking

6

5’

3’

Final deprotection and cleavage from the support

Initial deprotection

nucleoside2-cyanoethyl phosphoramidite

Blocking unreacted 5’-OH groups

Base 2O

O

O

DMTr

O NP

CH3

CH2CH2NC

CH3

CH3

CH3

5’

3’1

Base 1O

O

O

DMTr

AAAA

5’

3’2

Base 1O

OH

O

AAAA

5’

3’4

Base 1O

OP

O

A

Base 2O

O

O

DMTr

OCH2CH2NC

AAAAA

53’

5’ Base 1O

O

O

A

O

CH3

AAAA

O

Base 1O

O

P

O

A

Base 2O

O

O

DMTr

OCH2CH2NC

AAAA

Final oligonucleotide

5’

3’7

O

Base 1O

O

P

OH

Base 2O

O

O

OCH2CH2NC

OP

O

OCH2CH2NC

O

OP

Base nO

OH

O

OCH2CH2NC

Page 9: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Write...

Commercial synthesis: Agilent, Gen9, IDT, Twist Bioscience (recent feature on“Rewriting DNA Synthesis on Silicon”).

▸ DNA microarray based short string - oligo - synthesis (left): Cost effective,large scale. Moderate error rates.

▸ Long strand - gBlock - synthesis (right): Assembles short blocks.Chemical error-correction.

▸ Types of synthesis errors: Deletions, insertions, substitutions.

Page 10: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Write...

Commercial synthesis: Agilent, Gen9, IDT, Twist Bioscience (recent feature on“Rewriting DNA Synthesis on Silicon”).

▸ DNA microarray based short string - oligo - synthesis (left): Cost effective,large scale. Moderate error rates.

▸ Long strand - gBlock - synthesis (right): Assembles short blocks.Chemical error-correction.

▸ Types of synthesis errors: Deletions, insertions, substitutions.

Page 11: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Write...

Commercial synthesis: Agilent, Gen9, IDT, Twist Bioscience (recent feature on“Rewriting DNA Synthesis on Silicon”).

▸ DNA microarray based short string - oligo - synthesis (left): Cost effective,large scale. Moderate error rates.

▸ Long strand - gBlock - synthesis (right): Assembles short blocks.Chemical error-correction.

▸ Types of synthesis errors: Deletions, insertions, substitutions.

Page 12: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Read...

Illumina (MiSeq): Best overall performance of modern sequencing technologiesin terms of yield and accuracy. Relatively small error rates (substitutions andrare deletions). Short read length.

Steps: Cloning /// Shearing /// Reading of unordered pool /// Computeraided alignment of overlapping fragments /// Consensus

Page 13: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Read...

Illumina (MiSeq): Best overall performance of modern sequencing technologiesin terms of yield and accuracy. Relatively small error rates (substitutions andrare deletions). Short read length.

Steps: Cloning /// Shearing /// Reading of unordered pool /// Computeraided alignment of overlapping fragments /// Consensus

Page 14: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Read...

Oxford Nanopore - MinIon: Longer read lengths, portable architecture.Context-dependent deletion, insertion and substitution errors.

Key properties: Biological pore(s) and motor, base calling using deep learningtechniques.

Page 15: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Read...

Oxford Nanopore - MinIon: Longer read lengths, portable architecture.Context-dependent deletion, insertion and substitution errors.

Key properties: Biological pore(s) and motor, base calling using deep learningtechniques.

Page 16: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Amplify and Enable Random Access...

Polymerase Chain Reaction (PCR): Cheap, fast, “exponential” informationreplication.

Primers - Key enablers of PCR: Short DNA strands that initiate replication at“strand-matching” locations (red blocks).

Page 17: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Amplify and Enable Random Access...

Polymerase Chain Reaction (PCR): Cheap, fast, “exponential” informationreplication.

Primers - Key enablers of PCR: Short DNA strands that initiate replication at“strand-matching” locations (red blocks).

Page 18: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

We Can Amplify and Enable Random Access...

Polymerase Chain Reaction (PCR): Cheap, fast, “exponential” informationreplication.

Primers - Key enablers of PCR: Short DNA strands that initiate replication at“strand-matching” locations (red blocks).

Page 19: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Implementations

Page 20: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

“Double Helix Serves Double Duty”, NY Times, Jan 2013

▸ Church et al. (Science, 2012) and later Goldman et al. (Nature, 2013)stored 739 KB of data in synthetic DNA, mailed it and recreated theoriginal digital files using Illumina readers.

▸ Digital archival storage systems that will safely store the equivalent of onemillion CDs in a gram of DNA for 10,000 years.

100

D - DNA fragments

1 100 2 12

1 1

25

Encoded data

Reverse complemented encoded data

File identification

Intra-file location information

Parity-check

Reversed or non-reversed flag

A - Binary/text

B - Base-3-encoded

C - DNA-encoded

B e n a m k h o d a v a n d … …

…10001001111001010110110…

20112 20200 02110 10002 02212 01112 02110 10221 02212 11021 02212 10101 02212 10002 … …

TCACT ATATA TGTGA CGATA TAGTA TGTGC GCACG TCTAC GCTGC ACGCA TAGTA CGTCA TAGTA CGATA … …

Alternate fragments have file information reverse complemented

Page 21: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

“Double Helix Serves Double Duty”, NY Times, Jan 2013

▸ Church et al. (Science, 2012) and later Goldman et al. (Nature, 2013)stored 739 KB of data in synthetic DNA, mailed it and recreated theoriginal digital files using Illumina readers.

▸ Digital archival storage systems that will safely store the equivalent of onemillion CDs in a gram of DNA for 10,000 years.

100

D - DNA fragments

1 100 2 12

1 1

25

Encoded data

Reverse complemented encoded data

File identification

Intra-file location information

Parity-check

Reversed or non-reversed flag

A - Binary/text

B - Base-3-encoded

C - DNA-encoded

B e n a m k h o d a v a n d … …

…10001001111001010110110…

20112 20200 02110 10002 02212 01112 02110 10221 02212 11021 02212 10101 02212 10002 … …

TCACT ATATA TGTGA CGATA TAGTA TGTGC GCACG TCTAC GCTGC ACGCA TAGTA CGTCA TAGTA CGATA … …

Alternate fragments have file information reverse complemented

Page 22: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

“Data Storage on DNA Can Keep it Safe for Centuries,”NY Times, Dec 2015

Our Work: A fully operational random access and rewritable DNA-basedmemory with Sanger sequencing.

▸ Yazdi et.al., 2015 - First random access, rewritable DNA-based storagesystem. Encoded Wikipedia entries for six US universities.

▸ Large scale testing of the methods: Microsoft Research/Twist Bioscience,2016. Coverage by Spectrum, Nature, New Scientist etc.

Page 23: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

“Data Storage on DNA Can Keep it Safe for Centuries,”NY Times, Dec 2015

Our Work: A fully operational random access and rewritable DNA-basedmemory with Sanger sequencing.

▸ Yazdi et.al., 2015 - First random access, rewritable DNA-based storagesystem. Encoded Wikipedia entries for six US universities.

▸ Large scale testing of the methods: Microsoft Research/Twist Bioscience,2016. Coverage by Spectrum, Nature, New Scientist etc.

Page 24: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 25: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 26: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 27: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 28: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 29: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Practical Obstacles

▸ Cost of synthesis 5 − 6 orders of magnitude higher than for classicalrecorders.

▸ Use native instead of synthetic DNA for encoding.

▸ Readout devices such as Illumina are very expansive and bulky.

▸ Use portable and cheap nanopore sequencers such as MinION.

▸ No known interfaces between DNA-based memories and computationalplatforms.

▸ Use generalizations of strand displacement to implement Minsky’s register

machines.

Page 30: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

End-to-End Design

Page 31: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Writing on DNA

▸ File selection, (JPEG) compression, Base 64 conversion.

▸ Constrained and Error-Control Coding, Addressing.

▸ Synthesis with IDT (Integrated DNA Technologies).

JPEG

compression

Base64

conversion

Encoding into

DNA

DNA

synthesis

TTACCGCCTCCCCCCA

984 bases 16 bases

Address Information

Raw image

DNA blocks

AGTCTGAC ACTGTCTC ⋯ TGCTTGCA

JPEG and Base64 Encoded Image100010111001010…

Conversion into a sequence over the alphabet {A,T,G,C}AATGCAGCGTTTAGAGAT…

AATGCAGC|GTTTAGAG|AT…Parsing into blocks of length 896

Addressing

AATGCCCAGCATAG GTTTCGAGAGCTCTBlock GC-Content Balancing and ECC

CGAAAATGCCCAGCATAG GCTAGTTTCGAGAGCTCT

Page 32: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Writing on DNA

▸ File selection, (JPEG) compression, Base 64 conversion.

▸ Constrained and Error-Control Coding, Addressing.

▸ Synthesis with IDT (Integrated DNA Technologies).

JPEG

compression

Base64

conversion

Encoding into

DNA

DNA

synthesis

TTACCGCCTCCCCCCA

984 bases 16 bases

Address Information

Raw image

DNA blocks

AGTCTGAC ACTGTCTC ⋯ TGCTTGCA

JPEG and Base64 Encoded Image100010111001010…

Conversion into a sequence over the alphabet {A,T,G,C}AATGCAGCGTTTAGAGAT…

AATGCAGC|GTTTAGAG|AT…Parsing into blocks of length 896

Addressing

AATGCCCAGCATAG GTTTCGAGAGCTCTBlock GC-Content Balancing and ECC

CGAAAATGCCCAGCATAG GCTAGTTTCGAGAGCTCT

Page 33: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Writing on DNA

▸ File selection, (JPEG) compression, Base 64 conversion.

▸ Constrained and Error-Control Coding, Addressing.

▸ Synthesis with IDT (Integrated DNA Technologies).

JPEG

compression

Base64

conversion

Encoding into

DNA

DNA

synthesis

TTACCGCCTCCCCCCA

984 bases 16 bases

Address Information

Raw image

DNA blocks

AGTCTGAC ACTGTCTC ⋯ TGCTTGCA

JPEG and Base64 Encoded Image100010111001010…

Conversion into a sequence over the alphabet {A,T,G,C}AATGCAGCGTTTAGAGAT…

AATGCAGC|GTTTAGAG|AT…Parsing into blocks of length 896

Addressing

AATGCCCAGCATAG GTTTCGAGAGCTCTBlock GC-Content Balancing and ECC

CGAAAATGCCCAGCATAG GCTAGTTTCGAGAGCTCT

Page 34: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Why Balancing?

▸ 50% GC-content. Evens out melting temperature, creates more stable DNA.

▸ Makes synthesis simpler!

Page 35: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Why Balancing?

▸ 50% GC-content. Evens out melting temperature, creates more stable DNA.

▸ Makes synthesis simpler!

Page 36: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Why Addressing?

▸ Need to equip every block with unique address sequence to be used as PRC

primer - RA equals amplification!

▸ Addresses need to be sufficiently different (Hamming distance) and avoided

elsewhere in the blocks.

▸ Addresses should not have a Secondary Structure: Needed for accuratesequence/primer synthesis.

Page 37: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Why Addressing?

▸ Need to equip every block with unique address sequence to be used as PRC

primer - RA equals amplification!

▸ Addresses need to be sufficiently different (Hamming distance) and avoided

elsewhere in the blocks.

▸ Addresses should not have a Secondary Structure: Needed for accuratesequence/primer synthesis.

Page 38: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Why Addressing?

▸ Need to equip every block with unique address sequence to be used as PRC

primer - RA equals amplification!

▸ Addresses need to be sufficiently different (Hamming distance) and avoided

elsewhere in the blocks.

▸ Addresses should not have a Secondary Structure: Needed for accuratesequence/primer synthesis.

Page 39: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

The Constrained Coding Components

Definition. A sequence a = (a1, . . . , an) ∈ Fnq is self uncorrelated if no proper

prefix of a matches its suffix, i.e., (a1, . . . , ai) ≠ (an−i+1, . . . , an), for all1 ≤ i < n.Extension: A mutually uncorrelated (cross-bifix-free) code is a set of sequencessuch that for any two sequences a,b ∈ Fn

q in the code no proper prefix of aappears as a suffix of b and vice versa [L70, G60, B12].

Page 40: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Address Sequence Construction —

Enumeration and constructions of strings of a given length that contain noelements of some fixed set of strings as subwords [GO80’s].

Take addresses as “forbidden words”: Ensures specific random access.

MU vs. Weakly MU: A k-weakly mutually uncorrelated (WMU) code is a set ofsequences such that for any two sequences a,b ∈ Fn

q in the code no properprefix of a of length ≥ k appears as a suffix of b and vice versa [TKM16].Construction of WMU codes with balancing and Hamming weight constraints?

For a = (a1, . . . , as) ,b = (b1, . . . , bs) ∈ {0,1}s, define

Ψ (a,b) ∶ {0,1}s × {0,1}s → {A,T,C,G}s

according to:

for 1 ≤ i ≤ s, ci =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

A if (ai, bi) = (0,0)C if (ai, bi) = (0,1)T if (ai, bi) = (1,0)G if (ai, bi) = (1,1)

Page 41: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Address Sequence Construction ——

Decoupling the construction: Let C1,C2 ⊆ {0,1}s be two binary block code oflength s. Encode all pairs (a,b) ∈ C1 × C2 using C3 = {Ψ (a,b) ∣ a ∈ C1,b ∈ C2}.Then:

1 C3 is balanced if C2 is balanced.

2 C3 is a k-WMU code if either C1 or C2 is a k-WMU code.

3 If d1 and d2 are the minimum Hamming distances of C1 and C2,respectively, then the minimum Hamming distance of C3 is at leastmin (d1, d2).

See also [LY17] for MU codes.

Information sequence encoding?

Page 42: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Information Sequence Encoding?

Modification based on [WI90’s], and new approach in [TGM17].

Page 43: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Random Access and Rewriting Experiments

▸ Random access achieved via via PCR, addresses used as primers.

▸ Context identification and rewriting performed via gBlock or OE-PCRmethods.

Page 44: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Random Access and Rewriting Experiments

▸ Random access achieved via via PCR, addresses used as primers.

▸ Context identification and rewriting performed via gBlock or OE-PCRmethods.

Page 45: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Reading from DNA I

Reading=Sequencing: MinION Oxford Nanopore (R7).

Page 46: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Reading from DNA II

MinION Oxford Nanopore (R7): Sequence traces.

Page 47: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Reading from DNA III

Major Problem: Very large number of sequence-dependent indel andsubstitution errors (∼ 20%)!

Example Statistics:

Page 48: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Sequence Alignment I

First Step: “Merge traces” into one consensus sequence.

Bioinformatics: Sequence alignment [NW, SW]. Computer Science:Reconstructing sequences from traces [Batu et.al. 2004].

Error-correction

(runlength &

balancing)

Deletion

correction

BWA alignment

Phase 2

Phase 1 Read selection

based on address

MSA

Reference

Aligned reads

Region of poor alignment

Ref Reads

CNS

Recovered file

Reads

Page 49: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Sequence Alignment II

Optimal, but of very high complexity! Works poorly on real data!

AGTTGCGCACTTTAA…GTGCCCAACTTTTA…AATGCGGACTTTAA…AATTGGCAACTTA…

ATGCCACTTAA…

A…GTTGCGCACTTTAA…GTGCCCAACTTTTA…ATGCGGACTTTAA…ATTGGCAACTTA…

TGCCACTTAA…

AG… or AA…

▸ Reference-free sequence alignment: CLUSTAL, KALIGN, MUSCLE,TCOFFEE, etc.

▸ Simple trace reconstruction: [Batu et.al. 2004], [Holenstein et.al. 2016]

Page 50: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Sequence Alignment II

Optimal, but of very high complexity! Works poorly on real data!

AGTTGCGCACTTTAA…GTGCCCAACTTTTA…AATGCGGACTTTAA…AATTGGCAACTTA…

ATGCCACTTAA…

A…GTTGCGCACTTTAA…GTGCCCAACTTTTA…ATGCGGACTTTAA…ATTGGCAACTTA…

TGCCACTTAA…

AG… or AA…

▸ Reference-free sequence alignment: CLUSTAL, KALIGN, MUSCLE,TCOFFEE, etc.

▸ Simple trace reconstruction: [Batu et.al. 2004], [Holenstein et.al. 2016]

Page 51: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Sequence Alignment III

Nanopolish: A specialized deep-learning algorithm for base-calling developed byOxford Nanopore.

Example Statistics:

Page 52: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Our Readout Solution

▸ Select “best” traces for first alignment: Best traces=first sequencestraces, have a small number of errors!Nanopores are biological, tend to “tire,” traces of nonuniform quality.

▸ Perform alignment: Roughly 30 traces involved.

▸ Adjust balance: Iterative procedure. Balancing limits runlengths!

▸ Repeat while recruiting new traces.

Read selection

based on address

Error-correction

using runlength

and balancing

properties

Deletion

correction

MSA

BWA alignment

Reads

CNS

Ref

Reads Reconstructed data

Page 53: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Our Readout Solution

▸ Select “best” traces for first alignment: Best traces=first sequencestraces, have a small number of errors!Nanopores are biological, tend to “tire,” traces of nonuniform quality.

▸ Perform alignment: Roughly 30 traces involved.

▸ Adjust balance: Iterative procedure. Balancing limits runlengths!

▸ Repeat while recruiting new traces.

Read selection

based on address

Error-correction

using runlength

and balancing

properties

Deletion

correction

MSA

BWA alignment

Reads

CNS

Ref

Reads Reconstructed data

Page 54: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Our Readout Solution

▸ Select “best” traces for first alignment: Best traces=first sequencestraces, have a small number of errors!Nanopores are biological, tend to “tire,” traces of nonuniform quality.

▸ Perform alignment: Roughly 30 traces involved.

▸ Adjust balance: Iterative procedure. Balancing limits runlengths!

▸ Repeat while recruiting new traces.

Read selection

based on address

Error-correction

using runlength

and balancing

properties

Deletion

correction

MSA

BWA alignment

Reads

CNS

Ref

Reads Reconstructed data

Page 55: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Our Readout Solution

▸ Select “best” traces for first alignment: Best traces=first sequencestraces, have a small number of errors!Nanopores are biological, tend to “tire,” traces of nonuniform quality.

▸ Perform alignment: Roughly 30 traces involved.

▸ Adjust balance: Iterative procedure. Balancing limits runlengths!

▸ Repeat while recruiting new traces.

Read selection

based on address

Error-correction

using runlength

and balancing

properties

Deletion

correction

MSA

BWA alignment

Reads

CNS

Ref

Reads Reconstructed data

Page 56: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Deletion Correction Through Balancing

Cest=Current estimate of the consensus sequence

▸ Initial alignment:Cest TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace1 TTCACCCCAAAACCGAAAACCGCTTCACGATrace2 TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace3 TTCACCCAAAAACCCGAAAACCGCTTCAGCGA

▸ After MSACest T2C1A1C4A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C4A4...Trace3 T2C1A1C3A5...

▸ After BalancingCest T2C1A1C3A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C3A4...Trace3 T2C1A1C3A5...

Page 57: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Deletion Correction Through Balancing

Cest=Current estimate of the consensus sequence

▸ Initial alignment:Cest TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace1 TTCACCCCAAAACCGAAAACCGCTTCACGATrace2 TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace3 TTCACCCAAAAACCCGAAAACCGCTTCAGCGA

▸ After MSACest T2C1A1C4A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C4A4...Trace3 T2C1A1C3A5...

▸ After BalancingCest T2C1A1C3A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C3A4...Trace3 T2C1A1C3A5...

Page 58: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Deletion Correction Through Balancing

Cest=Current estimate of the consensus sequence

▸ Initial alignment:Cest TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace1 TTCACCCCAAAACCGAAAACCGCTTCACGATrace2 TTCACCCAAAAACCCGAAAACCGCTTCAGCGATrace3 TTCACCCAAAAACCCGAAAACCGCTTCAGCGA

▸ After MSACest T2C1A1C4A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C4A4...Trace3 T2C1A1C3A5...

▸ After BalancingCest T2C1A1C3A4...Trace1 T2C1A1C3A4...Trace2 T2C1A1C3A4...Trace3 T2C1A1C3A5...

Page 59: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

After Balancing Deletion Correction...

Asymmetric deletion errors: Handful of errors affecting runlengths of length6 ≤ ` ≤ 8.Most errors appear in runs of the symbol A! Example Statistics:

Page 60: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

And the Decoding Works?

Page 61: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Homopolymer Codes

▸ Definition. The integer sequence of a vector x ∈ Fn4 is the sequence of the

length of the runs in x.

Example: x = (0,0,1,3,3,2,1,1)→ I(x) = (2,1,2,1,2).The alternating sequence is the sequence of symbols in x, with all runsequal to one. The string sequence for x above is S(x) = (0,1,3,2,1).

▸ Suppose that CH(n, t) is a code that can correct up to t substitutionerrors. Let

C(n, t) = {x ∈ Fn4 ∶ I(x)mod 2 ∈ CH(n, t)}.

The code C(n, t) can correct up to t asymmetric (decreasing)run-preserving deletions.

▸ Related to sticky deletions [B90’s], [DA05].

Page 62: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Homopolymer Codes

▸ Definition. The integer sequence of a vector x ∈ Fn4 is the sequence of the

length of the runs in x.

Example: x = (0,0,1,3,3,2,1,1)→ I(x) = (2,1,2,1,2).The alternating sequence is the sequence of symbols in x, with all runsequal to one. The string sequence for x above is S(x) = (0,1,3,2,1).

▸ Suppose that CH(n, t) is a code that can correct up to t substitutionerrors. Let

C(n, t) = {x ∈ Fn4 ∶ I(x)mod 2 ∈ CH(n, t)}.

The code C(n, t) can correct up to t asymmetric (decreasing)run-preserving deletions.

▸ Related to sticky deletions [B90’s], [DA05].

Page 63: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Homopolymer Codes

▸ Definition. The integer sequence of a vector x ∈ Fn4 is the sequence of the

length of the runs in x.

Example: x = (0,0,1,3,3,2,1,1)→ I(x) = (2,1,2,1,2).The alternating sequence is the sequence of symbols in x, with all runsequal to one. The string sequence for x above is S(x) = (0,1,3,2,1).

▸ Suppose that CH(n, t) is a code that can correct up to t substitutionerrors. Let

C(n, t) = {x ∈ Fn4 ∶ I(x)mod 2 ∈ CH(n, t)}.

The code C(n, t) can correct up to t asymmetric (decreasing)run-preserving deletions.

▸ Related to sticky deletions [B90’s], [DA05].

Page 64: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Readout Time

▸ The sequencing time was ∼ 10 hours, but only “junk” reads generatedafter ∼ 6 hours.

▸ How long shall we sequence for?

▸ Definition. The k-deck of a sequence x is the multiset of all subsequencesof length k of x.

▸ Hybrid reconstruction: One is given a small number M of “long”asymmetric traces (o(n) deletions). What is the smallest value of k for ak-deck that along with the M long traces ensures unique reconstruction?

▸ See [GM17], and some general trace results by Dolecek and Yaakobi groupat ISIT!

Page 65: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Readout Time

▸ The sequencing time was ∼ 10 hours, but only “junk” reads generatedafter ∼ 6 hours.

▸ How long shall we sequence for?

▸ Definition. The k-deck of a sequence x is the multiset of all subsequencesof length k of x.

▸ Hybrid reconstruction: One is given a small number M of “long”asymmetric traces (o(n) deletions). What is the smallest value of k for ak-deck that along with the M long traces ensures unique reconstruction?

▸ See [GM17], and some general trace results by Dolecek and Yaakobi groupat ISIT!

Page 66: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Readout Time

▸ The sequencing time was ∼ 10 hours, but only “junk” reads generatedafter ∼ 6 hours.

▸ How long shall we sequence for?

▸ Definition. The k-deck of a sequence x is the multiset of all subsequencesof length k of x.

▸ Hybrid reconstruction: One is given a small number M of “long”asymmetric traces (o(n) deletions). What is the smallest value of k for ak-deck that along with the M long traces ensures unique reconstruction?

▸ See [GM17], and some general trace results by Dolecek and Yaakobi groupat ISIT!

Page 67: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Readout Time

▸ The sequencing time was ∼ 10 hours, but only “junk” reads generatedafter ∼ 6 hours.

▸ How long shall we sequence for?

▸ Definition. The k-deck of a sequence x is the multiset of all subsequencesof length k of x.

▸ Hybrid reconstruction: One is given a small number M of “long”asymmetric traces (o(n) deletions). What is the smallest value of k for ak-deck that along with the M long traces ensures unique reconstruction?

▸ See [GM17], and some general trace results by Dolecek and Yaakobi groupat ISIT!

Page 68: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

Readout Time

▸ The sequencing time was ∼ 10 hours, but only “junk” reads generatedafter ∼ 6 hours.

▸ How long shall we sequence for?

▸ Definition. The k-deck of a sequence x is the multiset of all subsequencesof length k of x.

▸ Hybrid reconstruction: One is given a small number M of “long”asymmetric traces (o(n) deletions). What is the smallest value of k for ak-deck that along with the M long traces ensures unique reconstruction?

▸ See [GM17], and some general trace results by Dolecek and Yaakobi groupat ISIT!

Page 69: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?

▸ Microarray Synthesis and Shotgun Sequencing: DNA ProfileCodes.

▸ Microarray Synthesis and Nanopore Sequencing: AsymmetricLee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 70: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.

▸ Microarray Synthesis and Nanopore Sequencing: AsymmetricLee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 71: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 72: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 73: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 74: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.

▸ Sequence Alignment: Reconstructing sequences from short andlong traces.

▸ Controlled Assembly: Repeat-avoiding codes.

Page 75: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.

▸ Controlled Assembly: Repeat-avoiding codes.

Page 76: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 77: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?

▸ Microarray Synthesis and Shotgun Sequencing: DNA ProfileCodes.

▸ Microarray Synthesis and Nanopore Sequencing: AsymmetricLee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 78: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.

▸ Microarray Synthesis and Nanopore Sequencing: AsymmetricLee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 79: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.

▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 80: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 81: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.

▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 82: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.

▸ Sequence Alignment: Reconstructing sequences from short andlong traces.

▸ Controlled Assembly: Repeat-avoiding codes.

Page 83: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.

▸ Controlled Assembly: Repeat-avoiding codes.

Page 84: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

▸ Formal mathematical theory of coding for DNA storage?▸ Microarray Synthesis and Shotgun Sequencing: DNA Profile

Codes.▸ Microarray Synthesis and Nanopore Sequencing: Asymmetric

Lee Distance and Homopolymer Codes.▸ DNA Media Aging: Codes in the Damerau Distance.

▸ Mathematical approaches for enabling random access and rewriting.▸ Address Design: (Weakly) Mutually Uncorrelated Codes.▸ Sequence Alignment: Reconstructing sequences from short and

long traces.▸ Controlled Assembly: Repeat-avoiding codes.

Page 85: Portable and Random-Access DNA-Based Storage Systemstce.webee.eedev.technion.ac.il/wp-content/uploads/sites/8/2017/07/... · MotivationImplementationsEnd-to-End DesignTHANK YOU! Portable

Motivation Implementations End-to-End Design THANK YOU!

THANK YOU!