1
CONCLUSIONS 30X Coverage on a Human Genome can be completed in 2 weeks with 40X coverage taking 2.5 weeks using 1 GridION. High molecular weight extractions produce data with a longer length and more yield when used fresh for library prep. Phenol:Chlorofom extractions combined with needle shearing produce high quality sequences. Stringent bead cleanups (0.4X) help to remove smaller reads from sequencing data. There is a tradeoff on length vs yield on GridION. However, even with increased yield, max length is between 100kb – 450kb. Length of libraries is sample dependent. For our work we saw one sample had slightly shorter N50 and max lengths. Yields were consistent even with these differences across samples. ABSTRACT The long reads and improved data throughput enabled by the Oxford Nanopore platform have enabled many genomic analyses, including detection of SVs and de novo genome assembly. Here, we describe the use of GridIONs to generate high coverage human genome sequences of three individuals from the 1000 Genomes Project selected by the Human Genome Structural Variation Consortium for comprehensive haplotype-aware structural variation detection. To exploit Nanopore’s potential of long read length and sequencing yield, we optimized protocols for preparing high-molecular weight genomic DNA library. Notably, we achieved multi-Gigabases (Gb) of reads of hundreds of kb of aligned sequences per run. To take advantage of the GridION highly parallel nature, we further built an efficient sample-to-data generation workflow and streamlined the data storage and QC pipeline. A single human genome of 30X coverage could be completed in two and a half weeks. With the high coverage Nanopore data from the well- characterized human genomes, we report the characterization and standardization of Oxford Nanopore performance matrices based on the latest technology updates. High Coverage, Ultra-long Read Sequencing of Human Genomes Jennifer Idol, Liang Gong, Chee Hong Wong, Dave Harrison, Lauren Bellfy, Adam Mil-Homens, Mallory Ryan, Chew Yee Ngan, Charles Lee, Chia-Lin Wei Genome Technologies, The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032 USA INTRODUCTION The purpose of the Human Genome Structural Variation Consortium (HGSV) is to sequence three trios from the 1000 Genomes Project for high-quality structural variation. The project displayed in this poster includes one human from each trio (GM19240, HG00514 and HG00733). GM19240 is a from Yoruba, HG00514 is Han Chinease and HG00733 is Puerto Rican. These three human samples therefore give a good distribution worldwide. The HGSV also aims to use new technologies to achieve their goal including taking advantage of the growing field of long reads. Oxford Nanopore has been a leader in the field of long read sequencing and the GridION system allows for long read data in a high throughput sequencing capacity. The ability to sequence five flow cells simultaneously for 48 hours allows ten libraries to be run per week. This throughput enables up to 70 Gb of data to be sequenced on a single sample in one week. Adding time for high molecular weight extractions, a 30X human genome can be sequenced in as little as 2 weeks. METHODS LITERATURE For more information on the 1000 Genomes Project please visit: http://www.internationalgenome.org/ For more information on the Human Genome Structural Variation Consortium (HGSV) please visit: http://www.internationalgenome.org/human-genome- structural-variation-consortium/ For more information on the Oxford Nanopore GridION please visit: https://nanoporetech.com/products/gridion ACKNOWLEDGEMENTS AND FUNDING The authors would like to thank James Brayer at Oxford for his continuing support throughout this project. Genome Technologies at The Jackson Laboratory for Genomic Medicine is partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196. Input: 30 M human cell pellet Phenol:Chloroform Extraction with Qiagen MaXtract tubes Resuspend Overnight Needle Shearing (30X shears with 27G needle) Extraction and Shearing Library Prep Input: 20ug HMW Sheared DNA DNA Repair (20C 1 hour) 1X AmPure XP Bead Cleanup End Prep (20C 1 hour, 65C 5 min) 0.4X AmPure XP Bead Cleanup Adapter Ligation (RT 30 min) 0.4X AmPure XP Bead Cleanup Prep Library for Loading Load Flow Cells v106 Sequence 48 Hours Library Prep and Sequencing Project Summary GM19240 HG00514 # Extractions Needed 3 4 # Flow Cells for 30X Coverage 23 30 Total Yield (Gb) 99.1 97.4 Average Flow Cell Yield (Gb) 4.3 ± 1.6 2.9 ± 1.7 N50 Read Length (bp) 45,470 ± 5,117 30,111 ± 8,565 Max Read Length (bp) 268,840 220,670 GridION Basecalling hg19 Alignment using MiniMap2 Metric Collection / Graphic Generation Using R Analysis Raw Yield Of Flow Cells GM19240 HG00514 Raw yields for flow cells of each sample. HG00514 has more flow cells with less yield because of an older extraction. Raw Flow Cell Gb N=23 N=30 Max and N50 Read Length Max and N50 read length for flow cells of each sample. This can vary according to extraction, library and flow cell quality. N=23 N=30 Fresh vs Old Extraction 5 libraries made from both old and fresh extractions. Fresh extractions produce better N50 and Max lengths. N50 Read Length 1 day 4 weeks Extraction Extraction 1 day 4 weeks Extraction Extraction Max Read Length Alignment Example of one flow cell Total Reads. Aligned reads are shown in the overlap of Aligned (red) and Raw (blue) which forms purple. >90% of all reads align. Counts Length (bp) 4% 2% 0% Example of two flow cells (one blue and one red) complete read length graph. Some flow cells get up to 250kb while others have shown up to 450kb. Read Lengths of Libraries Percentage of Bases Protocol 1 Protocol 2 0 50k 100k 150k 200k 250k 300k 350k 400k 450k Length Read Quality Scores Qscores of a single flow cells for each sample with each dot representing a single read. Qscores average 10.2 for GM19240 and 10.0 for HG00514. Qscore per Read GM19240 HG00514 0 5 10 15 18020218lee001GXB01102002 18042018lee001GXB01102001 N=187,977 N=324,482 N=5 N=5

Jennifer Idol Poster LC 2018 - FINAL · 2018. 6. 4. · Title: Jennifer Idol Poster LC 2018 - FINAL Created Date: 20180514204542Z

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jennifer Idol Poster LC 2018 - FINAL · 2018. 6. 4. · Title: Jennifer Idol Poster LC 2018 - FINAL Created Date: 20180514204542Z

www.jax.org

CONCLUSIONS• 30X Coverage on a Human Genome can be completed in 2

weeks with 40X coverage taking 2.5 weeks using 1 GridION.

• High molecular weight extractions produce data with a longer length and more yield when used fresh for library prep.

• Phenol:Chlorofom extractions combined with needle shearing produce high quality sequences.

• Stringent bead cleanups (0.4X) help to remove smaller reads from sequencing data.

• There is a tradeoff on length vs yield on GridION. However, even with increased yield, max length is between 100kb –450kb.

• Length of libraries is sample dependent. For our work we saw one sample had slightly shorter N50 and max lengths. Yields were consistent even with these differences across samples.

ABSTRACTThe long reads and improved data throughput enabled by the Oxford Nanopore platform have enabled many genomic analyses, including detection of SVs and de novo genome assembly. Here, we describe the use of GridIONs to generate high coverage human genome sequences of three individuals from the 1000 Genomes Project selected by the Human Genome Structural Variation Consortium for comprehensive haplotype-aware structural variation detection. To exploit Nanopore’s potential of long read length and sequencing yield, we optimized protocols for preparing high-molecular weight genomic DNA library. Notably, we achieved multi-Gigabases (Gb) of reads of hundreds of kb of aligned sequences per run. To take advantage of the GridION highly parallel nature, we further built an efficient sample-to-data generation workflow and streamlined the data storage and QC pipeline. A single human genome of 30X coverage could be completed in two and a half weeks. With the high coverage Nanopore data from the well-characterized human genomes, we report the characterization and standardization of Oxford Nanopore performance matrices based on the latest technology updates.

High Coverage, Ultra-long Read Sequencing of Human Genomes Jennifer Idol, Liang Gong, Chee Hong Wong, Dave Harrison, Lauren Bellfy, Adam Mil-Homens, Mallory Ryan, Chew Yee Ngan, Charles Lee, Chia-Lin Wei

Genome Technologies, The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032 USA

INTRODUCTIONThe purpose of the Human Genome Structural Variation Consortium (HGSV) is to sequence three trios from the 1000 Genomes Project for high-quality structural variation. The project displayed in this poster includes one human from each trio (GM19240, HG00514 and HG00733). GM19240 is a from Yoruba, HG00514 is Han Chinease and HG00733 is Puerto Rican. These three human samples therefore give a good distribution worldwide. The HGSV also aims to use new technologies to achieve their goal including taking advantage of the growing field of long reads.

Oxford Nanopore has been a leader in the field of long read sequencing and the GridIONsystem allows for long read data in a high throughput sequencing capacity. The ability to sequence five flow cells simultaneously for 48 hours allows ten libraries to be run per week. This throughput enables up to 70 Gb of data to be sequenced on a single sample in one week. Adding time for high molecular weight extractions, a 30X human genome can be sequenced in as little as 2 weeks.

METHODS

LITERATURE• For more information on the 1000 Genomes Project please

visit: http://www.internationalgenome.org/

• For more information on the Human Genome Structural Variation Consortium (HGSV) please visit: http://www.internationalgenome.org/human-genome-structural-variation-consortium/

• For more information on the Oxford Nanopore GridIONplease visit: https://nanoporetech.com/products/gridion

ACKNOWLEDGEMENTS AND FUNDINGThe authors would like to thank James Brayer at Oxford for his continuing support throughout this project. Genome Technologies at The Jackson Laboratory for Genomic Medicine is partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196.

Input: 30 M human cell pellet

Phenol:Chloroform Extraction with Qiagen

MaXtract tubesResuspend Overnight

Needle Shearing (30X shears with 27G

needle)

Extraction and Shearing

Library Prep Input: 20ug

HMW Sheared DNA

DNA Repair (20C 1 hour)

1X AmPure XP Bead Cleanup

End Prep (20C 1 hour, 65C 5 min)

0.4X AmPure XP Bead Cleanup

Adapter Ligation (RT

30 min)

0.4X AmPure XP Bead Cleanup

Prep Library for Loading

Load Flow Cells v106

Sequence 48 Hours

Library Prep and Sequencing

Project SummaryGM19240 HG00514

# Extractions Needed 3 4

# Flow Cells for 30X Coverage 23 30

Total Yield (Gb) 99.1 97.4

Average Flow Cell Yield (Gb) 4.3 ± 1.6 2.9 ± 1.7

N50 Read Length (bp) 45,470 ±5,117

30,111 ±8,565

Max Read Length (bp) 268,840 220,670

GridION Basecalling hg19 Alignment using MiniMap2

Metric Collection / Graphic Generation

Using R

Analysis

Raw Yield Of Flow Cells

GM19240 HG00514

Raw yields for flow cells of each sample. HG00514 has more flow cells with less yield because of an older extraction.

Raw

Flo

w C

ell G

b

N=23N=30

Max and N50 Read Length

Max and N50 read length for flow cells of each sample. This can vary according to extraction, library and flow cell quality.

N=23N=30

Fresh vs Old Extraction

5 libraries made from both old and fresh extractions. Fresh extractions produce better N50 and Max lengths.

N50

Rea

d Le

ngth

1 day 4 weeksExtraction Extraction

1 day 4 weeksExtraction Extraction

Max

Rea

d Le

ngth

Alignment

Example of one flow cell Total Reads. Aligned reads are shown in the overlap of Aligned (red) and Raw (blue) which forms purple. >90% of all reads align.

Cou

nts

Length (bp)

4%

2%

0%

Example of two flow cells (one blue and one red) complete read length graph. Some flow cells get up to 250kb while others have shown up to 450kb.

Read Lengths of Libraries

0

2%

4%

0 50k 100k 150k 200k 250k 300k 350k 400k 450kLength

Perc

enta

ge o

f bas

es

Run 180226.18.lee.001.GXB01102.002 180321.18.lee.001.GXB01102.005

Distribution of read length

Perc

enta

ge o

f Bas

es

Protocol 1Protocol 2

0 50k 100k 150k 200k 250k 300k 350k 400k 450kLength

Read Quality Scores

Qscores of a single flow cells for each sample with each dot representing a single read. Qscores average 10.2 for GM19240 and 10.0 for HG00514.

Qsc

ore

per R

ead

GM19240 HG00514

0

5

10

15

20

180202−18−lee−001−GXB01102−002 180420−18−lee−001−GXB01102−001

Sample

mean_qscore_tem

plate

Sample180202−18−lee−001−GXB01102−002

180420−18−lee−001−GXB01102−001

Oxford GridION Mean Read Qscore

N=187,977N=324,482

N=5N=5