37
NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies February 9, 2010 Elliott Margulies, Ph.D 1 Next‐Genera*on Sequencing Technologies Ellio8 H. Margulies, Ph.D. Genome Technology Branch Na2onal Human Genome Research Ins2tute Overview Technologies Applica*ons Background

Next‐Generaon Sequencing Technologies

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Next‐Genera*on Sequencing Technologies 

Ellio8 H. Margulies, Ph.D. 

Genome Technology Branch Na2onal Human Genome Research Ins2tute 

Overview 

Technologies 

Applica*ons 

Background 

Page 2: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Why Sequence DNA? 

De Novo Genome Sequencing Compara*ve Genomics 

Human Chicken Fugu 

Zebrafish 

Tetraodon 

Rat 

Rabbit Horse 

Cow 

Pig Dog 

Cat 

Platypus 

Hedgehog 

Vervet 

Orangutan 

Lemur 

Armadillo  Monodelphis 

Gorilla Mouse lemur  Mouse 

Opossum  Wallaby 

Duck 

Page 3: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Varia*on Detec*on 

“Coun*ng Experiments” 

Page 4: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

History of Nucleic Acid Sequencing 

1

15

150

50,000

25,000

1,500

200,000

1,000,000

Efficiency (bp/person/year)

Avery: Proposes DNA as ‘Genetic Material’

Watson & Crick: Double Helix Structure of DNA

Holley: Sequences Yeast tRNAAla

1870

1953

1940

1965

1970

1977

1980

1990

2000

Miescher: Discovers DNA

Wu: Sequences λ Cohesive End DNA

Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation

Messing: M13 Cloning

Hood et al.: Partial Automation

•  Cycle Sequencing •  Improved Sequencing Enzymes •  Improved Fluorescent Detection Schemes

1986

Slide kindly provided by Eric Green

Plateau in Sequencing Technology 

AB 3730 xl 

Current Topics in Genome Analysis, E. Green, Lecture 1

Page 5: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

New Sequencing Technologies 

SOLiD 

Liga2on‐based extension 

Genome Analyzer 

Reversible Terminator Chemistry 

454 

Pyrosequencing 

Even Newer than New… 

HeliScope  SMRT Technology 

True Single Molecule Sequencing 

Page 6: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Trade‐offs with Newer Sequencing Technologies 

Throughp

ut 

Length of Read (bp) 

~700 

Capillary‐based (AB) 

454 (Roche) 

~300 ~50 

Illumina AB/LifeTech Helicos 

kb 

Mb 

Gb 

Throughput = Amount of Sequence Generated Unit of Time or Cost 

PacBio ??? 

Page 7: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Template/Library Prepara*on Methods 

“Single Molecule” Sequencing Really a clonal amplifica.on of a single DNA molecule 

DNA 

Shear 

Add Adapters 

Select for fragments With an ‘A’ and ‘B’ adapter 

A[achment to  solid surface 

Sequence & Analysis 

Page 8: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Crea*ng Large Insert Paired‐End Libraries 

Genomic  DNA 

Ligate  Adapters  EcoP15I Linkers 

Circularize with Bio2nylated  Internal Adapter 

Bio2n Adapter EcoP15I 

2‐3KB Fragment 

Digest 

27 bp of genomic DNA 

Shear 

2‐3kb Fragments 

Slide adapted from Andrew Young and Ha.ce Ozel Abaan 

Paired‐End Sample Prepara*on 

300‐400 bp insert Sequencing primer R1 A  Adaptor  B Adaptor 

Sequencing primer R1 A  Adaptor  B Adaptor 

2‐3 kb fragment 

27bp insert 

27bp insert 

Bio2nylated Internal Adaptor 

Slide adapted from Andrew Young and Ha.ce Ozel Abaan 

Page 9: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

Mul*plexing Samples 

Sequence of Interest Adapter  Adapter 

Index ACAGTG ACTTGA ATCACG CAGATC CGATGT GATCAG GCCAAT GGCTAC TAGCTT TGACCA TTAGGC

Paired End Indexing Strategy 

454 Sequencing Technology  

Nature 31st July 2005 

Slide (though slightly modified) courtesy of Elaine Mardis, Wash U., St. Louis 

Page 10: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

10 

Emulsion PCR (Template Prep) 

Each bubble in the emulsion will potentially contain a different fragment.

Slide Courtesy of Alice Young, NISC 

Load PicoTiter Plate 

Packing beads and enzyme beads Slide Courtesy of Alice Young, NISC 

Page 11: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

11 

PicoTiter Plate Apparatus 

Instead of 96 reads/run, there are hundreds of thousands. Slide Courtesy of Alice Young, NISC 

PyroSequencing 

Luciferin

Slide Courtesy of Alice Young, NISC 

Page 12: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

12 

Figure 4, Genome Res (2001) 11:3‐11 

Data ‐ Flowgram 

Slide Courtesy of Alice Young, NISC 

Page 13: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

13 

454 Sequencing Summary   Run *me ~8 hrs   Produces 100’s of Mb of sequence   Read length ~300‐400 bp   Most “mature” of the next‐genera*on technologies   Homopolymer runs can be an issue 

Applica8ons:   de novo sequencing   Varia*on detec*on   Gene Expression   “Metagenomics” 

Nature, 2006 November 16; vol. (7117), 444 330-336

Science, 2006 November 17 ; vol. 314, 1113-111

http://popsci.typepad.com/photos/uncategorized/2007/10/25/laluezafox1lr.jpg

Page 14: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

14 

Page 15: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

15 

Illumina Genome Analyzer 

DNA (0.1‐1.0 ug) 

Single molecule array Sample 

prepara*on  Cluster growth  5’ 

5’ 3’ 

T C 

T A G 

C G 

T A 

G T 

Sequencing  

Bioinforma*cs Analyses 

Illumina/Solexa Sequencing 

Slide Courtesy of Illumina Inc. 

Page 16: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

16 

5’ 

3’ 

5’ 

 First base incorporated 

Cycle 1: Add sequencing reagents 

 Remove unincorporated bases 

 Detect signal 

Cycle 2‐n:  Add sequencing reagents and repeat 

All four labeled nucleo*des in one reac*on  Base‐by‐base sequencing No problems with homopolymer repeats 

Sequencing By Synthesis (SBS) 

Slide (though modified) Courtesy of Illumina Inc. 

The Illumina Genome Analysis System 

Sequencing & Imaging 

Flow Cell 

Cluster Genera*on 

Page 17: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

17 

100 MICRONS 

Pseudo‐color Enhanced Image  

Slide (though modified) Courtesy of Illumina Inc. 

Page 18: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

18 

Base Calling from Raw Data 

T T T T T T T G T … 

T G C T A C G A T … 

The iden2ty of each base of a cluster is read off from sequen2al images. 

1 2 3 7 8 9 4 5 6

Slide (though modified) Courtesy of Illumina Inc. 

Illumina Throughput 

  Each lane can sequence 20‐30 million molecules –  8 lanes = up to 240 million reads 

  36 bp reads suitable for coun*ng based experiments 

  Capable of up to 100bp paired end reads –  50 Gigabases of sequence per run 

Page 19: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

19 

HiSeq2000  

  Same chemistry   Runs 2 flowcells at the same *me –  Imaging one flowcell – chemistry on the other 

  Flowcells are bigger – More surface area can be scanned –  Focuses on top and bo[om of flowcell 

  Improvements to hardware –  Be[er lasers, cameras, etc. 

  Ini*al release mid‐February at 100‐125G per flowcell 

  8 day run*me for 2 flowcells –  Two “whole genomes” in 8 days! 

Page 20: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

20 

SOLiD from Applied Biosystems (now Life Technologies) 

Two‐base encoding 

Must know iden8ty of first base to decode color space 

Page 21: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

21 

Liga*on‐based extension 

Page 22: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

22 

A SNP requires an adjacent valid color change 

Errors do not have compensatory color changes 

Page 23: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

23 

Summary of Three Plakorms 

Pacific Biosiences 

Page 24: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

24 

Systems engineering challenges 

  We are s*ll learning what the important bits are that need to be stored. Data volumes are MASSIVE, especially short‐term 

  Challenges with cloud compu*ng 

~15 Terabytes Raw data 

~100 Gigabytes  Processed  

Per Human Genome 

Page 25: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

25 

Data and Compute – a shim in complexity 

Data  Computer Cluster Data 

Applica*ons 

Page 26: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

26 

Isolate parts of the genome that are “interes*ng” 

Shear DNA into “small” fragments 

Purify DNA fragments of interest 

SEQUENCE 

Page 27: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

27 

How do “coun*ng” experiments work? 

Cell (2007) May 18;129(4):823-37. 

  One of the first publica*ons using Solexa data 

  Reproducible data produc*on 

  Correlates with other sequence‐based coun*ng experiments 

  Iden*fy biologically‐relevant pa8erns of histone methyla*on –  Transcrip2on –  Enhancers –  Insulators 

  Stay tuned for Laura Elnitski’s lecture! 

Page 28: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

28 

Nature. 2007 Aug 2;448(7153):553-60

Sequencing‐based methods equivalent to  Microarray‐based methods 

Whole Genome Sequencing 

Page 29: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

29 

First two “personal” genomes sequenced 

Watson  Venter 

Page 30: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

30 

What does “whole‐genome” mean? 

  Average 30x base‐wise alignment depth of coverage 

  90 Gigabases of aligned sequence   120 Gigabases of purity filtered data   600 Million paired‐end 100bp reads 

  Realignment back to reference sequence. 

Why 30X? 

All SNPs 

Homozygous SNPs 

Heterozygous SNPs 

Page 31: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

31 

Tumor/Normal Whole‐Genome Comparison 

Experimental Design 

Tumor Genome  Normal Genome Compare: 

Tumor‐specific varia2on 

Tumor sample  “Normal” sample 

Page 32: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

32 

Summary of Data Generated 

Comparison to Other “Personal” Genomes 

Page 33: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

33 

Pipeline for Iden*fying Soma*c Muta*ons 

Melanoma Tumor Cell Line 

Page 34: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

34 

Defining a Soma*c Variant 

  Minimum of 3 high‐quality reads in the tumor with a variant 

  Minimum of 10X coverage in the normal and no evidence of the variant 

  Systema*c biases/errors are eliminated –  Library prepara2on –  Sequencing chemistry –  Read alignment 

Valida*on Efforts 

  32,325 soma*c variants detected 

  Valida*on against Sanger sequencing data: –  42 of 48 previously known soma2c variants detected 

88% sensi2vity 

–  452 of 470 newly detected soma2c variants confirmed 3% false posi2ve rate 

  Muta*onal profile reflec*ve of UV DNA damage:     

Page 35: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

35 

Transloca*ons and Copy Number Varia*ons 

Page 36: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

36 

www.1000genomes.org 

cancergenome.nih.gov 

Page 37: Next‐Generaon Sequencing Technologies

NHGRI Current Topics in Genome Analysis 2010 Week 5: Next-Generation Sequencing Technologies

February 9, 2010 Elliott Margulies, Ph.D

37