63
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences An introduction to “Next Generation” DNA Sequencing Jakob Hedegaard Post doc Dept. of Genetics and Biotechnology Molecular Genetics and Systembiology Blichers Allé 20, P.O. BO 50, DK-8830 Tjele Researchcentre Foulum Tlf (+45)89991363 E-mail: [email protected]

An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Embed Size (px)

Citation preview

Page 1: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

A A R H U S U N I V E R S I T E T

Faculty of Agricultural Sciences

An introduction to “Next Generation”DNA Sequencing

Jakob HedegaardPost doc

Dept. of Genetics and BiotechnologyMolecular Genetics and Systembiology

Blichers Allé 20, P.O. BO 50, DK-8830 TjeleResearchcentre Foulum

Tlf (+45)89991363E-mail: [email protected]

Page 2: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Programme

Monday NGS intro(RNA to sequences)

Tuesday

Wednesday NGS, pre-processing(Sequences to counts)

RNA- seq analysis(Counts to list(s))

Thursday Sequence based analysis Supervised and un-supervised learning

Friday Gene set analysis

a.m. p.m.

Page 3: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

RNA to sequences

The aims of today are to understand (or get an idea about):

• the potential use of NGS technology

• the basic principles of the Illumina technology

• the sample prep methods for mRNA-Seq (and RNA-Seq)

• the options for experimental design for mRNA-Seq

• the basic principles of processing the raw data from an Illumina GA

• the output and format of the data produced by an Illumina GA

Case study

Software check

Questions and breaks!

Page 4: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

DNA Sequencing technology

Sanger method (chain-terminator method)- Fluorescent labeled primer- 4 reactions/sample

Pharmacia ALF DNA Sequencer

- 10 sequences of 3-500 bp- sequence preparation and ON run

Page 5: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Short history of DNA Sequencing

1977– Maxim-Gilbert– Sanger

1986– First Automated DNASequencer ABI 370 (373)

1988– Pharmacia ALF

1995– ABI 377Up to 96 lanes

1996– First Capillary DNASequencer ABI 310

1998– First 96 Capillary instruments MegaBace, ABI 3700

2000– ABI 3100, 16 Capillary

2002– ABI 3730, 48 or 96 Capillary

2005– Genome Sequencer GS20 (454)

2006– Solexa (Illumina)

2007– SOLiD

2011– ?

Page 6: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Short history of DNA Sequencing

Method Read length (max bp)

Sequences per run

Run time Output/day

Output/run

Dye-Terminator (3730xl)

1,500 96 1 hr ~1-2 Mbp ~70 Kbp

Applied(SOLiD)

75 ~2.8 billions 7 days(60x60 pb)

~25 Gbp ~200 Gbp

Illumina (GA)

150 ~300 millions

14 days(2x150 bp)

~7 Gbp ~90 Gbp

Illumina (HiSeq)

100 ~1 billion 8 days(2x100 bp)

~25 Gbp ~200 Gbp

454/Roche (GS FLX)

400 ~ 1 million 10 hrs ~1 Gbp ~400 Mbp

Page 7: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

NGS – also known as…

• Next generation sequencing (NGS)

• High-throughput sequencing (HTS or HT-Seq)

• Flow cell sequencing (FCS)

• Massively parallel sequencing (MPS)

• Deep sequencing

• Many other synonyms!

Page 8: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

NGS technologies

Roche/454Genome Sequencer FLX

IlluminaGenome Analyser IIx

Applied Biosystems5500xl SOLiD

Additional technologies – and more to come!

IlluminaHiSeq 2000

Page 9: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina Sequencing technologyLibrary Generation

(hours-days)Applications

Whole genomeTargeted genome

Long PCRPull-down

EpigenomeBisulfiteRestrictionRestriction + bisulfiteAntibody pulldownAb. + bisulfite

Whole transcriptomeAll RNARNA minus rRNA, tRNAPolyAncRNA

Gene expression (DGE) Small RNARISC RNA productsProtein:DNAProtein:RNA

DNA Fragments

+Adapters

Cluster Generation(~5 hours)

Sequencing(days, 1.1 hrs/cycle)

~200 bp

Page 10: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

DNA(0.1-1.0 ug)

Single molecule arraySample preparation Cluster growth 5’

5’3’

G

T

C

A

G

T

C

A

G

T

C

A

C

A

G

TC

A

T

C

A

C

C

TAG

CG

TA

GT

1 2 3 7 8 94 5 6

Image acquisition Base calling

T G C T A C G A T …

Sequencing

Illumina Sequencing TechnologyReversible Terminator Chemistry

Page 11: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Flow cellA flow cell contains eight

lanes Lane 1Lane 2

Lane 8

.

.

.

Column 1

Column 2

Each lane contains two columns of tiles

Each column contains multiple tiles – total 120

Each tile is imaged four times per cycle –one image per base.

~340.000 clusters/tile ->

~40.000.000 clusters/lane ->

~320.000.000 clusters/flowcell

Illumina Sequencing Technology

Page 12: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina GA output

Page 13: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

100 Mbps

Pipeline Computer”Donkey”DELL PowerEdge 29008 cores (2 x 4 CPU, 2.6 GHz)16 GB RAM4.7 TB hard drive

GAIIx, “Oban” GAIIx, “Lagavulin”Cluster Station

Switch

• Storage• Analysis

Illumina GA Sequencing technology

Page 14: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina GA Sequencing technology

Illumina Genome Analyzer IIx, “Oban”

GA PC•2.66 GHz cpu•3 GB RAM•80 GB hard drive

Paired End (PE) module

Genome Analyzer (GAIIx)

Uninterruptible Power Supply (UPS)•Back up for ~10 min

Switch and 100 Mbps network to pipeline computer

Cooling unit!

Page 15: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina GA Sequencing technology

Flowcell& Prism

Laser

Liquid inlet

Pielter element

Camera

Liquid outlet

Page 16: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina Sequencing technology

Single-end

Paired-end

1.

2.

Multiplex

1.

(3.)

2.

Page 17: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina Sequencing technology

Single End Paired End

Page 18: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina Sequencing technology

Mate Pair

Page 19: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

SR and PE flowcells

Page 20: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Single-Read flowcells

Page 21: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Single-Read flowcells

Page 22: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Single-Read flowcells

Page 23: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

GAII, “Oban”

Cluster Station

Amplification

Linearization

Blocking

Primer hybridization

Sequencing – single read

Single-Read flowcells

Page 24: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Paired-End flowcells

Page 25: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Paired-End flowcells

Page 26: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Paired-End flowcells

Page 27: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Paired-End flowcells

Page 28: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

GAII, “Oban”

Cluster Station

Amplification

Linearization

Blocking

Primer hybridization

Sequencing – read 1

Re-synthesis

Linearization

Blocking

Primer hybridizationDenaturation + de-protection

Sequencing – read 2

Paired-End flowcells

Page 29: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Using NGS for expression profiling: mRNA-Seq

Digital gene expression• Sequencing of tags• Mapping tags to transcriptome -> counts• Counts, 0 - ∞• All genes/transcripts• Added information

(Alternative transcripts, SNPs, novel genes,…)

Page 30: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

RNA-Seq – library preparation

DNA methylation

DNA methylation

Ribosome associatedRibosome associated

Non-PolyA mRNA

Non-PolyA mRNA

RNARNA

PolyA mRNAPolyA mRNA

Non-codingNon-coding

RegulatoryRegulatory

StructuralStructural RNAassociated

RNAassociated

DNA associated

DNA associated

ReplisomeReplisome

DNA RepairDNA Repair

TelomericTelomeric

rRNArRNA

MicroRNAMicroRNA

Transcriptionalstart site

associated

Transcriptionalstart site

associated

Enhancer RNA

Enhancer RNA

Anti-senseAnti-sense

CodingCoding

Page 31: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

mRNA-Seq – library preparationTotal RNA (10 µg)

Purify and fragment mRNA

ds cDNA synthesis (random primed)

Repair ends

Add ‘A’ bases to 3’ ends

Ligate adapters

Purify ligation product (200-500 bp)

PCR amplification

Page 32: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

RNA – library preparations

mRNA-Seq

Adapter Ligation

Purified Total RNA

Poly-A Selection

RNA Fragmentation

cDNA Synthesis

PCR

TotalRNA-Seq II

Removal of rRNA

Adapter Ligation

Purified Total RNA

RNA Fragmentation

cDNA Synthesis

PCR

TotalRNA-Seq I

Normalizatione.g. DSN to remove

very abundant elements

Adapter Ligation

Purified Total RNA

RNA Fragmentation

cDNA Synthesis

PCR

Page 33: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

RNA – library preparations

smallRNA-Seq

Purified Total RNA

PAGE Selection

RNA Adapter Ligation

1. strand cDNA Synthesis

PCR

PAGE Selection

Directional mRNA-Seq

Purified Total RNA

RNA Adapter Ligation

1. strand cDNA Synthesis

PCR

Poly-A Selection

RNA Fragmentation

Page 34: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

RNA – library preparations

More variations:

• Multiplexing/indexing (more than 1 sample/lane)

• PCR free protocols

• Robot compatible protocols

• Tagged RT primer + terminal tagged oligo + index-PCR (Epicentre)

• Amplification based methods for samples with limited RNA

• …… more to come!

Confused?

Page 35: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

mRNA-Seq – library preparationTotal RNA (10 µg)

Purify and fragment mRNA

ds cDNA synthesis (random primed)

Repair ends

Add ‘A’ bases to 3’ ends

Ligate adapters

Purify ligation product (200-500 bp)

PCR amplification

Page 36: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Illumina - multiplex

Page 37: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

genomic DNA

Intensity versus Cycle

small-RNAmRNA (RNA-Seq)”random” priming?

Hansen et al, Nucl. Acids Res. (2010) 38 (12): e131.

Page 38: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Array vs NGS

• Hybridization to probes• Intensities, 0 – N (analog)• Fixed gene-set• Proven technology• Costs ?• ……

• Sequencing of tags• Counts, 0 - ∞ (digital)• All genes/transcripts• Added information• Costs ?• ……

Page 39: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Array vs NGS

mRNA-Seq data is information rich• mRNA Expression Profiling

• Alternative Splicing Analysis

• Analysis of expressed SNPs and mutations

• Analysis of Allelic-specific Expression

• Chimeric Transcript Discovery

• Gene Discovery and Annotation

• ……

(Module 11, Thursday: Sequence based analysis)

Page 40: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Array vs NGS

As in the early days of microarray technology….• The NGS technology is still under development

• The library prep methods (technology, biases, robot assisted, ……)

• Development of analytical tools and methods

• Education and training to handle the data

•……

Page 41: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Experimental design (mRNA-Seq)

Objective(s) of the study -> number of reads needed/sample

Counting/profiling 5-10 million reads/sampleAlternative splicing 50-100 million reads/sampleNovel discovery >100 million reads/sample

Multiplexing?- Illumina GAIIx: 35-40 million reads/lane- Done after 1 lane? <-> repeat in several lanes/flowcells(runs)?

Batch effects- RNA purification- library preparation (and method)- flowcell/sequencing run- chemistry batches- ?

Replication?- Biological vs. technical

Page 42: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Experimental design (mRNA-Seq)

• Replication• Randomization• Blocking

Genetics 185: 405–416 ( June 2010)

!

Page 43: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data processing

Page 44: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data processing

Illumina softwareAlternative software

Page 45: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Analysis

Pipeline Computer, ”Donkey”• sequence analysis

alignment to referenceproduces fastq files

GA-PC (SCS software, RTA)• image analysis• base calling1

2

Page 46: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data - structure

Real Time Analysis (RTA) output

Page 47: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data - structure

Disk space use

• Depends on the run (SE,PE, #sequences)

• Images are deleted

• Complete run folder ~ 1-2 TB (no images!)

• Minimum data set ~ 20-40 GB

• More space needed for analyzing the data!

BCL Converter output

Page 48: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data, the qseq files1 x _qseq.txt file/tile/lane -> • single-end: 960 _qseq.txt files (120x8)• paired-end: 1,920 _qseq.txt files (120x8x2)• paired-end, multiplex: 2,880 _qseq.txt files (120x8x3)

…………………………………………O 11 1 20 18235 1120 0 1 GGCGGCGGGC....GAGGGGGGGG BBBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18284 1125 0 1 ATGCCACCGC....ATGAAGAATA BBBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18449 1128 0 1 ACTTGAAATA....ATTTGGATAT `BBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18499 1123 0 1 CTTATTCAGG....GTGATTGAAG `][XUYKOMM....BBBBBBBBBB 0O 11 1 20 18546 1128 0 1 CGCATATCGT....TGAGCATGAA e_cS]becbe....BBBBBBBBBB 1O 11 1 20 18607 1120 0 1 CCCATTCGCA....TAATGTGATC SRSRQ[QW\\....BBBBBBBBBB 1O 11 1 20 18669 1125 0 1 CTTTTACGCA....AGACTGGAAA c_Scca[]cc....BBBBBBBBBB 1O 11 1 20 18783 1121 0 1 CCGGTGTGAT....AGTTAATGGA ccccR`Za`a....^^ZYb^]Hb^ 1…………………………………………

Sequence

Quality string

Filtering (1 for pass and 0 for failed)

Read number (1 or 2 for paired end runs; 1,2 or 3 for indexed paired-end)

A B C D

A

B

C

D

Page 49: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data, the fastq filesConversion of qseq -> fastq1 x fastq file/lane/read -> • single-end: 8 x fastq files (s_4_sequence.txt)• paired-end: 16x fastq files(s_4_1_sequence.txt, s_4_2_sequence.txt)• paired-end, multiplex: as paired-end (index included in fastq)

Fastq: same information as in qseq files, but• different format• 4 lines/sequence -> .txt files with 4x~40 millions lines• only sequences passing the signal purity filter

Pass: the second lowest chastity score in the first 25 cycles of a read must be greater than 0.6

Chastity is the ratio of the highest of the four (base type) intensities to the sum of highest two

Further quality (phred) based filtration can be applied on the fastq files

Page 50: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data, the fastq files…………………………@L_0023:1:1:16103:1200#0/1TTATGTGTTTATTACGTTNTTTG……AATGTTTATTTACGGGTTATTTA+L_0023:1:1:16103:1200#0/1hhhhhhghhhhhhghdeeBee__…… hfhhghhghhghfffaehghhcX@L_0023:1:1:16318:1210#0/1TGTGAAAGAATTGTTAGTTGGGA…… TTAGGTGTTTATATGTTTAAATT+L_0023:1:1:16318:1210#0/1Y_fXfdfafc`a_g_g_gdd\af…… ffddfcffffffgggggfffcff…………………………

Sequence

Quality string

Header

Header

…………………………………………O 11 1 20 18235 1120 0 1 GGCGGCGGGC....GAGGGGGGGG BBBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18284 1125 0 1 ATGCCACCGC....ATGAAGAATA BBBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18449 1128 0 1 ACTTGAAATA....ATTTGGATAT `BBBBBBBBB....BBBBBBBBBB 0O 11 1 20 18499 1123 0 1 CTTATTCAGG....GTGATTGAAG `][XUYKOMM....BBBBBBBBBB 0O 11 1 20 18546 1128 0 1 CGCATATCGT....TGAGCATGAA e_cS]becbe....BBBBBBBBBB 1O 11 1 20 18607 1120 0 1 CCCATTCGCA....TAATGTGATC SRSRQ[QW\\....BBBBBBBBBB 1O 11 1 20 18669 1125 0 1 CTTTTACGCA....AGACTGGAAA c_Scca[]cc....BBBBBBBBBB 1O 11 1 20 18783 1121 0 1 CCGGTGTGAT....AGTTAATGGA ccccR`Za`a....^^ZYb^]Hb^ 1…………………………………………

Fastq file

qseq file

Page 51: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data, the fastq files

…………………………@L_0023:1:1:16103:1200#0/1TTATGTGTTTATTACGTTNTTTG……AATGTTTATTTACGGGTTATTTA+L_0023:1:1:16103:1200#0/1hhhhhhghhhhhhghdeeBee__…… hfhhghhghhghfffaehghhcX@L_0023:1:1:16318:1210#0/1TGTGAAAGAATTGTTAGTTGGGA…… TTAGGTGTTTATATGTTTAAATT+L_0023:1:1:16318:1210#0/1Y_fXfdfafc`a_g_g_gdd\af…… ffddfcffffffgggggfffcff…………………………

Fastq file

…………………………@L_0017:1:1:7461:1105#CGATGT/1GGGTGATCATTTAATATTNCCGCCGTGCAGGGTAGCTAATCAACTAAATA+L_0017:1:1:7461:1105#CGATGT/1cdddaa^c[addcdda\`Bba^aZ\Z`UR_a^K[\c^a]Lddb]Wad`\]@L_0017:1:1:15703:1104#CGATGT/1CTGCTTTCAGATGCTGTCNTTTTATTAGTGAGTGATGATGGTTTGCTAAT+L_0017:1:1:15703:1104#CGATGT/1hhfhhhhhhhhhfhhdbbBbffdb`h_cecf_bacdeedegfffdcaca]…………………………

Fastq file from indexed run

Page 52: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data, the fastq filesThe Illumina fastq header - sequence

Paired-end read:@L_0020:4:4:4:1052#0/1@L_0020:4:4:4:1052#0/2

L_0001 unique instrument name and run number3 flowcell lane (1-8)1 tile number within the flowcell lane (1-120)4 'x'-coordinate of the cluster within the tile 1303 'y'-coordinate of the cluster within the tile #0 index sequence for a multiplexed sample (0 for no indexing) /1 the member of a pair, /1 or /2 (paired-end)

Indexed read:@L_0001:3:1:4:1303#GATCAG/1

Page 53: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Data , the fastq filesThe Illumina fastq header – quality score

+L:4:4:4:1052#0/1abbbb`ba]aYD[``aa[`[_ZSVYLWOZ^`aYTWKTUXR[ZTMZYNIZ^OWY_`aaa]LVNYab_BBBBBBBBBB

Quality score are represented as ASCII characters (to save space)–One ASCII character per base

To get Phred score:

ASCII value–64Phred Score

Sanger quality scores use the same principle–Same as a Phred score but the ASCII score calculation is different

ASCII value–33Sanger Score

Page 54: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Phred quality scores

Phred quality score:

QV = - 10 * log10(Pe)

where Pe is the probability thatthe base call is an error.

Phred score

Pe Accuracy of the base call

10 1 in 10 90 %

20 1 in 100 99 %

30 1 in 1,000 99.9 %

40 1 in 10,000 99.99 %

50 1 in 100,000 99.99 %

Charac-ter

ASCII value

Phred score

^ 94 30

_ 95 31

” 96 32

a 97 33

b 98 34

c 99 35

d 100 36

e 101 37

f 102 38

g 103 39

Page 55: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

QC - FASTX-Toolkit (Hannonlab)

quality vs cycle

Page 56: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

QC - FASTX-Toolkit (Hannonlab)

Genomic DNA mRNA-Seq(“random” priming!)

Nucleotide distribution vs cycle

Page 57: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

QC - FASTX-Toolkit (Hannonlab)

Methylation: Paired-end sequencing of bisulfite treated gDNA

• bisulfite treatment converts C to U (T), except for 5-methylcytosine

Read 1(low % C)

Read 2(low % G)

Page 58: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Analysis……

Page 59: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression
Page 60: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Discussions (support)SeqAnswers (http://seqanswers.com/forums/index.php)

Page 61: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

How much does it cost?

Library preparation

Cluster kit (flowcell )

Sequence kits

Project A:5 human genomes ~20x • 5x PE library of gDNA• 5x PE runs, 2x 101 bp

30x 36 cycles: $ 42.000

5x Paired-end: $ 22.000

~ $ 2000

Total cost: $ 66.000 (+)

Project B:84 mRNA-Seq profiles• 84x RNA-Seq libraries

(7x12 plex)• 3x SE runs, 50+7 bp

6x 36 cycles: $ 8.400

3x Single-read: $ 7.600

~ $ 25.000

Total cost: $ 41.000 (+)

Page 62: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression

Summary

The aims of today were to understand (or get an idea about):

• the potential use of NGS technology

• the basic principles of the Illumina technology

• some sample prep methods for mRNA-Seq (and RNA-Seq)

• the options for experimental design for mRNA-Seq

• the basic principles of processing the raw data from an Illumina GA

• the output and format of the data produced by an Illumina GA

Case study

Page 63: An introduction to “Next Generation” DNA Sequencingstaff.vbi.vt.edu/mlawre04/NextGen Genomics Class/Technology... · An introduction to “Next Generation ... Digital gene expression