NGS II Illumina Sequencing

Preview:

Citation preview

DepthOfCoverageGenetics for Dummies 2017

NGS II – Illumina Sequencing

Robert Kraaij

Department of Internal Medicine

r.kraaij@erasmusmc.nl

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Things to be addressed

NGS: many short reads that might contain errors

data analysis will handle these reads and errors

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

cBot

flowcell

bridgePCR

HiSeq2000

Illumina Sequencing

Per Cycle Imaging

G A T C

Per Cycle Imaging

G

good quality

G

poor quality

Per Cycle Base Calling

Phred Score Incorrect base Accuracy

10 1 in 10 90 %

20 1 in 100 99 %

30 1 in 1000 99.9 %

40 1 in 10000 99.99 %

50 1 in 100000 99.999 %

0 to 93 → ASCII 33 to 126 = single character

Quality Scoring

@SEQ_ID

GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC

+SEQ_ID

!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

FASTQ File

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

Alignment or Mapping of Reads

R E F E R E N C E G E N O M E (HG19)

chromosome + position + strand

sample.bam

Run QC and filtering

sample.bam

sample.bam

• both reads

• quality scores

• chromosome

• position

• quality flag

• duplicate flag

• off target flag

sortedBAM file

Coverage

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x coverage

Mean Coverage

bases on target

size of target

% of Bases Above a Certain Threshold

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x 5x 4x1x

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T G C T T G C

G G T G C T T G C A T A G C T

T T A C G G T G C T T G C A T

G = homozygous alternative

- G A T T A C G G T G C

C G G T G C T T G C A T A G C

T G C A T A G C T -

A T T A C G G T G C T T G C A

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

T T A C G G T A C T T G C A T

A/G = heterozygous

- G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T -

A T T A C G G T G C T T G C A

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

A/G = heterozygous?

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G

sequencing quality

goodpoor

sample.vcf

• chromosome

• position

• quality

• annotations

VCF File

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

- G A T T A C G G T A C T T G C A T

deletion = heterozygous

- G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T -

- G A T T A C G G T G C T T G C A

Paired-End Sequencing

2 x 100 bp

Variant Calling: Mate Pairs

normal

400 bp

deletion

800 bp

insertion

200 bp

Variant Calling: Mate Pairs

normal

400 bp

translocation

Variant Calling: Split Reads

genome

800 bp

mRNA (cDNA)

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Applications

• Re-sequencing → full genome → SNPs and indels

• Re-sequencing → mate pairs → structural variations

• Re-sequencing → regional → SNPs and indels

• Sequencing → de novo assembly

• RNAseq

• ChIPseq

• …seq

www.illumina.com

Example:

Exome Sequencing

funding by NGI-NCHA, NWO, BBMRI

n > 3,000 samples of random set from RS-I

start May 2011; Nimblegen

part of “CHARGE-S” effort:

>5,000 exomes across 4 cohorts

Framingham, CHS, ARIC, Rotterdam Study

Expand with exome variants array?

CHARGE

Exome Sequencing

Exome vs Full Genome

exon exon exongenome → 3 Gb

exome → ~30 Mb

Exome Sequencing Workflow

DNA

isolation

Library

preparation

Exome

captureSequencing

Data

analysis

+

+

Exome

capture

Nimblegen SeqCap EZ v2 Capture

• CCDS (Sept 2009)

• miRBase (v14, Sept 2009)

• RefSeq (Jan 2010)

• 2,100,000 probes

• 30,246 coding genes

• 329,028 exons

• 710 miRNAs

• 36.5 Mb primary target

• 44.1 Mb capture target

Illumina TruSeq V3 2x100 PE Sequencing

Data analysis: BWA-GATK pipeline

• BclToFastQ (CASAVA)

• Chastity Filter

Demultiplexing

• BWA (paired)

• SortSam, MarkDuplicates (picard)

Alignment• BaseQualityScore

Recalibration, IndelRealignment (GATK)

Processing

• HaplotypeCaller

• VQSR

• VarEval

Variant-Calling• ANNOVAR,

VCFtools

• PlinkSeq, SKAT, R

• Spotfire

Analysis

Sample QC and Variant QC

RSX-2 Samples were sequenced to ~54x Mean Coverage

Average Mean Depth of Coverage

across the 44Mb SeqCap Exome

Perc

enta

ge o

f 44M

b c

overe

d 1

0x o

r better

Mean Depth of Coverage by Flowcell

Mean D

epth

of

Covera

ge

Flowcell Number (Roughly Chronological Order)

Determing Heterozygous Concordance versus 550k

genotyping arrays

Hete

rozygous C

oncord

ance

Flowcell Number (Roughly Chronological Order)

Sample QC and Variant QC

Number of Detected SNPs per Samples by Flowcell

Flowcell Number (Roughly Chronological Order)

Heterozygous to Homozygous ratio per Sample by

Flowcell

Flowcell Number (Roughly Chronological Order)

purines

Transition to Transversion Ratio

pyrimidines

tran

svers

ion

transition

Transition to Transversion Ratio per Sample by Flowcell

Flowcell Number (Roughly Chronological Order)

QC and filtering results

Things to Remember

NGS: many short reads that might contain errors

coverage indicates the number of independent reads that

cover a base → needed to analyse a genome

FASTQ file → sequence + quality scores

BAM file → aligned reads

VCF file → called variants + annotation

Recommended