27
Next Generation Next Generation Sequencing in Virus and Sequencing in Virus and Parasite Research Parasite Research

Next Generation Sequencing in Virus and Parasite Research

Embed Size (px)

Citation preview

Page 1: Next Generation Sequencing in Virus and Parasite Research

Next Generation Sequencing Next Generation Sequencing in Virus and Parasite Researchin Virus and Parasite Research

Page 2: Next Generation Sequencing in Virus and Parasite Research

Sanger Read

>800bp

GS-FLX read

~250bp 500 bp

100Mb|

500Mbper run

WGS

Annotation

PopulationDiversity

PathogenDiscovery

Applications Presented

Four main projectsIn the lab

Page 3: Next Generation Sequencing in Virus and Parasite Research

Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis

• Total scaffolds: ~8250• Longest scaffold: 6.5 Mb• Total bases in scaffolds: 71 Mb• Total span of scaffolds: 80 Mb

Genome size ~100Mb

6 chromosomes in 8250 pieces

Sanger(cloning bias)

Page 4: Next Generation Sequencing in Virus and Parasite Research

Closing the

Genome

Next-generation sequencing

Fingerprint maps

Curating the Data

DATABASEMapping 5’ and 3’UTRs

Functional annotation

Re-assemble genome Re-annotate

Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data

(Hybrid Sanger-GSFLX assembly) (Confirm UTRs by GSFLX)

Page 5: Next Generation Sequencing in Virus and Parasite Research

Mix of random reads and paired readsAvg read length: ~220bp

~100 Mb

GS-FLX Sequencing of WormgDNA and cDNA

5 runs= 5X coverage of the genome

5’UTR 3’UTR SL gDNA

Paired-Ends and WGS UTRs

Whole Plate 4-well gasket

Page 6: Next Generation Sequencing in Virus and Parasite Research

Mapping of paired and non-paired reads onto genomic assembly

SEQUENCE ASSEMBLYhits100%

||

80%Paired-ends

No apparent Bias

20Mb of Brugia reads = ~0.25X coverage

Page 7: Next Generation Sequencing in Virus and Parasite Research

Sequencing UTRs of B. malayi

mRNA

PAAAA

CIPTAPRNA ligase

AAAA

RT-PCR

RNA oligoMmeI site

NlaIII

SAGE Tag

Unique sequence

Concatenated SAGE Tags

AAAA

DITAGS

(variable length)

Page 8: Next Generation Sequencing in Virus and Parasite Research

Sequencing Results

One sequence run

~50Mb of data in ~400,000 reads

5’UTR 3’UTR SL

Page 9: Next Generation Sequencing in Virus and Parasite Research

Data processingRaw Data

RemoveLinker, Small tags(<10),

Identical, Junk

Blast against

Genome EST Exon CDS

Unmatched tags

Blast against

Small contigs

Mitochondrion Bacterial singletons

Page 10: Next Generation Sequencing in Virus and Parasite Research

EST

3’-tag

SL-tag

5’-tag

40S ribosomal protein S18

Mapping of Tags

Page 11: Next Generation Sequencing in Virus and Parasite Research

Intra-Host Diversity of Influenza A Virus

Antigenic variants Drug resistant and Sensitive variants

Page 12: Next Generation Sequencing in Virus and Parasite Research

HA1 HA2566aa1,757nt

Amplicons:

Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin

450bp

Page 13: Next Generation Sequencing in Virus and Parasite Research

Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain

E D A B D B D D E C

Page 14: Next Generation Sequencing in Virus and Parasite Research

Patterns:Non-Synonymous mutations are predominantly

in epitope regions(13/19 sites)

BBAAAAD#reads23

1221

12212

Page 15: Next Generation Sequencing in Virus and Parasite Research

4137

421

1717811114111

35

Identifying rare variants:Drug resistance mutation

Resistant H1N11/437=0.2%

agt (S) aat (N)

N31S

#reads

Matrix segment in H1N1 isolate

Page 16: Next Generation Sequencing in Virus and Parasite Research

SNP Analyses: Probability that Polymorphism is Real

Base# A C G N T GAP SNP probability

pbShort(polybayes)- Marth Lab, Boston College

Page 17: Next Generation Sequencing in Virus and Parasite Research

Error Correction(homopolymer tracks)

Page 18: Next Generation Sequencing in Virus and Parasite Research

Signal Processing: Length Distribution adjusting the stringency of quality filters

Changes length distributionReads slightly shorter BUT Average quality is higher

Default

Higher stringency

Read length

75,000 – avg ln 20070,000 – avg ln 195

Page 19: Next Generation Sequencing in Virus and Parasite Research

Signal Processing: Quality Distribution

Reduce the # of basesBUTIncrease the proportion ofbases of HIGH QUALITY

Default

Higher stringency

Quality Score

15 Million bp14 Million bp

Page 20: Next Generation Sequencing in Virus and Parasite Research

Whole Virus Genome Sequencing

Limitation of read length BUT:

- Isolate single genome (limited dilution, other?)- Random prime or specific primers with barcodes- use barcode to amplify- Multiplex: 20 barcodes, 16-well gasket = 320 samples

Page 21: Next Generation Sequencing in Virus and Parasite Research

Virus Genomic Library Construction- Discovery -

RNA

RT

PCR

cDNA or

ssDNA

Klenow Exo-DNA polymerase

dsDNA

Select 500 bp amplicons for emulsion PCR and

pyrosequencing

NNNN

NNNN

NNNNNNNN

NNNNNNNNNNNN

NNNNNNNN

NNNNNNNN

1a Reversetranscription

1b DNAextension fromrandom primers

2Amplification

from tags

3Size selection& Sequencing

Page 22: Next Generation Sequencing in Virus and Parasite Research

Multiplexing by Barcoding

Pools

Page 23: Next Generation Sequencing in Virus and Parasite Research

Barcodes mapped onto readsNUCMER

MySQL db

BLASTNBLASTX

Post-Processing Pipeline

Reads clusteredand reduced to a unique set

Page 24: Next Generation Sequencing in Virus and Parasite Research

26,750 contigs BLASTN 56% match human DNA12, 889 contigs BLASTX 120 match viruses

Page 25: Next Generation Sequencing in Virus and Parasite Research

Periodontal Disease Caries

VIR

AL

VIR

AL

VIR

AL

VIR

AL

BA

CT

ER

IAL

BA

CT

ER

IAL

BA

CT

ER

IAL

BA

CT

ER

IAL

Pool 1

Family FamilyFamilyFamily

BU128

WV409

BK026

BR095

HIGH LOW HIGH LOW

TagA

TagB

TagC

TagD

5 2 3 76 84

BU128

WV409

BK026

BR095

WV001

WV213

BK044

BU130

WV001

WV213

BK044

BU130

BR009

WV597

WV631

BU133

BR009

WV597

WV631

BU133

BR023

WV041

BU137

WV628

BR023

WV041

BU137

WV628

Oral Microbiome Project

Page 26: Next Generation Sequencing in Virus and Parasite Research

Bacterial Diversity Heat Maps:

Sequencing of 16S rRNA variable

region

Sequencing of PCR Amplicons 250bp in size

Page 27: Next Generation Sequencing in Virus and Parasite Research

AcknowledgmentsAcknowledgments

School of Dental School of Dental MedicineMedicineMary Marazita

Ghedin LabGhedin LabSchool of MedicineSchool of MedicineJay DePasseAdam FitchXu Zhang

Graduate School of Graduate School of Public healthPublic healthRobert FerrellMike Barmaba

Funding:Funding:

NIDCR/NIHNIDCR/NIH

CTSICTSI

JDRFJDRF

Burroughs-Burroughs-Wellcome FundWellcome Fund

GPCLGPCLDebby Hollingshead Paul WoodJanette Lamb