39
Bioinforma)cs Series: Designing An NGS Study For My Biological Ques)on Yussanne Ma Genome Sciences Centre

Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Bioinforma)csSeries:DesigningAnNGSStudyForMyBiological

Ques)on

YussanneMaGenomeSciencesCentre

Page 2: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

GenomeSciencesCentre

Sequencing5IlluminaHiseqX4IlluminaHiSeq25002NextSeq5003IlluminaMiseq1Life3730xl1500librariespermonth>80Tbasespermonth

Compute2secureddatacentresComputeclusters:800nodes,24,000

hyper-threadedcores48-384GBRAMpernodeHighmemory(1.5TBRAM)computers>11Petabyteson-linediskstorage

Engagedinover50ongoingprojectsandcollaboraYonsfromexperimentaldesign

todatainterpretaYon

Page 3: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Whyexperimentaldesignisimportant

Analysis and interpretation of sequencing data is completely dependent on everything upstream

Sample quality, sequencing and type of sequencing matter

The area being sampled matters

Bioinformatic corrections can be made but it’s always best to plan ahead

Page 4: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Outline

GenomesequencingTranscriptomesequencingIntegraYveapproachesOthertechnologies

Page 5: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Outline

GenomesequencingGenotypingarraysExomeandcustomcaptureAmpliconWholegenomesequencingPopulaYonsizeandcontrolsFactorsaffecYngqualityofvariantcalls

Page 6: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Genomesequencingoverview

SNP array

Custom Capture

Exome Capture

WGS

Genome Intron Exon

There are many ways to subsample the genome The cost trade-off is between area covered and depth Sometimes the genome can be overkill

Page 7: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Genotypingarrays

SamplingofthegenomeatlocaYonsofknownsinglenucleoYdepolymorphismsusingintensityprobes

Usedfor:Studyingcommonvariantsinlargenumberofcasesandcontrols

LimitaYons:Cannotbeusedforcallingofrareornovelvariants,orstructuralvariants.ResoluYonsforcopynumbervariaYonarelow

Exampleproject:Genome-wideassociaYonstudytolookforinheritedcancersuscepYbilityloci

Intron Exon

Page 8: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Exomeandcustomcapture

Probesareusedtocaptureallexons,oraspecificsetofgenomicregions

Usedfor:Studyingonlycodingchanges,orathosefoundinthepre-definedareaofinterest

LimitaYons:Cannotcallvariantsoutsideofcapturearea.Copynumberandstructuralvariantsaredifficulttocall

Exampleproject:Discoveryofrecurrentlymutatedgenesinlargecohort,clinicalpanel

Intron Exon

Page 9: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

AmpliconandSangersequencing

Primersaredesignedthatspanagenomicevent,orsequenceacrossanevent.

Usedfor:Determiningthepresenceorabsenceofspecificevents(SNVs,indels,SVs)

LimitaYons:Cannotbeusedfordiscovery,needexactbreakpointsinmostcases

Exampleproject:OrthogonalverificaYonofputaYveeventdiscoveredbyWGStobenchmarktools,determiningpresenceofmetastaYcfusioneventinprimarysample.

Intron Exon

X

Page 10: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Wholegenomesequencing

Usedfor:FullcharacterizaYonofgenomeincludingNovelgenesandeventsSNVsandindelsnotseeninthepopulaYonPrivateeventsinrecurrentgenesorpathways

ComplexeventsCopynumberStructuralvariants

GenomiclandscapeofapopulaYonMutaYonsignatures

LimitaYons:SamplesizeanddepthduetocostBestusedforstudieswithnoaprioriknowledgeof

populaYonsamples,indepthstudyofsinglepaYenttumour

Page 11: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Genomesequencing:summary

Technology SNVs CNVs SVs Muta)onalburden

Muta)onallandscape

WGS +++ +++ +++ +++ +++

Exome +++(codingonly)

+(coding)

+(coding)

++

-

Customcapture +++(ontarget)

+(ontarget)

+(ontarget)

+

-

Genotypingarray Specificeventsonly + - - -

Amplicon Specificeventsonly - Specific

eventsonly - -

SangerSpecific

eventsonly

- Specificeventsonly - -

Page 12: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

PopulaYonsizeandcontrols3milliongermlinevariants,10,000-100,000somaYcvariantsonaveragepersample

Raw variant calls

High quality variants

Not in controls

Functionally important

Page 13: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

PopulaYonsizeandcontrols

GWAS:LargesamplesizeneededtoachievestaYsYcalsignificance,1:1casesandcontrols

Raredisease:Sequencingofparentsrevealspafernsofinheritance,sequencingofunaffectedrelaYveshelpstofilteroutpassengers

SomaYcvariants:MatchednormalisneededtofilteroutpassengermutaYons

Page 14: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Factorsaffec)ngqualityofvariantcalls:sequencingdepth

Why30Xgenome?Ruleofthumb:ittakes3-10highqualityreadstocallavariantNeedtoaccountforvariablecoverage,evennessofcoverage,tumour

content,ploidy

Page 15: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Typeofvariant Depthneeded

Germline,diploid 30X

Tumour>70%tumourcontent 30-40X

Tumour40-70% 40-60X

Lowtumourcontent,subclonalevents >100X

Factorsaffec)ngqualityofvariantcalls:sequencingdepth

Page 16: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

FFPEvsFreshfrozenAllofourprotocols(WGS,RNA,miRNA)canberunonFFPE

samples,buttheymayresultinslightlyloweryieldanddiversity,andahigherfalseposiYverateforSNVandSVdetecYon,aswellasnoisierCNVcalls

Fresh FFPE

Factorsaffec)ngqualityofvariantcalls:sampletype

Page 17: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Outline

TranscriptomesequencingRNAsequencingmiRNAsequencingBatcheffects

Page 18: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

RNAsequencing

RibosomaldepleYonvs.polyAselecYon

no ribosomal RNA captured non-polyadenylated transcripts are captured lower minimum input requirement higher intergenic and intronic content

higher ribosomal RNA content only polyadenylated transcripts are captured higher minimum input requirement lower intergenic and intronic content

Page 19: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

RNAsequencing

Usedfor:Gene,exonandisoform-levelquanYficaYonQuanYfyingexpressionofgenomicevents(SNVs,SVs)DetecYngnoveltranscriptsDetecYngRNAeditsDifferenYalexpressionbetweengroups(condiYon/Yssue/

tumourtype)toidenYfyexpressionmarkersCorrelaYonandclusteringofsamplesbygeneexpressionto

idenYfysubgroups

Page 20: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

RNAsequencingDifferenYalexpressionbetweentumourgroupsResultsaremoredifficulttointerpretwithlowsamplesize

Page 21: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

RNAsequencingHierarchicalclusteringofmedulloblastomasamplesbygeneexpressionSamplesclusterbysubgroupWithinasubgroupsamplesclusterbypaYenttypeMostinformaYvewithlargesamplesizeanddetailedcovariateseg.clinicaldata

Page 22: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Sequencingdepth

Gene diversity for UHR control at different levels of downsampling Diversity does not tend to saturate

Page 23: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

SequencingdepthNumberofreadsperlibrary 200M 120M 60M 40M

Numberofgenesat1X 23,000 20,000 18,000 15,000

Numberofgenesat10X 14,000 12,000 10,000 5,000

ExpressionquanYficaYon ++ ++ ++ ++

DifferenYalexpression ++ ++ ++ ++

KnowntranscriptquanYficaYon ++ ++ ++ ++

DetecYonofstructuralvariantswithgenepartnersorbreakpointsspecified

++ ++ ++ +

DetecYonofSNVsandsmallindelswithknowncoordinates

++ ++ ++ ++

DenovoSNVcalling ++ ++ + -

Denovostructuralvariantcalling ++ + Alignmentbasedonly

-

Page 24: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

miRNAsequencingUsedfor:QuanYficaYonofmiRNAexpressionDifferenYalexpressionandexpressionclusteringCorrelaYonwithgeneexpressiontoidenYfytargets

Using miRNA to determine a signature for prognosis miRNA clustering is found to be more sensitive to subgroups Search space is smaller and cost of sequencing is lower May be easier to translate into clinical test

Page 25: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Sequencingdepth

miRNA diversity vs number of reads aligned to miRNA Two failed samples on far left Saturation between 2 and 4 million reads

Page 26: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Batcheffects

Samples cluster by protocol, so clustering is difficult to do across multiple protocols Sample sets sequenced using different protocols are best used as validation, or for meta analysis Batch effect correction is the most effective with technical replicates

Page 27: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Outline

IntegraYveapproachesGenomeandtranscriptomesequencingClonalevoluYonexperimentIntegraYveanalysistostudy‘darkmafer’incancer

Page 28: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

IntegraYveapproaches:genomeandtranscriptomeRNAseq provides orthogonal validation of genomic events Combined approach improves specificity, and can identify/confirm alternative splicing and elucidate the effects of genomic events on transcription

Alternative splicing at M1 is identified in the structural variant analysis of RNA and DNA, and gene expression data confirms exon 9 skipping event

Page 29: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

IntegraYveapproaches:ClonalevoluYon

Tumour

Mousemodel

Mousemodel

Mousemodel

WGS to identify SNVs common to tumour and xenografts

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Mousemodel

Custom panel designed from WGS calls and literature

Clonal evolution over multiple timepoints and treatment events

Timepoint 1 Timepoint 2 Timepoint 3

Rx1 Rx2

Page 30: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

IntegraYveapproaches:‘darkmafer’

Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N Engl J Med 2016; 374:135-145, January 14, 2016

miRNA, RNA and WGS and methylation sequencing identify multiple mechanisms in which CDKN2A function is disrupted in papillary renal-cell carcinoma

Page 31: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Outline

OthertechnologiesMicrobialanalysisdenovogenomeassemblySingle-cellsequencingEpigenomicsImmunogenomics

Page 32: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Microbialanalysis

16Ssequencing:IdenYficaYonandquanYficaYonofknownbacterialspecies.Usefulforsurveyoflargenumberofsamples

Shortreadsequencing(shallow):RapidclassificaYonofknownmicrobialspeciesinmetagenomicsamples

Wholegenomeandtranscriptomesequencing:MicrobialexpressionandgenomeintegraYonintumoursamples

Page 33: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Denovogenomeassembly

Shortreadsequencingat~30XissufficientfordenovoassemblyusingABySStoproduceconYgsConYgscanbe:AlignedtoexisYngreferencestoidenYfyvariantsinnewstrainsAnnotatedtoidenYfyputaYvegenes

ExtensiontoafulldraoreferencewillrequireaddiYonalsequencingtobuildscaffolds:

Mate-pairs: large insert, long reads to extend assembly 10X Chromium: Phased genomes, localized assemblies

Oxford nanopore: high throughput long reads Pacbio: consensus long read with lower error rate

Page 34: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Single-cellsequencing

WGS and RNA sequencing from individual cells allows for single cell resolution of copy number and expression

Cell populations treated under different conditions can be examined separately

Page 35: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

Epigenomics

Post-transcripYonalmodificaYoncannotbedetectedthroughgenomeandtranscritomesequencing

EffortssuchastheInternaYonalHumanEpigenomeConsorYumhaveprovidedcomprehensivedatasetsforcomparisonandinterpretaYonofepigenomicdata

ChIPseqandbisulphitesequencing(array,wholegenomeorcapture)areusedtostudyhistonemodificaYonandDNAmethylaYon

Examplesofanalysis:IdenYfygenesandpathwaysthatareepigeneYcallymodified,correlatedChIPdatawithexpressionandmutaYonaldata,clustersamplesbyDNAmethylaYonprofile

Page 36: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

ImmunogenomicsTCR/BCRsequencingHLAtypingAnalysisfromWGSandWTSsequencing:TandBcellrepertoireHLAtypingCelltypeabundanceNeoanYgenpredicYon

Page 37: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

HowmuchdiskdoIneed

Data* Typicalfilesize

30Xgenome 50G

Full-depthtranscriptome 15G

miRNA 500M

1laneHiseq2500 50G

1laneHiseqX 65G

Variantfiles 10-100M

*human data, bam/fastq.gz/raw assembly data are similar in size

Page 38: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

QuesYons

Aboutthistalk:[email protected]:[email protected]

Page 39: Bioinforma)cs Series: Designing An NGS Study For My ... · Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N

AcknowledgementsImagesandresultsRichardCorbefReanneBowlbyDeniseBrooksEmiliaLimCaralynReisleEricChuahSteveBilobramWestgridJanaMakar

GSCMarcoMarra,DirectorStevenJones,DirectorGSCGroupleadersintheroomAndyMungall,BiospecimenandLibraryCoresRichardMoore,SequencingCoreRobinCoope,EngineeringDianeMiller,ProjectsMyawesomebioinforma)csteamRichard,Nina,Eric,Kane,Gina,Irene,Dorothy,Correy,Dean,Athena,Dan,Young,Brenna,Laura,Tina,Karen,Marc,Tori,Yisu,Patrick,Mya,Darryl,Nav,James,Brandon,Karen,Morgan,An,Diana,Johnson,Denise,Shirin,Marcus,Amir,Cara,DusYn,Caleb,Daniel,Simon,Steve,Reanne,Wei,Sara,StuartLab,Systems,Qualitysystems,Projects,Adminandallotherteams