Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Bioinforma)csSeries:DesigningAnNGSStudyForMyBiological
Ques)on
YussanneMaGenomeSciencesCentre
GenomeSciencesCentre
Sequencing5IlluminaHiseqX4IlluminaHiSeq25002NextSeq5003IlluminaMiseq1Life3730xl1500librariespermonth>80Tbasespermonth
Compute2secureddatacentresComputeclusters:800nodes,24,000
hyper-threadedcores48-384GBRAMpernodeHighmemory(1.5TBRAM)computers>11Petabyteson-linediskstorage
Engagedinover50ongoingprojectsandcollaboraYonsfromexperimentaldesign
todatainterpretaYon
Whyexperimentaldesignisimportant
Analysis and interpretation of sequencing data is completely dependent on everything upstream
Sample quality, sequencing and type of sequencing matter
The area being sampled matters
Bioinformatic corrections can be made but it’s always best to plan ahead
Outline
GenomesequencingTranscriptomesequencingIntegraYveapproachesOthertechnologies
Outline
GenomesequencingGenotypingarraysExomeandcustomcaptureAmpliconWholegenomesequencingPopulaYonsizeandcontrolsFactorsaffecYngqualityofvariantcalls
Genomesequencingoverview
SNP array
Custom Capture
Exome Capture
WGS
Genome Intron Exon
There are many ways to subsample the genome The cost trade-off is between area covered and depth Sometimes the genome can be overkill
Genotypingarrays
SamplingofthegenomeatlocaYonsofknownsinglenucleoYdepolymorphismsusingintensityprobes
Usedfor:Studyingcommonvariantsinlargenumberofcasesandcontrols
LimitaYons:Cannotbeusedforcallingofrareornovelvariants,orstructuralvariants.ResoluYonsforcopynumbervariaYonarelow
Exampleproject:Genome-wideassociaYonstudytolookforinheritedcancersuscepYbilityloci
Intron Exon
Exomeandcustomcapture
Probesareusedtocaptureallexons,oraspecificsetofgenomicregions
Usedfor:Studyingonlycodingchanges,orathosefoundinthepre-definedareaofinterest
LimitaYons:Cannotcallvariantsoutsideofcapturearea.Copynumberandstructuralvariantsaredifficulttocall
Exampleproject:Discoveryofrecurrentlymutatedgenesinlargecohort,clinicalpanel
Intron Exon
AmpliconandSangersequencing
Primersaredesignedthatspanagenomicevent,orsequenceacrossanevent.
Usedfor:Determiningthepresenceorabsenceofspecificevents(SNVs,indels,SVs)
LimitaYons:Cannotbeusedfordiscovery,needexactbreakpointsinmostcases
Exampleproject:OrthogonalverificaYonofputaYveeventdiscoveredbyWGStobenchmarktools,determiningpresenceofmetastaYcfusioneventinprimarysample.
Intron Exon
X
Wholegenomesequencing
Usedfor:FullcharacterizaYonofgenomeincludingNovelgenesandeventsSNVsandindelsnotseeninthepopulaYonPrivateeventsinrecurrentgenesorpathways
ComplexeventsCopynumberStructuralvariants
GenomiclandscapeofapopulaYonMutaYonsignatures
LimitaYons:SamplesizeanddepthduetocostBestusedforstudieswithnoaprioriknowledgeof
populaYonsamples,indepthstudyofsinglepaYenttumour
Genomesequencing:summary
Technology SNVs CNVs SVs Muta)onalburden
Muta)onallandscape
WGS +++ +++ +++ +++ +++
Exome +++(codingonly)
+(coding)
+(coding)
++
-
Customcapture +++(ontarget)
+(ontarget)
+(ontarget)
+
-
Genotypingarray Specificeventsonly + - - -
Amplicon Specificeventsonly - Specific
eventsonly - -
SangerSpecific
eventsonly
- Specificeventsonly - -
PopulaYonsizeandcontrols3milliongermlinevariants,10,000-100,000somaYcvariantsonaveragepersample
Raw variant calls
High quality variants
Not in controls
Functionally important
PopulaYonsizeandcontrols
GWAS:LargesamplesizeneededtoachievestaYsYcalsignificance,1:1casesandcontrols
Raredisease:Sequencingofparentsrevealspafernsofinheritance,sequencingofunaffectedrelaYveshelpstofilteroutpassengers
SomaYcvariants:MatchednormalisneededtofilteroutpassengermutaYons
Factorsaffec)ngqualityofvariantcalls:sequencingdepth
Why30Xgenome?Ruleofthumb:ittakes3-10highqualityreadstocallavariantNeedtoaccountforvariablecoverage,evennessofcoverage,tumour
content,ploidy
Typeofvariant Depthneeded
Germline,diploid 30X
Tumour>70%tumourcontent 30-40X
Tumour40-70% 40-60X
Lowtumourcontent,subclonalevents >100X
Factorsaffec)ngqualityofvariantcalls:sequencingdepth
FFPEvsFreshfrozenAllofourprotocols(WGS,RNA,miRNA)canberunonFFPE
samples,buttheymayresultinslightlyloweryieldanddiversity,andahigherfalseposiYverateforSNVandSVdetecYon,aswellasnoisierCNVcalls
Fresh FFPE
Factorsaffec)ngqualityofvariantcalls:sampletype
Outline
TranscriptomesequencingRNAsequencingmiRNAsequencingBatcheffects
RNAsequencing
RibosomaldepleYonvs.polyAselecYon
no ribosomal RNA captured non-polyadenylated transcripts are captured lower minimum input requirement higher intergenic and intronic content
higher ribosomal RNA content only polyadenylated transcripts are captured higher minimum input requirement lower intergenic and intronic content
RNAsequencing
Usedfor:Gene,exonandisoform-levelquanYficaYonQuanYfyingexpressionofgenomicevents(SNVs,SVs)DetecYngnoveltranscriptsDetecYngRNAeditsDifferenYalexpressionbetweengroups(condiYon/Yssue/
tumourtype)toidenYfyexpressionmarkersCorrelaYonandclusteringofsamplesbygeneexpressionto
idenYfysubgroups
RNAsequencingDifferenYalexpressionbetweentumourgroupsResultsaremoredifficulttointerpretwithlowsamplesize
RNAsequencingHierarchicalclusteringofmedulloblastomasamplesbygeneexpressionSamplesclusterbysubgroupWithinasubgroupsamplesclusterbypaYenttypeMostinformaYvewithlargesamplesizeanddetailedcovariateseg.clinicaldata
Sequencingdepth
Gene diversity for UHR control at different levels of downsampling Diversity does not tend to saturate
SequencingdepthNumberofreadsperlibrary 200M 120M 60M 40M
Numberofgenesat1X 23,000 20,000 18,000 15,000
Numberofgenesat10X 14,000 12,000 10,000 5,000
ExpressionquanYficaYon ++ ++ ++ ++
DifferenYalexpression ++ ++ ++ ++
KnowntranscriptquanYficaYon ++ ++ ++ ++
DetecYonofstructuralvariantswithgenepartnersorbreakpointsspecified
++ ++ ++ +
DetecYonofSNVsandsmallindelswithknowncoordinates
++ ++ ++ ++
DenovoSNVcalling ++ ++ + -
Denovostructuralvariantcalling ++ + Alignmentbasedonly
-
miRNAsequencingUsedfor:QuanYficaYonofmiRNAexpressionDifferenYalexpressionandexpressionclusteringCorrelaYonwithgeneexpressiontoidenYfytargets
Using miRNA to determine a signature for prognosis miRNA clustering is found to be more sensitive to subgroups Search space is smaller and cost of sequencing is lower May be easier to translate into clinical test
Sequencingdepth
miRNA diversity vs number of reads aligned to miRNA Two failed samples on far left Saturation between 2 and 4 million reads
Batcheffects
Samples cluster by protocol, so clustering is difficult to do across multiple protocols Sample sets sequenced using different protocols are best used as validation, or for meta analysis Batch effect correction is the most effective with technical replicates
Outline
IntegraYveapproachesGenomeandtranscriptomesequencingClonalevoluYonexperimentIntegraYveanalysistostudy‘darkmafer’incancer
IntegraYveapproaches:genomeandtranscriptomeRNAseq provides orthogonal validation of genomic events Combined approach improves specificity, and can identify/confirm alternative splicing and elucidate the effects of genomic events on transcription
Alternative splicing at M1 is identified in the structural variant analysis of RNA and DNA, and gene expression data confirms exon 9 skipping event
IntegraYveapproaches:ClonalevoluYon
Tumour
Mousemodel
Mousemodel
Mousemodel
WGS to identify SNVs common to tumour and xenografts
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Mousemodel
Custom panel designed from WGS calls and literature
Clonal evolution over multiple timepoints and treatment events
Timepoint 1 Timepoint 2 Timepoint 3
Rx1 Rx2
IntegraYveapproaches:‘darkmafer’
Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma, The Cancer Genome Atlas Research Network, N Engl J Med 2016; 374:135-145, January 14, 2016
miRNA, RNA and WGS and methylation sequencing identify multiple mechanisms in which CDKN2A function is disrupted in papillary renal-cell carcinoma
Outline
OthertechnologiesMicrobialanalysisdenovogenomeassemblySingle-cellsequencingEpigenomicsImmunogenomics
Microbialanalysis
16Ssequencing:IdenYficaYonandquanYficaYonofknownbacterialspecies.Usefulforsurveyoflargenumberofsamples
Shortreadsequencing(shallow):RapidclassificaYonofknownmicrobialspeciesinmetagenomicsamples
Wholegenomeandtranscriptomesequencing:MicrobialexpressionandgenomeintegraYonintumoursamples
Denovogenomeassembly
Shortreadsequencingat~30XissufficientfordenovoassemblyusingABySStoproduceconYgsConYgscanbe:AlignedtoexisYngreferencestoidenYfyvariantsinnewstrainsAnnotatedtoidenYfyputaYvegenes
ExtensiontoafulldraoreferencewillrequireaddiYonalsequencingtobuildscaffolds:
Mate-pairs: large insert, long reads to extend assembly 10X Chromium: Phased genomes, localized assemblies
Oxford nanopore: high throughput long reads Pacbio: consensus long read with lower error rate
Single-cellsequencing
WGS and RNA sequencing from individual cells allows for single cell resolution of copy number and expression
Cell populations treated under different conditions can be examined separately
Epigenomics
Post-transcripYonalmodificaYoncannotbedetectedthroughgenomeandtranscritomesequencing
EffortssuchastheInternaYonalHumanEpigenomeConsorYumhaveprovidedcomprehensivedatasetsforcomparisonandinterpretaYonofepigenomicdata
ChIPseqandbisulphitesequencing(array,wholegenomeorcapture)areusedtostudyhistonemodificaYonandDNAmethylaYon
Examplesofanalysis:IdenYfygenesandpathwaysthatareepigeneYcallymodified,correlatedChIPdatawithexpressionandmutaYonaldata,clustersamplesbyDNAmethylaYonprofile
ImmunogenomicsTCR/BCRsequencingHLAtypingAnalysisfromWGSandWTSsequencing:TandBcellrepertoireHLAtypingCelltypeabundanceNeoanYgenpredicYon
HowmuchdiskdoIneed
Data* Typicalfilesize
30Xgenome 50G
Full-depthtranscriptome 15G
miRNA 500M
1laneHiseq2500 50G
1laneHiseqX 65G
Variantfiles 10-100M
*human data, bam/fastq.gz/raw assembly data are similar in size
QuesYons
Aboutthistalk:[email protected]:[email protected]
AcknowledgementsImagesandresultsRichardCorbefReanneBowlbyDeniseBrooksEmiliaLimCaralynReisleEricChuahSteveBilobramWestgridJanaMakar
GSCMarcoMarra,DirectorStevenJones,DirectorGSCGroupleadersintheroomAndyMungall,BiospecimenandLibraryCoresRichardMoore,SequencingCoreRobinCoope,EngineeringDianeMiller,ProjectsMyawesomebioinforma)csteamRichard,Nina,Eric,Kane,Gina,Irene,Dorothy,Correy,Dean,Athena,Dan,Young,Brenna,Laura,Tina,Karen,Marc,Tori,Yisu,Patrick,Mya,Darryl,Nav,James,Brandon,Karen,Morgan,An,Diana,Johnson,Denise,Shirin,Marcus,Amir,Cara,DusYn,Caleb,Daniel,Simon,Steve,Reanne,Wei,Sara,StuartLab,Systems,Qualitysystems,Projects,Adminandallotherteams