35
1 Genetic Mapping by Bulk Segregant Analysis in Drosophila: Experimental Design and 1 Simulation-Based Inference 2 3 John E. Pool 4 5 Laboratory of Genetics, University of Wisconsin–Madison, Madison, WI, 53706 6 7 8 9 Running Title: Bulk Segregant Mapping in Drosophila 10 11 Keywords: bulk segregant analysis, quantitative trait locus mapping, simulation, 12 experimental design, Drosophila 13 14 Contact: 15 425-G Henry Mall 16 Madison, WI 53706 17 +1-608-265-1036 18 [email protected] 19 20 . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint . CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted June 9, 2016. ; https://doi.org/10.1101/057984 doi: bioRxiv preprint

Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

1

GeneticMappingbyBulkSegregantAnalysisinDrosophila:ExperimentalDesignand1

Simulation-BasedInference2

3

JohnE.Pool4

5

LaboratoryofGenetics,UniversityofWisconsin–Madison,Madison,WI,537066

7

8

9

RunningTitle:BulkSegregantMappinginDrosophila10

11

Keywords:bulksegregantanalysis,quantitativetraitlocusmapping,simulation,12

experimentaldesign,Drosophila13

14

Contact:15

425-GHenryMall16

Madison,WI5370617

+1-608-265-103618

[email protected]

20

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 2: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

2

ABSTRACT:21

22

Identifyingthegenomicregionsthatunderliecomplexphenotypicvariationisakey23

challengeinmodernbiology.Manyapproachestoquantitativetraitlocusmappingin24

animalandplantspeciessufferfromlimitedpowerandgenomicresolution.Here,I25

investigatewhetherbulksegregantanalysis(BSA),whichhasbeensuccessfullyappliedfor26

yeast,mayhaveutilityinthegenomicerafortraitmappinginDrosophila(andother27

organismsthatcanbeexperimentallybredinsimilarnumbers).Iperformsimulationsto28

investigatethestatisticalsignalofaquantitativetraitlocus(QTL)inawiderangeofBSA29

andintrogressionmapping(IM)experiments.BSAconsistentlyprovidesmoreaccurate30

mappingsignalsthanIM(inadditiontoallowingthemappingofmultipletraitsfromthe31

sameexperimentalpopulation).TheperformanceofBSAandIMismaximizedbyhaving32

multipleindependentcrosses,moregenerationsofinterbreeding,largernumbersof33

breedingindividuals,andgreatergenotypingeffort,butislessaffectedbytheproportionof34

individualsselectedforphenotypicextremepools.Ialsointroduceaprototypeanalysis35

methodforSimulation-basedInferenceforBSAMapping(SIBSAM).Thismethodidentifies36

significantQTLsandestimatestheirgenomicconfidenceintervalsandrelativeeffectsizes.37

Importantly,italsotestswhetheroverlappingpeaksshouldbeconsideredastwodistinct38

QTLs.ThisapproachwillfacilitateimprovedtraitmappinginDrosophilaandotherspecies39

forwhichhundredsorthousandsofoffspring(butnotmillions)canbestudied.40

41

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 3: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

3

INTRODUCTION:42

43

Connectingphenotypicdiversitytothegeneticvariantsthatencodeitisafundamental44

challengeformodernbiology.Inevolutionaryresearch,thereisstronginterestin45

revealingthegeneticarchitectureofadaptivephenotypicchange,includingthenumberof46

causativegenesandmutations,andtheirfunctionalandpopulationgeneticproperties.In47

moleculargenetics,themappingofphenotypicdifferencesfromnaturalorinduced48

mutationshasgreatutilityforelucidatinggeneticpathwaysthatunderliespecificbiological49

processes.Inanimalandplantbreeding,localizingthegenesunderlyingagronomically50

importanttraitvariationcanbeakeysteptowardgeneticimprovement.51

Especiallyinspeciesthatcanbeexperimentallycrossed,quantitativetraitlocus(QTL)52

mappingprovidesanimportanttoolforidentifyinggenomicregionsthatcontaincausative53

geneticvariantsunderlyingatraitdifference.Often,theF2orlateroffspringofacross54

betweenphenotypicallycontrastingparentalstrainsaregenotyped,individuallyorin55

groups,toidentifysectionsofthegenomethatwereinheritednon-randomlywithrespect56

tothephenotype(oftenonthemegabasescale).ThesimplestexampleofQTLanalysisis57

F2mapping,inwhichindividualsecondgenerationoffspringarephenotypedand58

genotyped.Toachievemuchgenomicprecision,however,thismethodrequiresthe59

individualgenotypingofalargenumberofF2offspring.PreparingmanygenomicDNA60

librariesfornextgenerationsequencingisoftenatime-andresource-intensive61

proposition,althoughprogresshasbeenmadeinthisregard(Andolfattoetal.2011).62

Introgressionmapping(IM)providesanotheralternativeforQTLanalysis.Here,63

followinganinitialcrossbetweenparentalstrainsAandB,offspringofsubsequent64

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 4: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

4

generationsarerepeatedlyselectedforstrainA’sphenotype,butareback-crossedtostrain65

B(Figure1).Toallowrecessivevariantstobeselected,thisselectionandintrogressioncan66

beperformedineverysecondgeneration.Thedesiredresultisanintrogressionlinethatis67

largelysimilartostrainBacrossthegenome,butthatmatchesstrainAatlocithatwere68

selectedalongwiththephenotype.Anotablemodernexampleofthisapproachis69

describedbyEarlyandJones(2011),whointrogressedabehavioraldifferencefromD.70

simulansintoD.sechellia.Here,30F2femalesweretestedforsimulans-likebehavior,anda71

subsetwasthenback-crossedtoD.sechellia.Afterrepeatingthisprocessfor1572

generations,next-generationsequencingwasusedtoidentifygenomicregionsthat73

introgressedwiththetraitfromD.simulans.74

Inbulksegregantanalysis(BSA),largenumbersofprogeny(fromF2orlater75

generations)aresorted/selectedbyphenotype,thencontrastingphenotypicpoolsof76

individualsareeachgenotyped(Figure1)(Michelmoreetal.1991).ComparedtoIM,BSA77

mayallowforalargernumberofuniquerecombinationeventstobegeneratedand78

sampled,whichcouldyieldsharperQTLpeaks.LikeIM,BSAdoesnotrequirelarge79

numbersofoffspringtobeindividuallygenotyped–insteadeachphenotypicextremecan80

besequencedasasinglepool.BSAhasbeenappliedverysuccessfullyforselectabletraits81

inyeast(e.g.Ehrenreichetal.2010;Magweneetal.2011;Partsetal.2011),facilitatedbya82

smallgenomeandtheeaseofgeneratingmillionsofsegregants.BSAhasalsoseendiverse83

applicationstotraitmappinginmulticellularorganisms(e.g.Michelmoreetal.1991;Wicks84

etal.2001;Bairdetal.2008;VanLeeuwenetal.2012;Haaseetal.2015),including85

Drosophila(Laietal.2007).86

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 5: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

5

Here,IusesimulationstoexaminethemappingsignalsofBSAandIMunderawide87

rangeofexperimentalparametersforthemappingofmulti-genetraits.IfindthatBSA88

producesstrongerandbetter-localizedmappingsignalsforallstudiedexperimental89

designs.Thetradeoffsofeffortandperformanceindicatedbytheseresults,alongwiththe90

newsimulationprogramsthatproducedthem,willhelpresearchersdesignmoreeffective91

mappingexperiments.92

IalsousethisBSAsimulationapproachtodeviseanewQTLinferencemethod.Existing93

BSAanalysismethodseffectivelyidentifyQTLsfromyeastdata(e.g.Magweneetal.2011;94

Edwardsetal.2012).However,thesemethodsdonotallowthediscriminationoftwo95

nearbyQTLpeaksversusasinglepeakwithnoisy,raggedcontours–anissuethatmaybe96

moreproblematicfororganismsinwhichmanyfewersegregantscanbesurveyedrelative97

toyeast.ThesemethodsalsodonotestimatetherelativestrengthofeachQTL.TheBSA98

inferencemethodproposedhereusesamulti-stepsimulationprocessto(1)identify99

significantQTLsandtheirgenomicconfidenceintervals,(2)separatesinglefrommultiple100

linkedQTLs,and(3)providearoughestimateoftheeffectsizesoftheidentifiedQTLs.101

Thismethodisvalidatedusingsimulationsinthepresentstudy,andappliedtodatainan102

accompanyingarticle(Bastideetal.2016).103

104

MATERIALSANDMETHODS105

106

PreliminarysimulationsforBSAandIM107

SimulationprogramswerewrittentoassesstheQTLsignalsofBSAandIM(software108

relatedtothisarticleisavailableathttps://github.com/JohnEPool/SIBSAM1).BSA109

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 6: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

6

simulationanalysesfocusedonasummarystatistic,“ancestrydifference”(ad).Foragiven110

geneticmarkerlocusorgenomicwindowofsequence,adreferstothedifferencebetween111

thehighandlowphenotypicpoolsintheproportionofancestryfromtheparentalstrain112

withthehigherphenotypicvalue.Forexample,ifthehighphenotypicpoolisestimatedto113

have60%ofitsancestryfromthisparentalstrainataparticularlocus,andthelow114

phenotypicpool40%,thenad=0.6–0.4=0.2.ForIM,theproportionofancestryinthe115

mappingpopulationfromthenon-backcrossparentalstrain(ap)wasevaluated.This116

quantitymayapproachzerofornon-causativelociaftermanygenerationsofback-crossing117

totheotherparentalstrain.Foreachstatistic,IexaminedhowoftenthetallestlocalQTL118

peakwasobservedwithin0.5centiMorgans(cM)ofthetruesimulatedtargetlocus,andthe119

average(median)distancebetweentheQTLpeakandthetargetlocus.120

TheBSAandIMsimulatorsarelargelysimilar.Theseprogramstrackparentalstrain121

ancestryalongthechromosomesofeachindividualinthemappingpopulation,fromtheF1122

generationuntiltheendoftheexperiment.APoisson-distributednumberof123

recombinationeventshappeneachgeneration,withtheexpectednumberforeach124

chromosomebeingitslengthinMorgans(interferenceisnotmodeled).Tofocusonthe125

caseofDrosophila,chromosomesX,2,and3wereexplicitlysimulated,andno126

recombinationwasallowedinmales.Atotalof5,000markers/windowsweresimulated127

oneachchromosome.IntheBSAsimulation,aspecifiednumberofindividualsexistineach128

newgeneration,andeachonedrawsrandomparentsfromthepreviousgeneration,with129

nophenotypicselectionuntilthelastgeneration.IntheIMsimulations,individualswere130

subjecttophenotypicselectionineverysecondgeneration.131

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 7: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

7

Phenotypesforeachindividualweremodeledbasedongenotypesandrandomvariance132

(thelattermaystemfromenvironmentaleffects,measurementerror,orothercauses).For133

mostofthesepreliminarysimulations,thesamenumberofequal-effectlociweresimulated134

oneachchromosomearm(X,2L,2R,3L,3R).Randomvariancewasaddedbymodifying135

eachindividual’sphenotypicvaluebyanormally-distributedrandomeffectwithmeanzero136

andstandarddeviation(SD)equaltotheaveragetraitvalue.Forexample,ifeachofthe137

fivearmsholdsasingleQTLthatadds1toadiploidindividual’sphenotypicvalueforeach138

alleleinheritedfromthehighparentalstrain,therangeofgeneticcontributionscould139

rangefrom0to10,withameanof5,andtheSDforenvironmentalvariancewouldalsobe140

5.Phenotypicselectionwasthenbasedonchoosingadefinedquantile(q)ofindividuals141

fromthemappingpopulationwiththehighestandthelowestphenotypicvalues.142

ForBSA,phenotypicselectionhappensonlyattheendoftheexperiment,followedby143

sequencing/genotypingofbothhighandlowphenotypicpools.ForIM,thelastbatchof144

selectedindividualsissequencedandcomparedagainsttheparentalstrains.The145

simulationsmodel“depth”ofsequencingcoverage(orgenotypesampling),drawingan146

appropriatenumberofrandomancestry-informativereadsfromtheselectedpoolof147

individualsforeachwindow/marker.Theproportionofancestryfromeachparentalstrain148

isthencalculated,andthusdependsonboththesamplingofindividualsandthesampling149

ofsequencereads.150

Tofacilitateconsistentanalysis,QTLsinthesepreliminarysimulationswerespaced151

uniformlyandeachwasassignedaspecificanalysiszonealongthechromosome.For152

example,iftheXchromosomehadfiveQTLs,theywouldbeplacedatrelativepositions0.1,153

0.3,0.5,0.7,and0.9(representingthechromosomeasa0to1interval).Theirzonesof154

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 8: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

8

analysiswouldthenbe0to0.2,0.2to0.4,andsoon.TheassessmentofQTLsignal155

strengthandprecisionwasbasedonthelocationwithinitszoneofthehighestQTLpeak156

(i.e.themaximumadorap),relativetothetrueQTLposition.157

Mostsimulationanalysesassumedthateachmappingexperimentwouldbeanalyzed158

separately,However,Ialsoinvestigatedcaseswheremultipleindependentmapping159

populationswereconstructedfromparentalstrainssharingthesamecausativegenetic160

differences.Here,adorapforeachwindowwassummedacrossreplicatedmapping161

populations.162

Forawidevarietyofexperimentalparametercombinations,1,000independent163

replicatesweresimulatedandanalyzed,andstatisticalperformancewascompared164

betweenthesescenariostoaidintheoptimizationofexperimentaldesign.165

166

Simulation-basedinferenceofQTLfromBSA:Overview167

PreliminaryempiricalBSAdatafromthePoollaboratoryindicatedtheneedforaQTL168

inferencemethodcapableofdealingwithneighboringQTLsthathavewide,overlapping169

statisticalsignals.Suchscenariosaredifficulttoaccountforinmostanalysisapproaches,170

butthesimulationframeworkdescribedaboveoffersapotentiallyflexiblefoundationfor171

QTLinference.Ithereforedevelopedamethodof“Simulation-basedInferenceforBulk172

SegregantAnalysisMapping”(SIBSAM).SIBSAMusesBSAsimulationsanalogoustothose173

describedabove,inordertoidentifyandlocalizesignificantQTLs,estimatetheirstrength,174

anddistinguishindividualQTLamongclustersoflinkedcausativeloci.175

ThroughouttheSIBSAMpipeline,thedistinctionbetweenprimaryQTLpeaksand176

secondaryQTLpeaksisrelevant.AprimaryQTLpeakisdefinedbasedonthehighestvalue177

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 9: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

9

ofadacrossacontinuousintervalinwhichthisstatisticremainsabovezero(whichisthe178

nullvalueexpectedintheabsenceofcausativeloci).AsecondaryQTLpeakwithinthat179

sameintervalhasalowerheightthanitsassociatedprimarypeak.Animportantquantity180

inassessingthesignificanceofasecondarypeakisits“secondarydeviation”(v),definedas181

thedifferencebetweensecondarypeakheightandtheminimumadvaluebetweenthe182

primaryandsecondarypeaks(Figure2).Multiplesecondarypeaksmaybeassociatedwith183

thesameprimarypeak,impactingthecalculationofv,asdiscussedbelow.184

AschematicoftheSIBSAMpipelineisillustratedinFigure3.First,primaryand185

secondarypeaksofadareidentifiedfromtheempiricaldata.Todeterminewhichprimary186

peaksareunexpectedintheabsenceoftrueQTLs,nullsimulationsareconductedinwhich187

phenotypesaredeterminedbynon-geneticfactorsonly.Pvaluescanthenbeobtainedfor188

eachprimarypeak.Next,simulationswithasinglecausativeQTLareconducted.Basedon189

arejectionsamplingapproach,estimatesofthestrengthandgenomicconfidenceintervals190

ofeachsignificantprimarypeakareobtained,alongwithaPvalueforeachsecondarypeak.191

Lastly,simulationsinvolvingaclusteroflinkedQTLsareconducted,reflectingaprimary192

peakanditsassociatedsecondarypeak(s).Thisphaseallowsfortherefinementof193

strengthestimatesandgenomicconfidenceintervalsforeachpeakinthecluster.194

Alloftheabovesimulationsoperatewithuser-definedwindowsofvariablecMlength.195

ThesewindowscouldalsobeviewedasmarkersseparatedbyvariouscMdistances,but196

thisarticle’sterminologymainlyassumesthatQTLmappingdatacomesfromthefull197

resequencingofmappingpopulationgenomes.Intheexamplespresentedhere,the198

windowbpspanswerebasedonD.melanogasterpolymorphismdata(Lacketal.2015)and199

cMdistanceswerecalculatedfromempiricalrecombinationrateestimates(Comeronetal.200

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 10: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

10

2012).Windowsweredefinedtoeachcontain200non-singletonvariablesitesfromthe201

Zambia-Siavongapopulationsample.Theusercanalsodefinethe“informativedepth”for202

eachwindowineachphenotypicpool.Thisquantityreferstothenumberofsequence203

readsthatcontaininformationaboutparentalstrainancestry.Thesimulatorwilldrawa204

correspondingnumberofallelesatthiswindowforancestryproportioncalculations.205

206

SIBSAMidentificationofprimaryandsecondarypeaksfromempiricaldata207

Primaryandsecondarypeaksofadareidentifiedfromdatabasedonpreliminary208

thresholdsforprimarypeakheightandsecondarypeakdeviation(adtandvt,respectively),209

plusanoptionalsmoothingstep.Thetwothresholdsshouldrepresentvalueslowenough210

thatnoshorterpeakwouldbestatisticallysignificant(thedefaultvalueforbothis0.1).211

Thesmoothingenabledhereisasimpleweightedaverage.Oneachsideofthefocal212

window,mflankingwindowsareincluded(thedefaultusedhereism=4).Thefocal213

windowreceivesaweightofm+1,theadjacentwindowoneachsidereceivesaweightof214

m,thenextwindowsreceiveaweightofm–1,andsoonuntilthemthwindowtoeachside215

receivesaweightof1.Alternativesmoothingschemesarenotafocusofthisstudy;the216

optimalstrategyshoulddependonthedatabeinganalyzed.Empiricalandsimulatedad217

valuesmustbesmoothedusingthesameprocedure.218

Primarypeakidentificationisstraightforward:thehighestvalueofadinacontinuous219

blockofwindowswithad>0,conditionalonthepeakvalueofadexceedingadt.Toidentify220

secondarypeaks,localminimaandmaximaofadmovingawayfromtheprimarypeakare221

noted.Arecovery,beyondvt,fromthelowpointsincethelastpeaksignifiesanew222

secondarypeak.Whenaddropsmorethanvtbelowthissecondarypeak’smaximumvalue,223

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 11: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

11

thispeakendsanditsmaximumvalueandassociatedwindowpositionarenoted.224

Statisticalsignificanceoftheseprimaryandsecondarypeaks,alongwiththeirconfidence225

intervalsandrelativestrengths,willbeassessedinsubsequentstagesofthispipeline.226

227

SIBSAMidentificationofsignificantprimarypeaks228

Thefalsepositiveprobability(P)foreachprimarypeakisestimatedbycomparing229

empiricalpeakheightsagainstsimulationsunderthenullhypothesisofnotrueQTLs,in230

whichallphenotypicvarianceinthemappingpopulationisrandomwithrespectto231

genotype.Allprimarypeaksexceedingadtfromeachsimulationreplicatearenoted.The232

enrichment(e)ofpeaksequaltoorgreaterthanagivenpeak’sheightintherealdatais233

givenbytheratioofthefrequencyofpeaksofthisheightintherealdatarelativetothe234

simulateddata.Ifthereisanenrichment(e>1),anestimateoftheproportionofreal235

peaksofthisheightrepresentingfalsepositivesisthengivenby1/e.Forexample,ifad236

peaksofatleast0.2inheightarethreetimesmorecommonintheempiricaldatathanin237

nullsimulations,thenonaverageoneoutofthreesuchempiricalpeakscanbeexplainedby238

theexpectedfalsepositiverate.PrimarypeakswithanestimatedPlessthansome239

threshold(bydefault,0.05)arecarriedforwardforsubsequentanalysis.240

241

SIBSAMinferencesfromsingleQTLsimulations242

GenomicsimulationswithasingleQTLareusedtoestimatethegenomicconfidence243

intervalsandstrengthofeachprimarypeak,alongwithaPvalueforeachsecondarypeak.244

SingleQTLsimulationsareperformedwitheachfixedgenomicpositionscorrespondingto245

thewindowwiththepeakmaximumadforeachQTL,thusconservinglocalwindow246

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 12: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

12

patternsofdepthandcMdistance.Foragivensetofsimulatedgenomesfromthemapping247

population(pre-selection),arandomQTLeffectsizeisdrawn.SuchaQTListhen248

separatelysimulatedateachpositioncorrespondingtoanempiricalprimarypeak,with249

phenotypesimulationandreadsamplingperformedseparatelyineachcase.Thesimulated250

cageancestriesarereusedforeachseparateQTLsimulationasatime-savingefficiency.251

ThesimulatedQTLstrength,s,rangingfrom0to1,istheestimatedproportionof252

variancethataQTLexplainsamongthemappingpopulationindividuals.Inthesesingle253

locussimulations,allotherphenotypiccontributionsaremodeledasrandomvariance,254

whichhereisintendedtoencompasstheeffectsofunlinkedQTLsinadditiontonon-255

geneticeffectsonphenotypicmeasurements.Theamountofrandomvariancesimulatedis256

fixedtoapproximatethevariancecontributedbyacodominantlocusinwhicheachallele257

adds1tothephenotypicscore.ThiseffectwasimplementedbyobtainingGaussian258

randomvalueswithmean0andstandarddeviation1,andthenmultiplyingeachvalueby259

0.5toobtaintherandomvarianceeffectoneachindividual’sphenotypicscore.The260

simulatedeffectsizeofeachQTL,e,describesthequantitythateachalleleofthislocus261

(inheritedfromthehighparentalstrain)addstoanindividual’sphenotypicscore.Since262

randomeffectscorrespondtothevariancecontributedbyalocuswithe=1,theproportion263

ofvariancecontributedbyasingleQTL(s)isequaltoe/(1+e).Andcorrespondingly,a264

singleQTLintendedtohavestrengthsissimulatedwithaneffectsizee=s/(1+e).265

Foreachsimulatedreplicate,thesimulatedstrengthisrecorded,alongwitheachQTL’s266

maximumheight,peakwindowlocation,andmaximumsecondarydeviation.Toanalyze267

theonelocussimulationdataforeachprimarypeak,arejectionsamplingapproachisused268

toidentifysimulationreplicatesinwhichmaximumadfallswithinaspecifiedtolerance269

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 13: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

13

(default0.025)oftheempiricalpeak’smaximumad.Foreachacceptedsimulation270

replicate,thestrengthofthesimulatedlocusgoesintotheposteriordistributionforthe271

empiricalQTL’sstrength(fromwhichstrengthvaluescorrespondingtothe0.05,0.5,and272

0.95quantilesarereturned).Agenomicconfidenceintervalissimilarlyobtainedby273

examiningthefarleftandfarrightquantilesforthesimulatedpeaklocationsresulting274

fromaQTLsimulatedattheempiricalpeaklocation.Thisassumesacertaintransitivity.275

Here,wearesimulatingQTLswithfixedpositionsandobservinghowfarawaythe276

maximumadfallsinthesesimulations.Intheempiricaldata,weobservethelocationofthe277

maximumad,andwe’dliketoknowhowfarfromthiswindowthetrueQTLmightbe.278

Thus,weassumethedistancesfromtrueQTLtomaximumadinthesimulateddataarea279

goodproxyforthedistancesbetweenmaximumadandtrueQTLintheempiricaldata.280

Lastly,thesecondarydeviationsfromeachacceptedsimulationenablePvaluestobe281

calculatedforeachoftheempiricalprimarypeak’sassociatedsecondarypeaks.Ifmore282

thanonesecondarypeakispresentonthesamesideoftheprimarypeakintheempirical283

data,thetallestsecondarypeakistestedfirst,anditsvisbasedonthedifferencebetween284

itsheightandthelowestadvaluebetweenitselfandtheprimarypeak(evenifother285

secondarypeaksexistbetweenthispeakandvalley;Figure2).Forashortersecondary286

peakbetweenaprimarypeakandatallersecondarypeak,vwouldbedefinedasthe287

differencebetweenitsheightandthehigherofthevalleysoneithersideofit.Givingtaller288

peaksthispriorityavoidsthesituationofashortersecondarypeakbeingdeemed289

significantandatallerpeakbeyonditmissingthisthreshold(asmightoccurifsecondary290

peaksweresimplyevaluatedsequentiallybyposition).Aftersuchadjustments,each291

secondarypeakdeviationintheempiricaldataassociatedwiththisprimarypeakis292

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 14: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

14

comparedtothedistributionofvfromacceptedsimulations.Theproportionof293

simulationswithavgreaterthanobservedforagivenempiricalsecondarypeakbecomes294

thePvalueforthatpeak(i.e.theprobabilityofgettingasecondarydeviationthisextreme295

whenthetruemodelisasingleQTLoftheobservedmagnitude).296

297

SIBSAMinferencesfromQTLclustersimulations298

Incaseswhereanempiricalprimarypeakisaccompaniedbyoneormorestatistically299

significantsecondarypeaks,thestrengthsandconfidenceintervalsofallpeaksinthis“QTL300

cluster”arebestapproximatedfromsimulationsthatincludeeachmemberQTL.For301

example,apairofnearbyQTLsmayeachaddtotheadpeakheightoftheother,leadingto302

overestimatesofeffectsize.Therefore,multi-QTLsimulationsareconductedseparatelyfor303

eachQTLclusterinferredfromtheempiricaldata.Forsimplicity,thewindowpositionof304

eachsimulatedQTLisfixedaccordingtothewindowsshowingmaximumadforeach305

significantpeakintheempiricalcluster.ToexamineeachQTLseparately,eachisassigned306

ananalysiszonewithboundariescorrespondingtotheempiricalvalleys(localminima)307

betweenpeaks.Movingawayfromtheouterpeaksinthecluster,thisanalysiszoneis308

boundedonlybytheendsofthechromosome.309

Foreachclustersimulationreplicate,arandomstrengthvalueisfirstdrawnforthefull310

cluster(representingthecumulativeproportionofphenotypicvarianceexplainedbythe311

QTLsinthiscluster).ThatclusterstrengthisrandomlyapportionedamongtheQTLs,and312

eachpeak’sstrengthisthentranslatedintothesimulatedeffectsizeasdescribedabove.313

AclustersimulationreplicateisacceptedonlyifthelocalmaximumadineveryQTL’s314

analysiszonefallswithinatoleranceofthecorrespondingempiricalpeakheights.Here,it315

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 15: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

15

couldbenecessarytouseaslightlyhighertolerancevaluetoaccrueenoughaccepted316

simulations(defaultadtolerance0.05).ThisoranyothersimulationstepinSIBSAMcanbe317

parallelizedtoincreasethenumberofreplicates,followedbyjointanalysisofmultiple318

simulationoutputfiles(Figure3).319

Theestimatedstrengthofeachpeakincluster,alongwithconfidenceintervalsof320

strengthandgenomicposition,areobtainedfromasimilarrejectionprocessasdescribed321

fortheonelocussimulations(basedonthedistributionofstrengthvaluesandpeak322

locationsforthatpeakamongtheacceptedsimulations).Thus,theclusterQTLsimulations323

provideestimatesofeffectsizeandgenomicconfidenceintervalsforallsignificant324

secondarypeaks.Theyalsoreplacepriorestimatesofthesequantitiesfortheassociated325

primarypeaks,sinceclusterestimatesthataccountfortheeffectsoflinkedQTLsshouldbe326

moreaccurate.327

ThefinalSIBSAMoutputfilecontains,foreachsignificantprimaryandsecondarypeak,328

itsPvalue,thegenomiccoordinatesofthepeakwindowandtheconfidenceintervalforthe329

QTL’sgenomiclocation,andthepointestimateandconfidenceintervalforQTLstrength.330

InformationsuchasPvaluesfornon-significantpeakscanbefoundintheintermediate331

filesproducedatdifferentstagesoftheSIBSAMpipeline(Figure3).332

333

SimulationstestingtheperformanceofSIBSAM334

SimulationtestingofSIBSAMwasperformedtotestitsQTLdetectionpowerunder335

differentscenarios,andtoconfirmthatestimatesandconfidenceintervalsofgenomic336

locationandQTLstrengthwereperforminginlinewithexpectations.Althoughanearly337

infiniterangeofscenarioscouldpotentiallybeinvestigated,Ifocusedonexperimental338

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 16: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

16

parametersrelevanttoourcurrentempiricalapplicationsinDrosophila(e.g.Bastideetal.339

2016),inwhich1,200individualsinterbreedfor16generations,and10%phenotypictails340

areselectedforsequencing.Testsimulationssampled1,000informativereadsforeach341

windowforeachphenotypicpool,whichisabouthalfthemediandepthperwindowfrom342

currentempiricalapplications(e.g.Bastideetal.2016).Windowsweredesignedtoeach343

contain200non-singletonvariablesitesintheZambia-Siavongapopulationgenomicdata344

describedbyLacketal.(2015).These14,107windowshadamedianlengthof6.8kb.345

SimulationswithonegenuineQTLwereperformedwithvaryinglocusstrengths(s=346

0.05,0.1,0.15,0.2,0.25,0.33,0.5).Theseinitialtestsimulationsusedfixedgenomic347

positionscorrespondingtothelocationsofDrosophilapigmentationgenestan(ontheX348

chromosome)andebony(onarm3R).Additional3Rscenarioswiths=0.2investigatedthe349

consequencesoftheremainingvariancebeingduetounlinkedQTLs(1withs=0.8orelse350

4otherswiths=0.2)insteadofrandomGaussianvariance.Comparingeachtestreplicate351

againstSIBSAMnullsimulationsrevealedthetruepositiverateforQTLdetection.Running352

thetestreplicatesthroughtheSIBSAMonelocussimulationanalysisindicatedthe353

frequencyatwhichsecondaryQTLswerefalselyinferred,alongwithallowedtheinferred354

distributionsofQTLstrengthandgenomiclocationtobecomparedagainstknowntrue355

values.356

Additionalsimulationswereconducted(focusingonthe3Rlocation)toinvestigate357

SIBSAM’sperformanceinthepresenceoftwolinkedQTLs.ScenarioswithsymmetricQTL358

strength(s=0.15or0.3)andasymmetricQTLstrength(s=0.15and0.3)were359

investigated.ThedistancebetweenthetwoQTLswasvariedat2.5,5,10,and25cM.The360

testreplicateswerethenevaluatedwithSIBSAMto(1)testthepowertodetectoneorboth361

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 17: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

17

QTLs,(2)testtherateoffalselydetectingthreeormoreQTLs,(3)evaluatethe362

performanceofQTLlocalization,and(4)evaluatetheperformanceofQTLsizeestimation.363

364

365

RESULTS:366

367

InitialSimulationStudyofBSAandIM368

SimulationswereperformedtoexaminethepropertiesofQTLsignalsunderBSAand369

IMapproaches.Importantly,theseexploratorysimulationsarenotconnectedtoany370

formalQTLinference.Instead,theyfocusontheperformanceofsummarystatisticsrelated371

tothesignatureofaQTL.ForBSA,Iexamineancestrydifference(ad),thedifference372

betweenhighandlowphenotypicpoolsintheproportionofancestrysampledfromthe373

parentalstrainwiththehigherphenotypicvalue(ataparticulargenomiclocus).ForIM,I374

examineancestryproportion(ap),theproportionofthemappingpopulation’sancestrythat375

derivesfromthenon-backcrossparentalstrain.Ratherthanfocusingontherawvaluesof376

thesestatistics,IassesstheperformanceofBSAandIMbyexaminingthegeneticdistance377

betweenatruesimulatedQTLandthe“QTLpeak”(themaximumvalueofadorapinthis378

partofthegenome).379

Theaboveapproachallowsawiderrangeofscenariostobeexaminedthanwouldbe380

computationallyfeasibleunderthefullSIBSAMinferenceprocess.Beyondatentative381

comparisonofthegenomicprecisionofBSAvs.IM,animportantgoalhereistooptimize382

criticalexperimentalparameterstoimprovetheoutcomesoffuturetraitmappingstudies.383

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 18: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

18

Asapointofreference,thesesimulationsbeganwitha“default”scenarioinwhich600384

individualswerebredeachgeneration,for10totalgenerations,phenotypicselection385

retainedthe20%mostextremeindividualsineachdirection,andeachwindow/locushada386

sequencingdepthof300.Individualparameterswerethenvaried,aloneorincombination,387

andtheaccuracyoftheadorapsignalwasexamined.388

First,performancewasexaminedwhentandemlyvaryingthenumberofQTLsandthe389

numberofindependentcrosses.Withineachsimulationcase,allQTLswereofequal390

magnitudeandexplained5/6oftotalphenotypicvariance.Independentcrosseswere391

simulatedundertheassumptionthatallpairsofparentalstrainsshareagivenQTL392

differencebetweenthem.Whenmultiplecrosseswereanalyzedtogether,adorapwere393

addedbetweencrossesforeachgenomicwindowtotestwhetheramoreprecise394

localizationemergedfromthisjointsignal.Threeprimarythemesemergedfromthis395

analysis.First,BSAoutperformedIMforanygivencombinationofcrossesandloci(Figure396

4).Second,combiningdatafrommultiplecrosseshadamarkedlypositiveeffectonthe397

accuracyoftheseancestrysignals.Third,performanceshowedapredictabledeclinefor398

more/weakerQTLs.Still,caseswithmultiplecrossesstillmanagedrelativelystronger399

performanceformorepolygenicscenarios(Figure4),particularlyinthecaseofBSA.For400

simplicity,theremainingsimulationsbelowwillfocusonasinglecrossreplicateanda401

scenariowithfiveQTLs.402

Thenumberofgenerationsbeforegenotyping/sequencingwasalsovaried.Strong403

performanceimprovementwasobservedbyincreasingthenumberofgenerationsto8or404

10,withfurtherincreasesyieldingongoingbutdiminishingimprovements(Figure5A).405

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 19: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

19

Additionalgenerationsallowmorerecombinationbetweenparentalgeneticbackgrounds,406

whichshouldleadtosharperQTLpeaks.407

Pastresultsindicatethatselectingonlythemostextremeindividualsisnotoptimalfor408

BSA(Magweneetal.2011).Concordantly,forthefocalsimulationscenariostudiedhere,409

optimumbulkproportionswerearound10-15%foreachBSApool,and20%forthesingle410

IMpool(Figure5B).Theseresultsappeartoreflectabalancebetweenenrichingfor411

causativegenotypes(favoringfewerindividuals)andminimizingtheeffectsofrandom412

samplingvariance(favoringmoreindividuals).Thus,bothBSAandIMstudiesmaybenefit413

fromselectingsignificantnumbersofindividuals,whichshouldhelptomaximizethe414

diversityofrecombinationbreakpointsrepresentedinthefinaldata.415

Relatedtotheissueofsamplingvarianceareparameterssuchasthenumberof416

individualspresentineachgenerationandthenumberofgenotypessampledinthedata417

(e.g.sequencingdepth).Whensimulationsjointlyscaledupthenumberofindividuals418

presentineachgeneration,thenumbersampledforsequencing,andthesequencingdepth,419

performanceimprovedconsiderably(Figure5C).Thenumberofindividualssampledin420

thefinalgenerationmadeaparticulardifference,atleastifdepthwasscaleduplinearly421

(FigureS1).Increasingsequencedepthconsistentlyledtobetterperformance(viaa422

reductioninsamplingvariance),althoughwithsomediminishingreturns(Figure6).423

Simulationsalsoconsideredtheinteractionbetweenselectionproportionand424

populationsize.Theoptimalselectionproportion(s)tendstoscaleinverselywith425

populationsize(N).ForBSApopulationsizesbetween100and2,400,therewasarelative426

stabilityintheoptimalnumberofsampledindividualsforsequencing(Ns),withthis427

quantityrangingonlyfrom35to60(TableS1).InlinewiththefindingsofMagweneetal.428

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 20: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

20

(2011),thisresultsuggeststhatreducingsamplingvarianceisofprimaryimportance,429

whereasenrichingforthemostphenotypicallyextremeindividualsisasecondarypriority.430

431

SimulationtestingoftheSIBSAMpipeline432

AselaboratedintheMaterialsandMethodssection,Idevelopedaprototypemethodfor433

Simulation-basedInferenceforBSAMapping(SIBSAM).Theflexibilityofthissimulation-434

drivenpipelineallowsarangeofinferences,includingforchallengingcasesinwhichtwoor435

moreQTLsarepartofthesamecomplexpeak(Figure2).ThegoalsofSIBSAMinclude436

assessingthesignificanceofpeaks,andestimatingthestrengthandgenomicconfidence437

intervalofsignificantQTL.TheperformanceofSIBSAMwasassessedviaaseriesoftest438

simulationswithoneormoreQTLs.WhileavastrangeofQTLandexperimentalscenarios439

couldpotentiallybeexamined,Ifocushereonparametersrelevanttoongoingempirical440

workinDrosophila(Bastideetal.2016).TheBSAexperimentaldesignsimulatedhere441

wentfor16generations,with1,200individualsineachgeneration,with600females442

phenotypedinthelastgenerationwith10%poolsselected,and1,000informative443

sequencereadsforeachgenomicwindow.444

Fortheabovescenario,SIBSAM’sQTLdetectionpowerwentfromweakforaQTL445

explaining10%oftheexperimentalpopulation’sphenotypicvariance(withtheremainder446

duetorandomenvironmentalormeasurementvariance)tostrongfora20%QTL,with447

intermediatepowerfor15%QTL(Figure7).Asillustratedbytheexploratorysimulations448

above,theperformanceofQTLmappingislikelytobeimprovedbyincreasingthenumber449

ofgenerations,thepopulationsize,sequencingdepth,and/orthenumberofindependent450

crosses.451

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 21: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

21

TheestimationofQTLstrengthforsignificantpeakswasquiteaccuratefor452

intermediatestrengthQTL(15%to33%)whentheremainingphenotypicvariancewas453

randomandnormallydistributed(Figure7).However,inotherscenariosthestrength454

estimatebecameupwardlybiased.ForaweakerQTL(e.g.10%inthisexample),there455

appearstobea“detectionbias”inwhichonlythetestreplicatesgivingthetallestpeaks456

weredeemedsignificant,andsincethesepeaksareunusuallyhighforas=10%QTL,their457

strengthwastypicallyoverestimated.Ifstrengthestimatesfornon-significantpeakswere458

included,therewasnodirectionalbias.ThehighestQTLstrength(50%)alsoshowed459

upwardbias,whichmayreflecta“saturationeffect”oftheadstatistic.Here,peakheights460

wereverycloseto1(individualswerewell-sortedintotheextremepoolsbasedonQTL461

genotype),whichisthesameoutcomeproducedbyaQTLwiths>50%.Upwardstrength462

biaswasalsoobservediftheremainingphenotypicvariancewasproducedbyotherstrong463

QTLs,ratherthannormallydistributedrandomvariance.Ifa20%QTLwasaccompanied464

byanunlinked80%QTL(withnoenvironmental/measurementvariance),themedian465

estimateofswas24.2%.Ifa20%QTLwasaccompaniedbyfourunlinkedQTLsofequal466

strength,themedianestimateofswas31.4%(althoughpowerincreasedfrom94%to467

100%forbothofthesecases).Inlightoftherecurrentbiasineffectsizeestimation,the468

reportedquantitiesarebestviewedasroughestimatesofQTLstrength.Future469

methodologicalstudiesmayexplorealternativeapproachestotheestimationofQTL470

strengthinasimulationframework.471

OtheraspectsofSIBSAMinferenceperformedlargelyasexpectedonthesimulated472

data.ForsignificantQTL,onlyaround5%hadafalsepositivesecondarypeak(inlinewith473

nullexpectations;FigureS2).ForQTLstrengthswithadequatepower,approximatelythe474

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 22: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

22

predictedproportionoflocifellwithintheprovidedconfidenceintervalsforQTLstrength475

andgenomicposition(FigureS2),withperformanceonlydecliningfortheweakers=10%476

casethatwasrarelydetectedforthisscenario.477

DetectionpowerwasalsoexaminedforcasesinvolvingtwolinkedQTLs(ofstrength478

15%and/or30%)separatedbyvariousdistances(2.5cM,5cM,10cM,25cM).ForQTLof479

equalstrength,the25cMlinkagehadnoadverseeffectonQTLdetection.Powerwas480

actuallyslightlyhigherinthecaseof15%QTLseparatedby25cM(relativetotheunlinked481

case),eventhough55%ofthesetestreplicateshadoneoftheQTLsasasecondarypeak.482

PowertodetectasecondpeakdroppedsignificantlyasthedistancebetweenQTLdropped483

to10cMand5cM(Figure8).InthecasewhereoneQTLshads=30%andtheotherhads484

=15%,powerremainedhighforthestrongerQTLatalldistances,butwaslowforweaker485

QTLat10cMorcloser(Figure8).486

487

DISCUSSION488

Mappingthegeneticarchitectureofphenotypictraitdifferencesremainsachallenging489

butcriticalprobleminthefieldsofgeneticsandevolutionarybiology.Above,Ihave490

comparedthebehaviorofbulksegregantanalysisandintrogressionmapping,while491

assessingtheexperimentalparametersthatmodulatetheiroutcomes.Ithenofferedanew492

simulation-basedapproachtoBSAinference,gearedtowardsystemslikeDrosophilain493

whichhundredsorthousands(butnotmillions)ofindividualscanbeexamined,andin494

whichBSAQTLsignalsmaysometimesoverlapeachother.495

AgeneralprincipleofQTLmappingisthatperformanceisenhancedbysamplinga496

diverserangeofrecombinantgenotypes.Thus,simulationresultssuggestthatBSAandIM497

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 23: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

23

shouldbothbemoresuccessfulwhenmoregenerationsofinterbreedingoccur,when498

largernumbersofindividualsarepresentinthemappingpopulation,andwhengreater499

sequencingeffortisemployed.Theimportanceofsamplingatleastafewtensof500

individualsinphenotypically-selectedpoolsisclearaswell.Theseresultssuggestthatthe501

typicalmethodofintrogressionmapping,inwhichsmallnumbersofindividualsare502

phenotypicallyselectedeverygenerationortwo,isnotadvisableformappingoligogenic503

traits(andisnotidealformonogenictraitseither;Figure4).Instead,ifIMisused,larger504

numbersofphenotypedandretainedindividualsaredesirable.However,basedonthe505

criteriaemployedhere,BSAgaveamoreprecisemappingsignalthanIMforevery506

combinationofexperimentalandQTLparametersexamined.Thisfindingmayagainrelate507

totheprincipleofmaintainingadiversityofrecombinationbreakpoints,whichis508

maximizedbyavoidingIM’spopulationbottlenecksassociatedwithphenotypicselection509

duringtheintermediategenerationsofinterbreeding.510

ThetradeoffsamongBSA,IM,andothermappingapproachesarecomplexandmerit511

furtherattention.AcompellingadvantageofBSAisthatthesameexperimentalpopulation512

maybeusedtomapmultipletraitdifferences(e.g.oncetheadultshavealready513

reproduced,selectforonetraitingeneration12,anothertraitingeneration13,etc.).For514

thesamesetofexperimentalparametersasdefinedhere,BSAactuallyrequireslesseffort515

thanIMduringtheexperiment,sincephenotypingmustbeperformedonlyinthelast516

generation.BSAdoesrequirethesequencingoftwophenotypicpools(highandlow),517

whereasIMrequiresjustonephenotypicpooltobesequenced(notehoweverthat518

doublingIMdepthdoesnotallowittomatchBSA’sperformance;FigureS1).Becauseboth519

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 24: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

24

parentalstrains’genotypesarepresentacrossthegenomesofmappingpopulation520

individuals,BSAmaybemoreinfluencedbythecomplexitiesofepistaticinteractions.521

InthecourseofaBSAexperiment,parentalstrainancestryfrequenciesinthemapping522

populationcoulddeviatefrom50%.Theeffectsofgeneticdriftshouldbemodestwhenthe523

populationsizeisvastlygreaterthanthenumberofgenerationsofinterbreeding,and524

SIBSAMallowsfordrift’soccurrence.Althoughnotmodeledhere,inadvertentlaboratory525

selectioncouldalsoshiftmappingpopulationancestryfrequencies.Ingeneral,such526

ancestryshiftsshouldnotleadtofalsepositiveQTL,becausebothphenotypicpoolswillbe527

equallyaffected.Ifancestryfrequenciesbecomeextreme,theresponseofadtoaQTLcould528

bedampened,leadingtoreducedpowerandunderestimationofQTLstrength.Hence,it529

maybeworthwhiletocollectBSAsequencedatabeforeanexcessivenumberof530

generationshaveelapsed.Genomicregionsfoundtoshowancestryshiftscouldbe531

interestingintheirownright,sincetheymaycontaindriversoflaboratoryadaptation,532

differentialmatingsuccess,orsegregationdistortion.533

ItismorechallengingtocompareBSAorIMagainstalternativemappingmethodssuch534

asthoseinvolvingindividualgenotyping(e.g.Andolfattoetal.2011)orthegenerationof535

recombinantinbredlines(e.g.Kingetal.2012).However,itmaybeworthevaluatingthe536

benefitsofcombiningelementsofBSAwiththeseapproaches.Followingmultiple537

generationsinalargemappingpopulation,offspringwithextremephenotypescouldbe538

individuallygenotyped.Or,themappingpopulationcouldbeusedtofoundalargenumber539

ofrecombinantinbredlines(RILs),withBSAandRILmappingpotentiallyintegrated.540

Themappingapproachandmethoddescribedhererequiresamoderateinvestmentof541

researchertimeandfunding,anddeliversarangeofQTLinferences.Whileusefulinits542

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 25: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

25

currentform,SIBSAMmayalsomotivatefuturesimulation-basedmappingmethods.543

AlthoughmotivatedbyDrosophilaQTLmapping,thisapproachmayprovebroadlyuseful544

fornon-modelinsectsandothersmallerorganismswithshortgenerationtimes.545

546

547

ACKNOWLEDGEMENTS548

IthankKarlBromanforhelpfuldiscussionsregardingSIBSAMandDavidBegunfor549

suggestingBSAtomeseveralyearsago.TheUW-MadisonCenterforHighThroughput550

Computingprovidedcomputationalassistanceandresourcesforthiswork.Thisproject551

wasfundedbyNIHgrantR01GM111797.552

553

554

LITERATURECITED555

Andolfatto,P.,D.Davison,D.Erezyilmaz,T.T.Hu,J.Mast,etal.,2011Multiplexedshotgun556

genotypingforrapidandefficientgeneticmapping.GenomeRes.21:610-617.557

Baird,N.A.,P.D.Etter,T.S.Atwood,M.C.Currey,A.L.Shiver,etal.,2008RapidSNP558

discoveryandgeneticmappingusingsequencedRADmarkers.PLoSONE3:e3376.559

Bastide,H.,J.D.Lange,J.B.Lack,A.Yassin,andJ.E.Pool,2016Oligogenicadaptation,soft560

sweeps,andparallelmelanicevolutioninDrosophilamelanogaster.Accompanying561

manuscript.562

Comeron,J.,R.Ratnappan,andS.Bailin,2012Themanylandscapesofrecombinationin563

Drosophilamelanogaster.PLoSGenet.8:e1002905.564

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 26: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

26

Earley,E.J.,andC.D.Jones,2011Next-generationmappingofcomplextraitswith565

phenotype-basedselectionandintrogression.Genetics189:1203-1209.566

Edwards,M.D.,andD.K.Gifford,2012High-resolutiongeneticmappingwithpooled567

sequencing.BMCBioinformatics13:S8.568

Ehrenreich,I.M.,J.Bloom,N.Torabi,X.Wang,Y.Jia,etal.,2012Geneticarchitectureof569

highlycomplexchemicalresistancetraitsacrossfouryeaststrains.PLoSGenet.8:570

e1002570.571

Haase,N.J.,T.Beissinger,C.N.Hirsch,B.Vaillancourt,S.Deshpande,etal.,2015Shared572

genomicregionsbetweenderivativesofalargesegregatingpopulationofmaize573

identifiedusingbulkedsegregantanalysissequencingandtraditionallinkageanalysis.574

G3(Bethesda)5:1593-1602.575

King,E.G.,S.J.Macdonald,andA.D.Long,2012PropertiesandpoweroftheDrosophila576

SyntheticPopulationResourcefortheroutinedissectionofcomplextraits.Genetics577

191:935-949.578

Lack,J.L.,C.M.Cardeno,M.W.Crepeau,W.Taylor,R.B.Corbett-Detig,etal.,2015The579

Drosophilagenomenexus:apopulationgenomicresourceof623Drosophila580

melanogastergenomes,including197fromasingleancestralrangepopulation.581

Genetics199:1229-1241.582

Lai,C.Q.,J.Leips,W.Zou,J.F.Roberts,K.R.Wollenberg,etal.,2007Speed-mapping583

quantitativetraitlociusingmicroarrays.Nat.Methods4:839-841.584

Magwene,P.M.,J.H.Willis,andJ.K.Kelly,2011Thestatisticsofbulksegregantanalysis585

usingnextgenerationsequencing.PLoSComputBiol,7:e1002255.586

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 27: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

27

Michelmore,R.W.,I.Paran,andR.V.Kesseli,1991Identificationofmarkerslinkedto587

disease-resistancegenesbybulkedsegregantanalysis:arapidmethodtodetect588

markersinspecificgenomicregionsbyusingsegregatingpopulations.Proc.Natl.Acad.589

Sci.USA88:9828-9832.590

Parts,L.,F.A.Cubillos,J.Warringer,K.Jain,F.Salinas,etal.2011Revealingthegenetic591

structureofatraitbysequencingapopulationunderselection.GenomeRes.21:1131-592

1138.593

VanLeeuwen,T.,P.Demaeght,E.J.Osborne,W.Dermauw,S.Gohlke,etal.2012Population594

bulksegregantmappinguncoversresistancemutationsandthemodeofactionofa595

chitinsynthesisinhibitorinarthropods.Proc.Natl.Acad.Sci.USA109:4407-4412.596

Wicks,S.R.,R.T.Yeh,W.R.Gish,R.H.Waterston,andR.H.Plasterk,2001Rapidgene597

mappinginCaenorhabditiselegansusingahighdensitypolymorphismmap.Nat.Genet.598

28:160-164.599

600

601

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 28: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

28

Figure1.TheinvestigatedexperimentaldesignsforBSAandIMareillustrated.InBSA,offspringofreciprocalparentalstraincrossesarecombinedandallowedtobreedwithouttraitselectionforanumberofgenerations.Phenotypingoccursonlyinthefinalgeneration,andpoolsofindividualswiththehighestandlowesttraitvaluesareeachsequenced.TheIMframeworkinvestigatedhereinvolvestraitselectionandparentalstrainbackcrossingeverysecondgeneration(allowingrecessivegenotypestobeexpressed).Inthelastgeneration,onephenotypicextremeissequencedandcomparedagainstthebackcrossparentalstraingenome.

Selection for high trait

Parental Odd Generations Even Generations Last Even Generation

high trait ♀♀ (seq.)

all others (discard)

High Strain ♀♀× Low Strain ♂♂

Low Strain ♀♀× High Strain ♂♂

F1 ♀♀ F1 ♂♂

F1 ♀♀ F1 ♂♂ Interbreeding

without trait selection

Parental F1 Several Generations Last Generation

high trait ♀♀ (seq.)

mid trait ♀♀ (discard)

low trait ♀♀ (seq.)

High Strain ♀♀× Low Strain ♂♂

Interbreeding w/o selection

High-selected ♀♀ × Low Strain ♂♂

(backcross)

INTROGRESSION MAPPING (IM):

BULK SEGREGANT ANALYSIS (BSA):

.CC-BY-NC 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 9, 2016. ; https://doi.org/10.1101/057984doi: bioRxiv preprint

Page 29: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

29

Figure2.Definitionsofprimaryandsecondarypeaks,alongwithsecondarydeviation,usedbySIBSAMareillustratedhere.Amongagroupofcontiguouswindowswithsmoothedadvaluesabovezero,theprimarypeakisdefinedbythewindowwiththehighestvalue.Secondarypeaksrepresentotherlocalmaxima,andtheirsignificanceisjudgedbasedonsecondarydeviation(v).Secondarydeviationisdeterminedbythedifferenceinadbetweenthesecondarypeak’smaximumvalueandtheminimumvaluebetweenthatpeakandtheprimarypeak(oratallersecondarypeak,whicheverminimumisgreater).

PrimaryPeak

SecondaryPeak1

SecondaryPeak2

v2v1

Page 30: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

30

Figure3.AflowchartillustratingtheSIBSAManalysispipelineisshown.Asingleinputfilecontainsphysicalandgeneticmappositionsofwindowboundariesforallchromosomes,alongwithancestrydifferencevaluesandinformativedepth(thenumberofreadswithininformationaboutparentalstrainancestry)foreachwindow.NullsimulationswithnotrueQTLsareusedtoidentifysignificantprimarypeaksintheempiricaldata.SimulationswithoneQTL(matchingaprimarypeaklocation)arethenusedtoestimateconfidenceintervalsforprimarypeakeffectsizeandgenomiclocation,whilealsoidentifyingsignificantsecondarypeaks.Foranyprimarypeakwithsignificantsecondarypeaks,clustersimulationsareconductedwithQTLsateachpeak’slocation,inordertogeneratefinalconfidenceintervalsforeffectsizeandgenomiclocation.Theseanalysesaresummarizedintoasingleoutputfilecontainingallrelevantinferencesforeachsignificantpeak.

Input File: Window Data, Depth, Ancestry

sibsam_cluster_prepare.pl

sibsam_cluster_analysis.pl

Cluster Simulations

Null Simulations

sibsam_null.pl

Null Sim. Output File(s)

Secondary Peak File

Primary Peak P Value File

sibsam_cluster.pl

Cluster Peak Data File(s)

Cluster Output File(s)

Cluster Sim. Output File(s)

Secondary Peak P Value File

One Locus Simulations

1 Locus Sim. Output File(s)

Primary Peak Results File

sibsam_1locus.pl

sibsam_1locus_analysis.pl sibsam_null_analysis.pl

SIBSAM pipeline:

sibsam_summarize.pl Final Output File

Page 31: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

31

Figure4.ResultsareshownforexploratoryBSAandIMsimulationswithvaryingnumbersofQTLsandnumbersofjointly-analyzedindependentcrosses.Asaproxyformethodperformance,themediancentiMorgandistancebetweenthetrueQTLandthestatisticmaximum(ofadforBSAorapforIM)isshown.ThenullexpectationforarandomlylocatedpeakwithinaQTL’sanalysiswindowisalsoshown(gray).Theseresultsindicate:(1)theincreasingchallengeofmorepolygenicscenariosforallapproaches,(2)ageneraladvantageofBSAoverIM,and(3)theutilityofcombiningdatafromindependentcrossesthatallshareagivenQTLincommon.

(26.8) (24.3) (14.6) (7.3)

Page 32: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

32

Figure5.TheresultsofexploratoryBSAandIMsimulationsareshowninwhichoneormoreexperimentalvariablesweremanipulated.(A)Increasingthetotalnumberof

generationsintheexperimentreducesthemediancentiMorgandistancebetweenthetrue

QTLsandtheobservedpeak.(B)Abroadoptimalrangeofselectionproportionexistsfor

thefocalBSAandIMscenarios.(C)Scalinguptheexperimentalpopulationsize(andhence

thenumberofphenotypedindividuals),alongwiththesequencingdepth,leadsto

improvedstatisticalperformance.

0

1

2

3

4

5

0 4 8 12 16 20

cMfrom

PeaktoQTL

Genera2ons

IMBSA

0.0

0.5

1.0

1.5

2.0

0 0.1 0.2 0.3 0.4 0.5

cMfrom

PeaktoQTL

Selec2onPropor2on

IMBSA

0

1

2

3

4

5

6

7

75 150 300 600 1200

cMfrom

PeaktoQTL

Pop.Size,#Phenotyped,&(Depthx2)

IMBSA

C

A

B

Page 33: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

33

Figure6.OutcomesofexploratoryBSAandIMsimulationswithvariablesequencingdepthareshown.Tomoreclearly

illustratetheinfluenceofdepthonsamplingvariance,depthisplottedintermsoftheaveragenumberofreadsforeach

individualinaphenotypicallyselectedpool.Fromagroupof600phenotypedindividuals,resultsforaseriesofselection

proportionsareillustrated.Resultsillustratetheadvantageofincreasedsequencingdepth,withsomediminishingreturns.

0.0

0.5

1.0

1.5

2.0

2.5

0.625 1.25 2.5 5 10

Distan

cefrom

peaktoQTL(cM)

Depthpersampledindividual

5%(n=30)10%(n=60)20%(n=120)35%(n=210)50%(n=300)

0.0

0.5

1.0

1.5

2.0

2.5

0.625 1.25 2.5 5 10Distan

cefrom

peaktoQTL(cM)

Depthpersampledindividual

5%(n=30)10%(n=60)20%(n=120)35%(n=210)50%(n=300)

(11.9) (3.5)BSAIM

Page 34: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

34

Figure7.ResultsofonelocustestsimulationsassessingtheperformanceoftheSIBSAMpiplineareshownforQTLsontheautosomes(reddiamond)andXchromosome(blueX).Asshowninthetoppanel,thescenarioinvestigatedhere(involvingapopulationof1,200individualswith600phenotypedafter16generationsand10%retainedineachphenotypicpool)hadintermediatepowerforaQTLexplaining15%ofphenotypicvarianceintheexperimentalpopulation,withlow/highpowerbelow/abovethatmark.Asillustratedinthebottompanelanddiscussedinthetext,someupwardbiasineffectsizeestimationwasobservedfortheweakestandstrongestQTLsexamined.

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5

Powerto

Detect

QTLEffectSize

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5

Es#m

ated

Effe

ctSize

QTLEffectSize

Page 35: Genetic Mapping by Bulk Segregant Analysis in Drosophila ...Jun 09, 2016  · 46 revealing the genetic architecture of adaptive phenotypic change, including the number of 47 causative

35

Figure8.ThedetectionpowerofSIBSAMintestsimulationswithtwolinkedQTLsis

illustrated.ThetoppanelsillustratethepowertodetectthesecondoftwolinkedQTLsof

equalmagnitude,conditionalondetectingthefirst.Thebottompanelillustratesthepower

todetecteithertheweakerorthestrongeroftwolinkedQTLsofunequalsizes.

0

0.2

0.4

0.6

0.8

1

Unlinked

25cM

10cM

5cM

2.5cM

Powerto

DetectS

econ

dQTL

Strength30%

Linked QTLs of Equal Strength

0

0.2

0.4

0.6

0.8

1Unlinked

25cM

10cM

5cM

2.5cM

Powerto

DetectS

econ

dQTL

Strength15%

0

0.2

0.4

0.6

0.8

1

Unlinked 25cM 10cM 5cM 2.5cM

Powerto

DetectE

achQTL

LinkedQTLsofUnequalStrength

Strength15%

Strength30%