Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
HybridizationCaptureUsingRADProbes1
(hyRAD),aNewToolforPerformingGenomic2
AnalysesonCollectionSpecimens 3
4
TomaszSuchan1‡*,CamillePitteloud1‡,NadezhdaS.Gerasimova2,3,AnnaKostikova3,Sarah5
Schmid1,NilsArrigo1,MilaPajkovic1,MichałRonikier4,NadirAlvarez1*6
7
1DepartmentofEcologyandEvolution,UniversityofLausanne,Lausanne,Switzerland,8
2BiologyFaculty,LomonosovMoscowStateUniversity,Moscow,Russia,9
3InsideDNALtd.,London,UnitedKingdom,10
4InstituteofBotany,PolishAcademyofSciences,Kraków,Poland11
‡Theseauthorsarejointco-firstauthorsonthiswork.12
*[email protected](TS);[email protected](N.Alvarez)13
14
15
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
2
Intherecentyears,manyprotocolsaimedatreproduciblysequencingreduced-genomesubsetsin16
non-modelorganismshavebeenpublished.Amongthem,RAD-sequencingisoneofthemost17
widelyused.ItreliesondigestingDNAwithspecificrestrictionenzymesandperformingsize18
selectionontheresultingfragments.Despiteitsacknowledgedutility,thismethodisoflimiteduse19
withdegradedDNAsamples,suchasthoseisolatedfrommuseumspecimens,asthesesamples20
arelesslikelytoharborfragmentslongenoughtocomprisetworestrictionsitesmakingpossible21
ligationoftheadaptersequences(inthecaseofdouble-digestRAD)orperformingsizeselection22
oftheresultingfragments(inthecaseofsingle-digestRAD).Here,weaddresstheselimitationsby23
presentinganovelmethodcalledhybridizationRAD(hyRAD).Inthisapproach,biotinylatedRAD24
fragments,coveringarandomfractionofthegenome,areusedasbaitsforcapturinghomologous25
fragmentsfromgenomicshotgunsequencinglibraries.Thissimpleandcost-effectiveapproach26
allowssequencingoforthologouslocievenfromhighlydegradedDNAsamples,openingnew27
avenuesofresearchinthefieldofmuseumgenomics.Notrelyingontherestrictionsitepresence,28
itimprovesamong-samplelocicoverage.Inatrialstudy,hyRADallowedustoobtainalargesetof29
orthologouslocifromfreshandmuseumsamplesfromanon-modelbutterflyspecies,withahigh30
proportionofsinglenucleotidepolymorphismspresentinalleightanalyzedspecimens,including31
58-year-oldmuseumsamples.Theutilityofthemethodwasfurthervalidatedusing49museum32
andfreshsamplesofaPalearcticgrasshopperspeciesforwhichthespatialgeneticstructurewas33
previouslyassessedusingmtDNAamplicons.Theapplicationofthemethodiseventually34
discussedinawidercontext.Asitdoesnotrelyontherestrictionsitepresence,itisthereforenot35
sensitivetoamong-samplelocipolymorphismsintherestrictionsitesthatusuallycausesloci36
dropout.ThisshouldenabletheapplicationofhyRADtoanalysesatbroaderevolutionaryscales.37
38
39
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
3
Introduction40
41Withtheadventofnext-generationsequencing,conductinggenomic-scalestudieson42
non-modelspecieshasbecomeareality[1].Thecostofgenomesequencinghassubstantially43
droppedoverthelastdecadeandrepositoriesnowencompassanincredibleamountofgenomic44
data,whichhasopenedavenuesfortheemergingfieldofecologicalgenomics.However,when45
workingatthepopulationlevel—atleastineukaryotes—sequencingwholegenomesstilllies46
beyondthecapacitiesofmostlaboratories,andanumberoftechniquestargetingasubsetofthe47
genomehavebeendeveloped[2,3].Amongthemostpopularareapproachesrelyingon48
hybridizationcaptureofexome[4]orconservedfragmentsofthegenome[5],RNAsequencing49
(RNAseq[6]),andRestriction-Associated-DNAsequencing(RADseq[7,8]).Thelatterhasbeen50
developedinmanydifferentversions,butgenerallyreliesonspecificenzymaticdigestionand51
furtherselectionofarangeofDNAfragmentsizes.RAD-sequencinghasprovedtobeacost-and52
time-effectivemethodofSNP(singlenucleotidepolymorphisms)discovery,andcurrently53
representsthebesttoolavailabletotacklequestionsinthefieldofmolecularecology.Thewide54
utilityofRAD-sequencinginecological,phylogeneticandphylogeographicstudiesishowever55
limitedbytwomainfactors:i)thequalityofthestartinggenomicDNA;ii)thedegreeof56
divergenceamongthestudiedspecimens,thattranslatesintoDNAsequencepolymorphismat57
therestrictionsitestargetedbytheRADprotocols.58
SequencepolymorphismattheDNArestrictionsitecausesaprogressivelossofshared59
restrictionsitesamongdivergingcladesandresultsinnullallelesforwhichsequencedata60
cannotbeobtained.Thislimitationcriticallyreducesthenumberoforthologouslocithatcanbe61
surveyedacrossthecompletesetofanalyzedspecimensandleadstobiasedgeneticdiversity62
estimates[9-11].Thisphenomenon,combinedwithothertechnicalissues–suchaspolymerase63
chainreaction(PCR)competitioneffects–isaseriouslimitationofmostclassicRAD-sequencing64
protocolsthatneedstobeaddressed.65
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
4
Inaddition,RAD-sequencingprotocolsrelyonrelativelyhighmolecularweightDNA66
(especiallyfortheddRADprotocol[12]),notablybecauseenzymedigestionandsizeselectionof67
resultingfragmentsarethekeystepsforretrievingsequencedataacrossorthologousloci.68
Therefore,RAD-sequencingcannotbeappliedtodegradedDNAsamples,alimitationalso69
sharedbyclassicalgenotypingmethodsandampliconsequencing.Museumcollections,70
althoughencompassingsamplescoveringlargespatialareasandbroadtemporalscales,have71
notnecessarilyensuredoptimalconditionsforDNApreservation.Asaresult,manymuseum72
specimensyieldhighlyfragmentedDNA–evenforrelativelyrecentlycollectedsamples[13-73
15],limitingtheiruseformolecularecology,conservationgenetics,phylogeographicand74
phylogeneticstudies[16,17].Acost-effectiveandwidelyappliedapproachforgenomic75
analysesonmuseumspecimenswouldallowexploringoftenuniquebiologicalcollections,e.g.,76
encompassingrareornowextincttaxa/lineagesororganismsoccurringephemerallyinnatural77
habitatsandposingproblemsforsampling.Itwouldalsoallowstudyingtemporalshiftsin78
geneticdiversityusinghistoricalcollections,nowappliedonlyinahandfulofcasesatagenomic79
scale[18].80
Hybridization-capturemethodshavebeenacknowledgedasapromisingwaytoaddress81
boththeallelerepresentationandDNAqualitylimitations[19,20].Suchapproacheshowever82
usuallyrelyonpriorgenome/transcriptomeknowledgeanduntilrecentlyhavebeenlargely83
confinedtomodelorganisms.Addressingthislimitation,therecentdevelopmentof84
UltraConservedElements(UCE[3,5])oranchoredhybridenrichment[21]capture-based85
methodsallowedtargetinghomologouslociatbroadphylogeneticscalesusingonesetof86
probes.Ithoweverrequiresatime-consumingdesignandcostlysynthesisoftheprobesfor87
capturingtheDNAsequencesofinterest.Similarly,exoncapturetechniques,recentlyappliedin88
thefieldofmuseumgenomics[18],requirefreshspecimensforRNAextractionorsynthesizing89
theprobesbasedontheknowntranscriptome.90
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
5
Here,wepresentanapproachwecalled‘hybridizationRAD’(hyRAD),inwhichDNA91
fragments,generatedusingdoubledigestionRADprotocol(ddRAD[8])appliedtofresh92
samples,areusedashybridization-captureprobestoenrichshotgunlibrariesinthefragments93
ofinterest.OurmethodthuscombinesthesimplicityandrelativelylowcostofdevelopingRAD-94
sequencinglibrarieswiththepowerandaccuracyofhybridization-capturemethods.This95
enablestheeffectiveuseoflowqualityDNAandlimitstheproblemscausedbysequence96
polymorphismsattherestrictionsite.Moreover,utilizingstandardddRADandshotgun97
sequencingprotocolsallowsapplicationofthehyRADprotocolinlaboratoriesalreadyutilizing98
theabovementionedmethods,forlittlecost.99
Inshort,thehyRADapproachconsistsofthefollowingsteps(Fig1):100
1)generationofaddRADlibrarybasedonhigh-qualityDNAsamples,narrowsize101
selectionoftheresultingfragmentsandremovingadaptersequences;102
2)biotinylationoftheresultingfragments,hereaftercalledtheprobes;103
3)constructionofashotgunsequencinglibraryfromDNAsamples(eitherfreshor104
degradedasinmuseumspecimens);105
4)hybridizationcaptureoftheresultingshotgunlibrariesontheprobes;106
5)sequencingofenrichedshotgunlibrariesandoptionallyoftheddRADlibrary107
(probesprecursor)forfurtheruseasareference;108
6)bioinformatictreatment(Fig2):assemblyofthereadsintocontigs,alignmenttothe109
sequencedddRADlibraryordenovoassembly,SNPcalling.110
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
6
111
Fig1.Lab-workprocedureusedforhyRADdevelopment.Homologousreadsfrom112
shotgungenomiclibrariesarecapturedthroughhybridizationonrandomRAD-basedprobes.113
Thesefragmentsarethenseparatedusingstreptavidin-coatedbeadsandsequenced.114
I) Hybridization
RAD-seq libraries preparation & size selection
Adaptors removal & biotin labelling
B) RAD-seq probes generation (from high quality DNA)
II) Capture on streptavidin beads
IV) Libraries enrichment
A) Shotgun libraries preparation (DNA to be captured)
DNA of interest
III) Wash
C) Hybridization- capture
Probes sequencing
Final sequencing libraries
Contaminantor undesired loci
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
7
115
Fig2.BioinformaticpipelineusedforprocessinghyRADsequences.First,thereadsare116
demultiplexedandcleaned.Differenttypesofreferenceswerebuiltandthecapturedfragments117
weremappedonthereference.TheSNPsarethencalledaftercorrectingforpost-mortemDNA118
damages.119
120
Inthispaper,wedescribethelaboratoryandbioinformaticpipelinesforobtaining121
hyRADdata,aswellasvalidatetheusefulnessofthemethodontwoempiricaldatasets.Wefirst122
testthemethodontheDNAobtainedfrommuseumandfreshspecimensofLycaenahelle123
butterflies.Weexploredifferentbioinformaticapproachesforassemblinglocioutofthe124
hybridization-capturelibraries,namely:i)mappingcapturedlibrariesonpreviouslysequenced125
RADlocifromfreshsamples;ii)usingRADlociasseedsandthecapturedlibraries’readsto126
extendtheRADlociinordertoobtainlongerlociformapping;iii)denovoassemblyofthe127
Data
Mapcapturesbow.e2
CallSNPsFreeBayes
Cleantrimmoma.c
Clusterprobestolocivsearch
AssemblecapturesfromfreshsampleSOAPdenovo
RescaleBAMwithpostmortaldamagesmapDamage
Demul>plexpairedreadsbybarcodesea-u.lsfastq-multxunzip Qualitycontrol
fastqc
VCFtoNEXUSPGDSpider
VCFtostructureVcf-consensus
reads
Referencecrea.on–differenttypes
reads
Dataprepara.on
SNPsearch
Output
FilterSNPsvcNools
ElongatelociPriceTI
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
8
capturedreadsfromasingle,well-preservedbutterflyspecimenforthereference.Secondly,asa128
proofofconcept,weapplytheprotocoltomuseumandfreshsamplesofOedaleusdecorus,a129
Palearcticgrasshopperspeciesforwhichamarkedeast-westspatialgeneticstructurehasbeen130
identifiedinapreviousstudy[22].131
MaterialsandMethods132
133
Studyspeciesandstudydesign134
Forthefirststepofthemethoddevelopment,weusedsamplesofthebutterflyLycaenahelle135
(Lepidoptera,Lycaenidae)(Table1).Threerecentlycollected,ethanol-preservedsamplesfrom136
Romania,FranceandKazakhstanwereusedforgeneratingtheRADprobes,inordertocover137
variationwithinthefullspeciesrange.Genomiclibrariestobeenrichedbysequencecapture138
werebuiltusingeightsampleswhichincludedsevenmuseumdry-pinnedspecimensfrom139
Finland(4collectedin1985and3collectedin1957)andonerecentlycollectedandethanol-140
preservedspecimenfromRomania.Usingtheseeightsampleswecomparedtheoutputs141
betweenfreshandhistoricalDNAofdifferentage,andtestedtheimportanceofDNAsonication142
ineachcase.ThemuseumsampleswereloanedfromtheFinnishMuseumofNaturalHistoryin143
Helsinki,andtheethanol-preservedsampleswereobtainedfromRogerVila’sButterfly144
DiversityandEvolutionlab(InstituteofEvolutionaryBiology,CSIC,Barcelona,Spain).145
146 147
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
9
Table1.Lycaenahellesamplesusedinthestudy.148
Typeofpreservation
Yearofcollection
DNAconcentration[ng/ul]
Locality
dry(pin-mounted) 1957 2.12 Kuusamo,Finland
dry(pin-mounted) 1957 3.02 Kuusamo,Finland
dry(pin-mounted) 1957 1.48 Kuusamo,Finland
dry(pin-mounted) 1985 29.2 Kuusamo,Finland
dry(pin-mounted) 1985 19.7 Kuusamo,Finland
dry(pin-mounted) 1985 17.5 Kuusamo,Finland
dry(pin-mounted) 1985 8.84 Kuusamo,Finland
ethanol 2007 2.12 DumbravaVadului,Romania 149
Forthemethodvalidation,weused53samplesofthegrasshopperOedaleusdecorus,150
including49samplesofbothfreshandmuseumcollectionspecimensforconstructinggenomic151
librariesforthecapture(seeTable2),andfourfreshspecimensspanningthespecies’152
distribution(Switzerland,Spain,Hungary,Russia)forgeneratingtheRADprobes.Themuseum153
sampleswereonaverage64-years-old,withtheoldestsampledatingbackto1908andwere154
providedbytheNationalHistoryMuseumofLondon(UK),theNaturalHistoryMuseumofBern155
(Switzerland),theZoologicalMuseumofLausanne(Switzerland),theETHEntomological156
Collection(Switzerland),theNaturalHistoryMuseumofBasel(Switzerland),theNatural157
HistoryMuseumofGeneva(Switzerland),andtheNaturalHistoryMuseumofZurich158
(Switzerland).ThefourfreshgrasshoppersampleswereprovidedbyG.Heckel(Universityof159
Bern).160
161
162
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
10
Table2.SummaryofOedaleusdecorussamplesusedinthestudy.163
Typeofpreservation
Meanyearofcollection(range)
MeanDNAconcentration[ng/ul]±SD(range)
Localities
ethanol 2006(2005-2009) 28.34±28.04(3.2-105.7) Croatia,France,Italy,Russia,Spain,Switzerland
dry(pin-mounted) 1952(1908-1997) 18.31±21.73(0.3-121.4) Algeria,France,Greece,Italy,Madeira,,Portugal(mainlandandMadeira),Spain(mainlandandCanaryislands),Switzerland,Turkey
164
DNAextraction165
DNAwasextractedfrominsectlegsforallthesamples.Asmuseumspecimensare166
usuallycharacterizedbylow-contentofdegradedDNA,theisolationprotocolwasoptimized167
accordingly.ThesampleswereextractedusingQIAampDNAMicrokit(Qiagen,Hombrechtikon,168
Switzerland)inalaboratorydedicatedtolow-DNAcontentsamplesattheUniversityof169
Lausanne,Switzerland.Forthesesamples,DNArecoverywasimprovedbyprolongedsample170
grinding,overnightincubationinthelysisbufferfor14handfinalDNAelutionin20μlofthe171
bufferwithgradualcolumncentrifugation.Extractionoffreshsampleswasperformedusing172
DNeasyBlood&TissueKit(Qiagen).DNAextractionandlibrarypreparationusingmuseum173
specimenswasperformedusingconsumablesdedicatedtothemuseumspecimensonly.174
Bencheswerethoroughlycleanedwithbleachandfiltertipswereusedatallstagesoflabwork.175
RADprobespreparation176
Theprobeprecursorswerepreparedusingadouble-digestionRADprotocol[8,23],177
withfurthermodifications.178
TotalgenomicDNAwasdigestedat37°Cfor3hoursina9μlreaction,containing6μlof179
DNA,1xCutSmartbuffer(NewEnglandBiololabs-NEB,Ipswich,MA,USA),1UMseI(NEB)and180
2UofSbfI-HF(NEB).ThereactionproductswerepurifiedusingAMPureXP(BeckmanCoulter,181
Brea,USA),witharatioof2:1withthesample,accordingtothemanufacturer’sinstructions,182
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
11
andresuspendedin10μlof10mMTrisbuffer.Subsequently,adapterswereligatedtothe183
purifiedrestriction-digestedDNAina20μlreactioncontaining10μloftheinsert,0.5μMof184
RAD-P1adapter,0.5μMofuniversalRAD-P2adapter,1xT4ligasebuffer,and400UofT4DNA185
ligase(NEB).AdaptersequencesareshowninTable3;singlestrandadapteroligonucleotides186
areannealedbeforeusebyheatingto95°Candgradualcooling.Ligationwasperformedat16°C187
for3hours.ThereactionproductswerepurifiedusinganAMPureXPratio1:1withthesample,188
andresuspendedin10mMTrisbuffer.Theligationproductwassize-selectedusingthePippin189
Prepelectrophoresisplatform(SageScience,Beverly,USA)withapeakat270bpand‘tight’size190
selectionrange.191
192Table3.Oligonucleotidesusedintheprotocol.x=barcodesequenceintheadapters;barcode193sequencescanbedesignedusingpublishedscripts[24],availableat:194https://bioinf.eva.mpg.de/multiplex/;I=inosineintheregioncomplementarytothebarcodein195blockingoligonucleotidessequences.196
RADprobesP1adapters,SbfI-compatible(RAD-P1)
RAD-P1.1 ACACTCTTTCCCTACACGACGCTCTTCCGATCTxxxxxxCCTGCA
RAD-P1.2 GGxxxxxxAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
RADprobesP2adapter,MseI-compatible(RAD-P2)
RAD-P2.1 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
RAD-P2.2 TAAGATCGGAAGAGCGAGAACAA
ShotgunlibraryP1adapters
P1.1 ACACTCTTTCCCTACACGACGCTCTTCCGATCTxxxxxx
P1.2 xxxxxxAGATCGGAAGAGC
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
12
ShotgunlibraryP2oligonucleotide
P2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCCCCC
PCRprimers
ILLPCR1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTT
ILLPCR2_01 CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_02 CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_03 CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_04 CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_05 CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_06 CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_07 CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_08 CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_09 CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_10 CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_11 CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCAGACGTGTGC
ILLPCR2_12 CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGC
Blockingoligonucleotides
BO1.P5.F AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
BO2.P5.R AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
BO3.P7.F CAAGCAGAAGACGGCATACGAGATIIIIIIGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
BO4.P7.R AGATCGGAAGAGCACACGTCTGAACTCCAGTCACIIIIIIATCTCGTATGCCGTCTTCTGCTTG
197
TheresultingtemplatewasamplifiedbyPCRina10μlmixconsistingof1xQ5buffer,198
0.2mMofeachdNTP,0.6μMofeachprimer(Table3),and0.2UQ5hot-startpolymerase199
(NEB).Thethermocyclerprogramincludedinitialdenaturationfor30secat98°C;30PCR200
cyclesof20secat98°C,30secat60°C,and40secat72°C;followedbyafinalextensionfor10201
minat72°C.Inordertoobtainsufficientamountofmaterialfortheprobes,thereactionhadto202
beruninreplicates.Thenecessarynumberofreplicateshastobedeterminedempiricallyto203
reachintotal500-1000ngoftheamplifiedproductrequiredforeachcapture.DNAamounts204
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
13
wereassessedusingaQubitfluorometer(Waltham,MA,USA).Successofsizeselectionandof205
PCRreactionswasconfirmedbyrunningthemonaFragmentAnalyzer(seeS1Fig;Advanced206
Analytical,Ankeny,IA,USA;seeS1Fig).Afterwards,thePCRproductswerepooledandpurified207
usingAMPureXP,witharatioof1:1withtheamplifiedDNAvolume.208
Analiquotoftheresultinglibrarywassequencedandtherestofthelibrarywas209
convertedintoprobesbyremovingadaptersequencesbyenzymaticrestrictionfollowedby210
biotinylation.Theprobeprecursorswereincubatedat37°Cfor3hoursina50μlreaction211
containing30μlofDNA,1xCutSmartbuffer(NEB),5UofMseI(NEB)and10UofSbfI-HF212
(NEB),replicatedasrequiredbytheamountoftheamplifiedproduct.Thereactionwasended213
with20minenzymeinactivationat65°Cfor20minandtheresultingfragmentswerepurified214
usingAMPureXP:reactionvolumeratioof1.5:1.Purifiedfragmentswerebiotinnick-labelled215
usingBioNickDNALabelingSystem(ThermoFisher,Waltham,MA,USA)accordingtothe216
supplier’sinstructionsandpurifiedusingAMPureXP:reactionvolumeratioof1.5:1.The217
resultingfragmentswillthereafterbereferredtoasprobes.218
Shotgunlibrarypreparation219
Shotgunlibrarieswerepreparedfromthefreshandmuseumspecimensbasedona220
publishedprotocolfordegradedDNAsamples[15],modifiedinordertoincorporateadapter221
designofMeyer&Kircher[24].Theapproachusedforlibrarypreparation,utilizingbarcoded222
P1adapterand12indexedP2PCRprimers,allowsahighsamplemultiplexingonasingle223
sequencinglane(seeTable3[24]).224
ForL.helle,DNAfromeachindividualwasdividedintwoaliquots.Onealiquotwaskept225
intact(i.e.highmolecularweightDNAinthefreshsampleandnaturallydegradedDNAin226
museumspecimens)andthesecondwassonicatedusingCovarisfocusedultrasonicator227
(Woburn,MA,USA)withapeakat300bp.Bothaliquotswereprocessedinparallelduring228
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
14
subsequentstepsoflibrariespreparation.AllthelibrariesfromO.decoruswereprepared229
withoutsonication,basedonthetestresultsobtainedfromtheL.hellelibraries.230
DNAsampleswerefirst5’-phosphorylatedinordertoallowadapterligationinthenext231
stepsoftheprotocol.8μlofDNAwasdenaturedat95°Cfor10minutesandquicklychilledon232
ice.The10μlreactionconsistingofdenaturedDNA,1xPNKbufferand10UofT4polynucleotide233
kinase(NEB)wasincubatedat37°Cfor30minandheat-inactivatedat65°Cfor20min.The234
DNAwasthenpurifiedusinganAMPure:reactionvolumeratioof2:1andresuspendedin10μl235
of10mMTrisbuffer.236
Aguanidinetailingreactionofthe3’-terminuswasperformedafterheatdenaturationof237
DNAat95°Cfor10minutesandquicklychillingonice.Thereactioncomposedof1xbuffer4238
(NEB),0.25mMcobaltchloride(NEB),4mMGTP(LifeTechnologies),10UTdT(NEB)and10µl239
ofdenaturedDNAin20µlreactionvolumewasincubatedat37°Cfor30minandheat-240
inactivatedat70°Cfor10min.241
ThesecondDNAstrandwassynthesizedusingKlenowFragment(3’→5’exo-),witha242
primerconsistingoftheIlluminaP2sequenceandapoly-Csequencehomologoustothepoly-G243
tail(seeTable3)addedtotheDNAstrandinthepreviousreaction.A10μlreactionmix244
consistingof1μlofNEBuffer4(10x),0.6μlofdNTPmix(25mMeach),1μloftheP2245
oligonucleotide(15mM),5.4μlofwater,and2μlofKlenowFragment(3’→5’exo-;NEB,5246
U/μl)wasaddedtothe20μloftheTdTreactionmix,incubatedat23°Cfor3h,andheat-247
inactivatedat75°Cfor20min.Thedoublestrandedproductwasthenblunt-endedbyaddinga248
mixconsistingof0.5μlofNEBuffer4(10x),0.35μlofBSA(10mg/ml),0.2μlofT4DNA249
polymerase(NEB,3U/μl)and3.95μlofwater,andincubatedat12°Cfor15min.Theresulting250
productwaspurifiedusingAMPureXP:reactionratioof2:1andresuspendedin10μlof10mM251
Trisbuffer.252
BarcodedP1adapters(seeTable3)wereligatedtothe5’-phosphorylatedendofthe253
double-strandedproductina20μlreactionconsistingof10μlofthedouble-strandedDNA,1μl254
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
15
ofthe25uMadapters,1xT4DNAligasebuffer,and400UofT4DNAligase(NEB).Adapters255
havetobeannealedbeforeuseintheRADprobesprotocol.Thereactionwasincubatedat16°C256
overnight.TheresultingproductwaspurifiedusinganAMPure:reactionratioof1:1and257
resuspendedin20μlof10mMTrisbuffer.LigatedP1adapterswerefilled-inina40μlreaction258
consistingof20μlofpurifiedligationproduct,1xThermoPolreactionbuffer(NEB),12UofBst259
polymerase(NEB),anddNTPs(0.25mMeach),andincubatedat37°Cfor20min.260
TheresultingtemplatewasamplifiedbyPCRadding15μlofamixconsistingof5μlof261
Q5reactionbuffer(5x),0.2μlofdNTPs(25mMeach),2.5μlofthePCRprimermix(5μMeach),262
and0.5UofQ5HotStartHigh-FidelityDNApolymerase(NEB)tothe10μlofthetemplate.The263
programstartedwith20secat98°C,followedby25cyclesof10secat98°C,20secat60°C,and264
25secat72°C,followedbyafinalextensionfor2minat72°C.SuccessofeachPCRreactionwas265
checkedusinggelelectrophoresis,andtheresultingproductswerepurifiedusingAMPure266
XP:reactionratioof0.7:1.Sampleswerethenpooledinequimolarratios.267
Insolutionhybridizationcapture,libraryreamplificationandsequencing268
Thehybridizationcaptureandlibraryenrichmentstepsdescribedbelowarebasedon269
previouslypublishedprotocols[13,25]withsomemodifications.Thehybridizationmix270
consistedof6xSSC,50mMEDTA,1%SDS,2xDenhardt’ssolution,2μMofeachblocking271
oligonucleotide(topreventhybridizationofadaptersequences;seeTable3),500to1000ngof272
theprobesand500to1000ngoftheshotgunlibraries,inatotalvolumeof40μl.Onaccountof273
grasshopperlargergenomesizeandpreliminaryresultsindicatinglowsignal-noiseratiointhe274
butterflylibraries,500ngofhumanCot-1DNA(ThermoFisherScientific,Switzerland)was275
addedtotheO.decorushybridizationmixinordertopreventnon-specifichybridizationscaused276
byrepetitivesequences.Themixwasdenaturedat95°Cfor10minandsubsequentlyincubated277
at65°Cfor48hours.Theprobes,hybridizedwithtargetedfragmentsofthelibrary,werethen278
separatedonstreptavidinbeads(DynabeadsM-280,LifeTechnologies).10μlofthebeads279
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
16
solutionwaswashedthreetimesonthemagnetwith200μlofTENbuffer(10mMTris-HCl7.5,280
1mMEDTA,1MNaCl)andresuspendedin200μlofTEN.40μlofthehybridizationmixwas281
addedtothe200μlofthebeadssolutionandincubatedfor30minatroomtemperature.After282
separatingthebeadswiththemagnet,thesupernatantwasremovedandthebeadswere283
washedfourtimesunderdifferentstringencyconditionsasfollows.Thebeadswere284
resuspendedin200μlof65°C1xSSC/0.1%SDSwashbuffer,incubatedfor15minat65°C,285
separatedonthemagnetandthesupernatantwasremoved.Theabovestepwasperformed286
againwith1xSSC/0.1%SDS,followedby0.5xSSC/0.1%SDSand0.1xSSC/0.1%SDS.Finally,287
thehybridization-enrichedproductwaswashed-offfromtheprobesbyadding30μlof80°C288
waterandincubatingat80°Cfor10min.289
Enrichmentofthecapturedlibrarieswasperformedina50μlPCRreactioncontaining290
1xQ5reactionbuffer(NEB),0.2mMdNTPs,0.5μMofeachPCRprimer(theP1universalprimer291
andoneofthe12P2indexedprimers,seeTable3),1UofQ5HotStartHigh-FidelityDNA292
Polymerase(NEB),and15μlofthetemplate.Theprogramstartedwith20secinitial293
denaturationat98°C;followedby25PCRcyclesof10secat98°C,20secat60°C,and25secat294
72°C;andafinalextensionfor2minat72°C.Theenriched-capturedlibrarieswerepurified295
usinganAMPureXP:reactionratio1:1andpooledinequimolarratiosforsequencing(seeS2Fig296
foraprofileexampleofthere-amplifiedcapturelibraryafterAMPurepurification).297
Theprobesprecursors(RADlibrary)forthebutterflylibrariesweresequencedonone298
laneofIlluminaMiSeq300bpsingle-end.Butterflycapture-enrichedlibrariesweresequenced299
ononelaneofMiSeq150bppaired-end,andgrasshoppercapture-enrichedlibrarieswere300
sequencedononelaneofIlluminaHiSeq100bppaired-end.301
Updatedversionsofthelabprotocolcanbefoundat302
https://github.com/chiasto/hyRAD.303
304
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
17
Dataanalysis305
ThehyRADdatasetscorrespondtotarget-enrichedlibrariesandcannotbeanalysedwiththe306
usualRADpipelines[26,27]Indeed,althoughtheyweregeneratedusingRADloci,theobtained307
sequencesarenotflankedbytherestrictionsitesandinsteadmaynotoverlapcompletely308
and/orextendbeforeandaftertheRADlocus.Asaresult,theanalysispipelinemustincludethe309
followingsteps:310
1)demultiplexingandcleaningofrawreads;311
2)buildingofreferencesequencesforeachRADlocus;312
3)alignmentofreadsagainsttheobtainedreferences;313
4)SNPcalling.314
AllbioinformaticstepsofthehyRADpipelinecanberunathttps://insidedna.me.315
Demultiplexinganddatapreparation316
Theobtainedreadsweredemultiplexedusingthefastx_barcode_splittertoolfromthe317
FASTX-Toolkitpackage[28].RAD-seqsequences(probeprecursors)wereprocessedbyTrim318
Galore![29]andcleanedwiththefastq-mcftoolfromea-utilspackage[30]toremovelow319
qualitynucleotidesandadaptersequences.ThePCRduplicateswereremovedfromRAD-seq320
probesprecursorsandhyRADdatasetsusingtheMarkDuplicatestoolofPicardtoolkit[31].321
ReadsfromhyRADlibrariesweretestedforexogeneousDNAcontaminationusingBLAST322
againstNCBInucleotidedatabase(50,000readsforsonicatedornon-sonicatedfreshor323
museumDNAsamples).324
ExploringthemethodsofreferencecreationonLycaenahellelibraries325
Paired-endreadsobtainedfromthehybridization-capturelibraryforeachsamplewere326
mappedontothreereferences:(1)consensussequencesfortheclusteredRAD-seqreads(RAD-327
ref),(2)RAD-seqreadsextendedusinghybridization-capturedreads(RAD-ref-ext),and(3)328
contigsassembledfromthereadsofhybridization-capturedsamples(assembly-ref).Aswehad329
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
18
noreferencegenomeavailable,wefocusedoncheckingthenumbersofloci/SNPsobtainedby330
mappingthereadsoneachreferenceandtheoverlapofthelociobtainedusingdifferent331
methods.Theoretically,weshouldbeabletoretrieveallthelocifromthesequencedprobe332
precursors(RAD-seqlibrary)inthecapturedlibraries.Wethusevaluatethenumberof333
capturedlocihomologoustothoseretrievedbysequencingtheRAD-seqlibraries.However,334
differentfactorsaffectsignaltonoiseratio(i.e.theproportionoffragmentshomologouswith335
theprobe)inthecapturedlibrariesandcandecreasethenumbersofhomologousfragments336
retrieved.337
VsearchRADlociclustering(RAD-ref).HighqualityreadsofRADprobeswere338
clusteredbysimilarityusingVsearch[32]toobtainlociforfurthermappingthereadsfromthe339
hybridization-capturelibraries.BeforeVsearchrun,weconvertedcleanedfastqfilesintofasta340
formatusingthefasta_to_fastqtoolfromthetheFASTX-Toolkitpackage[28].Toobtainthemost341
reliablecontigsacrosssamples,Vsearchwasrunintwoiterations.Duringthefirstiteration,we342
obtainedconsensusclustersatthewithin-individuallevel(i.e.clusteringoftherawreadsfor343
eachsampleindependently).Duringtheseconditeration,Vsearchwasrunontheconsensus344
clustersobtainedfromthefirstiteration.Theseconditerationallowedustoobtainconsensus345
clustersattheamong-individuallevel.ForbothiterationsweranVsearchwithvariousidentity346
thresholds(0.51,0.61,0.71,0.81,0.83,0.91,0.93,0.96,0.98forthewithin-individualleveland347
0.51,0.61,0.71,0.81,0.85,0.88fortheamong-individuallevel)inordertoidentifyanoptimal348
identitythresholdforclustering,i.e.athresholdthatmaximizesthenumberofclusterswitha349
minimalcoverageof2xand3x,respectively.Inallcases,weusedthecluster_fastoptionfor350
clustering.Theconsensussequencesofeachsecondaryclusterwerethenusedaslocus351
referencesinsubsequentalignmentandSNPcallingsteps.352
VsearchRADlociclusteringandextensionusingcapturedreads(RAD-ref-ext).To353
obtainRAD-ref-ext,weiterativelyextendedcontigsoftheRAD-refusingreadsfromthe354
hybridization-capturelibrary(bypoolingreadscontributedbyalltheanalysedspecimens)355
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
19
usingPriceTI[33]with30cyclesofextensionandaminimumoverlapofasequencematchto356
30.Theobtainedreferencesweretrimmedby60bpateachendinordertoremovesequences357
withputativelylow-qualityends.WeappliedthistoughthresholdforRAD-ref-extonly,as358
probesextensioncanbeperformedonverylow-coveragedata,andwethereforewantedto359
keeptheerrorrate(usuallyhigheronbothsequenceends)attheminimum.360
De-novoassemblyfromcapturedreadsonly(assembly-ref).Assemblywas361
performedonthehybridization-capturedreadsofonegoodqualityethanol-preserved,362
sonicated,sample.Onlysequencesobtainedfromthesinglefreshsamplewereused,as363
stringentcleaningparametersinTrimmomatic[34],usedforthereferenceconstruction,ledtoa364
largeloss(upto80%)ofthereadsfromhistoricalsamples(andsuchdatawasthereforeless365
optimalthanthatfromthefreshspecimenforproducingareference).Moreover,usingone366
individualallowsobtainingamorereliablereference,asanyamong-sampledivergencecan367
resultinbubblesinthecontigsandbiastheassemblybyoversplittingalleles.Asaresultwe368
couldhaveobtainedchimericduplicatedlocipresentedindifferentcontigs.Weused369
SOAPdenovoV2.04toassemblecleanedreadsintocontigs[35].370
MappingandSNPcalling371
ReadsfromthehyRADlibrarywerecleanedbyTrimmomaticwithmilderparameters372
thanreadsusedinthedenovoassembly(keeping60-85%ofthereadsinhistoricalspecimens).373
Readmappingwasperformedusingbowtie2-buildforthereferenceindexingandbowtie2for374
mapping[36],PCRduplicateswereremovedusingtheMarkDuplicatestoolfromthePicard375
toolkit[31]andSNPswerecalledwithFreeBayesusingthedefaultparameters[37,38].To376
evaluatethelevelofDNAdamageinmuseumDNAsamples,weusedmapDamage2.0that377
rescalesbasequalityscoresofputativelypost-mortemdamagedbases[39]inordertominimize378
presenceofpost-mortemconversionsintheresultingSNPs.Datasetsforreplicateswere379
mergedandanalysedforSNPswithFreeBayes.ObtainedVCFfileswerefilteredbythevcffilter380
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
20
toolfromthevcflib[40]andVCFtools[41].Intheresultingsetonlybiallelicsiteswithhigh381
quality(PHRED>30),minorallelecountlargerthan1/6ofall,presentinatleast50%ofthe382
samplesandwithaminimumdepthof6werekept,indelswereremoved.Potentialparalogsand383
multi-copysiteswereremovedbasedonacoverageofastandard-deviationthreetimeshigher384
thanthemean[42].VCFformatfiles[43]wereconvertedtoSNP-basedNEXUSfilesusing385
PGDSpiderconverter[44]andtostructuredatafilesforeveryindividualusingthevcf-386
consensustool(seealsoFig2).Updatedversionsofthebioinformaticpipelinecanbefoundat387
https://github.com/chiasto/hyRAD.388
Geneticstructure389
Inordertocheckwhetherdatawasreflectinggeneticstructure,weapplied390
fastStructure,aStructure-likealgorithmadaptedtolargeSNPgenotypedata[45]tothesixfinal391
datasets(RAD-ref,RAD-ref-extandassembly-ref,bothforsampleswithandwithout392
sonication).Theanalyseswereconductedusingasimplepriorandassumingtwogroups(k=2).393
Overlapbetweenassemblyreferences394
ToevaluatethelevelofoverlapamongthethreeassemblyreferencesforL.helle(RAD-395
ref,RAD-ref-ext,andassembly-ref)weusedtheOrthoMCL[46]pipelinefororthologydetection.396
Mostofthepipelinewasrunwiththedefaultparameters,exceptforBlastallandMCLclustering397
steps.Here,weusedmorestringentparametervalues(e-valueof0.0001andMCLwasrunwith398
aninflationparameterof2.0)inordertoreducechancesofdetectingfalseorthologygroups.As399
aresult,weobtainedclustersofcontigsbeingcontributedbythethreeassemblyreferences.We400
thencountedhowmanyoftheseclusters–presumablycorrespondingtohomologousloci–401
weresharedamongtheavailablereferenceassemblyapproaches.Eventually,torevealthe402
numberofRADlocipresentinthereferences,readsoftherawRADlibraryweremappedon403
RAD-refandassembly-refusingbowtie2[36]andlevelsofmappingwerecompared.404
Proofofconcept:applicationofhyRADtoOedaleusdecorus405
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
21
Theutilityofthemethodwasfurthervalidatedusing49museumandfreshsamplesofa406
Palearcticgrasshopperspeciesforwhichamarkedeast-westspatialgeneticstructurewas407
identifiedpreviously[22].Thecatalogwasbuiltbasedoneightspecimensfromthecaptured408
librarythatshowedthelargestnumberofreadsandspannedthespecies’distributionarea409
(Switzerland,Italy,Spain,Russia),usingthemethodthatyieldedthehighestnumberofcontigs410
andproducedconsistentgeneticstructureinL.helle(assembly-ref,i.e.,denovoreferencebuilt411
usingSOAPdenovo,seeResultsandDiscussionsectionandTable4).Thegeneratedcontigswere412
blastedagainstGenBankdatabasesforbacterial,fungalandtechnicalsequenceswitha413
minimumE-valuethresholdof0.1.Endogeneouscontigsofeachsampleswereassembledto414
generatethefinalreferenceusingGeneiousV9.0.2[47].Filtersweresubsequentlyappliedto415
keeponlyhighqualityandinformativeSNPsasfortheL.hellelibraries. ThefinalVCFmatrix416
wasconvertedintoStructureformatusingPGDSpiderV2.0.9.0[44]andpopulationstructure417
wasinferredusingfastStructure[45]usingasimplepriorandassumingtwogroups(k=2).418
QuantumGISV2.4.0wasusedtopresentthegeographicdistributionofthegeneticclusters[48].419
420
Table4.DataonobtainedreferencesforL.helle(RAD-ref,RAD-ref-ext,assembly-ref)andO.421decorus(assembly-ref).422
Reference Numberofcontigs Largestcontig(bp)
Totallength(bp) N50
RAD-ref(L.helle) 25478 544 5445942 209
RAD-ref-ext(L.helle) 24820 851 2613024 98
assembly-ref(L.helle) 304161 2352 35579979 666
assembly-ref(O.decorus) 408851 13103 119789911 321 423
ResultsandDiscussion424
425
Sequencinganddataquality426
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
22
Lycaenahellelibraries.RAD-seqlibrariessequencingyielded14,188,023and427
hybridization-capturelibraries16,636,502rawreads:8,217,522forsonicatedand8,418,980428
fornon-sonicatedsamples.Additionalsequencingofthereferencelibraryfromtheethanol-429
preservedspecimen(usedfortheRAD-refcreation)yielded4,703,744reads.Theproportionof430
readskeptafterqualityfilteringvariedwithsampleageandpreparationmethod.Forthe431
ethanol-preservedsample,89.8%ofreadsfromsonicatedand89.4%fromnon-sonicated432
samplewereretained.Forthe30yearsoldsamplesthemeanwas74.3%and79.6%,andfor58433
yearsoldsamples70.8%and73.8%forsonicatedandnon-sonicatedsamples,respectively.434
Oedaleusdecoruslibraries.Hybridization-capturelibrariesyieldedatotalof435
69,306,042rawreads.Afterqualityfiltering,80.3%ofreadswerekeptamongallthesamples.436
ComparisonofreferencesobtainedforL.hellelibraries437
Proportionofsingle-hitalignmentsandSNPnumbers.Consensusclusteringofthe438
RAD-basedreferencewithinindividualsproducedthelargestnumberofclusterswith2xand3x439
coveragewithaclusteringidentitythresholdof0.91,comparedtootherthresholdvalues.440
Consensusclusteringamongindividualsproducedthebestresultswithaclusteringidentity441
thresholdof0.71(seeS3Fig).442
Thehighestnumber,lengthandthetotallengthofreferencecontigswereobtained443
usingdenovoassemblywithSOAPdenovo(assembly-ref;Table4).BothRAD-basedassemblies444
producedanorderofmagnitudelowernumberofcontigs.Theextensionperformedonthe445
obtainedRADreferencefollowedbytrimmingofadaptersequencesresultedinreferenceswith446
anaverageshorterlength(lowerN50)thanthestartingcontigs—whereaspriceTIextendeda447
largenumberofprobes,thisdidnotreflectinasubstantiallyhigheraveragelocilengthbecause448
offurthertrimmingofobtainedcontigs.449
Thehighestlevelsofsingle-hitalignmentsformostofthesamples,excepttheoldest450
ones,forbothpreparationmethods(sonicatedandnotsonicated)wereobtainedwhenmapped451
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
23
onthesequencedRADlociextendedusingPriceTI(RAD-ref-ext).Thismethodwasfollowedby452
denovoassemblyusingreadsfromthehybridization-capturelibraryfromasinglefresh453
specimen(assembly-ref)andmappingontheRADloci(RAD-ref);althoughthedifference454
betweenthelasttwoapproacheswasnotlarge(Fig3).Onlyfortheoldestsamplesaswellasin455
thenon-sonicatedfreshsample,denovoreferenceprovidedslightlybetterresults.456
457
Fig3.Percentageofthecapturedreadsshowinguniquemappingeventsfordifferent458
typesofDNApreparationsandbioinformaticpipelines.459
460
IntermsofthenumberofSNPsretainedaftercoverage,paralogsandamong-samples461
overlapfiltering,theRAD-refpipelinedetectedthehighestnumbersofloci,regardlessofsample462
ageandpreparation(Fig4).Noclearcorrelationwithsampleagecouldbeobserved,although463
allthemethodsprovidedthelowestSNPnumbersinthefreshsamples–mostlikelyaneffectof464
0.0
2.0
4.0
6.0
8.0
10.0
12.0
fresh museum,30y.o.
museum,58y.o.
fresh museum,30y.o.
museum,58y.o.
Percen
tageofsingle-mappingevents
Sonicatedlibraries Non-sonicatedlibraries
TypeofthesampleandlibrarypreparaEonprotocol
RAD-ref
RAD-ref-ext
assembly-ref
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
24
smallgeneticdistancebetweenthereferencesampleandthefreshspecimens(asthefresh465
samplewasusedbothforthereferenceandthealignedsample,onlyheterozygotesitesaccount466
forSNPshere).AhighernumberofSNPswasdetectedinthesonicatedlibraryonlyforthefresh467
specimens.468
469
Fig4.MeannumberofSNPspersampleobtainedfordifferenttypesofDNA470
preparationsandbioinformaticpipelines.471
472
RAD-sequencingdatasetsdependonthepresenceoftherestrictionsitesandtherefore473
anypolymorphisminsuchsitesleadstoeithermissinglocioralleles.Asourmethoddoesnot474
dependontherestrictionsitepresence,combinedwiththehighnumberofgatheredSNPs,this475
allowsobtaininglargelyfilleddatamatrices.Matrixfullnesswas>50%inallcases:476
- assembly-ref,non-sonicated:66.1%477
0
500
1000
1500
2000
2500
3000
fresh museum,30y.o.
museum,58y.o.
fresh museum,30y.o.
museum,58y.o.
Num
bero
fSNPs
Sonicatedlibraries Non-sonicatedlibraries
TypeofthesampleandlibrarypreparaAonprotocol
RAD-ref
RAD-ref-ext
assembly-ref
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
25
- assembly-ref,sonicated:63.4%478
- RAD-ref,non-sonicated:52.0%479
- RAD-ref,sonicated:69.1%480
- RAD-ref-ext,non-sonicated:63.4%481
- RAD-ref-ext,sonicated:72.5%482
Lackinganobjectivecriterionforassessingthe‘best’performingmethodforbuilding483
thecatalog,wetestedwhichofthethreereferencescreatedthedatasetproducingtheexpected484
spatialdivisionofgeneticstructurebetweenFinnishandRomanianL.hellesamples.Thespatial485
geneticstructureinferredbyfastStructurerevealedthattheeightsamplesaredividedintotwo486
clustersof,respectively,sevenFinnishvs.oneRomaniansampleonlywhenusingthenon-487
sonicatedlibrarymappedtotheassembly-refcatalog.Thisresultisinagreementwiththe488
hypothesisthatwhenusingsonicatedDNA,weareathighriskofincorporatingcontaminant489
DNAwhichcanblurthesignal(MatthiasMeyer,MaxPlanckInstituteforEvolutionary490
Anthropology,Liepzig,DE;personalcommunication).ThisisaninterestingresultasBLAST491
analysesonthesixcatalogsdidnotretrievedifferencesinthelevelofknowncontaminants(see492
below).493
Locioverlapamongtheassemblymethods.About15.2%ofthereferenceloci494
obtainedviadenovoassemblyweresharedwiththosebasedontheclusteringofRAD-seqreads495
(eitherRAD-reforRAD-ref-ext;Fig5).Incontrast,anappreciablefractionoftheobtained496
referenceloci(59.7%)wereuniquetotheassembly-refapproach.WhenmappingrawRAD497
readsonRAD-ref,26.3%didnotmap,42.4%mappedonceand31.3%mappedmorethanonce.498
ThisshowsthatRAD-refsummarizesca.¾ofthereadsfromtheRADdataset.Incontrast,when499
mappingrawRADreadsonassembly-ref,78,3%ofreadsdidnotmap,21,6%mappedonceand500
0,1%mappedmorethanonce.Thisresultshowsthatassembly-refpossiblycontainsthree501
quartersofalllocithatarenothomologoustotheRADprobes.Suchlowsignaltonoiseratio502
(targetedreadstothenumberoftotalreads)ismostlikelyaresultofbackgroundcarryoverin503
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
26
thehybridizationcapturestep,aphenomenonwhichcanhavemanysources.,Itcouldresult504
from‘daisy-chaining’ofthecapturedfragments[49,50],wherepartiallycomplementaryDNA505
moleculeshybridizewiththeotherfragmentsthatarealreadyhybridizedtotheprobes.Wecan506
howeverdiscardthisexplanationasaprimaryreasonforthebackgroundcarryoveras507
extendingofRADprobesdidnotproducelongercontigs(inRAD-ref-extassemblies).Another508
likelyreasoncouldbecarryoverofrandomDNAfragmentswithrepetitivesequences.The509
extentofsuchprocesscanbesignificantlyreducedbyaddingblockingagentstothe510
hybridizationmix(typicallyCot-1aswasperformedfortheO.decorusdataset[51]orsalmon511
spermDNA),aswellasoptimizinghybridizationtemperatureandwashstringencytoincrease512
captureefficiency.Suchdevelopmentsaredesirablebecausetheyincreasethepercentageof513
readsmatchingthelociofinterestandeventuallyimprovetheoverallsequencingcoverage.514
515
516
Fig5.Numberoflociobtainedusingdifferentbioinformaticapproaches,identifiedusing517
theOrthoMCL[42]pipelinefororthologydetection.518
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
27
519
EffectsofsamplepreparationandageonthenumbersofSNPsobtainedandthe520exogeneousDNAcontent521
Readscanbemappedonareferenceeitheroncewithahighestscore(i.e.,single522
mapping)oronmorethanoneregionofreferencewithclosescores(i.e.,multimapping).The523
reasonsformulti-mappingeventscanbebiological(e.g.,paralogsequences)ortechnical524
(splittingsinglelociintomorereferenceloci),neverthelessthesemappingeventscannotbe525
usedforSNPcallingandofferanotherbenchmarkfortheassemblymethodsused.Differencesin526
thenumberofsinglemappingeventsandinthenumbersofSNPsobtainedwerenotsubstantial527
betweensonicatedandnon-sonicatedsamples,anddependedonthesampleageandthe528
bioinformaticpipelineused(Figs3and4).Weexpectedthatmuseumspecimensshould529
performbetterwithoutsonication,astheDNAwasalreadyvisiblyfragmented,andsonicationof530
museumspecimensmayincreasethelevelsofexogenousDNAcontamination(byfragmenting531
intactfungalorbacterialDNAcontaminatingmuseumsamples;M.Meyer,pers.comm.).In532
termsofmappingevents,whereasthefreshsampleusuallyshowedabetterratioofsingle-to533
multi-mappingeventswhensonicated,therewasatrendtowardsahigherpercentageofunique534
mappingeventsfornon-sonicated30-yearsoldmuseumspecimens,onaverage1.3times535
higher,irrespectiveofthereferenceused(thedifferencewaslessclearfor58-yearsoldmuseum536
specimens,anddependedonthereferenceused).Wewouldthereforeadvisenottosonicate537
DNAobtainedfromthemuseumspecimens,whichsignificantlycutsdownthepriceandtime538
requiredforlibrarypreparation,exceptincaseswhennosignsofdegradationareobservable539
ontheDNAprofile.AslevelsofDNAdegradationofcontemporarysamplesmayvary,onemay540
considerthatthesonicationstepshouldbeadvisablewhenworkingwithwell-preservedDNA.541
However,thisisstillanopenquestion,aswhereasBLASTsearchesdidnotretrievehigher542
fractionsofcontaminantsinsonicatedvs.non-sonicatedlibraries,theexpectedpopulation543
structurewasretrievedwasthenon-sonicatedonemappedonassembly-ref.544
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
28
Asoneofthemaintypesofpost-mortemDNAdegradationisdeaminationofcytosines,545
highlydamagedancientormuseumDNAsamplesareusuallycharacterizedbyhigheruracil546
content[52-54].Inclassicallibrarypreparationprotocols,theusageofaproofreading547
polymerasesshouldstallthechainelongationinthepresenceofuracilandthusreducethe548
misincorporationerrorsinthefinaldataset.Ontheotherhandthisapproachmightnotbe549
optimalforhighlydegradedancientDNAsamples,wherealargefractionofDNAfragmentsmay550
carrycytosinetouracilmisincorporations[53].Moreovertheusageofaproofreading551
polymerasedoesnotpreventthemisincorporationscausedbydirectdeaminationof552
methylatedcytosinetothymine,orlesscommondeaminationofguanidinetoadenine[52,54].553
Intheprotocolusedabove[15],thesecondstrandsynthesiswasperformedusingKlenow554
Fragment(3’→5’exo-),lackingproofreadingability,andthusapproximatelyhalfofthe555
resultingDNAfragmentsshouldhavecytosinetouracilmisincorporationsubstitutedfor556
thymine,amplifiablebyproofreadingDNApolymerase.Wethusoptedforabioinformaticpost-557
processingwayoffiltering-outsuchbases.Post-mortemdamageinthesequencedsampleswas558
assessedbymapDamage2.0,whichrescalessequencefilesbydownscalingqualityscoresof559
likelypost-mortemdamagedbases.AssomeSNPsbecamefilteredbylowerqualityscoresafter560
therescaling,thenumberofSNPsisdecreasedaftermapDamage2.0.Weexpectedhigher561
numberofdiscardedSNPsintheoldestsamples,becauseofahigherproportionofDNAdamage562
occurringwithtime.TheproportionofSNPsdiscardedafterapplyingmapDamage2.0wasthe563
highestamongthe58yearsoldsamples(1.46%forRAD-ref;1.89%forRAD-ref-ext;2.36%for564
assembly-ref),althoughrelativelylow,giventhesamples’ageandpreservationtype(seeS1565
Table).566
Itisworthmentioning,thatthehighernumberofSNPsdetectedinlibrariesfrom567
museumspecimens,comparingtothefreshsamples,isnotaneffectofpost-mortemdamage568
(anoppositetrendwasdetectedwiththehigestproportionoftypeIItransitionstotransitions569
[55]andtransversionspresentinthefreshspecimens;S4Fig)570
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
29
Application:spatialgeneticstructureofOedaleusdecorus571
Thedenovoreferencecatalogwascomposedof408,851referencecontigs.TheN50572
lengthwas321bpandthetotallengthwas119,789,911bp.Amongthetotalnumberofcontigs,573
9%wereshowntobeofexogeneousoriginbytheBLASTsearch,eitheragainstfungiand574
bacteriaGenBankdatabasesoragainsttechnicalsequences.Suchalevelofcontaminantsis575
expectedhere,asincontrasttotheL.hellereferences,whichwerebuiltfromfreshsamples,the576
O.decorusassembly-refwasbasedoneightspecimensfromthecapturedlibrary—eitherfresh577
orpin-mounted—thatshowedthelargestnumberofreads.Atotalof4,783,774informative578
siteswereretrievedafterSNPcalling.Afterremovingindelsandlowqualitysites,125,890sites579
wereconserved.Keepingonlybialleliclociwithaminorallelecountofatleast6,withdata580
fullnesshigherorequalto50%ofthesamples,weobtained6,046loci.Finally,weconserved581
2,979SNPsaftertheremovalofpotentialparalogoussites.ThemediandepthforeachSNPwas582
10.Onaverage,eachofthe49sampleswerecharacterizedby1864SNPsandeachSNPwas583
foundin32individuals(62.7%ofmatrixfullness).584
ThespatialgeneticstructureinferredbyfastStructurerevealedtwogeographically585
distinctcladesinthewestandtheeastofthePalearctic(Fig6).Thisresultsupportstheeastern-586
westernsplitpreviouslyhighlightedinOedaleusdecorusbasedonmtDNAamplicons[22].This587
demonstratesthathyRADisareliabletechniquetoinferspatialgeneticstructurefromboth588
freshsamplesandmuseumsamplescollectedatvarioustimepointsinthepast.589
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
30
590
Fig6.SpatialgeneticstructureofO.decorusinferredusingfastStructurewithk=2and591
simplepriors.Coloursdenotethetwodifferentgeneticgroupssupportedbyapreviousstudy592
relyingonmtDNAmarkers[22].593
594
Conclusions595
596Here,wepresentamethodforobtaininglargesetsofhomologouslocifrommuseum597
specimens,withoutanyapriorigenomeinformation.Despitethedifferencesinsingle-mapping598
eventsamongsamplesofdifferentageswereretrieved,theobtainednumbersofSNPswerenot599
significantlydifferentforthetwoageclassesoftheL.hellemuseumspecimens.Weobtained600
aroundathousandofSNPsfromL.hellesamplesupto58yearsold,confirmingthatitcanbe601
successfullyappliedinthefieldofmuseumgenomics.Theapplicationofthecatalog-building602
methodthatwasthemostpromisingforresolvingpopulationgeneticstructure(i.e.,non-603
sonicatedlibrarymappedtoassembly-ref)tothegrasshopperO.decorus,confirmedthe604
usefulnessofhyRADtoretrievephylogeographicdatausingmuseumsamplesuptoone605
hundredyearsold.Ourmethoddoesnotrequiretime-consumingandcostlyprobesdesignand606
synthesis,noraccesstofreshsamplesforRNAextraction,makingitoneofthesimplestand607
moststraightforwardtechniqueforobtainingorthologouslocifromdegradedmuseumsamples.608
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
31
Intheprotocol,weappliedamodifiedshotgunlibrarypreparationmethod,optimized609
fordegradedDNAfrommuseumspecimens[15].However,thecaptureprotocolpresentedhere610
canbeappliedtoanytypeoflibrarypreparation,includingcommercialones,simplifyingthe611
workflowandcuttingdownthepreparationtime.612
Wealsoexploredseveralbioinformaticapproachesforlociassemblyfromthecaptured613
libraries,acrucialstepwhenworkingonorganismswithoutareferencegenome.Identifyingthe614
mostappropriatecatalog-buildingmethodmaydependonthegoalsofeachstudy.Inourcase,615
thepipelinethatwasthebestatidentifyingpopulationstructureinthebutterflywasrelyingon616
anon-sonicatedlibrarymappedtothedenovoreferenceassemblyfromcapturedreadsfroma617
singleethanol-preservedspecimen,usingSOAPdenovoassembler(assembly-ref).Despitethe618
factthatamaximumof26%oftheobtainedsequencesmappedtothereferencesandthe619
proportionofsinglemappingeventswerenothigherthan10%onaverage(Fig3),wecould620
successfullycallaroundathousandoflociineachcase(Fig4),withhighcoverageacrossthe621
samples.622
Importantlyfromthewetlabprotocolperspective,inthehybridizationstep,wehave623
usedblockingoligonucleotidestoprevent‘daisy-chaining’ofcapturedsequencesbyadapter624
sequences’homology.Usingablockingagentpreventingsimilarchainingcausedbyrepetitive625
sequences(Cot-1DNA,appliedtothegrasshopperlibraries,seebelow[51])aswellas626
optimizingtheconditionsofhybridizationandcapturereactionsforincreasedstringency(e.g.,627
bydecreasinghybridizationtemperatureandthestringencyofthewashesusinghigher628
concentrationofSDSand/orlowerconcentrationofSSC)mayfurtherincreasethe629
hybridizationefficiencyandthusthenumbersofreadsmappingonthereferenceandreduce630
thenumberoflow-coverageloci.631
Thedenovoassemblybuildingpipelineproducedthelargestcontigof2,352bpforthe632
butterflyand13,103bpforthegrasshopperdataset.Althoughthemeanlengthoftheassembled633
contigswasmuchsmaller,ourmethodalsoallowsretrievinglongersequencesthanthelength634
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
32
oftheprobesused.ThereasonforthisisthatcapturedsequenceshybridizewithotherDNA635
fragmentswithhomologoussequences,flankingtheprobesequence(i.e.,‘daisy-chaining’[49,636
50]).Thismayleadtoenrichmentacrosslargerfractionsofgenome,asideeffectofourmethod,637
thatcanbeutilizedforassemblyoflargercontigsbyusinglongerprobesandcapturinglonger638
targets.639
Themethodpresentedhere,althoughbasedontherestrictionenzymedigestionofDNA640
tocreatetherandomgenomicprobes,doesnotdependontherestrictionsitepresenceinthe641
capturedlibrary.ThisrepresentsasignificantimprovementoverclassicalRAD-sequencing642
datasets,inwhichincreaseinthephylogeneticdistanceamongsamplesiscorrelatedwithan643
increaseinthenumberofmissingsites[56-61],sometimesleadingtoconflictingsignals644
betweenRAD-andcapture-baseddatasets[62],orarecharacterizedbythepresenceofnull645
allelesthatleadtoheterozygosityorFSTunderestimation[9,10].Inthisaspect,ourapproachis646
similartoothercapture-enrichmentprotocols,suchasUltraConservedElements[5]orexome-647
capture[4],withthebenefitofmuchsimplerandlessexpensiveprobegeneration,without648
accesstogenomeinformationorfreshspecimensforRNAisolation.Notrelyingonthepresence649
ofrestrictionsite,themethodpresentedhereshouldbealsousefulforbroaderphylogenetic650
scales,allowingsequencinghomologouslocifrommoredivergenttaxa,whichwouldnotbe651
possibletoretrieveusingclassicalRAD-seqapproaches.652
653
Acknowledgments654
655WethankAlanBrelsford,AliciaMastretta-YanesandPawelRosikiewiczfortheirhelp656
withdevelopingRAD-sequencingprotocols.JairoPatiñotestedearlyversionsoftheprotocol657
andprovidedvaluablefeedback.RogerVilaandGeraldHeckelkindlyprovidedfreshsamplesfor658
thestudy.Wethankthefollowingmuseumcuratorsforprovidingcollectionsamples:Hannes659
Baur(NaturalHistoryMuseum,Bern,Switzerland),AnneFreitag(ZoologicalMuseum,660
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
33
Lausanne,Switzerland),RodEastwood(ETHEntomologicalCollection,Zurich,Switzerland),661
DanielBurckhardt(NaturalHistoryMuseum,Basel,Switzerland),PeterSchwendinger(Natural662
HistoryMuseum,Geneva,Switzerland),BarabaraOberholzer(NaturalHistoryMuseum,Zurich,663
Switzerland),GeorgeBeccaloni(NaturalHistoryMuseum,London,UK),LauriKaila(Finnish664
MuseumofNaturalHistory,Helsinki,Finland).WealsothankBrentEmersonforhisconstant665
supportduringthedevelopmentofthismethod.666
667 668
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
34
References669
670
1.EllegrenH.Genomesequencingandpopulationgenomicsinnon-modelorganisms.671
TrendsinEcology&Evolution2014;29:51-63.672
2.DaveyJW,HohenlohePA,EtterPD,BooneJQ,CatchenJM,BlaxterML.Genome-wide673
geneticmarkerdiscoveryandgenotypingusingnext-generationsequencing.NatureReviews674
Genetics2011;12:499-510.675
3.McCormackJE,HirdSM,ZellmerAJ,CarstensBC,BrumfieldRT.Applicationsofnext-676
generationsequencingtophylogeographyandphylogenetics.MolecularPhylogeneticsand677
Evolution2013;66:526-538.678
4.SulonenAM,EllonenP,AlmusaH,LepistöM,EldforsS,HannulaS,etal.Comparisonof679
solution-basedexomecapturemethodsfornextgenerationsequencing.GenomeBiology680
2011;12:R94.681
5.FairclothBC,McCormackJE,CrawfordNG,HarveyMG,BrumfieldRT,GlennTC.682
Ultraconservedelementsanchorthousandsofgeneticmarkersspanningmultipleevolutionary683
timescales.SystematicBiology2012;61:717-726.doi:10.1093/sysbio/sys004.684
6.ChepelevI,WeiG,TangQ,ZhaoK.Detectionofsinglenucleotidevariationsin685
expressedexonsofthehumangenomeusingRNA-Seq.NucleicAcidsResearch2009;37:e106.686
7.BairdNA,EtterPD,AtwoodTS,CurreyMC,ShiverAL,LewisZA,etal.RapidSNP687
discoveryandgeneticmappingusingsequencedRADmarkers.PLoSONE2008;3:e3376.688
doi:10.1371/journal.pone.0003376.689
8.PetersonBK,WeberJN,KayEH,FisherHS,HoekstraHE.DoubledigestRADseq:an690
inexpensivemethodfordenovoSNPdiscoveryandgenotypinginmodelandnon-model691
species.PLoSONE2012;7:e37135.692
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
35
9.ArnoldB,Corbett-DetigRB,HartlD,BombliesK.RADsequnderestimatesdiversityand693
introducesgenealogicalbiasesduetononrandomhaplotypesampling.MolecularEcology694
2013;22:3179–3190.695
10.GautierM,GharbiK,CezardT,FoucaudJ,KerdelhuéC,PudloP,etal.Theeffectof696
RADalleledropoutontheestimationofgeneticvariationwithinandbetweenpopulations.697
MolecularEcology2013;22:3165-3178.doi:10.1111/mec.12089.698
11.DaveyJW,CezardT,Fuentes-UtrillaP,ElandC,GharbiK,BlaxterML.Specialfeatures699
ofRADSequencingdata:implicationsforgenotyping.MolecularEcology2013;22(11),3151-700
3164.doi:10.1111/mec.12084.701
12.PuritzJB,MatzMV,ToonenRJ,WeberJN,BolnickDI,BirdCE.DemystifyingtheRAD702
fad.MolecularEcology2014;23:5937–5942.doi:10.1111/mec.12965.703
13.MasonVC,LiG,HelgenKM,MurphyWJ.Efficientcross-speciescapturehybridization704
andnext-generationsequencingofmitochondrialgenomesfromnoninvasivelysampled705
museumspecimens.GenomeResearch2011;21:1695-1704.706
14.StaatsM,CuencaA,RichardsonJE,Vrielink-vanGinkelR,PetersenG,SebergO,etal.707
DNADamageinPlantHerbariumTissue.PLoSONE2011;6:e28448.doi:708
10.1371/journal.pone.0028448.709
15.TinMM-Y,EconomoEP,MikheyevAS.SequencingdegradedDNAfromnon-710
destructivelysampledmuseumspecimensforRAD-taggingandlow-coverageshotgun711
phylogenetics.PLoSONE2014;9:e96793.doi:10.1371/journal.pone.0096793.712
16.WandelerP,HoeckPE,KellerLF.Backtothefuture:museumspecimensin713
populationgenetics.TrendsinEcology&Evolution2007;22:634-642.714
17.RoweKC,SinghalS,MacManesMD,AyrolesJF,MorelliTL,RubidgeEM,etal.715
Museumgenomics:low-costandhigh-accuracygeneticdatafromhistoricalspecimens.716
MolecularEcologyResources2011;11:1082–1092.doi:10.1111/j.1755-0998.2011.03052.x.717
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
36
18.BiK,LinderothT,VanderpoolD,GoodJM,NielsenR,MoritzC.Unlockingthevault:718
nextgenerationmuseumpopulationgenomics.MolecularEcology2013;22:6018–6032.719
19.JonesMR,GoodJM.Targetedcaptureinevolutionaryandecologicalgenomics.720
MolecularEcology2015.doi:10.1111/mec.13304.721
20.OrlandoL,GilbertMTP,WillerslevE.Reconstructingancientgenomesand722
epigenomes.NatureReviewsGenetics2015;16:395-408.doi:10.1038/nrg3935.723
21.LemmonAR,EmmeSA,LemmonEM.Anchoredhybridenrichmentformassively724
high-throughputphylogenomics.SystematicBiology2012;sys049.725
22.KindlerE,ArlettazR,HeckelG.Deepphylogeographicdivergenceandcytonuclear726
discordanceinthegrasshopperOedaleusdecorus.Molecularphylogeneticsandevolution727
2012;65(2),695-704.728
23.Mastretta-YanesA,ArrigoN,AlvarezN,JorgensenTH,PiñeroD,EmersonBC.729
Restrictionsite-associatedDNAsequencing,genotypingerrorestimationanddenovoassembly730
optimizationforpopulationgeneticinference.MolecularEcologyResources2015;15:28-41.731
doi:10.1111/1755-0998.12291.732
24.MeyerM,KircherM.Illuminasequencinglibrarypreparationforhighlymultiplexed733
targetcaptureandsequencing.ColdSpringHarborProtocols2010;2010:t5448.doi:734
10.1101/pdb.prot5448.735
25.OpenWetWarecontributors'HybSeqPrep'.OpenWetWare2015;736
http://openwetware.org/index.php?title=Hyb_Seq_Prep&oldid=553025.737
26.CatchenJ,HohenloheP,BasshamS,AmoresA,CreskoW.Stacks:ananalysistoolset738
forpopulationgenomics.MolecularEcology2013;22:3124-3140.739
27.EatonDA.PyRAD:assemblyofdenovoRADseqlociforphylogeneticanalyses.740
Bioinformatics2014;30:844-1849.doi:10.1093/bioinformatics/btu121.741
28.FASTX-Toolkit.2015.Database:GitHub[Internet].Available:742
https://github.com/agordon/fastx_toolkit.743
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
37
29.KruegerF.TrimGalore:AwrappertoolaroundCutadaptandFastQCtoconsistently744
applyqualityandadaptertrimmingtoFastQfiles,withsomeextrafunctionalityforMspI-745
digestedRRBS-type(ReducedRepresentationBisufite-Seq)libraries.2015.Available:746
http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/.747
30.AronestyE.ea-utils:command-linetoolsforprocessingbiologicalsequencingdata;748
2011.Database:GoogleCode[Internet]Available:http://code.google.com/p/ea-utils749
31.Picardtools.2015.Database:GitHub[Internet].Available:750
http://broadinstitute.github.io/picard/.751
32.FlouriT,IjazUZ,MahéF,NicholsB,QuinceC,RognesT.VSEARCHGitHubrepository.752
Release1.0.16;2015.Database:GitHub[Internet].Available:753
https://github.com/torognes/vsearch.doi:10.5281/zenodo.15524.754
33.RubyJG,BellareP,DeRisiJL.PRICE:softwareforthetargetedassemblyof755
componentsof(meta)genomicsequencedata.G3:Genes,Genomes,Genetics2013;3:865-880.756
34.BolgerAM,LohseM,UsadelB.Trimmomatic:aflexibletrimmerforIllumina757
sequencedata.Bioinformatics2014;30:2114-2120.doi:10.1093/bioinformatics/btu170.758
35.LuoR,LiuB,XieY,LiZ,HuangW,YuanJ,etal.SOAPdenovo2:anempirically759
improvedmemory-efficientshort-readdenovoassembler.GigaScience2012;1(1):18.doi:760
10.1186/2047-217X-1-18.761
36.LangmeadB,SalzbergS.Fastgapped-readalignmentwithBowtie2.NatureMethods762
2012;9:357-359.763
37.GarrisonE,MarthG.Haplotype-basedvariantdetectionfromshort-readsequencing.764
arXiv2012;arXiv:1207.3907.765
38.LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,etal.TheSequence766
alignment/map(SAM)formatandSAMtools.Bioinformatics2009;25:2078-2079.767
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
38
39.JónssonH,GinolhacA,SchubertM,JohnsonP,OrlandoL.mapDamage2.0:fast768
approximateBayesianestimatesofancientDNAdamageparameters.Bioinformatics2013;29:769
1682-1684.doi:10.1093/bioinformatics/btt193.770
40.GarrisonE.vcflib.2015.Database:GitHub[Internet].Available:771
https://github.com/ekg/vcflib.772
41.AutonA,DanecekP,MarckettaA.VCFtools.2015.Database:GitHub[Internet].773
Available:https://vcftools.github.io/.774
42.PuritzJB,HollenbeckCM,GoldJR.dDocent:aRADseq,variant-callingpipeline775
designedforpopulationgenomicsofnon-modelorganisms.PeerJ2014;10.7717/peerj.431.776
43.DanecekP,AutonA,AbecasisG,AlbersCA,BanksE,DePristoMA,etal.Thevariant777
callformatandVCFtools.Bioinformatics2011;27:2156-2158.778
44.LischerHEL,ExcoffierL.PGDSpider:Anautomateddataconversiontoolfor779
connectingpopulationgeneticsandgenomicsprograms.Bioinformatics2012;28:298-299.780
45.RajA,StephensM,PritchardJK.fastSTRUCTURE:VariationalInferenceofPopulation781
StructureinLargeSNPDataSets.Genetics2014;197:573-589;doi:782
10.1534/genetics.114.164350.783
46.LiL,StoeckertCJ,RoosDS.OrthoMCL:identificationoforthologgroupsfor784
eukaryoticgenomes.GenomeResearch2003;13:2178-2189.785
47.KearseM,MoiR,WilsonA,Stones-HavasS,CheungM,SturrockS,etal.Geneious786
Basic:anintegratedandextendabledesktopsoftwareplatformfortheorganizationandanalysis787
ofsequencedata.Bioinformatics2012;28(12),1647-1649;doi:10.1093/bioinformatics/bts199.788
48.QGISDevelopmentTeam.QGISGeographicInformationSystem.OpenSource789
GeospatialFoundationProject.2014;http://qgis.osgeo.org.790
49.CronnR,KnausBJ,ListonA,MaughanPJ,ParksM,SyringJV,etal.Targeted791
enrichmentstrategiesfornext-generationplantbiology.AmericanJournalofBotany2012;99:792
291-311.793
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
39
50.TsangarasK,WalesN,Sicheritz-PonténT,RasmussenS,MichauxJ,IshidaY,etal.794
HybridizationcaptureusingshortPCRproductsenrichessmallgenomesbyCapturingFlanking795
sequences(CapFlank).PLoSONE2014;9:e109101.796
51.FairclothBC,BranstetterMG,WhiteND,BradySG.Targetenrichmentof797
ultraconservedelementsfromarthropodsprovidesagenomicperspectiveonrelationships798
amongHymenoptera.MolecularEcologyResources2015;15:489–501.doi:10.1111/1755-799
0998.12328.800
52.BriggsAW,StenzelU,MeyerM,KrauseJ,KircherM,PääboS.Removalofdeaminated801
cytosinesanddetectionofinvivomethylationinancientDNA.NucleicAcidsResearch2010;38:802
e87.803
53.BriggsAW,StenzelU,JohnsonPL,GreenRE,KelsoJ,PrüferK,etal.Patternsof804
damageingenomicDNAsequencesfromaNeandertal.ProceedingsoftheNationalAcademyof805
SciencesUSA2007;104:14616-14621.806
54.StillerM,GreenRE,RonanM,SimonsJF,DuL,HeW,etal.Patternsofnucleotide807
misincorporationsduringenzymaticamplificationanddirectlarge-scalesequencingofancient808
DNA.ProceedingsoftheNationalAcademyofSciencesUSA,2006;103:13578-13584.809
55.GilbertMTP,HansenAJ,WillerslevE,RudbeckL,BarnesI,LynnerupN,etal.810
Characterizationofgeneticmiscodinglesionscausedbypostmortemdamage.TheAmerican811
JournalofHumanGenetics,2003,72:48-61.812
56.CruaudA,GautierM,GalanM,FoucaudJ,SaunéL,GensonG,etal.Empirical813
assessmentofRADsequencingforinterspecificphylogeny.MolecularBiologyandEvolution814
2014;31:1272-1274.815
57.EatonDA,ReeRH.InferringphylogenyandintrogressionusingRADseqdata:an816
examplefromfloweringplants(Pedicularis:Orobanchaceae).SystematicBiology2013;62:689-817
706.818
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
40
58.HippAL,EatonDAR,Cavender-BaresJ,FitzekE,NipperR,etal.AFramework819
PhylogenyoftheAmericanOakCladeBasedonSequencedRADData.PLoSONE2014;9:820
e93975.821
59.JonesJC,FanS,FranchiniP,SchartlM,MeyerA.Theevolutionaryhistoryof822
Xiphophorusfishandtheirsexuallyselectedsword:agenome-wideapproachusingrestriction823
site-associatedDNAsequencing.MolecularEcology2013;22:2986-3001.824
60.RubinBE,ReeRH,MoreauCS.InferringphylogeniesfromRADsequencedata.PLoS825
ONE2012;7:e33394.826
61.WagnerCE,KellerI,WittwerS,SelzOM,MwaikoS,GreuterL,etal.Genome-wide827
RADsequencedataprovideunprecedentedresolutionofspeciesboundariesandrelationships828
intheLakeVictoriacichlidadaptiveradiation.MolecularEcology2013;22:787-798.829
62.LeachéAD,ChavezAS,JonesLN,GrummerJA,GottschoAD,LinkemCW.830
Phylogenomicsofphrynosomatidlizards:conflictingsignalsfromsequencecaptureversus831
restrictionsiteassociatedDNAsequencing.GenomeBiologyandEvolution2015;7:706–719.832
833
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint
41
SupportingInformation834
835
S1Fig.ProfileoftheRAD-probesprecursor,theRAD-seqlibrary.Leftpanel,X-axis:836
fragmentsize(semi-logscale);Y-axis:fragmentdensity(RelativeFluorescentUnits).Right837
panel,gel-likerepresentationoftheleftpanel.838
839
S2Fig.Profileofthere-amplifiedcapturelibraryafterAMPurepurification.Leftpanel,840
X-axis:fragmentsize(semi-logscale);Y-axis:fragmentdensity(RelativeFluorescentUnits).841
Rightpanel,gel-likerepresentationoftheleftpanel.842
843
S3Fig.IllustrationoftheclusteringoptimizationofRAD-refassemblyclustering844
thresholdsusingVsearch.X-axis:clusteringthreshold;Y-axis:numberofclusterswith2x(red)845
or3x(greenline)coverage.Thetoppanelshowswithin-sample,whereasthebottompanel846
showsamong-sampleclusteringresults.Theoptimalthresholdoptimizesthenumberofthe847
clusterswith2xand3xcoverage.848
849
S4Fig.ProportionoftypeIItransitionstoallthetransitionsandtransversionsforeach850
ofthereferencecatalog,withandwithoutpost-mortembiascorrection.Thedatashownarefor851
thefreshsamplewithDNAsonicationandthemuseumsampleswithoutsonication.Theleftplot852
showsvalueswithoutandtherightplotwithmapDamage2.0correction.853
854
S1Table.mapDamage2.0resultsbasedoneachofthethreereferencecatalogsforL.855
helleanalyses,withthenumberofobtainedSNPs(withandwithoutapplicationof856
mapDamage2.0).857
.CC-BY 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted March 22, 2016. ; https://doi.org/10.1101/025551doi: bioRxiv preprint