View
213
Download
0
Category
Preview:
Citation preview
Sakamoto, M., Venditti, C., & Benton, M. J. (2017). 'Residual diversityestimates' do not correct for sampling bias in palaeodiversity data. Methodsin Ecology and Evolution, 8(4), 453–459. https://doi.org/10.1111/2041-210X.12666
Peer reviewed version
License (if available):CC BY-NC
Link to published version (if available):10.1111/2041-210X.12666
Link to publication record in Explore Bristol ResearchPDF-document
This is the author accepted manuscript (AAM). The final published version (version of record) is available onlinevia Wiley at http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12666/abstract. Please refer to any applicableterms of use of the publisher.
University of Bristol - Explore Bristol ResearchGeneral rights
This document is made available in accordance with publisher policies. Please cite only the publishedversion using the reference above. Full terms of use are available:http://www.bristol.ac.uk/pure/about/ebr-terms
1
‘Residualdiversityestimates’donotcorrectforsamplingbiasin1
palaeodiversitydata2
3
SHORTTITLE:Donotuseresidualsmethod4
5
WORDCOUNT:3,6356
7
ManabuSakamoto1,ChrisVenditti1andMichaelJ.Benton28
9
1SchoolofBiologicalSciences,UniversityofReading,Reading,RG66AJ,UK10
2SchoolofEarthSciences,UniversityofBristol,Bristol,BS81RJ,UK11
12
EMAIL:m.sakamoto@reading.ac.uk13
14
15
2
ABSTRACT16
1. Itiswidelyacceptedthatthefossilrecordsuffersfromvarioussampling17
biases–diversitysignalsthroughtimemaypartlyorlargelyreflectthe18
rockrecord–andmanymethodshavebeendevisedtodealwiththis19
problem.Onewidelyusedmethod,the‘residualdiversity’method,uses20
residualsfromamodelledrelationshipbetweenpalaeodiversityand21
sampling(sampling-drivendiversitymodel)as‘corrected’diversity22
estimates,buttheunorthodoxwayinwhichtheseresidualsaregenerated23
presentsseriousstatisticalproblems;theresponseandpredictor24
variablesaredecoupledthroughindependentsorting,renderingthenew25
bivariaterelationshipmeaningless.26
2. Here,weusesimplesimulationstodemonstratethedetrimental27
consequencesofindependentsorting,throughassessingerrorratesand28
biasesinregressionmodelcoefficients.29
3. Regressionmodelsbasedonindependentlysorteddataresultin30
unacceptablyhighratesofincorrectandsystematically,directionally31
biasedestimates,whenthetrueparametervaluesareknown.Thelarge32
numberofrecentpapersthatusedthemethodarelikelytohave33
producedmisleadingresultsandtheirimplicationsshouldbereassessed.34
4. Wenotethatthe‘residuals’approachbasedonthesampling-driven35
diversitymodelcannotbeusedto‘correct’forsamplingbias,andinstead36
advocatetheuseofphylogeneticmultipleregressionmodelsthatcan37
includevariousconfoundingfactors,includingsamplingbias,while38
simultaneouslyaccountingforstatisticalnon-independenceowingto39
sharedancestry.Evolutionarydynamicssuchasspeciationareinherently40
3
aphylogeneticprocess,andonlyanexplicitlyphylogeneticapproachwill41
correctlymodelthisprocess.42
KEYWORDS43
Palaeodiversity;residuals;modeling;samplingbias;fossilrecord;independent44
sorting 45
4
INTRODUCTION46
IthasbeenwellknownsincethetimeofDarwinthatthefossilrecordislargely47
incomplete(Darwin1859),promptinggenerationsofmacroevolutionary48
researcherstotakeacautiousapproachwheninterpretingpatternsof49
palaeodiversitythroughtime(Raup1972;Raup1976;Raup1991;Prothero50
1999;Smith&McGowan2007;Alroy2010b).Therehavebeenmanyattemptsto51
accountforthissamplingbias(Raup1972;Raup1976;Smith&McGowan2007;52
Alroy2010b),butoneapproachinparticular,oftenreferredtoasthe‘residual53
diversity’method,devisedbySmithandMcGowan(2007)(andmodifiedby54
Lloyd(2012)),hasbeenwidelyused(citationcount~215toAug2016;Google-55
Scholar).56
57
Usingregressionresidualsasdata‘corrected’forconfoundingfactorsisawidely58
usedmethodinbiology,socialsciences,economics(King1986;Freckleton59
2002),andeveninpalaeodiversitystudies(Raup1976).However,Smithand60
McGowan’s(2007)approachdiffersfromtheseclassicalresidualsapproachesin61
onekeyway:the‘residuals’aregeneratednotasregressionresiduals(ε=y-ŷ)62
fromasimpleregressionofdiversity(y)onaproxyofsampling(x),butfrom“a63
modelinwhichrockareaatoutcropwasaperfectpredictorofsampleddiversity”64
(Smith&McGowan2007),herereferredtoasthesampling-drivendiversity65
model(SDDM).TheSDDMisconstructedasaregressionmodelbetweenysorted66
fromlowtohighvalues(y’)andxsortedfromlowtohighvalues(x’),wherethe67
relationshipbetweenthesetwoindependentlysortedvariablesy’andx’is68
assumedtorepresenttheSDDgeneratingprocess–thoughthereisnoreasonto69
assumeassuch.‘Residuals’areobtainedasthedifferencebetweentheSDDM70
5
predictionsŷ’andtheobservedvaluesy,whicharethentreatedasthe‘residual71
diversityestimates’(figure1).72
73
However,independentlysortingyandxasoutlinedabovedecouplesapaired,74
bivariatedataset,andisobviouslyproblematicinstatistics.Modelfittingon75
decoupleddata(e.g.y’andx’)willleadtospuriouspredictionsand‘residuals’as76
theestimatedregressioncoefficientswillbebasedonaforced(false)linear77
relationship(figure1b).However,owingtocontinuedwideuseoftheSDDMasa78
preferredmethodforidentifyingsupposedly‘true’palaeodiversitysignals(as79
recentlyas(Grossnickle&Newham2016)),itappearsthatthisbasicstatistical80
conceptissomehowoverlooked.Whileithasbeensuggestedthattheuseof81
formationcountsto‘correct’palaeodiversitytimeseriesdataisunlikelytobe82
meaningfulbecauseofsubstantialredundancyofthetwometrics(Bentonetal.,83
2011;Benton2015),andarecentstudyhasscrutinizedtheperformanceof84
SDDMresidualsinaccuratelypredictingtruesimulatedbiodiversitysignals85
(Brocklehurst2015),theperformanceoftheSDDMitselfhasneverbeen86
assessed.Here,wedemonstratethedetrimentaleffectsofdecouplingdatain87
regressionmodellingusingsimplesimulations.88
89
90
MATERIALANDMETHODS91
Wefirstgeneratedrandomdeviates,x,samplingfromanormaldistribution(μ=92
0,σ=1),atasamplesizen=100(seeSIforothersamplesizesn=30and1000).93
Wethencalculatedyusingalinearrelationshipintheformofy=a+bx+e,94
whereaistheintercept,bistheslopeandeisGaussiannoise.Forsimplicity,we95
6
fixeda=0.4andb=0.6,whilevaryinge(μe=0,σe=0.05,0.1,0.25,0.5)–other96
valuesofaandbshouldreturnsimilarifnotidenticalresults(though,b=197
wouldbemeaningless).FollowingSmithandMcGowan(2007),wesortedyandx98
independentlyofeachothertogeneratey’andx’,andfittedanordinaryleast99
squares(OLS)regressionmodeltoy’onx’(SDDM).Forcomparison,wefittedan100
OLSregressionmodeltoyonxintheiroriginalpairedbivariaterelationship(the101
standardregressionmodel,SRM),theperformanceofwhichservesasa102
benchmark.103
104
TotestSmithandMcGowan’s(2007)assertionthattheSDDMisindeed“amodel105
inwhichrockareaatoutcropwasaperfectpredictorofsampleddiversity”,we106
evaluatedwhethertheestimatedregressioncoefficientsαandβsignificantly107
differedfromthetrueregressionparameters,aandb,usingat-test.Werepeated108
theprocedureover5000simulationsandcalculatedthepercentageoftimesthe109
estimatedcoefficientsdifferedsignificantlyfromthetrueparameters.Wewould110
expectabout5%ofthesimulationstoresultinregressioncoefficients111
significantlydifferentfromthetrueparametersbychancealone;anything112
substantiallyabovethisthresholdwouldindicatethatthemodelhas113
unacceptablyhighTypeIerrorratesorfalselyrejectingatruenullhypothesis,114
whereournullhypothesisisthattheSDDMcancorrectlyestimatethe‘true’115
modelparameters.116
117
Inaddition,wetestedforbiasintheestimatedregressionslopes,i.e.whetherthe118
estimatessystematicallydeviatedfromthesimulationparameterb=0.6.The119
meanofthe5000slopeswassubjectedtoat-testagainstafixedvalueof0.6.If120
7
deviationswererandom,thenwewouldnotexpecttofindanysignificant121
differencesbetweenthemeanslopeandthetheoreticalvalue,withallslopes122
randomlydistributedaroundit.123
124
125
RESULTS126
SRMcoefficientsweresignificantlydifferentfromthetruemodelparametersin127
only~5%ofthe5000iterationsacrossσe(figure2a;table1;SI),within128
acceptablelevelsofrandomlydetectingastatisticalsignificance.Variationin129
regressionlinesacross5000iterationsaredistributedrandomlyaboutthe130
simulatedline(figure3a),withnosignificantdifferencebetweenthemean131
regressionslopeandthesimulationparameterb=0.6(table2;SI).Incontrast,132
SDDMcoefficientsweresignificantlydifferentfromthetrueparameters(figure133
2b)ataratemuchhigherthantheconventionallyaccepted5%(table1;SI).The134
meanslopeoftheregressionmodelssignificantlydifferedfromthesimulation135
parameterb,inasystematicallyanddirectionalmanner(figure3b;table2;SI)–136
SDDMregressioncoefficientsarenotonlyincorrectbutgrosslymisleading.This137
systematicbiasincreaseswithincreasednoiseinthedata(table2)–themore138
noisethereisinthedata,themorepositivetherelationshipbetweeny’andx’139
becomes.140
141
142
DISCUSSION143
Byestablishing“amodelinwhichrockareaatoutcropwasaperfectpredictorof144
sampleddiversity”,SmithandMcGowan(2007)attemptedtocreateasampling-145
8
drivendiversitymodel.However,theirSDDMisnotbasedonanyhypothesized146
orempiricalrelationshipbetweendiversityandsampling,orformulatedfrom147
firstprinciples.Thisisincontrasttootherwell-formulatedbiologicalmodels148
suchasvariousscalingmodelswheretheparameterofinterest(i.e.scaling149
coefficientortheslopeofthebivariaterelationship)isfoundedonfirst-principle150
theories,e.g.the2/3ruleforthescalingofareawithmass.Rather,theSDDMis151
basedontheassumptionthaty’andx’(yandxsortedindependentlyofeach152
other)formtheexpectedtheoreticalbivariaterelationshipbetweenyandx,153
whichthisstudyshowstobeincorrect(figures2,3),asonewouldexpectsince154
thereisnoreasontoassumesuchathing.155
156
Afurtherandperhapsmoreseriousproblemwithusingaforcedpairingofy’and157
x’isthateachdatapoint(pairofy’iandx’i)doesnotrepresentanaturalpairing158
andhasnomeaning;thenewpairingisactuallyyiandxj,wheretheithandjth159
ordersareindependentofeachother.Forinstance,usingthemarinegeneric160
diversityandrockareadataofSmithandMcGowan(2007)(figure4),thelowest161
marinegenericdiversityisintheCambrian,TommotianStage(529–521million162
yearsago[Ma];genuscount=309),whilethesmallestmarinerockoutcroparea163
(afterremoving0valueddata(Smith&McGowan2007))isfromtheEarly164
Permian,Asselian/SakmarianStage(299–290Ma;rockarea=1).Similarly,the165
highestdiversityisrecordedforthePliocene(5.3–2.58Ma;genuscount=3911)166
whilethelargestrockareaisfoundintheCenomanian(100–94Ma;rockarea=167
373).Thesetwoextremepointsalonedemonstratethatthepaireddiversityand168
rockareavaluesaremillionsofyearsapart,andareindependentofeachother169
(figure4).170
9
171
Thismaybeobvious,butindependentlysortingyandxhasseriousstatistical172
consequences.Forinstance,inSmithandMcGowan’s(2007)data,log10marine173
genericdiversityhasnosignificantrelationshipwithlog10rockareaintheir174
originalpairedbivariatedata(figure4;r2=0.0398;p=0.0979),butoncesorted,175
hasasignificantlystrongpositiverelationshipwithlog10rockareasorted176
independentlyoflog10diversity(figure4;r2=0.903;p<0.001).Thisgeneral177
patternistrueinatleasttwomoredatasets(Bensonetal.2010;Benson&178
Upchurch2013)(figuresS1andS2).Theindependentsortingprocedurehas179
forcedastrongbutfalselinearrelationshipbetweentwovariablesthat180
otherwisedonotshowanysignificant(orifsignificant,averyweak)181
relationship.Infact,tworandomlygenerateddeviates(e.g.sampledfroma182
normaldistribution)thathavenorelationshipwitheachother(figure5a),once183
sortedindependentlyfromlowesttohighestwillinevitablyhaveasignificant184
andstrongrelationship(r2=~1;figure5b).Perhapsmoredetrimental,isthefact185
thattheindependentlysortedbivariaterelationshipwillalwaysbestrongly186
positive–asimulatednegativerelationshipbetweenxandy(figure5c)willhave187
astrongandpositiverelationshiponcetheyaresortedindependently(figure188
5d).189
190
Insomeclades(namelyMesozoicdinosaurs),diversitymeasurescanhavevery191
stronglypositiverelationshipswithsomesamplingmetrics,suchasgeological192
formationcounts(β=0.868;r2=0.85;p<0.001(Barrett,McGowan&Page193
2009))orfossilcollectioncounts(β=0.865;r2=0.79;p<0.001(Butleretal.194
2011)),whichwouldjustifycorrectingforsuchconfoundingfactors,ifthe195
10
samplingmetricswereindeednon-redundantwithdiversity(Bentonetal.2011;196
Bentonetal.2013;Benton2015).However,eveninsuchcases,itdoesnot197
changethefactthatthemodelledrelationshipobtainedfromtheSDDMwillstill198
besystematicallybiased(figure3),andalternativemethodsshouldbe199
considered.200
201
Itisproblematictostipulatethatthisforcedrelationshipisthe‘true’relationship202
betweensampledpalaeodiversityandtherockrecord.Oursimulationsshow203
thatregressionmodelsfittedonindependentlysorteddatahaveunacceptably204
highTypeIerrorrateswhenthedatagenerationprocessesareknown,meaning205
thatSmithandMcGowan’s(2007)approachisnotstatisticallyviable.In206
particular,thattheslopesareincorrectlyestimatedatveryhighrates(~100%207
whenσe=0.5)hassevereconsequencesinthatSDDMpredictionsare208
systematicallybiased(figures2b,3b),leadingtoerroneous‘residuals’.209
Inferencesmadefromsuchproblematic‘residuals’(e.g.Smith&McGowan2007;210
Barrett,McGowan&Page2009;Bensonetal.2010;Butleretal.2011;Benson&211
Upchurch2013)willinevitablybemisleading(Brocklehurst2015),lackingany212
biologicalorgeologicalmeaning.213
214
Givenoursimulations,westronglyrecommendagainstusingtheSDDM215
approachinmodellingtherelationshipbetweenpalaeodiversityandrockrecord216
data;thestandardregressionusingunsorteddataisasensibleoption.However,217
usingtheresidualsofaregressionmodelasdataforsubsequentanalyseshas218
alsolongbeenknowntointroducebiasedstatisticalestimates(King1986;219
Freckleton2002).Successiveseriesofmodellingremovesvarianceanddegrees220
11
offreedomfromsubsequentmodelparameterestimation,sothefinalmodels221
andstatisticalanalysesdonotaccountfortheremovederrorsappropriately222
(King1986).Instead,onecandirectlymodeltheconfoundingeffectsalongwith223
effectsofinterest(e.g.environment,climate,etc)throughmultipleregressions224
(OLS,GLMsorgeneralizedleastsquares[GLS]).Inthecontextofpalaeodiversity225
studies,onecanfitamultipleregressionmodelusingsomediversitymetricas226
theresponsevariableandsamplingproxyasaconfoundingcovariate,alongside227
additionalpredictorvariablessuchassealevel,temperature,etc.Theresulting228
modelcoefficientsfortheenvironmentalpredictorswouldbetheeffectsof229
interestafteraccountingfortheundesiredeffectsofrockavailability.Since230
diversitymeasuresarefrequentlytakenascounts,itisadvisabletousemodels231
thatappropriatelyaccountforerrorsincountdata,suchasthePoissonor232
negativebinomialmodels(O'Hara&Kotze2010).Whetherornottoincludetime233
seriesterms(e.g.autoregressive[AR]terms)dependsonthelevelofserial234
autocorrelationinthetimeseriesdataandonsamplesize;palaeontologicaltime235
seriestendtobeshort,with30timebinsorfewerbeingfairlytypical(Mesozoic236
dinosaursonlyspanamaximumof26geologicalstages(Butleretal.2011;237
Benson&Mannion2012)),inwhichcasecomplexmodelsfacetherisksofover-238
parameterisation.ModelselectionproceduresusingtheAkaikeInformation239
Criterion(Akaike1973)orsimilarindicescanhelpmakethisdecision(Burnham240
&Anderson2002).However,wedonotlightlyadvocatetheuseoftimeseries241
modelling,especiallyifthedependentvariable,sampleddiversity,isintheform242
ofcounts,inwhichcaseappropriatetimeseriesmethodsareseverelyunder-243
developed(butseegeneralisedlinearautoregressivemovingaverage[GLARMA]244
models(Dunsmuir&Scott2015)orPoissonexponentiallyweightedmoving245
12
average[PEWMA]models(Brandtetal.2000)),butmoreimportantlysince246
therearemoreappropriatealternativemethods,i.e.phylogeneticapproaches(?247
Silvestroetal.2015;Sakamoto,Benton&Venditti2016).248
249
Fundamentally,macroevolutionarystudiesaimtoincreaseourunderstandingof250
evolutionaryprocesses(speciationandextinctionthroughtime),ratherthanthe251
resultingpatternsorphenomena(sampleddiversity,e.g.richness).Thus,we252
shouldseektocharacterizetheprocessusingbiologicallymeaningfuland253
interpretablemodelsinsteadofdescribingthepatterns.Further,simply254
exploringerrorinthefossilrecordinitselfseemsratherfruitlessbecause255
uncertaintydependsonthequestionsbeingposed;palaeontologicalstudiesof256
macroevolutionshouldbenodifferentthanotherstatisticalapproachesinthe257
naturalsciencesinthatuncertaintyisassessedwhileexploringthephenomena258
ofinterest(Benton2015).Explicitlyphylogeneticapproaches(e.g.Lloydetal.259
2008;Didier,Royer-Carenzi&Laurin2012;Stadler2013;Stadleretal.2013;260
Sakamoto,Benton&Venditti2016)offerthebestandmostappropriatemeansto261
tacklequestionsofevolutionaryprocesses.Especiallywhenextrinsiccausal262
mechanismsforchangesinbiodiversityaretestedusingregressionmodels,263
ignoringphylogenyisinseriousviolationofstatisticalindependence264
(Felsenstein1985;Harvey&Pagel1991).Itisalsoworthnotingthat265
subsamplingapproaches(e.g.Alroy’sSQS(Alroy2010a;Alroy2010b;Alroy266
2010c))aregainingwidepopularityasmodernmethodstoaccountforsampling267
bias,theyarenotwithoutproblems(Hannisdaletal.2016),andcertainlydonot268
takesharedancestrydescribedbyphylogenyintoaccount,thusalsosuffering269
statisticalnon-independence(Felsenstein1985;Harvey&Pagel1991),andcan270
13
frequentlyresultinincorrectinterpretationofthedata.Forinstance,while271
recentstudiesusingbinnedtimeseriesapproaches(includingSDDMandSQS)272
haveledtomixedconclusionsregardingthelong-termdemiseofdinosaurs273
beforetheirfinalextinctionattheCretaceous-Paleogene(K-Pg)boundary66274
millionyearsago(Ma)(Barrett,McGowan&Page2009;Lloyd2012;Brusatteet275
al.2015),anexplicitlyphylogeneticBayesiananalysishasstronglysuggested276
thatdinosaurswereindeedinalong-termdeclinetensofmillionsofyearsprior277
totheK-Pgmassextinctionevent,inwhichspeciationratewasexceededby278
extinctionrateanddinosaurswereincreasinglyincapableofreplacingextinct279
taxawithnewones(Sakamoto,Benton&Venditti2016).Suchevolutionary280
dynamicscannotbeidentifiedusingtime-binned(tabulated)data.Phylogenetic281
mixedmodellingapproaches(Hadfield2010)furtherallowtheincorporationof282
confoundingvariablessuchassamplingbutalsoenvironmentaleffects283
(Sakamoto,Benton&Venditti2016).Therefore,inordertoadvanceour284
understandingoftheevolutionarydynamicsofbiodiversity,speciationand285
extinctionthroughtime(ortheunderlyingprocessgeneratingtheobserved286
patternsinsampleddiversity,e.g.taxonrichness),whileaccountingforsampling287
andphylogeneticnon-independence,itisimperativethatwehaveanabundance288
oflarge-scalecomprehensivephylogenetictreesoffossil(andextant)taxa.289
290
291
ACKNOWLEDGEMENTS292
WethankJoBaker,CiaraO’DonovanandHenryFerguson-Gowfordiscussion293
andinsightfulcomments.WealsothankNeilBrocklehurstandMichelLaurinfor294
14
reviewingthismanuscriptandprovidinghelpfulcommentary.Wehaveno295
conflictsofinterest.296
297
298
FUNDING299
MSandCVarefundedbyLeverhulmeTrustResearchProjectGrantRPG-2013-300
185(awardedtoCV).MJBisfundedbyNaturalEnvironmentResearchCouncil301
StandardGrantNE/I027630/1.302
303
304
REFERENCES305
Akaike,H.(1973)Informationtheoryandanextensionofthemaximum306likelihoodprinciple.2ndInternationalSymposiumonInformationTheory307(edsB.N.Petrov&F.Csaki),pp.267–281.AkademiaiKiado,Budapest.308
Alroy,J.(2010a)Fairsamplingoftaxonomicrichnessandunbiasedestimationof309originationandextinctionrates.Quantitativemethodsinpaleobiology.310PaleontologicalSocietyPapers,16,55-80.311
Alroy,J.(2010b)Geographical,EnvironmentalandIntrinsicBioticControlson312PhanerozoicMarineDiversification.Palaeontology,53,1211-1235.313
Alroy,J.(2010c)TheShiftingBalanceofDiversityAmongMajorMarineAnimal314Groups.Science,329,1191-1194.315
Barrett,P.M.,McGowan,A.J.&Page,V.(2009)Dinosaurdiversityandtherock316record.ProceedingsofTheRoyalSocietyB-BiologicalSciences,276,2667-3172674.318
Benson,R.B.J.,Butler,R.J.,Lindgren,J.&Smith,A.S.(2010)Mesozoicmarine319tetrapoddiversity:massextinctionsandtemporalheterogeneityin320geologicalmegabiasesaffectingvertebrates.ProceedingsofTheRoyal321SocietyB-BiologicalSciences,277,829-834.322
Benson,R.B.J.&Mannion,P.D.(2012)Multi-variatemodelsareessentialfor323understandingvertebratediversificationindeeptime.BiologyLetters,8,324127-130.325
Benson,R.B.J.&Upchurch,P.(2013)Diversitytrendsintheestablishmentof326terrestrialvertebrateecosystems:Interactionsbetweenspatialand327temporalsamplingbiases.Geology,41,43-46.328
Benton,M.J.(2015)Palaeodiversityandformationcounts:redundancyorbias?329Palaeontology,58,1003-1029.330
15
Benton,M.J.,Dunhill,A.M.,Lloyd,G.T.&Marx,F.G.(2011)Assessingthequalityof331thefossilrecord:insightsfromvertebrates.ComparingtheGeologicaland332FossilRecords:ImplicationsforBiodiversityStudies,358,63-94.333
Benton,M.J.,Ruta,M.,Dunhill,A.M.&Sakamoto,M.(2013)Thefirsthalfof334tetrapodevolution,samplingproxies,andfossilrecordquality.335PalaeogeographyPalaeoclimatologyPalaeoecology,372,18-41.336
Brandt,P.T.,Williams,J.T.,Fordham,B.O.&Pollins,B.(2000)Dynamicmodeling337forpersistentevent-counttimeseries.AmericanJournalofPolitical338Science,44,823-843.339
Brocklehurst,N.(2015)Asimulation-basedexaminationofresidualdiversity340estimatesasamethodofcorrectingforsamplingbias.Palaeontologia341Electronica,18.342
Brusatte,S.L.,Butler,R.J.,Barrett,P.M.,Carrano,M.T.,Evans,D.C.,Lloyd,G.T.,343Mannion,P.D.,Norell,M.A.,Peppe,D.J.,Upchurch,P.&Williamson,T.E.344(2015)Theextinctionofthedinosaurs.BiologicalReviews,90,628-642.345
Burnham,K.P.&Anderson,D.R.(2002)Modelselectionandmultimodelinference:346apracticalinformation-theoreticalapproach,2ndedn.Springer,New347York.348
Butler,R.J.,Benson,R.B.J.,Carrano,M.T.,Mannion,P.D.&Upchurch,P.(2011)Sea349level,dinosaurdiversityandsamplingbiases:investigatingthe'common350cause'hypothesisintheterrestrialrealm.ProceedingsofTheRoyalSociety351B-BiologicalSciences,278,1165-1170.352
Darwin,C.(1859)OntheOriginofSpeciesbyMeansofNaturalSelection,orthe353PreservationofFavouredRacesintheStruggleforLife,FirstEditionedn.,354London,UK.355
Didier,G.,Royer-Carenzi,M.&Laurin,M.(2012)Thereconstructedevolutionary356processwiththefossilrecord.JournalOfTheoreticalBiology,315,26-37.357
Dunsmuir,W.T.M.&Scott,D.J.(2015)TheglarmaPackageforObservation-358DrivenTimeSeriesRegressionofCounts.JournalofStatisticalSoftware,35967,1-36.360
Felsenstein,J.(1985)PhylogeniesandtheComparativeMethod.American361Naturalist,125,1-15.362
Freckleton,R.(2002)Onthemisuseofresidualsinecology:regressionof363residualsvs.multipleregression.(vol71,pg542,2002).JournalofAnimal364Ecology,71,722-722.365
Grossnickle,D.M.&Newham,E.(2016)Therianmammalsexperiencean366ecomorphologicalradiationduringtheLateCretaceousandselective367extinctionattheK–Pgboundary.ProceedingsoftheRoyalSocietyof368LondonB:BiologicalSciences,283.369
Hadfield,J.D.(2010)MCMCmethodsformulti-responseGeneralizedLinear370MixedModels:TheMCMCglmmRPackage.JournalofStatisticalSoftware,37133,1-22.372
Hannisdal,B.,Haaga,K.A.,Reitan,T.,Diego,D.&Liow,L.H.(2016)Common373specieslinkglobalecosystemstoclimatechange.bioRxiv,043729.374
Harvey,P.H.&Pagel,M.D.(1991)Thecomparativemethodinevolutionary375biology.OxfordUniversityPress.376
King,G.(1986)HowNottoLiewithStatistics-AvoidingCommonMistakesin377QuantitativePolitical-Science.AmericanJournalofPoliticalScience,30,378666-687.379
16
Lloyd,G.T.(2012)Arefinedmodellingapproachtoassesstheinfluenceof380samplingonpalaeobiodiversitycurves:newsupportfordeclining381Cretaceousdinosaurrichness.BiologyLetters,8,123-126.382
Lloyd,G.T.,Davis,K.E.,Pisani,D.,Tarver,J.E.,Ruta,M.,Sakamoto,M.,Hone,383D.W.E.,Jennings,R.&Benton,M.J.(2008)DinosaursandtheCretaceous384TerrestrialRevolution.ProceedingsOfTheRoyalSocietyB-Biological385Sciences,275,2483-2490.386
O'Hara,R.B.&Kotze,D.J.(2010)Donotlog-transformcountdata.Methodsin387EcologyandEvolution,1,118-122.388
Prothero,D.(1999)Fossilrecord.Encyclopediaofpaleontology(ed.R.Singer).389FitzroyDearbonPublishers,Chicago,USA.390
Raup,D.M.(1972)TaxonomicDiversityduringthePhanerozoic.Science,177,3911065-1071.392
Raup,D.M.(1976)SpeciesDiversityinthePhanerozoic:AnInterpretation.393Paleobiology,2,289-297.394
Raup,D.M.(1991)Extinction:badgenesorbadluck?W.W.Norton,NewYork.395Sakamoto,M.,Benton,M.J.&Venditti,C.(2016)Dinosaursindeclinetensof396
millionsofyearsbeforetheirfinalextinction.ProceedingsoftheNational397AcademyofSciences,USA,113,5036-5040.398
Silvestro,D.,Antonelli,A.,Salamin,N.&Quental,T.B.(2015)Theroleofclade399competitioninthediversificationofNorthAmericancanids.Proceedings400ofTheNationalAcademyofSciences,USA,112,8684-8689.401
Smith,A.B.&McGowan,A.J.(2007)Theshapeofthephanerozoicmarine402palaeodiversitycurve:Howmuchcanbepredictedfromthesedimentary403rockrecordofwesternEurope?Palaeontology,50,765-774.404
Stadler,T.(2013)Recoveringspeciationandextinctiondynamicsbasedon405phylogenies.JournalOfEvolutionaryBiology,26,1203-1219.406
Stadler,T.,Kuhnert,D.,Bonhoeffer,S.&Drummond,A.J.(2013)Birth-death407skylineplotrevealstemporalchangesofepidemicspreadinHIVand408hepatitisCvirus(HCV).ProceedingsofTheNationalAcademyofSciences,409USA,110,228-233.410
411
17
TABLES412Table1.TypeIerrorrates(%)forSRM(StandardRegressionModel)andSDDM413(Sampling-DrivenDiversityModel)estimates(interceptαandslopeβ)across414residualerror(σe).415416
σeSRM SDDM
α β α β0.05 5.34 4.90 26.1 28.50.10 4.84 4.92 40.2 48.40.25 4.82 4.78 57.3 91.30.50 5.48 5.14 68.7 100.0
417
18
Table2.t-testresultsbetweenmeanregressionslopesof5000iterationsandthe418theoreticalslopeb=0.6,forSRM(StandardRegressionModel)andSDDM419(Sampling-DrivenDiversityModel)acrossresidualerror(σe).420421
σeSRM SDDM
mean-slope t-value p-value mean-slope t-value p-value0.05 0.6 1.230 0.220 0.602 20.9 00.10 0.6 -1.790 0.073 0.607 46.0 00.25 0.6 -0.042 0.967 0.646 131.0 00.50 0.6 0.685 0.493 0.775 244.0 0
422
19
FIGURES423
424Figure1.Procedureforgenerating‘residuals’fromasampling-drivendiversity425
model.(a)Apaired,bivariatedatasetx(samplingproxy)andy(sampled426
diversity)wassimulatedsothatxisrandomlydrawnfromanormaldistribution427
(μ=0,σ=1)andyiscalculatedasy=a+bx+ewherea=0.4,b=0.6andeis428
noise(μ=0,σ=0.5).ThethickblacklineistheexpectedrelationshipY=a+bx.429
Verticallinesrepresentthetrueresidualsordeviationsinyfromthethickline.430
(b)FollowingSmithandMcGowan(2007)xandyaresortedfromlowtohigh431
valuesindependentofeachother(x’andy’respectively),andanordinaryleast432
squares(OLS)regressionmodel(pinkline)isfittedtoy’onx’.Despitethepink433
linesupposedlyrepresentingthedatageneratingprocess,itisclearthatitisnot434
agoodestimatorofthetrueknowngeneratingprocess,thethickline.(c)The435
OLSmodelfrom(b)isusedasthesampling-drivendiversitymodel(SDDM)or436
theexpectedrelationshipbetweenyandx,fromwhich‘residuals’arecomputed437
asthedeviationsinyfromthepinkline(verticalpinkdottedlines).Itis438
immediatelyclearthatthereisasubstantialdifferencebetweenthetrue439
residuals(a)andtheSDDM‘residuals’(c).440
441
x
y
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2 3
−1
0
1
2
(a)
x'
y'
●
●●
●
● ●●
●
●●●●●
● ●●● ●
●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●● ●●●●
●●●●●
●●●●●●●●
●
●●
●
−2 −1 0 1 2 3
−1
0
1
2
(b)
x
y
−2 −1 0 1 2 3
−1
0
1
2
(c)
20
442
Figure2.Regressionmodellingonadecoupledbivariatedatasetfailstoestimate443
thesimulationslopeparameter.(a)Abivariatedataset(yandx)wasgenerated444
soastofollowatheoreticalrelationship(thickline)withintercepta=0.4,slope445
b=0.6andnoise(e[μe=0,σe=0.5]).Thebest-fitregressionline(blue)isnot446
significantlydifferentfromthetheoreticalline(dashed95%confidenceintervals447
encompassthethickline;seetable1forTypeIerrorratesover5000448
simulations),withyandxformingamoderatelystrongrelationship(r2=0.526)449
appropriateforthedegreeofemodelled.Regressionmodelresiduals(vertical450
lines)shownostructure,asexpected.(b)Thebivariatedatain(a)weresorted451
independentlyofeachother(y’andx’),towhicharegressionmodelwasfitted.452
Thebest-fitsampling-drivendiversitymodel(SDDM)regressionline(pink)453
deviatesstronglyfromthetheoreticalrelationship(dashed95%confidence454
intervalsdonotencompassthethickline;table1),andy’andx’formavery455
strong(butfalse)linearrelationship(r2=0.973).Regressionresiduals(vertical456
lines)showclearstructure.Onepairofmodelcomparisonoutof5000457
simulationsisshown.458
459
21
460
Figure3.SDDMregressionpredictionsaresystematicallybiased.(a)Standard461
regressionlines(blue)for5000simulateddatasetsatσe=0.5deviaterandomly462
aroundthetheoreticalrelationship(thickline)withthemeanslopeshowingno463
significantdifferencefromthetheoreticalslopeb=0.6(table2).(b)SDDM464
regressionlinesondecoupleddatasets(pink)deviatesystematicallyawayfrom465
thetheoreticalrelationship(thickline),withasignificantdifferencebetweenthe466
meanregressionslopeandthetheoreticalslope(table2).467
468
22
469
Figure4.Thedifferencebetweentheoriginalpaired,bivariaterelationship(a)470
andtheforced,falserelationship(b)shownusingthedatafromSmithand471
McGowan(2007).Log-transformedmarinegenericdiversityhasanon-472
significantandweakrelationshipwithlog-transformedrockarea(β=0.105;r2=473
0.0398;p=0.0979;a).However,oncediversityandrockareaaresorted474
independentlyofeachotherfollowingSmithandMcGowan(2007),thenthe475
relationshipbecomessignificantandstrong(β=0.499;r2=0.903;p<0.001;b).476
Pointsarecolouredaccordingtotheirgeologicalagewithcoolercoloursonthe477
olderandwarmercoloursontheyoungerendsofthetimescale.Filledand478
outlinecoloursin(b)correspondtotheagesoftherockrecordanddiversity479
respectively,anddemonstratevisuallythemismatchbetweeny’andx’.Dashed480
linesareconfidenceintervals,whiledottedlinesarepredictionintervals.481
482
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●●●
●
● ●●
●
●
Rock area
Dive
rsity
0.0 0.5 1.0 1.5 2.0 2.5
2.6
2.8
3.0
3.2
3.4
3.6
(a)
●
● ●
●
●● ●
●●●
● ●
●●
●●●●●●●●●●●●●●●
●●●●●●●●●●● ●●●
●●●
●●● ●●
●●●●●●
●
●●●●●●●
●●●●
●
Sorted rock area
Sorte
d di
vers
ity
0.0 0.5 1.0 1.5 2.0 2.5
2.6
2.8
3.0
3.2
3.4
3.6
(b)
23
483
Figure5.Independentlysortinganytwovariablesresultsinaforcedpositive484
relationship.(a)Tworandomlygeneratedvariablesyandxshownosignificant485
relationshipsacross1000simulations,withtheslopesoftheregressionlines486
(blue)distributedaroundtheexpectedslopeofzero.(b)Whenregression487
modelsarefittedonindependentlysorteddatasets(y’andx’),estimatedslopes488
aresignificantlydifferentfromtheexpectedvalueofzero,andresultinastrong489
positiverelationship(r2=~1;insetpink)despitetheunrelatednatureofthe490
originaldatasets(r2=~0;insetblue).(c)Abivariatedataset(yandx)was491
generatedsoastofollowatheoreticalrelationship(thickline)withintercepta=492
0.4,slopeb=-0.6andnoise(e[μe=0,σe=0.5]).Standardregressionlines(blue)493
x
y
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
−2 −1 0 1 2
−2
−1
0
1
2
(a)
(c) (d)x '
y'
●
●
● ●●●
● ●
●●●●●
●●
●●●●
●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●
●●●
●● ● ● ●●●● ●●
●
●
●● ●●
●●
●
−2 −1 0 1 2
−2
−1
0
1
2
(b)
r2
Frequency
0.00 0.50 1.00
0
200
400
600
x
y
−2 −1 0 1 2
−2
−1
0
1
2
x '
y'
−2 −1 0 1 2
−2
−1
0
1
2
24
deviaterandomlyaroundthetheoreticalrelationshipwiththemeanslope494
showingnosignificantdifferencefromthetheoreticalslopeb=-0.6.(d)However495
oncesortedindependently,regressionlines(pink)deviatesystematicallyaway496
fromthetheoreticalrelationship,withallestimatedslopesbeingpositive.Thus497
SDDMslopeestimatesaresystematicallyanddirectionallybiased.498
499
Recommended